What should be considered when building a sensor data platform for COs?
Description
A sensor data platform is a comprehensive framework designed to collect, process, store, manage, and utilize data generated by sensing devices deployed in various environments. Key components include the ingestion of data coming from distributed networks of sensors capturing physical parameters, mechanisms for data processing, storage solutions for organizing and archiving data, visualization tools for data representation, security measures to protect data integrity and privacy. Additional features may include the integration with other sources for contextual enrichment and scalability to accommodate growing data volumes.
Why is this relevant?
Sensor data platforms allow collecting, processing, and sharing data from distributed networks of sensing devices. However, there are many choices we need to take when designing and implementing such platforms, both on the technical and non-technical side. In this section, we will mostly discuss conceptual choices, such as the ability for the platform to communicate with other platforms (interoperability), or for users to be able to access data in a free and easy way (findable and accessible). These are relevant aspects as the proliferation of multiple platforms in recent years shows both the diversity in approaches, but also the great potential the data on these platforms has when these aspects are well thought through. Other very relevant aspects to consider fall on the technical side, such as transmissions protocols to use, type of databases, API choices, all of which will significantly impact the technical development efforts and the maintainability of the platform in the future.
How can this be done?
When designing a sensor data platform, several technical and non-technical aspects must be carefully considered to make use of technical development efforts efficiently, ensure the maintainability of the platform, but also consider following conceptual design choices that can make data more findable or interoperable.
Considerations should be accounted for the layers presented on our section about sensor platforms, namely: data ingestion, data storage, data processing, data visualization and API.
Data Ingestion: designing appropriate data ingestion workflows and formats will have an impact on compatibility, interoperability across various devices and systems, and even the power consumption of the device itself. We can establish a first differentiation between the transmission channel, for instance, data transmitted via wired, or wireless communications, and the transmission protocol, i.e., how data is transmitted following (standard) communication protocols that define how data is encoded while it travels via wired, or wireless channels. Typically, the transmission channel will be wireless, via Wi-Fi, LoRa or GSM. This is discussed in “What technical aspects to consider when choosing an air quality sensing device?”. An important decision to take at this point is regarding the transmission protocol. Using a standard protocol, that is well tested, robust, sclable and well documented is a good choice. Examples are HTTP (Hypertext Transfer Protocol) or MQTT (Message Queuing Telemetry Transport), and other promising alternative is CoAP (Constrained Application Protocol). Another very important aspect to keep in mind is the message format, i.e. how the device encodes the data that it sends. Using JSON format is considered a good practice, and a good option for sensor data encoding. Other options, likely more efficient, but not standard, are MSGPack or ProtoBuf. These options can help reduce development effort in the long-term if thought through from the beginning.
Data Storage: designing robust data storage and management systems is critical for handling large volumes of sensor data efficiently. Considerations include selecting suitable databases (e.g., relational databases, NoSQL databases), implementing data partitioning and indexing strategies for performance optimization, and ensuring data security and privacy compliance. Consider implementing data duplication and backup systems as well, to avoid data loss. Data storage is a very specialized field of expertise, which requires a certain level of experience to handle such a critical task, guaranteeing that sensor data is stored safely. Consider, however, using well stablished databases with long-term support such as PostgreSQL, Influx DB, or similar options. See the useful resources section to find some guides about the topic.
API: API is the interface that computers will use to interact with the data platform(s). This interface will typically have various endpoints, where data from different data-tables in your database can be presented. The API needs to be able to present data efficiently, interacting with your data storage correctly and in an understandable way for other people to be able to use it. Any database allows CRUD operations (Create, Read, Update, Delete), and the API should filter those with authentication and authorization on each operation. Search endpoints help users find information programmatically.. If building from scratch, it is practical to implement a standard API format, such as STA+, which can contribute towards interoperability.
Data processing/ Data quality assurance: data processing involves applying quality control algorithms (e.g. removing noise and outliers), as well as calibration/correction algorithms (e.g. correcting the effects of air humidity in air quality sensing, or interferences with other gases, etc). Processing is usually done using statistics and other advanced methods as machine learning models. Consider using well known programming languages and libraries for processing your data. Some languages have open libraries and repositories and are free to use, while others use proprietary licenses.
Data visualization and presentation: presenting sensor data in a user-friendly and intuitive manner is essential for facilitating data interpretation and decision-making. Data visualization tools and dashboards enable users to explore and interact with sensor data through charts, graphs, maps, and other visualizations, allowing them to gain insights and derive meaning from the data.
In addition, below, we list some factors that are transversal to these items:
Scalability and performance: building a scalable sensor data infrastructure capable of handling growing data volumes and user demands is essential.
Data loss prevention: ensuring data will be safely stored is key, regardless of what may happen with any of the components in your platform. In particular, the data ingestion part of your platform should be carefully designed to ensure that data will persist, even if your database is down for a certain period (either intentionally or unintentionally). Consider implementing intermediate buffers for temporary data storage and safeguard mechanisms that check if any component of the platform is down and restore it in case its necessary.
Interoperability and integration: facilitating interoperability and seamless integration with existing systems and applications will maximize the utility of sensor data. This involves adhering to open standards and APIs (e.g., OGC SensorThings API+) for data exchange and integration, as well as implementing middleware and data transformation layers for interoperability across heterogeneous environments.
Traceability: establishing mechanisms for traceability ensures the provenance and lineage of sensor data throughout its lifecycle. By capturing metadata such as data source, acquisition timestamp, and processing history, organizations can trace the origins of sensor data and track any transformations or modifications it undergoes. This enhances transparency and accountability, enabling stakeholders to verify the accuracy, reliability, and integrity of sensor data and supporting data-driven decision-making processes. Additionally, traceability facilitates compliance with regulatory requirements and quality assurance standards, providing assurance regarding the authenticity and trustworthiness of sensor data for various applications and use cases.
Contextualization: incorporating contextual information into sensor data enhances its relevance and usability by providing insights into the conditions under which measurements were made. Contextual factors such as location, environmental conditions, devices’ configuration, and operational context help interpret sensor readings and understand their significance. By capturing and integrating contextual information alongside sensor data, organizations can improve data interpretation, analysis, and decision-making processes. Contextualization also supports data quality assessment and validation by enabling anomaly detection and error correction based on contextual discrepancies. Furthermore, contextual information facilitates data fusion and integration with other datasets, enabling richer and more comprehensive analysis for various applications and domains.
FAIR data principles: adhering to the FAIR (Findable, Accessible, Interoperable, and Reusable) data principles ensures that sensor data is managed in a manner that maximizes its usability and impact. This involves implementing metadata standards and conventions to make sensor data findable and accessible through search engines and data catalogues. Additionally, adopting standardized data formats and protocols promotes interoperability, enabling seamless integration with other datasets and analysis tools. Ensuring data quality and documentation facilitates data reuse by providing clear descriptions of data sources, collection methods, and processing steps. By following FAIR data principles, organizations can enhance the discoverability, accessibility, and usability of sensor data, fostering collaboration and knowledge sharing within the scientific community and beyond.
Security and privacy: ensuring the security and privacy of sensor data is paramount to protect against unauthorized access, data breaches, and privacy violations. This involves implementing encryption, access control mechanisms, authentication, and auditing features to safeguard data integrity and confidentiality throughout the data lifecycle.
Regulatory compliance: ensuring compliance with relevant regulations and standards governing data privacy, security, and environmental monitoring is essential. This includes adhering to laws such as GDPR (General Data Protection Regulation) and ensuring data collection and storage practices align with industry-specific regulations.
Data governance and ownership: establishing clear policies and procedures for data governance, including defining data ownership, access rights, and responsibilities, is crucial. This involves addressing questions about who owns the data collected by devices, who has permission to access it, and how it can be used and shared.
Licensing and Intellectual Property: addressing licensing and intellectual property considerations is crucial for ensuring legal compliance and protecting proprietary rights. This involves defining the licensing terms and conditions for sensor data, including permissions for use, redistribution, and modification. Choosing appropriate licensing models, such as open-source licenses or proprietary licenses, depending on the organization's goals and requirements, helps clarify the rights and responsibilities of data users and contributors while mitigating legal risks and conflicts. Additionally, establishing mechanisms for tracking and managing intellectual property rights associated with sensor data and related technologies supports innovation and collaboration while safeguarding against unauthorized use or infringement.
Sustainability and long-term viability: planning for the long-term sustainability and viability of the sensor data infrastructure is critical. This includes assessing the financial and resource requirements for ongoing maintenance, upgrades, and support, as well as exploring opportunities for revenue generation or cost-sharing models to sustain the infrastructure over time.
Open Data policies: incorporating open data principles into the sensor data infrastructure promotes transparency, collaboration, and innovation. This involves adopting policies that encourage the publication of sensor data in open formats and under open licenses, facilitating broader access and reuse by researchers, developers, and the public. Implementing data portals or APIs that provide easy access to sensor data and metadata further enhances transparency and fosters data-driven decision-making.
CARE Data Principles: Incorporating CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) data principles into the sensor data infrastructure ensures that data collection and usage prioritize collective benefits, respect individual rights, and uphold ethical standards. This involves designing data governance frameworks that prioritize collective benefits and societal well-being over individual or organizational interests. Providing individuals with the authority to control their data empowers them to make informed decisions about data sharing and usage, enhancing trust and accountability. Implementing measures to ensure data responsibility, such as data stewardship and accountability mechanisms, promotes ethical data practices and mitigates risks associated with data misuse or harm. By adhering to CARE data principles, organizations can foster a culture of data stewardship, integrity, and respect for individual rights, contributing to the ethical and responsible use of sensor data for societal benefit.
Useful resources
How to design a good API, bytebytego blog: https://blog.bytebytego.com/p/how-to-design-a-good-api
What database should I use? bytebytego blog: https://blog.bytebytego.com/p/which-database-shall-i-use
You might also be interested in….
Last updated
Was this helpful?