The connected devices that make up the internet of things are expected over the next several years to generate increasingly vast amounts of data. IT teams must determine where to process and store this data, relying on edge systems, central data centers, cloud platforms or any combination of the three.
When starting to plan IoT storage, IT managers should focus on seven key areas related to their data: What type of data is it, and how will it be collected, managed, secured, processed, analyzed and stored?
What types of data will be collected?
To design the storage infrastructure, IT needs to understand what types of data the organization will collect. For example, the data might include video files from surveillance cameras, streaming data from machinery sensors or environmental data from field equipment. An IT team might have to contend with multiple types of IoT data and systems, each with unique storage requirements.
In addition, IoT data might be structured, semistructured or unstructured and generated across vast geographic regions, with different technologies used for data collection. Although many IoT devices are unidirectional — streaming data only from the device to the collection point — other devices are bidirectional to support smart controls on the devices.
IT must thoroughly understand the nature of the data and the devices that produce the data.
How much data will be collected?
Just as important as the types of data is the amount of data being collected. Whether the IoT devices are commercial or industrial in nature, they will generate large data sets, and those amounts will continue to grow. IT needs to have a clear sense of what to expect in the short and long terms for every type of data collected.
However, projecting data amounts is a difficult task. Much depends on how extensively an organization plans to embrace IoT technologies and how those technologies will fit into its long-range goals and objectives.
At the same time, planners must keep in mind the anticipated explosion of IoT devices in homes and in businesses and how that will affect the amount of data their organizations will collect. According to a recent IDC report, by 2025, IoT devices worldwide will generate about 40 zettabytes of real-time data. That comes to about 40 billion TB.
Where will data be collected and stored?
IT teams can’t plan their IoT storage infrastructures if they don’t know where the data will be collected and stored. And those factors depend in part on the types and amounts of data, as well as the IoT devices themselves.
Some devices might have the capacity to collect their own data, while others will immediately stream the data to edge gateways, centralized data centers, public cloud platforms or any combination of the three. In addition, data collected at edge systems will likely move to centralized platforms for permanent storage.
IT teams must completely understand how data will move from the IoT devices to the collection points and then to centralized storage systems. For example, data might be streamed from the devices to a hyper-converged infrastructure (HCI) appliance at the network’s edge and then transferred to cloud-based object storage. In this case, IT must ensure that the HCI appliance has the storage capacity to support local operations and that the cloud storage platform provides a service-level agreement that can meet the expected capacity and performance requirements.
Where will data be processed and analyzed?
To fully understand the movement of the data, IT must identify where data will be processed and analyzed. Those operations can occur across multiple nodes in the IoT infrastructure. Some IoT devices might process the data to some degree before it’s transmitted, while others might stream the raw data to an edge system, where it will be cleansed, transformed and sent to a data center for analysis.
Sometimes, both processing and analytics will need to occur in close proximity to the IoT devices to avoid latency issues, which can be especially important for real-time analytics. The raw data might then be archived to a cloud or sent to the data center. Data can be processed and analyzed at any point in the IoT infrastructure.
IT must assess where all operations will occur to ensure that adequate storage resources are available at each stage, taking into account which operations will be conducted entirely in memory. Any combination is possible, depending on the circumstances, and IT might need to support multiple scenarios.
What data will be retained?
IT must also identify data that needs to be retained and for how long, as well as what data can be discarded. The IoT storage architecture might have to accommodate raw data, staged data, transformed data, aggregated data and analyzed data, and each type of data will have different storage requirements.
Raw data might need to be archived indefinitely or stored only long enough to perform the transformations and analytics necessary to generate alerts, key performance indicators or full reports. Each type of data will likely have its own retention requirements, with any scenario possible.
How will data be managed and stored?
The tools and technologies used to manage data can also affect how storage is set up. Relational databases, NoSQL data stores and object storage platforms all have different storage requirements. Factors such as scaling, redundancy, high availability and disaster recovery can further affect how data is managed and stored. In addition, IT teams planning to retrofit their existing infrastructure to accommodate IoT could face even greater challenges.
Other important factors include the types and amounts of data, whether the data must be processed in real time and the nature of the analytics that will be performed. I/O patterns can vary significantly, as can throughput and latency requirements, complicated by the extent to which operations are distributed. IT teams will likely need to take different approaches to address all their IoT storage and other requirements, which can complicate data management and, in turn, make storage planning that much more difficult.
What are the security and privacy requirements for IoT storage?
As the number of IoT devices increases, so do the amount and distribution of data, creating a data governance nightmare for IT teams used to the orderly data models of traditional data centers. The distributed nature of the IoT architecture opens attack surfaces in every direction. Organizations might have full control over the devices and collection processes, as can be the case with industrial IoT. But such control isn’t always possible, especially if an organization wants to realize IoT’s full potential.
Unfortunately, security has taken a back seat to other considerations in many IoT discussions, with no clear security standards yet. For the most part, IT must figure out for itself how to implement IoT systems that protect sensitive data, while complying with applicable regulations and standards.
Some compliance regulations require that certain types of data be retained for an extended period of time, while others put the responsibility squarely on the organization to protect all personally identifiable information, regardless of cost. Data governance is an organizationwide concern, with storage one of the most important factors in that equation.