The Democratization of Data

The Internet of Things (IoT) means many things to many people. Fundamentally, it refers to the proliferation of sensors and interfaces that connect things that aren't computers to the internet. To a retailer, IoT may mean passive RFID tags, GPS receivers and embedded sensors that track merchandise from supply chain through inventory to point of sale and even to the customer. To a city it can mean smart parking meters, smart trash cans or microphones to detect and locate gun shots.

Just as important as connected devices, is the infrastructure to ingest, store, merge and analyze the signals they produce. Advances in big data technologies and in scalable, public and private cloud solutions make the cost of entry low. Cloud providers like Microsoft Azure and Amazon Web Services offer scalable storage and analysis platforms with an ever increasing set of interfaces. The GE Predix platform is purpose built for industrial IoT. Providers such as Rackspace offer dedicated products and services that allow customers to securely deploy IoT solutions with minimal up-front investment.

Often overlooked amidst the IoT buzz is the data. Until recently, most device manufacturers jealously guarded the data by storing it in proprietary databases or housing it themselves. End customers were granted access to aggregate versions of the data, to pre-built reports or to views of the data, but not to the data itself.

All of that is changing. The democratization of data is playing out in business strategy meetings, vendor negotiations and even the courtroom. Customers are increasingly demanding and winning access to data generated by the devices they own, arguing "our device, our data."

Device manufacturers should see this democratization as an opportunity rather than a threat. First off, if their interfaces are any good, customers will keep using them. After all, if it isn't broke, don't fix it. More fundamentally, the data becomes more valuable when it's merged with other data. Data on water flow through storm drains, for instance, is more valuable to a municipality when it's merged with weather data, repair data, soil assays and so on. Data on customer movement through a shopping mall is more valuable merged with data on store locations, store sales and events. By encouraging customers to use data more extensively, device manufacturers secure their position by making switching more disruptive.
They can even pick up some professional services revenue along the way.

Customers need to develop the technological expertise, human resources and corporate culture to take advantage of the new data nirvana. An environment housing data from many sources is often called a data lake. Structured and unstructured data from IoT devices and back office systems flow into the data lake and insights flow out. Only it's not that easy. Put a lot of data in one place and you run the risk of building a data swamp. The hallmarks of a data swamp are:

Data organized by source system rather than business concept
Inconsistent descriptions, identifiers and units of measure
Inappropriate or poorly understood tools for merging and analyzing the data
Poorly defined business objectives

Building good data lakes is hard work. Successful deployments tend to follow certain guidelines:

Lead with outcomes: start with well-defined business goals to bound the problem and deliver results quickly.
Move from a product focus to a platform focus: build the data, analysis and human foundation for deploying custom solutions.
Move from a system centric view of data to a concept centric view: organize data around specific concepts, like product or customer, and shape it for analysis.

Data shaping is time consuming, but critical because it delivers a platform for new use cases. A case study highlights this point. A leading mining company sought to improve the reliability and utilization of its fleet of haul trucks. The haul fleet is bristling with sensors that, collectively, generate thousands of readings per second. The data is streamed in real time to a data historian.

Other data on haul trucks resides in over a dozen systems such as maintenance, dispatch, mine planning and labs. Collectively they paint a complete picture of a day in the life of a haul truck. The manufacturer of the data historian, OSI Soft, recognized the value of merged data and provided an API for bulk extracting of the sensor data.

The mining company was able to load the sensor data in a Hadoop data lake along with data from other source systems. They then spent several months building data shapes related to truck health. They were able to develop engine failure models and re-evaluate maintenance schedules within a few months. Having delivered some quick wins they merged in geospatial data and created new data shapes related to road and haul cycle. Within several months they deployed entirely new solutions on truck health, road quality and operator performance. All built on the same foundation.

IoT is creating opportunities in old and new industries alike. All parties – device manufacturers, infrastructure vendors and end customers – stand to benefit. Realizing the full value requires thinking of data a little differently. Rather than seeing it as a precious commodity, like oil or gem stones, view it as a raw material like iron ore that has higher value as a finished product. The bounty will go to those who embrace the democratization of data and build those finished products, not to those who oppose it.