Device Management Requirements to Secure Enterprise IoT Edge Infrastructure
In this white paper
Intel Corporation Authors:
Marcos Carranza, Senior IoT Solutions Architect
- Nicolas Oliver, Software Engineer
- Sindhu Pandian, Software Engineer
- Cesar Martinez Spessot, Senior Engineering Director, Internet of Things Group
- Lakshmi Talluru, Senior Director Digital Transformation
The Internet of Things (IoT) is quickly evolving, and the architecture required to support the industry's digital transformation goals includes an enterprise-grade device manageability capability as one of the main components.
In the market there are many platforms to manage devices, but generally they lack some key functions, or they don't support all types of devices. Another important issue with existing options is scaling (i.e. platforms that are not enterprise ready). This causes complex integration problems with IT divisions and generates adoption barriers. The cost for adapting these platforms is extremely high. Lastly, security is fundamental when considering a device-management platform.
This paper will present the overall requirements and capabilities required to address the needs of enterprise customers.
IoT edge networks are composed of small computing devices that are deployed to provide digital intelligence and control from outside typical computing environments. Generally, these devices are deployed in a decentralized way, often across many geographic locations, and require a centralized device management component to help manage their lifecycle needs and route value stream data to consumers.
Device management agents assist in all aspects of the device lifecycle. They help in the commissioning process by providing device identities, initial configurations, security credentials and access policies for the host platform. They provide ongoing ability to monitor the status of the device and modify device configuration. They route incoming value stream data to other software services, store recovery information for lost or damaged devices, and can archive or migrate device data when deactivation or decommissioning is needed.
In the following sections, this paper examines some of the key concepts around the device lifecycle and important features to consider when choosing a device management component to do this job. The goal is to provide a roadmap for understanding different aspects of the decision-making process and provide guidance on industry best practices.
Figure 1 (Device Management Lifecycle) presents the different stages and functions for "Device Lifecycle" and "Software and Firmware Deployment Lifecycle."
IoT Edge devices will have different stages along their life:
- Early life, in which devices are prepared for being used.
- Useful life, in which the devices are deployed and used in the field.
- End-of-life when a device is planned to be removed or replaced.
- Reuse-Life, when a device is still usable, but it is repurposed.
- Decommission, when a device is discarded by following a set of procedures to clear or invalidate a device.
Each of the above stages includes device lifecycle and/or software lifecycle functions, which are:
When the device is securely onboarded, the recommendation is to use hardware root of trust based on TPM 2.0. During the device commissioning phase, the device enrollment process is leveraging the Trusted Platform Module (TPM) identity function, where devices can be pre-registered based on their immutable Public Endorsement Key which is tied to the physical instance of the TPM 2.0. During the registration phase, the device is challenged to prove that it is a legitimate device that holds the private part of the endorsement key. One approach to prove this is to have the management backend generate a random nonce and encrypt that with the public part of the endorsement key (EK). The device can meet the challenge by decrypting the nonce and signing it with the private part of the EK prior to returning the response back to the device management backend.
The registration and enrollment flow used here is not necessarily coupled specifically to TPM 2.0 but could be used with other Hardware Security Modules (HSMs) such as the Device Identity Composition Engine (DICE).
Tenant definition is also part of this function. Ideally, this function is fully automated and instrumented as zero touch to allow to scale IoT device deployment.
This function is used to customize the device creating specific behavior. This activity can be part of the early life while the device is being prepared (before deployment) or can also be performed after deployment during useful life by manual intervention (not ideal for scalability from operations point of view), or in an automated way, by triggering scripts from the manageability solution. This function allows the configuration of properties like device parameters which are required to set communication mode, security, logging, etc.
Configuration management functions include batch updating based on the current state of configuration parameters like device type, revision level and hierarchy locations.
Remote access and diagnosis
These are key functions used to monitor and troubleshoot issues and solve them remotely. During the standard operational lifetime of the device, it is critical that the device meets compliance requirements which include integrity verification of the platform state. The remote attestation feature provides means for the device to measure and collect integrity measurements emanating from a variety of sources on the platform (such as BIOS firmware, bootloader, kernel, filesystem, etc.). The verifier of the attestation capability verifies through these measurements that the device is compliant. TPM 2.0 is used to get signed quotes from the underlying platform configuration register (PCR) banks along with data from the TPM Event Log. The verifier uses this information to determine the compliance state of the device and can then take an appropriate action (e.g. alarm, log, etc.).
During the useful life, the software/firmware lifecycle management is key since it will allow the device to have different behavior. Functions on this lifecycle include:
- Provisioning: this is one of the main functions since it will allow the installation of Software/Firmware.
- Activation/deactivation: Software under subscription models can be enabled or disabled using available Digital Rights Management (DRM) mechanism.
- Updates: this is the main function required by any Software component. With no update mechanism the solution is not able to be improved or fixed (business logic functionality, general bug, or more importantly, security issue such as zero-day attacks).
- Configuration: this is required to remotely modify software (SW) and firmware (FW) for changing behavior, improving performance for certain use case, etc.
- Uninstall: it is always convenient to remove software that is no longer required in the device, considering storage limitations and possible security vulnerabilities.
Rollback and recovery
These functions are important after a failed deployment (provisioning, update, configuration, etc.) and help to revert the changes to the previous stable configuration.
Migration can happen when a device is on the end-of-life stage (planned to be replaced) and all the configurations (SW, FW, Data, Configurations) are gathered and made available to a new device. Migration can also happen on reuse-life stage by redeploying a previous configuration to a device when a device is still usable but requires some maintenance.
In the decommissioning phase, a device may be removed, deleted, blacklisted or unregistered. This process includes securely sanitizing any sensitive data from the device. This may be particularly important if the device ends up missing or is otherwise unaccounted for. The data sanitization process can be triggered from the device management component, but this requires the device to be online and connected. There are also a set of locally enforced triggers in offline scenarios that are important to support.
During a data sanitization process, sensitive data is rendered inaccessible so that it becomes infeasible for an adversary to recover the data without a significant level of effort. There are different levels of data sanitization with respect to how difficult it is for an adversary to gain access to the data. Here we consider Cryptographic Erase (CE) on encrypted storage, which is a process that will effectively destroy the passphrase to the entire volume from its storage (in this case, the TPM non-volatile memory). There are several pre-requisites and guiding principles to ensure effectiveness of this approach such as:
- Local backups of the partition
- Length and entropy of the passphrase protecting the volume
- Cryptographic algorithms used and associated key lengths
- Configuration of hibernation and swap partitions
This is the process of assigning a device to a group that represents tenant, hierarchy and/or asset based on application needs. This provides context for the physical location of the device as well as enables resource-based access controls.
Manageability components usually do not cover all types of devices, creating an operational issue for activities like updates, troubleshooting, etc. This causes IT teams incur additional costs to manage several different systems (e.g., training costs). It is key in an enterprise-grade end-to-end IoT device management component to cover each type of device that is part of the IoT infrastructure:
Edge compute devices (also known as IoT Gateways) have enough processing power to host one or more IoT workloads. They are deployed close to sensors, receive data directly from them, and interact with actuators, motors, etc. to perform specific actions. These devices are standardized by IT including hardware and software stack. These devices can be divided in subgroups depending on connectivity options since workload management is different for a device connected via Ethernet to the local network Edge Server vs a device using cellular (communicating directly to the cloud infrastructure).
Things are the sensors and actuators connected with no or minimum processing capabilities.
Edge servers are on-premises, high-processing capacity servers required for consolidating data coming from edge compute devices (e.g., sensor data), or for processing heavy workloads such as video analytics. They can also enable connectivity between co-located Edge Compute Devices without the need for messaging to travel to the cloud. These servers provide independence from cloud providers, allowing higher availability, lower latency, and reduced outbound data transmission.
There are many characteristics that go into choosing device manageability components. Customer requirements for each IoT solution are unique and device management components should support a range of functionalities to enable maximum capability. However, to compare and select device management components that meet customer needs, we need to understand what they all have to offer. It is also important to note that these features are subjective and based on what the customer considers to be priority.
For a baseline standard to efficiently and safely manage IoT devices, we consider the following to be of utmost importance:
Automatic device onboarding
It is as close to zero-touch provisioning as possible. Device onboarding is simplified to avoid manual intervention including automated provisioning of crypto keys and certificates. Additional benefits include the ability for devices to be directed to their correct regional cloud endpoints upon power up in multi-geographical deployment scenarios.
For example: An operating system (OS) image is built with the device management agent thus enabling the device to be automatically brought on-board when it is turned on and connected to a network connection that is in the correct deployment geography.
Device manageability dashboard
Most device management components come with pre-existing dashboard availability for default methods and properties of devices. Device health can be monitored using device status (online/offline), location, CPU utilization, available memory, etc. These metrics are very important for operations teams to improve user experience and simplifying adoption, as opposed to having to develop custom dashboards based on complex and ever-changing APIs. Multitenancy is a factor that should be considered for this. See Device Grouping / Hierarchy Management section below.
Remote login should only be used for troubleshooting purposes, and not directly expose SSH to the internet. For direct connection from the cloud, remote tunneling via the management dashboard should be used. Logging of all local and remote activities is best practice and highly recommended.
Device grouping and hierarchy management
This means batch management of devices and support to help maintenance at distinct levels of organization/department. To some components they are referred to as tenants. The ability of a component to follow a hierarchical structure of devices may help an enterprise serve more than one customer or project at the same time, enabling a customer the ability to organize their devices into multiple tenants. This provides the ability to create additional resource-based authorization so that users of the system can only access devices that belong to the tenants that have given them access permissions.
IoT systems designed for asset-centric operational technology (OT) industries such as manufacturing, oil and gas, etc., often have IT services that orchestrate hierarchy separate from the IoT device management service and that adhere to ISO standards (e.g. ISO 14224 Asset Hierarchy standard for the petrochemical industry). These hierarchy and asset management (HAM) services are built as a digital twin of the company's asset hierarchy. Defining a clear way for the IoT system to integrate with the HAM service should be a primary design goal of up-front architecture efforts because this integration allows for IoT devices to be associated to both their physical location in the hierarchy as well as the assets they are deployed to monitor.
When this is done correctly it provides the ability for the IoT system to be truly multi-tenant and enables the platform to transform IoT data into the context of the asset being monitored for the operation center team. For example, an operation center can receive a custom notification from the platform: "The primary bearing on Pump A is 150 degrees" instead of "IoT Device 1 is currently reading 150 degrees."
Device grouping can also be performed by filtering or searching devices based on a feature called Tags. It is the ability to give one or more descriptive tag to a group of devices so they can be filtered accordingly for searching/ performing updates, etc.
When choosing a strategy for hierarchy management, performance needs (e.g. how fast data needs to be returned when queried) should be a primary consideration, especially in applications where scaling up can equate to tens of thousands of devices across multiple tenants.
The relationships created here have inter-dependency on data management strategy.
Remote script execution
A script or a command can be run directly from the device management portal. Most vendors also offer ways to configure the parameters, choose an executable, etc.
Updates – wired and over the air
Device updates should have the ability to be staged, executed and verified remotely. The best security practices include code signing for update images that can be delivered "over the air" and preferably as deltas and not full replacements of system images. For example, it is often desirable to trigger a firmware update on devices of a certain revision level or in a certain geographical or hierarchical location. A device management component can enable this by providing the ability to query the device registry (tags and configurations) prior to making batch updates, allowing the system to target devices for update based on relevant criteria. The next time the device checks in with the device management component it can be informed of the pending update, retrieve the files associated with the update, execute the update, and report the results back to the device management system.
Campaign is a functionality that allows multiple devices (even under multiple tenants) to be updated as a batch remote execution, and not just a single device over-the-air update.
Device management components offer commands to provision device properties, OS configuration files, network files, etc.
When considering device configuration features, it's important to think about the range of devices that could use the platform and what those devices need. For example, battery-operated devices or devices with poor connectivity will require a store and forward publish and subscribe capability because they will not have a constant connection to the cloud, and thus config changes to these devices that happen when they are offline in a sleep mode need to be cached until they wake up from their sleep cycle and check-in. Many device management components solve this problem through "shadows" or "twins" which are managed JSON configurations for each individual device in the registry. This arrangement provides a dedicated cache for individual device configurations that would sit at the access point of the IoT system and would reduce the need for centralized processing of check-ins. Using these types of tools and the underlying pattern of check-ins that they enable is considered a best practice.
As mentioned previously, it is also important for device configuration changes to be managed in groups based on device tags or existing parameters. For example, if it becomes desirable to change the APN for all devices in a certain geographical or hierarchical location a device management component can enable this by providing the ability to query the device registry (tags and configurations) prior to making batch updates. This allows the system to target devices for update based on relevant criteria. The managed JSON configuration twin of each individual device will be updated and can then inform the device of the update upon their next check-in so that each device will handle the update according to their own scheduling needs.
Required for operational data processing. Users can create rules from the dashboard that reacts to events, triggers and execute actions like sending notifications, alerts, etc.
For many IoT systems the main purpose of the system is to route value data from the devices to data ingestion services. The ability to securely route incoming data packets to data processing services and/or data storage services is a key component of a device management component.
Some solutions do this using by allowing routing rules to be applied directly from the topics of the publish/subscribe protocols used to get data from devices to the management component. Other solutions do this through rules engines that can read key value pairs in the device's message payloads and make routing decisions based on information contained there. Having these options reduces the amount of custom code needed to get IoT data to the services that will rely on it and streamlines the routing process.
Deployment model support
Device Management components offer a variety of support — cloud (SaaS) or hybrid edge to cloud solution.
Manageability standards support
A versatile device management component includes support for widely used standards like LWM2M, TR069 and OMA-DM. This will allow simple deployment process for devices supporting this. Usually these are intended for things management. Often IoT solutions have small devices like sensors and actuators that needs to be paired with edge gateways. Device management components offer things management, the same way edge devices are managed.
Management of certificates is particularly important on this space to ensure security and must be done securely throughout the certificate lifecycle: private key generation, certificate renewal and revocation of certificates as needed. This can be accomplished by deploying a custom certificate authority (CA) and registration authority (RA) that can be integrated with an enterprise public key infrastructure (PKI) or other platform providing this service. If possible, a hardware security module (HSM) appliance should be used for the secure generation and storage of private keys corresponding to certificates.
How certificate management is handled often becomes a consideration for multi-geo deployments because multiple cloud endpoints will exist in a multi-geo deployment, and it is often unknown which geo the device will be deployed to at the time of provisioning.
Transport protocols supported
Flexibility to support a multitude of different transport protocols such as HTTP/S, (S)MQTT, AMQP and CoAP is very important to reduce integration complexity with enterprise data infrastructure.
Scalability is the characteristic that allows IoT solutions to scale to include large number of edge devices, connected devices like sensors and actuators. Device management components need to provide support for scalability to manage and maintain thousands of devices without restricting them.
There are many security aspects that need to be considered as previously mentioned.
Device management components usually include features providing data protection (sealing, encryption, sanitization) as well as integrity attestation and mutual authentication. Multi-tenancy is also a security concern that requires careful planning.
It is extremely important to have a secure and enterprise grade IoT device management solution for both IT and OT organizations and any IoT use case. Device management components shall help simplify getting started with IoT, automate management at scale, extend it security standards to the edge and IoT infrastructure.
In combination with the Intel's built-in foundation of security capabilities provided by Intel Security Essentials, it is possible to manage the "Device Lifecycle" and "Software and Firmware Deployment Lifecycle," keeping IoT devices secured.
TPM based device enrollment/registration
There are different options to onboard a device into a device management system such as:
- Basic Authentication - A simple authentication scheme built into the HTTP protocol. The client sends HTTP requests with the Authorization header that contains the word Basic followed by a space and a base64-encoded string username: password.
- Token-Based Authentication - Creates a single-use device credential with signature and expire time verification.
- Property-Based Authentication - Creates a single-use device credential with device identity value verification
- Device Onboarding using TPM - Uses the TPM 2.0 module for device onboarding. The unique Endorsement Key of the device's TPM is used for identity. This is the most secure way of device onboarding.
Using TPM based onboarding, trust is established between the device and the device management component, guaranteeing that the onboarding credentials are unique per device, and the private parts of those credentials are kept secured in the TPM, without the possibility of disclosing them to a third party.
A device management agent is installed on the device operating system. This agent could run as a system service that periodically communicates with the device management service backend.
The device management system offers capabilities to pre-register devices before they are deployed into production and have their first communication with the backend itself. This allows the administrator to define properties, metrics, and connected things associated with the device to be deployed.
One other security measure that makes sure only trusted software/files can run or be onboarded onto the device is whitelisting. Along with the configurations, administrator can define whitelisting requirements.
Boot integrity attestation
Security profiles can be created on the device management backend and associated with the Device Instances and can be validated periodically against the data reported by each of the physical devices. The security profile definition is done with the help of the TPM 2.0 Platform configuration registries, which represent the integrity state of each of the devices based on the measurements taken in the measured boot process.
Any modification is detected by the boot integrity attestation system, and alerts will be triggered to notify the administrator. Later, remediation steps or decommissioning process can follow, to protect the infrastructure from a potential malicious actor.
Runtime integrity attestation
To establish trust with the runtime itself, where applications and/or containers are running, runtime attestation can be done using the services offered by the operating system of the device itself.
Devices that run Linux based operating systems can make use of the Linux Integrity Measurement Architecture (IMA), to extend the hardware root of trust to the user space.
Again, on the event of an unexpected modification, it detects and alerts the administrator. Remediation steps or decommissioning of devices follow.
Policies and Alerts in Device Management backend
Security profiles define the expected integrity state of devices. They are associated to devices instances and are continuously monitored by the attestation system.
If a reported digest does not match the expected value in the security profiles, alarms and notifications can be triggered to the administrators and/or operators to execute the respective troubleshooting or mitigation process for the event.
Data sanitization is a process to render access to target data on the storage media infeasible for a given level of recovery effort.
Cryptographic Erase (CE) leverages the encryption of target data by enabling sanitization of the target data's encryption key.
The data sanitization process can be triggered by the device management agent running in the device (in case the architecture is not agentless), as part of a decommissioning or repurpose process.
For more information about Data Sanitization, refer to the NIST Special Publication 800-88 Guidelines for Media Sanitization.
Device management components also offer integration with third-party clouds. These integrations are helpful in forwarding data and maintaining your data plane in third-party clouds where you can perform data aggregation, analytics, etc.
Many cloud-based device management components also provide edge run times for edge servers to enable some processing and device management on-premises. They also provide device software development kits to assist in integration of devices into the management component.
Identity and access management (IAM) is an important integration component for device management. IAM allows for the Device Management component to identify which system users have access to the data and controls from which devices using common security standards such as JWT.
Most device management and IAM components also use a hierarchy and asset management (HAM) component to assist in access management. The HAM component provides a digital twin of the physical hierarchy of devices and/or assets being monitored that can be referenced by other components for determining physical location and deriving access permissions (e.g. User 1 has access to all devices in Floor 3 of Building 2). The HAM can also provide context of data for other platform components by identifying physical location and device/asset types. Device 1 is sending temperature data about the refrigerator on Floor 3 of Building 2, and Device 2 is sending temperature data about the oil transfer pump on the Manufacturing floor of Building 2.
Understanding how the device management component will integrate with the IAM and HAM components will drive design requirements of the device management component. Having an architectural strategy for these integration points early in the design process is recommended.
Simple container orchestration capabilities shall be added to the device management solution. Some IoT deployments cannot support complex orchestration tools such as Kubernetes due to the restrictions of the target environment. This may be beneficial in scenarios in which connectivity is unreliable and/or for resource constrained devices.
Deploying an IoT device management component at scale requires a security first approach using TPM based provisioning and deployment services, individual device access policies, multitenancy, and resource-based authorization. It needs tools to assist in passing data to other systems such as rules engines and other secure data routing capabilities. Additionally, it needs to provide capabilities to help manage the device and application lifecycle, such as infrastructure for multi-geo deployments, device updates and configuration management.
Integration points are also important to a device management strategy. Having a clear plan for integrating your device manager component with an identity and access management service and understanding how it will tie into the business from a hierarchy and asset management perspective are both key points in creating a secure system that will provide value to the business. If your device management component doesn't provide a public key infrastructure (PKI) solution that meets security needs, this is also a key integration point.
When choosing a device management component, it is important to know what features you want in your solution and how the device management component(s) you are evaluating enables those features. Understanding this will help to ease the implementation journey and will contribute to the success of your IoT project.