Enhancing a Distributed IoT Sensor Platform With Edge TPU-Based Object Detection

The Mobile Field Kit (MFK) software, which has been deployed at events like the Boston Marathon, the Super Bowl and the presidential inauguration, is an advanced operations control and threat monitoring system used to help ensure public safety at events and entertainment venues. The MFK is continually being improved with new features that are critical to enabling security personnel on the ground to monitor a situation and detect threats in real time. These features, such as the integration of disparate sensors and mobile devices, enable teams on the ground to respond rapidly and deploy the necessary resources to mitigate a potential threat.

The MFK software includes features to help assist teams on the ground ingest and visualize information, including maps that can be annotated with points and images, user/team chat, and dashboards for visualizing IoT sensor readings. The Application Services team has recently been investigating a new feature for the MFK that would allow them to classify objects in an image using artificial intelligence (AI) techniques.

This would help teams on the ground to quickly identify any person or object and determine whether that person or object is a potential threat. Using security cameras located across a given geographical area connected to the MFK software, videos and images are captured, labeled based on detected objects, and then displayed for personnel on the ground to interpret.

In this article, two different architectures for productionizing deep learning models to enable object-detection in MFK are compared:

CPU-based traditional compute architecture, where machine learning (ML) model inferencing is performed on the same device on which the MFK software is being run.
TPU-based on-device edge AI architecture, where model inferencing is performed at the edge, by a device connected to the remote IP camera.

Comparing two ML model inference approaches

To compare these two approaches, MFK was integrated with a Bosch FLEXIDOME IP camera, along with a deep learning model for object detection. Through the first approach, a traditional centralized compute architecture was used, involving a pure Java approach, where CPU-based model inference was performed on the same machine that was running the MFK software.

For our second approach, an on-device edge AI architecture was utilized, leveraging on-device AI using the Google Coral Dev Board with its built-in Edge TensorFlow Processing Unit (TPU) to perform model inferencing at the edge.

Traditional centralized compute architecture

Using the centralized compute architecture, we chose to utilize a CPU-based model inference design. A pure Java solution was designed and implemented, where the deep learning model inference was performed on the same machine which the MFK software was running.

**Figure 1:** *CPU-based model inference network architecture*

As shown in Figure 1, this solution connected the ruggedized laptop running the MFK software to the Bosch FLEXIDOME IP camera via a router/switch.

**Figure 2:** *CPU-based model inference sequence diagram illustrating the communication between MFK and the Bosch FLEXIDOME IP camera, model inference, and visualization*

To acquire a frame to be labeled and then rendered, the following steps are performed:

MFK performs a GET request to the endpoint provided by the Bosch FLEXIDOME IP camera.
The camera then captures the current video frame as a snapshot.
The camera returns the current video frame from the camera to MFK.
MFK uses the OpenCV library to first pre-process the received image and then feed the input forward through the deep neural network (DNN).

For this architecture, a YOLOv3 Tiny model pre-trained with the COCO dataset was provided to the OpenCV library. The YOLO deep learning model uses a single convolutional neural network (CNN) to simultaneously predict multiple bounding boxes for an input image, along with class probabilities for those boxes. Because YOLO can "see" the entire image during training, it is able to encode contextual information about specific classes while also encoding information about the appearance of each class (Redmon, Divvala, Girshick, & Farhadi, 2016).

The output from the model includes a collection of candidate objects, where each object has a bounding box and label identifier along with a confidence value. For each object where the confidence is above a specific threshold (0.6), MFK uses the OpenCV library to draw the bounding box with its associated label, such as "person" or "car."

5. MFK then renders the labeled image, described above, in the UI frame.

**Figure 3:** *CPU-based model inference application architecture*

For the CPU-based application architecture, the MFK software (a Java application) was enhanced with both model inference code and with connection logic for making a request to the Bosch FLEXIDOME camera, as shown in Figure 3. The model inference code was written in Java and used the OpenCV library with a YOLOv3 Tiny model.

On-device edge AI architecture

Edge AI combines the flexibilty of edge computing with the predictive capabilities of ML. Using the on-device edge AI architecture, we chose to utilize an on-device TPU-based design to perform deep learning model inference on the edge with the TPU on the Coral Dev Board.

A TPU is a powerful processing unit designed to perform ML with neural networks, which performs computation in parallel instead of serially. (This is an additional feature which can increase the performance of the system, further boosting the advantage of the on-device edge AI architecture compared to the traditional compute architecture.)

On-device TPU-based network architecture — **Figure 4:** *On-device TPU-based model inference network architecture*

The network architecture diagram in Figure 4 illustrates the network connections between the ruggedized laptop running the MFK software, the Coral Dev Board and the Bosch FLEXIDOME IP camera.

On-device TPU-based sequence diagram — **Figure 5:** *On-device TPU-based model inference sequence diagram displaying the interoperability of MFK with the Coral Dev Board and the Bosch FLEXIDOME IP camera*

In order to quickly acquire and display a sequence of labeled frames in near-real time, the MFK node delegates tasks to the Coral Dev Board and the Bosch FLEXIDOME IP camera, by the following steps:

1. MFK node first makes a REST GET request to the API endpoint running on the Coral Dev Board. This REST API is implemented in Python using the Flask microservice framework.

2. Immediately after receiving this request, this service initiates an HTTP request to obtain a current snapshot image from the IP camera.

3. The IP camera captures a snapshot.

4. The IP camera then returns the image in its HTTP response.

5. Once the running Flask service on the Coral Dev Board has successfully received the snapshot image, the service performs on-device model inference by passing that image as input to the Edge TPU Python API.

Using this API for inference with our pre-trained MobileNetV2 SSD TensorFlow Lite model forces the calculations to be performed by the Edge TPU of the Coral Dev Board. MobileNet was designed to provide the ability to run DNNs on mobile devices. By using depth-wise separable convolutions, model size is reduced compared to typical CNNs (Howard et al., 2017).

On the dev board's hard drive, the model uses 6.7 Megabytes (MB). Single Shot MultiBox Detector (SSD) is an object detection method that uses a single DNN (Liu et al., 2016). (In our case, MobileNetV2 was used as the base DNN.) SSD is as accurate as slower techniques that perform explicit region proposals and pooling. SSD works by predicting category scores and box offsets for a fixed set of default bounding boxes using small convolutional filters applied to feature maps (Liu et al., 2016).

After model inference, the API returns a collection of detected objects where each object contains a bounding box describing the detected object's location within the image along with a label identifier corresponding to the appropriate COCO label (such as "person" or "car"). Our REST API then labels the image by using the Pillow library to draw a rectangle with the appropriate label text for each object detected by the SSD model.

6. The service then returns that image in the HTTP response, along with metadata such as detection duration and the total request duration.

7. The MFK then renders the frame in its UI.

On-device TPU-based application architecture — **Figure 6:** *On-device TPU-based model inference application architecture*

The key software component of the on-device edge AI application architecture, portrayed in Figure 6, is the Object Detection API, which is responsible for model serving. The Object Detection API is an HTTP service implemented in Python using the Flask microframework and uses the DetectionEngine from the Edge TPU Python API (Coral, n.d.) to perform fast model inference on the TPU. The DetectionEngine is provided with a TensorFlow Lite file containing a MobileNetV2 SSD model pre-trained with the COCO dataset (Liu et al., 2016).

After obtaining the bounding box and label for each object, the Pillow library is used to draw the rectangle and textual label on the image before returning the HTTP response to the client. For the on-device TPU-based architecture, a simple code update to MFK was needed to make an HTTP request to the Object Detection API, and then functionality was added to render the video frames in the UI.

Results

Of the two candidate architectures assessed (CPU-based and on-device TPU-based) for enhancing MFK with IP camera and object detection capabilities, the on-device TPU-based architecture was chosen as the most ideal approach. This architecture provides clear cut advantages over the CPU-based, including superior scalability, better flexibility and better operational efficiency.

Scalability

The on-device TPU-based architecture provides greater scalability as more IP cameras are added to the MFK network.

On-CPU model inference architecture — **Figure 7:** *With the on-CPU model inference architecture, the single MFK Node bears the computational load for all connected cameras*

As shown in Figure 7, as more IP cameras are added to the MFK network, the MFK node is forced to bear the load of all the computations required for object detection model inference for each of the newly added cameras.

As the MFK node is typically run on a ruggedized laptop and not on a high-powered datacenter server, the computational load will quickly surpass the capabilities of the laptop as additional IP cameras are connected. Furthermore, other processing on the MFK node, such as displaying maps to the user, will be negatively impacted.

On-device TPU-based model inference architecture — **Figure 8:** *With the on-device TPU-based model inference architecture, the computation is balanced across the TPUs*

However, in the on-device TPU-based architecture (depicted in Figure 8), as cameras are added, each Coral Dev Board handles the object detection model inference, so the level of AI-related computation required by the MFK node remains constant. As a result, computations performed by the MFK node are focused on MFK core features, such as displaying map information and sharing information with other nodes in the MFK network.

Flexibility

The on-device TPU-based architecture also provides greater flexibility than the traditional compute architecture. Due to object detection model inference being performed on a separate device than the MFK node, the two parts of the architecture are decoupled and can be updated independently.

For example, if a more efficient or accurate object detection deep learning model has been recently trained, deploying that model to the on-device system requires simply copying the new model file to the Coral Dev Board. However, in the traditional compute architecture, deploying a newly trained model would require re-configuring MFK.

On-device TPU-based model inference with multiple ML models — **Figure 9:** *With the on-device TPU-based model inference architecture, different models can be run on different cameras across the network*

The on-device TPU-based architecture also allows for a greater level of customizability when it comes to performing model inferences by allowing a heterogeneous mix of different models to be run on different cameras in different locations throughout the network.

As shown in Figure 9, in a scenario where a specialized model (displayed as M2) has been trained for a specific camera location, the Coral Dev Board connected to that camera could be updated with that model while all the other boards continue to leverage the original model (displayed as M1). In the traditional architecture, because the model inference is centralized, a single model is used for all connected cameras.

Performance

Finally, the on-device edge AI architecture grants a greater amount of performance power to the task of model inferencing than does the traditional compute architecture, starting with the TPU itself. The TPU provides a significantly higher degree of performance than the traditional CPU. Compared to other contemporary processing units like CPUs and GPUs, TPUs deliver 15-30 times higher performance than do the other processing units, and 30-80 times better performance per watt. (Patterson, Sato, & Young, 2017)

Therefore, as shown in Figure 10, using a TPU is a much better option than using a CPU in this situation, as it is much more effective at performing ML model inferencing. Second, the TPU is about half the size of the traditional CPU processing unit. Given that the cost of a chip is a product of the area, this makes the TPU a much cheaper option compared to the CPU.

Table illustrating the operations per cycle produced by various processing unit types — **Figure 10:** Compared to other contemporary processing units, TPUs provide up to 100 times more operations per cycle. Adapted from "AI & Machine Learning: An in-depth look at Google's first Tensor Processing Unit (TPU)", by David Patterson, Kaz Sato and Cliff Young. Retrieved February 26, 2020, from *https://cloud.google.com/blog/products/gcp/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu© 2017*

Conclusion

To enhance the MFK distributed IoT sensor platform with IP camera and near-real-time object detection capabilities, two approaches to enabling ML model inferencing were evaluated: a traditional compute architecture with CPU-based centralized compute and an on-device TPU-based edge AI architecture.

In the future, other MFK situational awareness features can be enhanced with object detection predictions. For example, specific detected object types from the video stream (such as people or vehicles in an unauthorized area) can trigger a chat message to be sent to team members within the vicinity, alerting those members to a potential threat. Because the object detection feature includes bounding boxes, images of objects could be captured and shared with the team using the MFK images feature. Each of these events can also be shared as points on the MFK map to provide additional situational awareness to team members in the command center.

Comparing the two approaches using additional processing units (e.g., graphical processing units (GPUs), field programmable gate arrays (FPGAs), visual processing units (VPUs)) may yield interesting results as well. TPUs are more beneficial when compared to CPUs because while CPUs execute commands serially (i.e., one at a time, executing one only after the command before it has completed), TPUs execute commands in parallel (i.e., at the same time), allowing more commands to be run in a given period of time.

Similarly, GPUs, FPGAs and VPUs also process commands in parallel, which leads us to believe that from an efficiency standpoint, using them would also materialize benefits comparable to that of the TPU.

Finally, MFK could be run on a machine with a parallel-computation processing unit. This would accelerate its ability to perform ML model inference, making it more comparable to an on-device edge AI architecture in terms of operational efficiency.

As a result of our investigation, between the CPU-based architecture and the TPU-based edge AI architecture, we concluded that the TPU-based edge AI architecture is the most effective in enabling real-time object detection when integrated with the MFK distributed IoT platform. The TPU-based edge AI architecture provides greater scalability and flexibility when compared to the CPU-based architecture, and also guarantees a performance advantage due to the superior number of operations per second provided by the edge TPU.

References

"Edge TPU Python API Overview". (n.d.) Coral. Retrieved January 19, 2020. https://coral.ai/docs/edgetpu/api-intro/

Howard, Andrew G.; Zhu, Menglong; Chen, Bo; Kalenichenko, Dmitry; Wang, Weijun; Weyand, Tobias; Andreetto, Marco; Adam, Hartwig. "MobileNets: Efficient convolutional neural networks for mobile vision applications", Cornell University. 2017. arXiv:1704.04861. https://arxiv.org/abs/1704.04861

Liu, Wei; Anguelov, Dragomir; Erhan, Dumitru; Szegedy, Christian; Reed, Scott; Fu, Cheng-Yang; Berg, Alexander C. "SSD: Single Shot MultiBox Detector", Cornell University, 2016. arXiv:1512.02325. https://arxiv.org/abs/1512.02325

Patterson, David; Sato, Kaz; Young, Cliff. "An in-depth look at Google's first Tensor Processing Unit (TPU)". Google Cloud: AI & Machine Learning. Retrieved February 25, 2020. https://cloud.google.com/blog/products/gcp/an-in-depth-look-at-googles-first-tensor-processing-unit-tpu

Redmon, Joseph; Divvala, Santosh; Girshick, Ross; Farhadi, Ali. "You Only Look Once: Unified, Real-time Object Detection". University of Washington, Allen Institute for AI, Facebook AI Research. 2016. https://arxiv.org/pdf/1506.02640v5.pdf