GPUs for ETL: Transforming the data science landscape

Performance

Explore the ATC Insight

Summary

Many organizations are interested in expanding and leveraging their AI and Deep Learning capabilities across various use cases. However, acquiring the necessary AI infrastructure remains a chicken-and-egg problem: one needs the infrastructure to experiment with AI project ideas, but one needs AI project ideas to justify the investment in infrastructure. It is much easier to justify the investment if the same infrastructure can be used to accelerate current processes as well.

The primary component of many data science projects is to clean and transform data, evaluate missing information, and create additional features that can be used for further discovery and model training. When it comes to pushing the model to production, the entire data processing pipeline needs to be set up. That data processing pipeline involves steps like Extraction, Transformation, and Loading (ETL) of data into a data lake. For a model to run efficiently and quickly, this data processing pipeline also needs to be fast. The whole process of making ETL robust is iterative and thus laborious. Moreover, with the exponential increase in the amount of data available today for analysis, it becomes imperative to understand how GPUs can help cut down the processing time of ETL steps. 

In order to show how processing time can be decreased with GPUs, we did some comparative benchmark testing between GPUs and CPUs and recorded the results for understanding.  If you would like to see what found out please keep reading in the ATC Insight section.

 

ATC Insight
Test Plan/Test Case
Technologies Under Test
Documentation

Comments