Partner POV | AI Inferencing With AMD EPYC™ Processors
In this partner contribution
This article was written and contributed by our partner, AMD.
It's important for Information Technology (IT) organizations to recognize the ubiquitous potential of AI and the innovative possibilities it offers. AI can be harnessed across various sectors, including commercial and enterprise, cloud data centers, transportation, smart retail, healthcare and life sciences, smart homes, intelligent factories, smart cities, and communication providers. Embracing AI can help businesses stay competitive and explore fresh avenues for growth.
The two most important parts of the deep learning lifecycle are AI training and AI inferencing:
AI TRAINING. This is the most data- and processing-intensive part of the AI lifecycle. Vast amounts of data are fed through models in order to train them to recognize patterns in data. The significant amount of processing required to train a model demands substantial computing power. Servers equipped with AMD InstinctTM accelerators are specifically designed to accelerate this training process.In this AI lifecycle phase, the aim is to find the right parameter weights for the model to work accurately.
AI INFERENCING. Once the model is trained, it needs a comparatively small amount of processing power to process incoming data in real-time. While models are trained at the beginning of the process and need to have a large amount of concentrated power, inferencing happens close to the data: in a retail store, in a moving automobile, on factory floors, or in radiology departments. The locus of where computing power meets data is the point where efficiency is everything.
Embedded inferencing devices are crucial components in various applications, including surgical robotics, security endpoints, and smart-grid systems. These applications frequently rely on specialized hardware, which integrates versatile components like AMD VersalTM and ZynqTM adaptive systems-on-chip (SoCs) to enhance their performance and adaptability to specific tasks. This integration of specialized hardware enables efficient and tailored AI processing in these diverse fields.
A highly economical solution is to use off-the-shelf servers equipped with AMD EPYC(TM) processors. While training is a highly compute-intensive operation, inferencing uses the parameters from training to actually execute the model, which imposes much lower compute demands than training. Using 4th Generation EPYC processors with up to 128 cores, off-the-shelf servers can accelerate a range of data center and edge applications including customer support, retail, automotive, financial services, medical, and manufacturing. These models typically fall into the categories of computer vision, natural language processing, and recommendation systems.
The AMD approach, using a unified inferencing model, facilitates training on high-performance GPU hardware, and then enables you to seamlessly move the model into production with inferencing on servers with AMD EPYC processors. AMD offers software support throughout your AI lifecycle, inferencing in the core, cloud, and edge, and access to optimized models and software stacks that complement AMD hardware.
Three different types of models have had a dramatic impact on business across multiple industries. Computer vision technology can recognize and classify objects, as well as detect anomalies, by analyzing images and video feeds, making it invaluable in applications such as surveillance, quality control in manufacturing, and even autonomous vehicles.Natural language processing can help recognize speech and make meaning out of written words to assist customers of all types. Recommendation systems can help predict everything from customer needs to anomalies in telemetry data. By focusing on accelerating these three model classes, you can reap the benefits regardless of your industry.
- Automotive. Computer vision models help propel self-driving cars and also help recognize signage, pedestrians, and other vehicles to be avoided. Natural-language processing models can help recognize spoken commands to in-car telematics.
- Manufacturing. Use computer vision models to monitor the quality of manufactured products from food items to printed- circuit boards. Feed telemetry data into recommendation engines to suggest proactive maintenance: Are disk drives about to fail? Is the engine using too much oil?
- Retail. Automate checkout lines by recognizing products, or even create autonomous shopping experiences where the models link customers with the items they choose and put into their bags. Use product recommendation engines to offer alternatives, whether online or in the store.
- Financial Services. AI-powered anomaly detection helps stop credit card fraud, while computer vision models watch for suspicious documents including customer checks.
- Medical. Detect anomalies including fractures and tumors with computer vision models. Use the same models in research to assess in vitro cell growth and proliferation.
- Service Automation. Where IT meets customers, natural-language processing can help take action based on spoken requests, and recommendation engines can help point customers to satisfactory solutions and product alternatives.
Enabling AI models to function effectively across different industries requires developers to transition seamlessly from the training phase to the inferencing phase, ensuring that both stages deliver high-performance results. AMD supports three AI software stacks, one for each of our three architectures, including AMD EPYC processors, AMD Instinct GPUs, and Xilinx Versal and Zynq adaptive SoCs (see grey components in Figure 1). Each of these three stacks is optimized to deliver excellent performance on the underlying hardware.
The AMD Unified Inference Frontend (UIF), which is represented
by the blue components in Figure 1, provides consistent access to these stacks through the three most popular frameworks for AI. Underneath each framework are tools and libraries optimized for the supporting hardware platform. The frameworks we support include:
- TensorFlow: This Google-owned platform focuses on the training and inference of deep neural networks.
- PyTorch: Originally developed by Meta, PyTorch was recently welcomed into the Linux® Foundation.
- ONNX Runtime: The Open Neural Network Exchange (ONNX), a Microsoft-sponsored platform.
In order to provide a single entry point for model development and deployment, UIF provides a set of optimized inference models in a single model zoo (collection or repository of optimized inference models, which serves as a central hub for model development and deployment). These optimized models plug seamlessly into each of the three hardware stacks and, because they are transportable, your model can run on any of our stacks without modification. Now you gain the power to use the best hardware platform for AI inferencing and the flexibility to change platforms as different needs arise. As we see customers mixing CPUs, GPUs, and adaptive System on a Chip, SoC, in their inferencing operations, the Unified Inference Frontend can enhance the AI lifecycle. These SoCs are commonly used in embedded systems and devices to perform specific functions, and in this case, they are utilized for AI inferencing operations.
AI inferencing generally takes place close to the data, and servers with AMD EPYC processors are often there too, ready to take on the task. In retail environments, video streams can be processed for monitoring inventory with on-site edge servers. In manufacturing, assembly-line images of products can be inspected for defects.
In medical imaging applications, hospitals already store and use images on their centralized servers. Financial and consumer services are generally centralized, so the data that needs to be analyzed is already collocated with servers often powered by AMD EPYC processors.
Whether at the core or at the edge, using servers with AMD EPYC processors for inferencing is an easy choice. The 4th Gen AMD EPYC processors have made huge gains in areas that accelerate inferencing operations, enabling a hardware boost. The AMD Unified Inference Frontend includes software optimizations that help make models running with our software run at peak performance.
AMD EPYC processors power the servers most often selected for AI inferencing, making them the AI platform of choice. EPYC-037 Now, with 4th Gen EPYC processors, AMD is making huge strides in further improving the aptitude of AMD EPYC processors for performing AI inferencing, including the following:
- ZEN 4' Core: The core architecture has been optimized to make the processor perform about 14% better in terms of processing instructions with each clock cycle. This is based on various workloads that include different types of tasks.
- More Cores: The new processors have 50% more cores compared to the previous generation. Having more cores allows the processor to handle multiple tasks at the same time without needing additional GPU support.
- AVX-512 Instruction Support: The 'Zen 4' core now supports AVX-512 extensions, which significantly boost AI inferencing performance for various data types. These extensions also support BF16 data types, providing better data throughput without the complexities of using INT8 data. This implementation is more efficient and can maintain higher frequencies compared to the previous AVX-2.
- Faster Memory with More Channels: The new processors use DDR5 memory and have 50% more memory channels (12 in total), resulting in 2.25 times more memory throughput compared to the previous generation.
- Faster I/O: The processors support PCIe Gen 5, which doubles the I/O (input/output) throughput compared to the previous generation. This means data can be moved in and out of the processor much faster, which is especially useful for AI inferencing tasks.
Performance measurements by AMD and third parties confirm the benefits of using AMD EPYC processors for inferencing. Enabling AVX-512 instructions makes image processing run dramatically faster while also increasing the amount of work done per watt—proof of the efficiency of their implementation. Comparing AMD server processors generation over generation, all three of the areas in which AMD strives to excel demonstrate dramatic speedups: computer vision, natural language processing, and recommendation engines.
The bottom line is that when you are considering using server CPUs for AI inferencing, AMD EPYC processors can deliver the performance you need with superb energy efficiency characteristics. And those are just the hardware measurements. Next, we add the benefits of AMD software optimizations.
Your AI models are trained infrequently. In comparison, inferencing happens every day, minute by minute, across your business. Inferencing needs to be close to your customers and it must deliver the high performance and economics that can help AI transform your business.
Servers powered by AMD EPYC processors provide an excellent platform for CPU-based AI inferencing. With performance propelled by the energy-efficient AMD EPYC processor and an optimized library designed to drive the processor to deliver up to 128 cores of processing power, and an optimized library whose primitives drive the processor to deliver its might to your solutions, it's hard to find a better solution. With the AMD Unified Inferencing Frontend, you are covered. Whether your AI inferencing delivers the performance you need on AMD EPYC processors, servers propelled with AMD Instinct GPU accelerators or Versal and Zynq adaptive SoCs, AMD offers the freedom to run your model across their hardware platforms to take advantage of the best that AMD has to offer.