Partner POV | AMD Instinct MI350 Series and Beyond: Accelerating the Future of AI and HPC

Written and provided by: Vamsi Boppana, Senior Vice President, Artificial Intelligence Group, AMD

At a Glance:

AMD launched the AMD Instinct™ MI350 Series, delivering up to 4x generation-on-generation AI compute improvement and up to 35x leap in inferencing performance
AMD launched ROCm 7.0 with over 4x inference and 3x training performance improvement over ROCm 6.0
AMD also showcased its new developer cloud to empower AI developers with seamless access to AMD Instinct GPUs and ROCm for their AI innovation
The company also previewed its next-gen "Helios" AI rack infrastructure, integrating MI400 GPUs, EPYC "Venice" CPUs, and Pensando "Vulcano" NICs for unprecedented AI compute density and scalability

The world of AI isn't slowing down—and neither is AMD. At AMD, they're not just keeping pace, they're setting the bar. AMD customers are demanding real, deployable solutions that scale, and that's exactly what they're delivering with the AMD Instinct MI350 Series. With cutting-edge performance, massive memory bandwidth, and flexible, open infrastructure, AMD is empowering innovators across industries to go faster, scale smarter and build what's next.

Powering tomorrow's AI workloads

Built on the AMD CDNA™ 4 architecture, the AMD Instinct MI350X and MI355X GPUs are purpose-built for the demands of modern AI infrastructure. The MI350 Series delivers a 4x, generation-on-generation AI compute increase as well as a 35x generational leap in inferencing, paving the way for transformative AI solutions across industries. These GPUs deliver leading memory capacity (288GB HBM3E from Micron and Samsung Electronics) and bandwidth (up to 8TB/s), ensuring exceptional throughput for inference and training alike.

Figure 1 MI350 Series offers faster AI inferencing, supports larger models, and accelerates AI innovation¹

With flexible air-cooled and direct liquid-cooled configurations, the Instinct MI350 Series is optimized for seamless deployment. It supports up to 64 GPUs in an air-cooled rack and up to 128 GPUs in a direct liquid-cooled rack, delivering up to 2.6 exaFLOPS of FP4/FP6 performance. The result is faster time-to-AI and reduced costs in an industry standards-based infrastructure.

The numbers that matter: Instinct MI350 series specifications

SPECIFICATIONS (PEAK THEORETICAL)	AMD INSTINCT™ MI350X GPU	AMD INSTINCT™ MI350X PLATFORM	AMD INSTINCT™ MI355X GPU	AMD INSTINCT™ MI355X PLATFORM
GPUs	Instinct MI350X OAM	8 x Instinct MI350X OAM	Instinct MI355X OAM	8 x Instinct MI355X OAM
GPU Architecture	CDNA 4	CDNA 4	CDNA 4	CDNA 4
Dedicated Memory Size	288 GB HBM3E	2.3 TB HBM3E	288 GB HBM3E	2.3 TB HBM3E
Memory Bandwidth	8 TB/s	8 TB/s per OAM	8 TB/s	8 TB/s per OAM
FP64 Performance	72 TFLOPs	577 TFLOPs	78.6 TFLOPS	628.8 TFLOPs
FP16 Performance*	4.6 PFLOPS	36.8 PFLOPS	5 PFLOPS	40.2 PFLOPS
FP8 Performance*	9.2 PFLOPs	73.82 PFLOPs	10.1 PFLOPs	80.5 PFLOPs
FP6 Performance*	18.45 PFLOPS	147.6 PFLOPS	20.1 PFLOPS	161 PFLOPS
FP4 Performance*	18.45 PFLOPS	147.6 PFLOPS	20.1 PFLOPS	161 PFLOPS

*with structured sparsity

Figure 2 AMD Instinct MI355X GPUs deliver leading memory and bandwidth along with impressive performance. *All Matrix Core numbers except FP64 with sparsity

Ecosystem momentum ready to deploy

MI350s will be broadly available through leading cloud service providers—including major hyperscalers and next-generation neo clouds—giving customers flexible options to scale AI in the cloud. At the same time, top OEMs like Dell, HPE, and Supermicro are integrating MI350 Series solutions into their platforms, delivering powerful on-prem and hybrid AI infrastructure.

ROCm™ 7: The open software engine for AI acceleration

AI is evolving at record speed—and AMD's vision with ROCm is to unlock that innovation for everyone through an open, scalable, and developer-focused platform. Over the past year, ROCm has rapidly matured, delivering leadership inference performance, expanding training capabilities, and deepening its integration with the open-source community. ROCm now powers some of the largest AI platforms in the world, supporting major models like LLaMA and DeepSeek from day one, and delivering over 3.5x inference gains in the upcoming ROCm 7 release. With frequent updates, advanced data types like FP4, and new algorithms like FAv3, ROCm is enabling next-gen AI performance while driving open-source frameworks like vLLM and SGLang forward faster than closed alternatives.

As AI adoption shifts from research to real-world enterprise deployment, ROCm is evolving with it too. ROCm Enterprise AI brings a full-stack MLOps platform to the forefront, enabling secure, scalable AI with turnkey tools for fine-tuning, compliance, deployment and integration. With over 1.8 million Hugging Face models running out of the box, industry benchmarks now in play, ROCm is not just catching up—it's leading the open AI revolution.

Developers are at the heart of everything we do. We're deeply committed to delivering an exceptional experience—making it easier than ever to build on ROCm with better out-of-box tools, real-time CI dashboards, rich collateral, and an active developer community. From hackathons to high-performance kernel contests, momentum is building fast. And on June 12, 2025, we were thrilled to announce and launch the AMD Developer Cloud, giving developers instant, barrier-free access to ROCm and AMD GPUs to accelerate innovation.

Whether optimizing large language models (LLMs) or scaling out inferencing platforms, ROCm 7 gives developers the tools they need to move from experimentation to production fast.

What's next: Previewing the AMD Instinct MI400 series and "Helios" AI rack

The AMD commitment to innovation doesn't stop with Instinct MI350 Series. The company previewed its next-generation AMD Instinct MI400 Series—which will offer a new level of performance in 2026.

The AMD Instinct MI400 Series will represent a dramatic generational leap in performance, enabling full rack level solutions for large scale training and distributed inference. Key performance innovations include:

Up to 432GB of HBM4 memory
19.6TB/s memory bandwidth
40 PF at FP4 and 20 PF at FP8 performance
300GB/s Scale-Out bandwidth

The "Helios" AI rack infrastructure – coming in 2026 – is engineered from the ground up to unify AMD's leadership silicon—AMD EPYC "Venice" CPUs, Instinct MI400 series GPUs and Pensando "Vulcano" AI NICs—and ROCm software into a fully integrated solution. Helios is designed as a unified system supporting a tightly coupled scale-up domain of up to 72 MI400 series GPUs with 260 terabytes per second of scale up bandwidth with support for Ultra Accelerator Link.

Laying the foundation for the future of AI

Built on the latest AMD CDNA 4 architecture and supported by the open and optimized ROCm software stack, Instinct MI350X and MI355X GPUs enable customers to deploy powerful, future-ready AI infrastructure today.

ROCm is unlocking AI innovation with open-source speed, developer-first design, and breakthrough performance. From inference to training to full-stack deployment, it's built to scale with the future of AI.
And with ROCm 7 and AMD Developer Cloud, they're just getting started.

As AMD looks ahead to the next era of AI with the upcoming MI400 Series and AMD "Helios" rack architecture, the Instinct MI4000 Series sets a new standard—empowering organizations to move faster, scale smarter, and unlock the full potential of generative AI and high-performance computing.

References