Partner POV | AMD Instinct MI350 Series and Beyond: Accelerating the Future of AI and HPC
In this article
- Powering tomorrow's AI workloads
- The numbers that matter: Instinct MI350 series specifications
- Ecosystem momentum ready to deploy
- ROCm™ 7: The open software engine for AI acceleration
- What's next: Previewing the AMD Instinct MI400 series and "Helios" AI rack
- Laying the foundation for the future of AI
- Download
Written and provided by: Vamsi Boppana, Senior Vice President, Artificial Intelligence Group, AMD
At a Glance:
- AMD launched the AMD Instinct™ MI350 Series, delivering up to 4x generation-on-generation AI compute improvement and up to 35x leap in inferencing performance
- AMD launched ROCm 7.0 with over 4x inference and 3x training performance improvement over ROCm 6.0
- AMD also showcased its new developer cloud to empower AI developers with seamless access to AMD Instinct GPUs and ROCm for their AI innovation
- The company also previewed its next-gen "Helios" AI rack infrastructure, integrating MI400 GPUs, EPYC "Venice" CPUs, and Pensando "Vulcano" NICs for unprecedented AI compute density and scalability
The world of AI isn't slowing down—and neither is AMD. At AMD, they're not just keeping pace, they're setting the bar. AMD customers are demanding real, deployable solutions that scale, and that's exactly what they're delivering with the AMD Instinct MI350 Series. With cutting-edge performance, massive memory bandwidth, and flexible, open infrastructure, AMD is empowering innovators across industries to go faster, scale smarter and build what's next.
Powering tomorrow's AI workloads
Built on the AMD CDNA™ 4 architecture, the AMD Instinct MI350X and MI355X GPUs are purpose-built for the demands of modern AI infrastructure. The MI350 Series delivers a 4x, generation-on-generation AI compute increase as well as a 35x generational leap in inferencing, paving the way for transformative AI solutions across industries. These GPUs deliver leading memory capacity (288GB HBM3E from Micron and Samsung Electronics) and bandwidth (up to 8TB/s), ensuring exceptional throughput for inference and training alike.
With flexible air-cooled and direct liquid-cooled configurations, the Instinct MI350 Series is optimized for seamless deployment. It supports up to 64 GPUs in an air-cooled rack and up to 128 GPUs in a direct liquid-cooled rack, delivering up to 2.6 exaFLOPS of FP4/FP6 performance. The result is faster time-to-AI and reduced costs in an industry standards-based infrastructure.
The numbers that matter: Instinct MI350 series specifications
SPECIFICATIONS (PEAK THEORETICAL) | AMD INSTINCT™ MI350X GPU | AMD INSTINCT™ MI350X PLATFORM | AMD INSTINCT™ MI355X GPU | AMD INSTINCT™ MI355X PLATFORM |
GPUs | Instinct MI350X OAM | 8 x Instinct MI350X OAM | Instinct MI355X OAM | 8 x Instinct MI355X OAM |
GPU Architecture | CDNA 4 | CDNA 4 | CDNA 4 | CDNA 4 |
Dedicated Memory Size | 288 GB HBM3E | 2.3 TB HBM3E | 288 GB HBM3E | 2.3 TB HBM3E |
Memory Bandwidth | 8 TB/s | 8 TB/s per OAM | 8 TB/s | 8 TB/s per OAM |
FP64 Performance | 72 TFLOPs | 577 TFLOPs | 78.6 TFLOPS | 628.8 TFLOPs |
FP16 Performance* | 4.6 PFLOPS | 36.8 PFLOPS | 5 PFLOPS | 40.2 PFLOPS |
FP8 Performance* | 9.2 PFLOPs | 73.82 PFLOPs | 10.1 PFLOPs | 80.5 PFLOPs |
FP6 Performance* | 18.45 PFLOPS | 147.6 PFLOPS | 20.1 PFLOPS | 161 PFLOPS |
FP4 Performance* | 18.45 PFLOPS | 147.6 PFLOPS | 20.1 PFLOPS | 161 PFLOPS |
*with structured sparsity
Ecosystem momentum ready to deploy
MI350s will be broadly available through leading cloud service providers—including major hyperscalers and next-generation neo clouds—giving customers flexible options to scale AI in the cloud. At the same time, top OEMs like Dell, HPE, and Supermicro are integrating MI350 Series solutions into their platforms, delivering powerful on-prem and hybrid AI infrastructure.
ROCm™ 7: The open software engine for AI acceleration
AI is evolving at record speed—and AMD's vision with ROCm is to unlock that innovation for everyone through an open, scalable, and developer-focused platform. Over the past year, ROCm has rapidly matured, delivering leadership inference performance, expanding training capabilities, and deepening its integration with the open-source community. ROCm now powers some of the largest AI platforms in the world, supporting major models like LLaMA and DeepSeek from day one, and delivering over 3.5x inference gains in the upcoming ROCm 7 release. With frequent updates, advanced data types like FP4, and new algorithms like FAv3, ROCm is enabling next-gen AI performance while driving open-source frameworks like vLLM and SGLang forward faster than closed alternatives.
As AI adoption shifts from research to real-world enterprise deployment, ROCm is evolving with it too. ROCm Enterprise AI brings a full-stack MLOps platform to the forefront, enabling secure, scalable AI with turnkey tools for fine-tuning, compliance, deployment and integration. With over 1.8 million Hugging Face models running out of the box, industry benchmarks now in play, ROCm is not just catching up—it's leading the open AI revolution.
Developers are at the heart of everything we do. We're deeply committed to delivering an exceptional experience—making it easier than ever to build on ROCm with better out-of-box tools, real-time CI dashboards, rich collateral, and an active developer community. From hackathons to high-performance kernel contests, momentum is building fast. And on June 12, 2025, we were thrilled to announce and launch the AMD Developer Cloud, giving developers instant, barrier-free access to ROCm and AMD GPUs to accelerate innovation.
Whether optimizing large language models (LLMs) or scaling out inferencing platforms, ROCm 7 gives developers the tools they need to move from experimentation to production fast.
What's next: Previewing the AMD Instinct MI400 series and "Helios" AI rack
The AMD commitment to innovation doesn't stop with Instinct MI350 Series. The company previewed its next-generation AMD Instinct MI400 Series—which will offer a new level of performance in 2026.
The AMD Instinct MI400 Series will represent a dramatic generational leap in performance, enabling full rack level solutions for large scale training and distributed inference. Key performance innovations include:
- Up to 432GB of HBM4 memory
- 19.6TB/s memory bandwidth
- 40 PF at FP4 and 20 PF at FP8 performance
- 300GB/s Scale-Out bandwidth
The "Helios" AI rack infrastructure – coming in 2026 – is engineered from the ground up to unify AMD's leadership silicon—AMD EPYC "Venice" CPUs, Instinct MI400 series GPUs and Pensando "Vulcano" AI NICs—and ROCm software into a fully integrated solution. Helios is designed as a unified system supporting a tightly coupled scale-up domain of up to 72 MI400 series GPUs with 260 terabytes per second of scale up bandwidth with support for Ultra Accelerator Link.
Laying the foundation for the future of AI
Built on the latest AMD CDNA 4 architecture and supported by the open and optimized ROCm software stack, Instinct MI350X and MI355X GPUs enable customers to deploy powerful, future-ready AI infrastructure today.
ROCm is unlocking AI innovation with open-source speed, developer-first design, and breakthrough performance. From inference to training to full-stack deployment, it's built to scale with the future of AI.
And with ROCm 7 and AMD Developer Cloud, they're just getting started.
As AMD looks ahead to the next era of AI with the upcoming MI400 Series and AMD "Helios" rack architecture, the Instinct MI4000 Series sets a new standard—empowering organizations to move faster, scale smarter, and unlock the full potential of generative AI and high-performance computing.
References