Building the Future of Enterprise AI: IBM watsonx.ai on Red Hat OpenShift with Intel Gaudi 3
In this blog
A modern AI platform for the enterprise
At the heart of this solution is a layered, cloud-native architecture that aligns rapid AI innovation with the operational rigor required in enterprise environments. Rather than treating AI as a standalone tool, the platform integrates compute, orchestration, MLOps, and foundation model services into a unified ecosystem that supports the full AI lifecycle from experimentation to large-scale production.
The foundation of the platform is built on Intel Gaudi 3-based systems, delivering the high-performance acceleration required for training and serving modern AI and foundation models. These systems provide the throughput, memory bandwidth, and scalability needed to support demanding workloads while maintaining cost-efficient operations.
IBM Watsonx.ai: Enterprise-Grade AI Studio
IBM Watsonx.ai provides the tools to build, fine-tune, govern, and deploy foundation-model-based AI solutions. It enables organizations to move beyond isolated experiments and create AI applications that are trusted, explainable and integrated into real business workflows.
With Watsonx.ai, teams can:
- Develop and orchestrate generative AI solutions
- Leverage foundation models for reasoning and content generation
- Manage prompts, pipelines and model lifecycles
- Apply governance, monitoring and risk controls
Red Hat OpenShift: The Hybrid Cloud Foundation
Red Hat OpenShift delivers the enterprise Kubernetes platform required to run AI workloads consistently across on-premises, public cloud, and edge environments. It provides:
- Secure, containerized infrastructure
- Built-in scalability and resilience
- DevSecOps and CI/CD integration
- Hybrid and multi-cloud portability
This ensures AI workloads are not locked into a single environment and can evolve alongside business needs.
Red Hat OpenShift AI: End-to-End MLOps
OpenShift AI extends OpenShift with a comprehensive data science and MLOps layer, supporting:
- Data preparation and notebooks
- Distributed training and pipelines
- Model serving and monitoring
- Lifecycle automation and governance
It operates Watsonx.ai workloads, enabling teams to manage the full AI lifecycle from development to deployment and continuous optimization.
Intel® Gaudi® 3: High-Performance AI Acceleration
Intel Gaudi 3 accelerators are purpose-built for large-scale AI training and inference. They deliver:
- High throughput for foundation models
- Strong performance-per-dollar economics
- High-bandwidth memory and networking
- An open, developer-friendly software ecosystem
Running this stack on Gaudi 3 systems allows enterprises to scale AI workloads efficiently while maintaining flexibility and control.
Supermicro AI Training Super Server
The infrastructure used to support the platform is Supermicro Super Server SYS-822GA-NGR3, which is an 8U rackmount AI training platform designed for large-scale machine learning, deep learning, LLMs (Large Language Models), and HPC workloads.
Key features
- High-Density Accelerators: Supports up to 8 Intel Gaudi® 3 AI accelerators (OAM form factor) for massive parallel computing and model training.
- Dual CPU Support: Dual Intel® Xeon® 6900 series processors with P-cores via LGA-7529 sockets, up to 128 cores/256 threads per CPU.
- Memory Capacity: 24 DIMM slots supporting up to 6 TB of DDR5 ECC memory (RDIMM/LRDIMM at 6400 MT/s or MRDIMM at 8800 MT/s).
- Networking: 6 × OSFP 800 GbE ports onboard — ideal for high-bandwidth interconnects in AI clusters.
Storage:
8 hot-swap 2.5″ NVMe Gen5 bays upfront
2 additional M.2 PCIe 5.0 x2 NVMe slots for boot or cache.
PCIe Expansion:
2 × PCIe 5.0 x16 FHFL slots
2 × PCIe 5.0 x8 FHFL slots
1 × PCIe 5.0 x4 AIOM (OCP 3.0) slot.
Power, Cooling & Chassis
Redundant Power Supplies: 8 × 3000 W Titanium Level (96% efficiency) for stable, high-power delivery.
Chassis: 8U rackmount with heavy-duty cooling fans optimized for dense GPU/accelerator loads.
Management & Security: Super Cloud Composer, Supermicro Server Manager (SSM), hardware TPM 2.0, secure boot, firmware signing, and remote management options.
Why this stack matters
Performance with Choice
Intel Gaudi 3 delivers enterprise-class acceleration without locking organizations into proprietary ecosystems.
Open by Design
Red Hat OpenShift ensures portability, interoperability, and long-term flexibility across environments.
Trusted Enterprise AI
IBM Watsonx.ai brings governance, transparency, and integration, key requirements for regulated and large-scale enterprises.
Operational Excellence
OpenShift AI provides the tooling to manage AI as a living system, not a one-time project.
Intel Gaudi 3: Accelerating RAG & Visual Summarization Workflows
Intel Gaudi 3 is the latest third-generation AI accelerator from Intel, designed for high-performance generative AI and large-scale RAG pipelines including multimodal tasks like video summarization and vision-augmented retrieval. It builds on Gaudi 2's architecture with increased compute, memory bandwidth, and efficient scalability for LLMs and multimodal AI workloads.
While Intel's primary focus was on text-centric RAG, video and visual summarization emerges naturally as a next step.
Integrating Visual Summarization into RAG
Modern video summarization techniques use vision-language models (VLMs) to process video frames, audio transcripts, and visual embeddings to create concise, narrative summaries of long videos. It extracts key events and semantic context, dramatically simplifying browsing and search.
Visual Summarization
Combining vision models with RAG enables "visual RAG summarization" workflows where:
- Frames or segments are embedded into vector stores,
- Retrieval retrieves relevant segments based on text or visual queries,
- LLMs articulate summaries, answers, or highlights in natural language.
This methodology is increasingly highlighted in industry demonstrations (e.g., retail surveillance summaries and interactive Q&A interfaces over video).
WWT and Intel created an architecture based on Visual RAG using Langchain based orchestration and Llama Large Language Model where Visual Question Answering (VQA) is the task of answering open-ended questions based on an image. The input to models supporting this task is typically a combination of an image and a question, and the output is an answer expressed in natural language.
Visual RAG Architecture
Video Summarization Architecture
Why Intel Gaudi 3 for Video RAG?
Compared to traditional GPU clusters:
- Optimized specifically for generative AI workloads
- High memory capacity for multimodal models
- Ethernet-based scale-out
- Competitive cost-per-token economics
- Open software ecosystem
For enterprises running OpenShift, Gaudi integrates cleanly into Kubernetes-native workflows without proprietary lock-in.
Video and Visual Summarization represent the next frontier of enterprise AI, converting unstructured video into searchable, explainable intelligence.
By combining:
- Intel Gaudi 3 accelerators
- IBM WatsonX.ai platform
- Red Hat OpenShift container platform
- OpenShift AI model lifecycle management
- Vector databases for retrieval
Enterprises can deploy an open, scalable, secure, and high-performance video intelligence platform capable of handling petabyte-scale archives.
This architecture turns video from passive storage into an active decision-support system.
Intel Gaudi 3 is purpose-built to power production-scale generative AI across different industry verticals that includes:
- Consumer Goods and Retail
- Healthcare and Medicine
- Manufacturing
- Media and Entertainment
- Financial Services
The path forward
As AI becomes embedded into the core of enterprise operations, success will depend less on isolated models and more on robust platforms. Platforms that can scale, adapt, and govern AI responsibly.
By deploying IBM watsonx.ai with Red Hat OpenShift and OpenShift AI on Intel Gaudi 3 systems, organizations establish a future-proof foundation for enterprise AI, one that turns innovation into impact.
The future of AI is not just about intelligence. It's about infrastructure, integration, and execution at scale.