In this article


AI's potential to completely transform how organizations operate is exciting. Whether AI is used to enable new revenue streams, cultivate client loyalty through personalization, drive efficiencies through automation, or extract data-driven insights to guide decision-making, the possibilities are limited only by the imagination.

As data centers, workflows and applications evolve to support the technical demands of AI, navigating the complexities of this modernization work — including integrating AI solutions with legacy IT systems — can overwhelm even the most seasoned IT professionals. The explosion of generative AI (GenAI) has heightened the need for organizations to modernize their data centers and quickly embrace high-performance architecture (HPA). So, where do organizations go from here? 

Recommended reading: WWT Research – A Guide for CEOs to Accelerate AI Excitement and Adoption


Enter the AI Proving Ground. 

What is the AI Proving Ground?

The AI Proving Ground is a dynamic environment composed of industry-leading software, hardware and component solutions that can be integrated quickly. Combined with the knowledge and experience of our AI and infrastructure experts, and supported by our longstanding manufacturer partnerships, the AI Proving Ground allows organizations to experience the art of the possible for themselves while accelerating their time to market.

Developed within our Advanced Technology Center (ATC), this one-of-a-kind lab environment empowers IT teams to evaluate and test AI infrastructure, software and solutions for efficacy, scalability and flexibility — all under one roof. The AI Proving Ground provides visibility into data flows across the entire development pipeline, enabling more informed decision-making while safeguarding production environments. 

By addressing common hurdles to AI success — including hardware availability, high costs, skills gaps, power and cooling concerns, connectivity challenges, environment management, and complex architecture designs — the AI Proving Ground enables organizations to quickly, confidently and safely develop transformational AI solutions that deliver real business results in a fraction of the time and expense it would take to achieve on their own.

How clients are using the AI Proving Ground

Over the last several months of working with clients in the AI Proving Ground, we've witnessed growing demand for WWT's GPU-as-a-service offering. This on-demand service gives clients access to a collection of powerful GPU resources critical for powering both pre-built and customizable AI applications.

Below are a few of the other ways clients use the environment. 

Risk-free learning 

Fear of disrupting production environments often hinders experimentation. The AI Proving Ground allays that fear by providing a safe and secure sandbox for data scientists, data center engineers and software developers to learn, test, iterate and innovate. It's a playground for bold ideas, unencumbered by the inherent constraints and risks of live systems.

The AI Proving Ground supports IT professionals with hands-on access so they can evaluate AI hardware, software and reference architectures before deploying these solutions in their production environments:

  • Data scientists can use the AI Proving Ground to evaluate a range of AI models, including large language models (LLMs) for GenAI and application development; natural language processing (NLP) models for smart assistants, language translation and digital phone call response; and computer vision models for image classification, object recognition or object tracking.
  • Facilities engineers can use the AI Proving Ground to investigate the impact that prospective modernization efforts and AI integrations will have on existing data centers, and they can test drive the latest innovations in cooling technologies.
  • IT infrastructure engineers can use the AI Proving Ground to validate hardware and software integrations, assess the performance-per-watt of AI workflows, and ensure the overall supportability of a desired AI solution. They can also test their ability to provision, reclaim and understand chargeback components for each business unit.
  • Security engineers can use the AI Proving Ground to compare and contrast the ability of new AI solutions to protect the organization against attacks, breaches, and the exposure or loss of sensitive data.
  • Software engineers can use the AI Proving Ground to design and deploy AI solutions in a hybrid environment that features easy access to cloud, on-premises, edge and cloud-adjacent components.

What makes up the AI Proving Ground?

From a technical lens, the AI Proving Ground is a heterogeneous "lab of labs" ecosystem that currently houses 13 different AI environments. You can find details about these environments below in "Environment details: 13 AI labs and growing."

Our AI Proving Ground labs strategically range in focus from reference architectures to automated component orchestrations. By seizing this unique opportunity to validate the performance of the latest AI hardware and software integrations, technologists can quickly and confidently pursue the types of AI-powered solutions that deliver the most business value. 

 

The AI Proving Ground is a lab of labs
The AI Proving Ground is a lab of labs

 

Environment details: 13 AI labs and growing

This section details each of the 13 lab environments currently operating in the AI Proving Ground. We plan to expand the number of dedicated AI labs available to our clients and partners in the coming months.


AI Lab 1: NVIDIA DGX H100 BasePOD lab

The DGX H100 BasePOD is a prescriptive AI reference architecture from NVIDIA designed for enterprise AI workflows. The DGX H100 BasePOD environment inside the AI Proving Ground features four H100 DGX appliances, two different 400GbE Ethernet fabrics (Cisco and Arista), 400Gb Mellanox InfiniBand fabric, and seven different storage solutions to choose from (Dell, NetApp, Pure Storage, VAST Data, IBM, DataDirect Networks, HPE GreenLake/VAST Data). This lab also includes NVIDIA's NVAIE Software Platform and Run:ai's Optimization and Orchestration solution.

Use cases: Clients can use this lab to understand the level of effort needed to build and support their own AI environments and validate the integration of different enterprise networking and storage solutions. Clients can also leverage the lab's synthetic load generation solutions to record power and performance metrics or even build their own use cases to better understand the performance of different workloads within an integrated solution of their choice.

AI Lab 2: NVIDIA GH200 Grace Hopper Superchip lab

The NVIDIA GH200 Grace Hopper Superchip is a small but powerful breakthrough processor designed for giant-scale AI and high-performance computing (HPC) applications. The GH200 lab environment inside the AI Proving Ground supports a single GH200 appliance with both 400Gb InfiniBand and 400GbE Ethernet connections. NVIDIA's superchip can deliver up to 10 times the performance for applications running terabytes of data.

Use cases: Clients can test HPC workloads on the GH200, recording performance and power metrics for each test. 

AI Lab 3: Composable XPU-as-a-Service lab

The AI Proving Ground features a dedicated GPU-as-a-Service environment. This fully automated solution enables our engineers to build physical server environments with different server, CPU, GPU and operating system options. Thanks to our Liqid Composable Disaggregated Infrastructure solution and RackN Digital Rebar Platform, dedicated specialized builds can happen within minutes without the need to physically touch the servers.

The following options are available for specialized, dedicated servers:

  • Server partners: Dell and HPE
  • CPU partners: Intel and AMD
  • GPU partners:
    • NVIDIA: A100, A30, L40
    • Intel: Flex 140, Flex 170, Max 1100
    • AMD: MI210
  • Operating systems: RHEL 8, RHEL 9 and Ubuntu 22.04

Use cases: In this lab, clients can engage servers with different configurations without the need to fully build and integrate different accelerators by hand simply to evaluate different AI models from training or inference standpoints.

AI Lab 4: Intel Gaudi performance cluster lab

Intel's Gaudi AI accelerator cluster supports a single Gaudi-1 appliance (with eight first-generation deep learning processors) and a single Gaudi-2 appliance (with eight second-generation deep learning processors). Each HPC appliance can leverage both local NVMe storage or high-speed storage systems via a dedicated 100GbE network fabric.

Use cases: Clients can leverage this lab to validate different deep learning training or inferencing solutions while recording both performance and power metrics during testing.

AI Lab 5: NVIDIA Omniverse Digital twin lab (with Dell) cluster

Our digital twin environment features dedicated Dell 16G PowerEdge server nodes in support of NVIDIA's Omniverse developer platform and its L40 and A40 GPUs. This lab supports Omniverse's database and collaboration engine components as well (i.e., the Enterprise Nucleus Server) in a dedicated environment, allowing developers to build highly scalable, high-performance solutions.

Use cases: Clients can use this lab to evaluate and build digital twin solutions specific to their needs within a NVIDIA Omniverse framework.

AI Lab 6: Dell reference architecture lab (with NVIDIA)

This Dell reference architecture environment is a full-stack solution. Hardware components include dedicated Dell PowerSwitches for high-speed networking, Dell PowerEdge accelerator-optimized compute nodes (XE9680 and R760xa servers), and Dell PowerScale storage (the F600 array). The lab also features multiple MLOps and Kubernetes platform solutions, and it's enabled with NVIDIA H100 and L40 GPUs. Clients can choose to apply the NVIDIA NVAIE framework to the environment or choose a preferred MLOps and Kubernetes solution.

Use cases: This lab allows clients to evaluate full-stack solutions from both management and performance validation standpoints, including power consumption and performance metrics.

AI Lab 7: Data scientist development for GenAI lab (with Dell and RedHat)

Our data scientist development cluster is a dedicated HPC environment that enables our data scientists to develop and train the LLMs used in GenAI solutions. This environment consists of an integrated solution that includes dedicated high-speed networking via Dell PowerEdge 15G servers and NVIDIA A100 GPUs. Additionally, the environment is configured with an OpenShift container platform and a dedicated MLOps solution that gives data scientists their own dedicated workspace within the cluster to execute their work without fear of interfering with others who are also utilizing the cluster. The team can check in approved models for quick access.

Use cases: In this lab, data scientists can demonstrate different LLM solutions and training techniques.

AI Lab 8: Data scientist capabilities lab

The Data Scientist Capabilities Cluster is a dedicated high-performance computing environment enabling our data scientists to evaluate and demo LLM solutions and training techniques for our clients and partners. The environment is a high-performance compute cluster with NVIDIA A100 GPUs. The environment is also configured with an OpenShift Container Platform and a dedicated MLOps solution that provides data scientists with a dedicated space to execute their work without fear of interference from other concurrent efforts within the cluster.

Use cases: Data scientists can leverage this lab environment to demo and validate different LLM solutions and training techniques.

AI Lab 9: AI security and application services lab (with Dell and Liqid)

This AI lab environment contains Dell PowerEdge 15G servers, a Liqid CDI fabric for dynamic swapping, and additional GPUs, as needed.

Use cases: The lab features a dynamic, dedicated cluster that enables our security and application services teams to evaluate and showcase independent software vendor (ISV) solutions that leverage accelerators for faster, more accurate outcomes. This AI lab environment gives organizations the ability to leverage multiple types of GPUs from WWT's roster of manufacturing partners.

AI Lab 10: Intel AI-Reference Kit lab (with Dell and RedHat)

The Intel AI-Reference Kit cluster consists of a five-node Dell PowerEdge 16G server environment with Intel 5th Generation Xeon Scalable processors. The nodes are connected via 100GbE speed Ethernet connections and are managed by a RedHat OpenShift AI platform. Users can leverage one of the lab's five prebuilt AI solutions, or they can request that we build a special instance from one of Intel's other 29 solutions.

Use cases: Clients can experience demonstrations of one of the five prebuilt solutions on demand or reserve a cluster for their specific use case validation.

AI Lab 11: Liqid CDI POC lab (with Dell, Liqid and more)

The Liqid proof of concept (POC) cluster features composable disaggregated infrastructure solutions from Liqid along with Dell PowerEdge 15G servers. The dedicated environment currently supports two Dell PowerEdge Intel-based servers, two Dell PowerEdge AMD-based servers, and an 8-slot Liqid Chassis that can be populated with Intel, AMD, or NVIDIA GPUs along with a Liqid NVMe IO Accelerator storage cards.

Use cases: Clients can validate the performance of in-server GPUs vs Liqid-attached GPUs for VDI, inference or training workflows.

AI Lab 12: HPE reference architecture for GenAI lab (with Aruba)

Our HPE reference architecture lab environment inside the AI Proving Ground is a full-stack solution. Hardware components include dedicated Aruba high-speed networking, HPE ProLiant and Cray accelerator-optimized compute nodes (Cray XD-670 and ProLiant DL-385 servers), and a dedicated HPE Greenlake/VAST Data array. The environment includes an MLOps platform from Determined AI as well as data fabric from Pachyderm, and it is enabled with NVIDIA H100 and L40 GPUs.

Use cases: Clients can use this lab to evaluate full-stack solutions from both management and performance validation standpoints, including power consumption and performance metrics.

AI Lab 13: NetApp AIPod lab (with NVIDIA)

NetApp's ONTAP AI Base Pod (AIPod) is a dedicated NVIDIA NeMo RAG (retrieval-augmented generation) demo environment. The lab includes dedicated high-speed networking, an NVIDIA DGX H100 appliance and a NetApp AFF800 array. The environment will showcase NetApp's BlueXP portfolio and highlight NetApp's ability to provide industry-leading data mobility and multitenancy for AI workloads in all the ATC-connected public cloud providers. This instance will be the first of the AI Proving Ground's hybrid cloud solutions.

Use cases: The environment will leverage NVIDIA NeMo Framework to quickly deploy different RAG environment frameworks including NeMo Retriever with NetApp storage endpoints (StorageGRID, ONTAP, FSxN, ANF and GCNV) that can be configured as vertical-specific use cases with the appropriate data ingestion. 

WWT is hard at work developing and implementing our own AI-powered solutions across key areas of our business. These are a few of the projects we're working on

Conclusion

As AI and data solutions continue to evolve across manufacturers and industries, so too will the AI Proving Ground. WWT is dedicated to enhancing and scaling the capabilities of the AI Proving Ground, in close collaboration with our partners, to deliver cutting-edge AI-powered solutions and high-performance architectures that generate real business value. 

AI is revolutionizing the way we do business. Together, we can drive innovation to make a new world happen.

Follow WWT's artificial intelligence and data page to stay up to date.
Keep learning