In this article

As organizations increasingly adopt artificial intelligence (AI) and machine learning (ML) workloads, data centers play a pivotal role in supporting these compute-intensive applications. This article focuses on evaluating data center readiness, specifically tailored to accommodate AI Large Language Models (LLMs). By addressing critical factors such as power availability, cooling systems, structural support, security, connectivity and reliability, organizations can ensure optimal performance and scalability for their AI initiatives.

Explore the sections below for deeper insights into what it takes to have an AI-ready data center.  

Power

High power racks

As AI-type workloads are driving power requirements on a per-rack basis far beyond what a typical enterprise data center can deliver, accommodations will have to be made. The current requirements are for 50 kW/rack. However, given the roadmaps of the relevant chip manufacturers, it is expected that those requirements will quickly rise to 80-120 kW per rack.

High power compute 

The driving force for racks of these power requirements is the compute elements being deployed. The power requirements for these GPU and CPU elements are expected to dramatically rise in the coming years. 

Latency 

The AI workloads require a high amount of low-latency network communications. This drives the rack designs to be dense so that the number of intermediate switches is minimized, and, to a lesser extent, the delay caused by cable lengths. One cannot lower the density of the rack designs without adversely affecting the performance of the overall system. 

While the initial footprint of high-power racks and systems may be small, as the workloads grow and become strategic for the organization, expanded deployments will have to be accommodated.

415 V Power @ 100A

When it comes to powering AI systems, 415 V power is preferred over 208 V power due to several reasons. First, 415 V distribution improves overall efficiency by eliminating voltage transformation losses along the power path. Second, most servers have single-phase power supplies that can directly handle 240 V (which aligns with 415 V distribution) without intermediate transformation. Third, safety remains a priority, and properly designed 415 V systems ensure safety for data center personnel. In summary, 415 V power offers efficiency benefits, reduces losses and aligns well with server requirements for AI workloads.

Cooling

Cool water feed and warm water return

An AI-ready data center must ensure a consistent supply of cool water for efficient cooling systems. Simultaneously, it needs a mechanism to handle warm water generated during the cooling process. This balance is crucial to prevent overheating and maintain optimal operating conditions for servers and equipment.

Tunable flow and temperature

Cooling requirements can vary significantly across different racks and workloads. Therefore, data centers should allow tunable flow rates and temperature adjustments on a per-rack basis. Fine-tuning these parameters ensures efficient performance and prevents thermal bottlenecks.

Water availability and leak detection

Understanding the availability of facilities water and chilled water is essential. Data centers rely on chilled water systems to maintain consistent temperatures. Additionally, leak detection instrumentation should be in place to promptly identify and address any water leaks. Early notification helps prevent potential damage to equipment and ensures uninterrupted operations.

Redundancy in water supply

Dual water supply sources are critical for reliability. Having redundant water feeds ensures that cooling systems remain functional even if one supply line fails. Redundancy mitigates risks associated with water system disruptions, safeguarding against downtime and potential data loss.

Hot/cold aisle containment and heat management

Achieving high power densities (such as 20 kW/rack and beyond) requires effective heat containment. Hot/cold aisle separation, chimney designs and other protocols optimize cooling efficiency. By directing airflow appropriately and minimizing mixing between hot and cold air, data centers can maintain stable operating conditions.

Rear Door Heat Exchangers (RDX)

RDX units play a crucial role in cooling high-power-density racks as it enhances the cooling efficiency by dissipating heat directly from server racks. Data center aisles must be wide enough to accommodate RDX deployments. These rear door heat exchangers are particularly relevant for systems requiring cooling beyond approximately 30 kW. 

Physical Infrastructure

Deep racks

An AI-ready data center must accommodate deep racks with dimensions exceeding 1200 mm. These racks provide the necessary space for high-density computing equipment, including AI servers, GPUs and storage arrays. Adequate clearance ensures efficient airflow and accessibility for maintenance.

Room for RDX/DLC piping and manifold

The data center layout should allocate space for RDX (Rear Door Heat Exchangers) or DLC (Direct Liquid Cooling) systems. These solutions enhance cooling efficiency by dissipating heat directly from server racks. The piping and manifold for these cooling systems require a dedicated room within the data center.

Power distribution

An AI-ready data center demands reliable power distribution. This includes sufficient electrical capacity, redundant power feeds and properly configured power distribution units (PDUs). Ensuring consistent power availability is critical for AI workloads.

Networking cabling

Structured networking cabling is essential. High-speed connections, such as fiber-optic cables, facilitate seamless communication between servers, switches and storage devices. Proper cable management minimizes signal interference and simplifies troubleshooting.

Monitoring and management

Implementing Data Center Infrastructure Management (DCIM) software allows real-time monitoring of critical parameters such as temperature, humidity, power usage and equipment status. 

Environmental monitors ensure optimal conditions. 

Remote access and control enable efficient troubleshooting, maintenance, and updates without physical presence.

Environmental, social and governance (ESG) footprint of the data center

Understand the origin of power, whether it's from renewable sources, fossil fuels or a mix. Sustainable energy choices align with environmental goals.

Trace the water supply from its source to cooling systems and wastewater disposal. Efficient water usage and responsible disposal contribute to environmental stewardship. 

Comply with current European Environmental, Social, and Governance (ESG) reporting requirements. Anticipate similar regulations in the US. Transparent reporting demonstrates a commitment to sustainability.

On-slab or heavy-duty raised floor

Data centers can have either an on-slab design (directly on the building floor) or a heavy-duty raised floor. The latter provides space for cable routing, cooling infrastructure and weight-bearing capacity.  By way of example, fully populated AI racks can weigh up to 1500 pounds. The floor must handle this load without compromising stability. 

Deployments must consider PSF (pounds per square foot) or KSM (kilograms per square meter) loading limits and elevator weight limits and adhere to PSF/KSM specifications for floor loading. Overloading the floor risks structural damage. If equipment needs to be transported via elevators, consider weight restrictions. Elevators must safely handle heavy server racks during installation or maintenance.

Space, security and availability

Understand the data center's growth strategy. Available space for future expansion ensures scalability. Assess the availability of key resources like electricity, water and other utilities. Reliable access to these resources is fundamental for sustained operations.

Other Key Considerations

There are other considerations for evaluating a data center for AI use. 

Connectivity

  • Latency: Efficient communication between servers and devices relies on low latency. Minimizing delays ensures smooth data flow.
  • Bandwidth: Adequate bandwidth is essential for handling large-scale data transfers. High-speed connections prevent bottlenecks.
  • Redundancy: Having redundant connections safeguards against service disruptions. If one link fails, another takes over seamlessly.

Reliability and resiliency

  • Disaster planning and mitigation: Robust disaster recovery plans are critical. Evaluate how the data center handles emergencies, such as natural disasters or power outages.
  • Power backups and service level agreements (SLAs):
    • Generators: Backup generators kick in during power outages, ensuring uninterrupted operations.
    • Battery Systems: These provide short-term power backup until generators activate.
    • Flywheel Systems: Kinetic energy storage systems offer rapid power support.

Compliance requirements

Consider legal and regulatory obligations. For instance, GDPR (General Data Protection Regulation) and HIPAA (Health Insurance Portability and Accountability Act) compliance are crucial.

Future-Proofing Power and Cooling

As AI workloads grow, anticipate increasing demands. Plan for scalable power infrastructure and efficient cooling solutions.

Remember, data centers play a pivotal role in supporting AI applications, and thoughtful evaluation ensures optimal performance and reliability.

Is your data center facility prepared for AI workloads?
Leverage our services