This article was written by Kevin Brown, SVP and CMO of Secure Power, Data Centers, & Global Services at Schneider Electric and David McGlocklin, Cooling Development/Sustaining Engineering Manager, North America at Schneider Electric. 

The AI shift: Why cooling is being redefined

The growth of Graphics Processing Unit (GPU)-based accelerated computing that powers AI workloads is changing the data center architecture. The power consumption of these chips is significantly higher than traditional Central Processing Units or CPUs. For many years servers were designed for CPUs that consumed ~150 watts. Today's Nvidia Blackwell chips are more in the 1000-1400 watt range.

This level of power consumption is driving up a common metric in our industry – power consumption per rack. Traditional data centers consume from 10kW up to 20kW per rack. The latest NVIDIA designs are 142kW per rack and NVIDIA has publicly stated 1 MW per rack is on the horizon.

Why air cooling is no longer enough  

Delivering this much power in AI data centers is a challenge, but common components are addressing the need. However, all this power turns into heat and removing that heat is no longer possible with traditional air cooling. It requires the use of a liquid. Liquid cooling comes in various methods, but direct liquid cooling (DLC), also known as direct-to-chip, has become the preferred technology to cool these chips. This technology has been used in super computers for years. In fact, Motivair by Schneider Electric has been at the forefront of DLC deployment for over a decade, bringing proven expertise from HPC environments to high-density AI data centers.

Deploying direct liquid cooling at scale

However, deploying direct liquid cooling at scale in AI data centers is new and introduces more complexity into a very complex environment. So, if you were under the illusion that liquid cooling is easy,  you may want to think again because it is the opposite of easy. The move towards liquid cooling is demanding a greater level of system engineering between the IT systems and facility infrastructure.

At Schneider Electric, we have detailed these liquid cooling challenges in Schneider Electric's White Paper 210 Direct Liquid Cooling System Challenges in Data Centers, providing critical insights for organizations planning to implement liquid cooling for AI workloads.

So, what could go wrong with liquid cooling for AI data centers?

The short answer is: a lot.

1. Possible corrosion and server damage

Liquid cooling uses various wetted materials and their selection is especially important because they must interact seamlessly. Manufacturers provide a list of the materials used and general water quality guidelines in the installation manual. Other wetted materials in the Technology Cooling System (TCS) loop should be verified for compatibility including the fluids themselves. The industry is developing standards and guidelines around liquid cooling, so materials coming from one manufacturer may not be compatible with the materials from other manufacturers.

  • Fluid Types

For example, the liquid used in liquid cooling solutions predominantly comes in two forms – the DI (deionized) water or a PG 25 solution, a propylene glycol-based fluid. Fluids from different manufacturers cannot be mixed.

  • Additives and Materials

Both fluid types have additives that could make them interact with certain types of brass or steel and potentially corrode. Even with corrosion inhibitors in the fluids, if the wrong type of material gets wet or if the system inhibitors are allowed to go out of an acceptable range, corrosion or biofilm is a possibility and will create debris within the coolant, representing risk of server damage.

  • New Fluids Entering the Market

 We are seeing companies bringing new nano fluids or engineered fluids to the market, adding to the options and the confusion.

2. Warranty confusion and challenges to SLAs

Prior to liquid cooling, data center operators could add air conditioning near the servers. Hot air coming out of the servers would be contained and fed to the air conditioner. But with liquid cooling for AI data centers, complexity creeps in because you have more things to consider. For example, the server and the cooling equipment are connected with pipes and become shared appliances with the controls interconnected.

With air cooling, you can check your server specification and know that it needs a specific supply temperature and, generally speaking, you can easily compensate for design errors by modifying the air flow at the facility level. Data center operators for years have added containment, changed floor tiles, and added close coupled cooling to compensate for errors.

With liquid cooling you do not have the same ability to compensate post implementation. The level of system engineering is much more precise. Instead of a single supply temperature, there are also specifications for pressure and flow rate of the liquid. Operators must be familiar with the specifications of the various server manufacturers and know which ones may have different required temperatures, pressures, and flow rates. Moving outside these specifications will provide risks to the performance of the server.

 Without one trusted vendor providing the end-to-end liquid cooling solution, you could end up with a lot of finger pointing or worse.

3. Energy savings left unrealized

Air cooling and liquid cooling operate at different supply temperatures and the design choices at the chiller plant can lead to additional energy savings. Compared to air, water is more than 23 times better at conducting heat and can hold over 3,000 times more heat by volume, so, it's a trade-off. Of course, you can have a single chiller plant that supplies the same water temperature to both air- and liquid-cooled units, but this approach limits how warm you can run your liquid cooling to take advantage of free-cooling hours. If you purchase a second chiller and have two plants instead of one – a chiller for the air cooling and a separate chiller for the liquid cooling – you can run the liquid cooling side temperature much warmer and experience that efficiency. (By 'rule of thumb,' every 1C you can raise your chiller temperature translates to between 2~2.5% savings on your electrical efficiency.) But if you stick with one chiller, you are sacrificing the energy savings offered by liquid cooling.

The solution is an end-to-end liquid cooling approach from a trusted partner

These are only a few of the things that can go wrong and I hope they illustrate why the best liquid cooling for AI data centers requires an end-to-end approach that accounts for technology sourcing and installation, and ongoing maintenance. Adding liquid cooling to your existing data center can be complicated, but when done successfully, it effectively cools hotter workloads and keeps critical infrastructure running at peak uptime and efficiency.

Motivair by Schneider Electric: Designed for AI, HPC, and high-density GPU workloads

Available globally, Motivair by Schneider Electric cooling solutions meet the power and GPU-intensive demands of high-density data centers reliably and at scale. Our complete liquid and air-cooled portfolio comprises data center physical infrastructure including:

  • CDUs
  • RDHx
  • HDUs
  • Dynamic cold plates
  • Chillers
  • Software and services

All are designed to handle the thermal management requirements of next-generation HPC, AI, and accelerated computing workloads.

Schneider Electric and Motivair are providing customers with the most comprehensive data center and liquid cooling portfolio available in the market, inclusive of all core cooling infrastructure, alongside a supply chain capable of serving global demands. 

Learn more about WWT and Schneider Electric
Connect with our experts

Technologies