Data centers face a pressing issue — heat.
Thermal Design Power (TDP) is a metric used to quantify the heat generated by components such as CPUs or GPUs under load. High-performance devices now feature TDPs exceeding 500 watts, allowing for more powerful devices in server racks. However, this leads to a new problem: power density. Enterprise server rack power density has more than doubled in the past decade, reaching 8 to 10 kilowatts per rack, and some US data centers peak at 16 to 20 kilowatts per rack. Considering the roadmaps of CPUs, GPUs and other compute elements, it is shortly expected that a fully populated rack will exceed 40 kilowatts.
While energy cost is a concern, cooling efficiency is also a serious issue. Cooling is a significant source of energy waste in data centers as it consumes as much or more power than the computing devices themselves. Reliance on inefficient air cooling, which is the typical method for heat dissipation, leads to low-density racks, performance drops and increased failure rates.
In this article, we will discuss the difference between efficiency and sustainability and how to approach heat challenges in data centers by introducing the concepts of workload consolidation and liquid cooling.
Data centers have a large and growing impact on the environment due to the amount of electricity they use to operate and cool systems. As technologies such as generative AI continue to advance, the increased power and cooling demands are placing more pressure on the data centers to support those systems while maintaining optimal performance.
But what does it mean to have an efficient versus sustainable data center?
Efficiency and sustainability are interrelated and often used interchangeably, but they are not synonyms. Efficiency pertains to optimizing resources and reducing waste, ultimately improving the bottom line; sustainability, on the other hand, encompasses environmental, social and economic dimensions. A sustainable data center —one that considers its environmental impact— is a result of an efficiently run data center.
Read WWT Research Report: Efficiency is the Path to a Sustainable Data Center to see how efficiency in the data center can lead to sustainable outcomes.
By implementing processes like workload consolidation and liquid cooling solutions, data centers can address environmental concerns while optimizing resources.
An approach to reducing the energy consumption and cooling demand of data centers is to consolidate the workloads on fewer but more efficient systems. To achieve this, an auditing process to identify zombie servers, which are servers that are idle or underutilized but still consume power, needs to be conducted. Getting the most out of a consolidation program requires an understanding of the characteristics and requirements of different workloads, such as CPU-intensive, GPU-intensive or memory-intensive tasks.
By consolidating workloads on modern CPUs or GPUs, data centers can reduce the total number of servers they need and save net energy. However, this also poses new challenges as the computationally powerful next generation of CPUs and GPUs consume a great deal of electricity and dissipate substantial heat. To take advantage of that computational ability, these devices require more sophisticated cooling solutions and workload management strategies to operate effectively. Moreover, they may require software re-architecture to optimize for energy efficiency.
Air cooling is the predominant method of cooling equipment in enterprise data centers, but it is not efficient. Air cooling can only dissipate a certain amount of heat, and it requires a low ambient temperature to work effectively. Moreover, as the amount of heat increases, the power requirements for the fans increase exponentially. High heat is a significant problem for electronic systems as the system's reliability and performance are impacted as well.
To avoid overheating and reliability issues, computing devices will "throttle," temporarily reducing the clock speed and therefore the power consumption and heat generation of the device. Throttling also reduces the performance and efficiency of the device, which on its own is undesirable in a data center environment. In certain cases, the device will "thrash" and rapidly switch between high and low power states, resulting in unstable performance and reliability issues.
Direct-to-chip liquid cooling (DLC) offers several advantages over air cooling. A liquid, such as water or a refrigerant, is run directly over the hottest parts of the system offering a more efficient transfer of heat from the device to a heat exchanger or a radiator than air. With that additional efficiency, liquid-cooled systems can operate in data centers with higher ambient temperatures, saving air-conditioning costs. Since the systems are now operating at a lower and more stable temperature, liquid cooling also allows the device to run at full speed without throttling, improving the performance and efficiency of the system.
A two-phase direct-to-chip liquid cooling system such as ZutaCore involves changing the state of the liquid from liquid to vapor and back again. By applying a liquid with a low vaporization point directly to the device, such as a CPU, more heat is absorbed from the device than if the coolant remained a liquid. The vapor then travels to a condenser, where it is compressed and cooled to return to its liquid state. Two-phase liquid cooling can achieve high heat transfers from CPUs. In WWT's experiments, we have shown reductions in CPU temperatures from 90 C to 70 C compared with air cooling.
In addition to the system-level benefits, liquid cooling has several knock-on effects that benefit data center design and operation. Liquid cooling can reduce the energy consumption and cost of cooling data center equipment. It also enables higher density and scalability of data center equipment, allowing for more devices to be packed in a smaller space without overheating.
Though there have been advancements in performance and cooling options, architecting an efficient data center that meets your organization's needs can be challenging.
At WWT, we test new solutions and integrations in our Advanced Technology Center lab to help our clients meet their efficiency and sustainability goals, as well as provide data center design consulting through our Sustainable IT Briefing. During this consultation, we can help you assess the current state of your data center, map systems, and identify workloads and initial areas for optimization.
This report may not be copied, reproduced, distributed, republished, downloaded, displayed, posted or transmitted in any form or by any means, including, but not limited to, electronic, mechanical, photocopying, recording, or otherwise, without the prior express written permission of WWT Research. It consists of the opinions of WWT Research and as such should be not construed as statements of fact. WWT provides the Report "AS-IS", although the information contained in Report has been obtained from sources that are believed to be reliable. WWT disclaims all warranties as to the accuracy, completeness or adequacy of the information.