A pivot from ESG to AI?

In 2022, ESG was a top-of-mind issue for almost every enterprise. Companies around the world were announcing their support of sustainability initiatives and setting goals to become carbon neutral. They were considering ways to reduce their carbon footprint, evaluating products that promised to do more in a smaller footprint, and looking to cut waste through good housekeeping practices. Some were even taking more dramatic steps such as consolidating and closing data centers and migrating workloads to the cloud and colocation facilities. Most major enterprises made commitments to reducing and ultimately eliminating their carbon footprint on corporate climate action sites like this.

Then at the end of 2022, the release of one generative transformer caught everyone's attention and started a frenzy that continues to grow and expand today.

By the end of 2023, it almost felt like we had done an about-face on the topic of ESG. Forget carbon footprint, how fast can we buy and deploy GPUs? Due to the demand, customers are waiting in line just for the chance to buy AI infrastructure. This seems to be a real-life version of the adage "one step forward, two steps back."

The drive to deploy AI

Businesses and government entities are now caught up in the drive to deploy AI into critical aspects of their operations. The infrastructure required to run AI projects is extremely power-hungry and generates more heat than most data centers were designed to handle.

Typical corporate data centers are unable to support the demands of modern AI workloads, and the power and cooling requirements are an order of magnitude larger. The "footprint" of an IT rack is usually less than 10kW while AI workloads start at 40 kW/rack, with demand expected to increase dramatically over the next few years. The roadmap for the compute elements of GPUs, CPUs and others are trending to exponential growth in power requirements.

Can an enterprise continue to make progress on its ESG goals while deploying AI? What options exist to lessen the impact of the AI carbon explosion?

Responsible AI includes ESG

We can't stop progress, so it's imperative we find a way to address both needs at the same time. As part of any enterprise's AI initiative, they must adopt a "responsible AI" approach. This not only includes taking proper care of data but also ensuring AI systems act as responsibly and efficiently as possible.

Technology is constantly improving

We also need to remember that the infrastructure used to deploy AI today will only get better, more efficient and have a smaller carbon footprint. And it will get there at an unbelievable speed. It's easy to forget how far we've come. Thirty years ago, the computing power that would equate to what's in your mobile phone today would have required huge HVAC systems to keep it cool. With the entire world hungry for AI infrastructure, we will see the rate of innovation exceed our imagination, and the chips available in a few short years will make today's infrastructure look archaic.

A "systems view" of AI

Fundamentally, an AI system can be thought of as a system that converts inputs of power, space and cooling into models and inferences using the infrastructure provided. An AI factory is an environment where the variable input of electricity is turned into a model — and we know AI consumes a great deal of that input. Without a significant breakthrough in training efficiency optimization, the rest of the process will need to be optimized. As with all systems, AI systems can be optimized to maximize the return on those resources and the return on investment of the infrastructure.

Power

The amount of power AI systems require is astonishing. A low-density IT rack, in 2024, consumes about 3.5 kW of electricity, or about the same as an electric patio heater. A high-density IT rack consumes between 10-15 kW, or about the same as two domestic ovens on self-clean. A single AI LLM rack consuming 50 kW exceeds what an entire household could pull from the electric grid through a domestic 200A electrical panel. Moreover, the typical data center is set to deliver power with 208 V power distribution units (PDUs) AI systems with their higher requirements often use 415 V, which may require a data center retrofit. Data centers before AI were power hungry; AI has made them ravenous. For example, training GPT-3 consumed 1000 mW/hours of electricity, or about the usage of 130 US homes for a year.

Cooling

Each watt of power consumed by an AI system is dissipated as heat. There are several ways to manage heat in a data center, but well-managed air-only cooling is limited to about 20 kW/rack. Advanced cooling systems are required to cool workloads beyond that limit. These may include full-rack containment, rear-door heat exchangers (RDX) and direct-to-chip liquid cooling (DLC).

Key chip suppliers are moving toward liquid-cooled devices across their product lines. If an enterprise intends to be ready to adopt those platforms, the data center impact must be considered early. From an ESG point of view, advanced cooling, such as DLC, rack containment and RDXs are much more efficient than data center air cooling.

Space

A trade-off that can be made, at least in the short term, is to exchange space for power and cooling. To mitigate both the high power and high cooling requirements, some data centers are adopting a "low density" deployment strategy, making the power and cooling requirements manageable, at least temporarily. This tradeoff is at best an interim solution because it introduces additional communications latency and switch hops, lowers switch port utilization and still presents an overall power draw and heat dissipation requirement.

How the industry is responding

AI deployments represent a significant challenge to the overall technology industry. The emphasis on AI from CEOs and boards of directors is changing how organizations operate, and that's changing how the technology industry delivers solutions.

The necessity of cloud

Enterprise AI programs have board-level visibility and there is great pressure on technical teams to deliver something quickly. Public cloud hyperscalers (e.g., AWS, Google Cloud, Azure) have responded by making cloud-based AI systems, be it at a significant cost. 

Leveraging the cloud for AI development, even if temporary, allows for rapid progress. However, as the requirements become clear, organizations will need to carefully select the right place to run their high-demand workloads:

  • Cloud: The public cloud provides maximum flexibility at a premium.
  • On-prem: On-premises deployments may have the lowest long-term cost, but underlying infrastructure may require a significant modernization or re-fitting to run AI's high-demand workloads.
  • Colo: Co-location facilities can provide a balance between those options.

Power is going green, driven by economics

The electricity demand of AI and other compute systems is extreme, so new capacity will have to come online. New electrical generation is a capital-intensive activity and understanding the costs is key. Grid-scale electricity costs are measured by LCOE (levelized cost of electricity). As of 2024, the least expensive sources of power generation are wind and solar, even including grid-scale battery storage. 

Thus, new electrical capacity will be garnered from these and other renewable sources, as evidenced by the exponential growth in solar and wind power installations. Moreover, the costs of renewables are dropping as manufacturing those technologies becomes more efficient.

Advanced cooling will just be cooling

The power requirements of high-performance compute, storage and network systems are also increasing exponentially. GPU-driven compute is currently driving the demand, but the same fundamentals will drive higher power requirements for other data center components as well.

How enterprises should respond

Integrating AI systems into enterprise operations is a challenge that will require organizations to adapt. Here are some suggestions for what that adaptation might look like.

Relentlessly & continuously optimize

By adopting the principles of continuous integration/continuous deployment (CI/CD), the enterprise's infrastructure can adapt to the challenge presented by AI-type workloads while maintaining its commitment to corporate responsibility goals. 

Optimization is predicated on measurement and goal setting — but optimization for what? And how can we know when a goal has been reached?

Cost, performance and sustainability are not mutually exclusive

There is a misconception that sustainable solutions are more expensive than non-sustainable ones. The sustainability of IT infrastructure reflects the efficiency of design and implementation. The key is to measure the overall implementation in terms of complete cost, which must include application, systems, facilities and energy requirements. Good systems and application hygiene are key to both optimizing costs and carbon footprint.

A greener cloud

Public cloud hyperscalers and colocation providers are investing heavily in green power and designing next-generation cooling solutions that should dramatically reduce carbon impact.

These hyperscalers are strongly motivated to implement the best practices of data center optimization as it has a direct impact on their own economics. However, the hyperscalers are also strongly motivated to maximize the consumption of their resources.

Thus, enterprises must carefully optimize the footprint of their AI applications in the cloud to manage both costs and their carbon footprint.

Application optimization

While many clients have completed the "easy" migrations to the cloud, application rationalization can provide additional migration options:

  • Refactored applications can be optimized to provide business functions that meet the enterprise's needs while consuming the minimum number of resources.
  • Refactoring legacy applications into modern, elastic design patterns will result in better, more robust applications and reduce their resource consumption.

Migrating legacy applications to newer SaaS options:

  • SaaS applications use common infrastructure optimized for that application, resulting in high utilization rates. High utilization rates are an indicator of resources being efficiently utilized.

AI applications are not architected like enterprise applications and have different design patterns. They are intrinsically more akin to high-performance computing (HPC) applications than enterprise IT applications. Thus, choosing infrastructure that is architected to support those types of workloads will be optimal compared to traditional IT designs.

On-premises facilities & IT optimization

The key to deploying AI is rethinking the approach to facilities infrastructure. AI places a very high demand on power and cooling and will require many organizations to refactor their data centers. Rack-scale AI systems currently require 40 kW and are expected to rapidly rise. Moreover, the power distribution required will be at 415 V. Power is only part of the demand; cooling requirements will rise consummately and many, if not all future AI systems will require advanced cooling. 

As mentioned above, advanced cooling solutions such as rear-door heat exchangers (RDX) and direct liquid cooling (DLC) are much more efficient than room air conditioning.

  • To enable the data center for the future:
    • Implement good hygiene, such as hot and cold containment.
    • Ensure there is environmental monitoring.
    • Consider investing in economic green power initiatives.
  • To deploy AI systems data centers must:
    • Plan for advanced cooling, including water to the rack.
    • Deliver 415V power at least 60A, likely much more.
    • Prepare for the very heavy weight loads.
    • Securing an adequate power allotment from the utility.
  • Modernize IT systems:
    • Densify IT workloads; new IT systems can do an equivalent amount of work in a much smaller footprint of servers and data center resources, including power and cooling.
  • Track emerging technologies:
    • There is a revolution going on in technology driven by high-demand AI systems. It is key that enterprises remain well and unbiasedly informed about these technologies, as they will play a critical role in reducing the carbon footprint of AI and IT workloads. Innovations in chip design, energy-efficient computing and advances in cooling technologies all help mitigate environmental impacts.

ESG AI and corporate responsibility

The critical need to reduce enterprise carbon footprints has not vanished with the emergence and growing pressure to adopt enterprise AI solutions. Indeed, AI can help organizations achieve ESG goals when applied judiciously, and even revolutionize ESG reporting.

Organizations should note that ESG regulations are still being written and will have a direct impact on business bottom lines. As regulatory bodies face the growing demand for AI, they must address the environmental challenges this demand creates. We can expect to see guidelines and regulations that reshape how organizations approach AI deployments.

Responsible AI, which every organization should strive to embrace, must consider the environmental and social impacts of AI development and deployment. This is even more crucial when you consider that the public perception of AI and its environmental impact is by and large negative; and enterprises need to be aware of and address those perceptions.

In the market's current state, it is not yet possible to deploy AI solutions without substantially adding to the enterprise's carbon footprint. To address this and other public concerns, there must be more transparency around the benefits and risks that AI systems pose to the public, consumers and employees.

Both AI and ESG are just beginning a long journey

We are at the very beginning of the age of AI, a period of potential transformation unlike any other in technological or human history. How the story unfolds from here will be dictated by leadership that is not only visionary but anchored by a strong culture that values innovation as much as being a responsible global citizen.


Additional sources