Note: This is the fourth blog in a series of posts exploring how AI agents can support IT Operations teams. In this fourth post, we consider the over or under allocation of computing resources. Identifying these situations can improve either application performance or return on hardware investments respectively. 

Explore the series: 

Part 1: Transforming IT Operations with Large Language Models 

Part 2: Transforming IT Operations - A Daily Ops Summary Agent 

Part 3: Transforming IT Operations - An Incident Knowledge Assistant 

The context 

Most IT organizations today deploy their applications onto virtual machines, which separate the concept of a server from the actual physical hardware it is dependent on. With this additional layer of abstraction comes a host of benefits, one of which is the ability to dynamically adjust the physical resources allocated to any server. With little to no downtime, the CPU, memory and disk resources assigned to any server can be increased or decreased by the virtualization software. 

This capability allows for flexible management of all the physical compute hardware available in the data center or cloud tenant. Servers that spend a majority of their time idle — using very little of the CPU, memory or disk assigned to them — can quickly be downsized. In aggregate, this leads to cost savings as less physical infrastructure is needed to run business applications. Conversely, servers that spend a majority of their time using almost all of the CPU, memory or disk assigned to them can be quickly upsized. By avoiding available resource ceilings, application performance issues are addressed or prevented altogether. 

The challenge 

Since there are hundreds or thousands of servers in a typical data center environment, the challenge becomes identifying, on an ongoing basis, which servers are experiencing under or over utilization. IT operations teams, of course, have monitoring software that tracks the utilization of CPU, memory and disk resources in their environment in real time.  

These solutions are usually configured to send alerts and notifications when upper thresholds are hit, which typically means users of business applications may be experiencing performance problems as they work. But individual point-in-time threshold crossing does not always point to a systemic problem on a given server. As a result, teams can quickly become alert fatigued after receiving a flurry of notifications, only to log in and find that levels are back down to normal at that moment. 

Meanwhile, at the other end of the spectrum, most teams do not set alerts to be fired when utilization levels are below certain benchmarks. Typically, this is because users of business applications experience nothing unusual when a server has plenty of headroom available. As such, these scenarios go largely unobserved, meaning more resources than needed can remain assigned to a machine. 

The remedy 

What's needed is a holistic analysis of historical resource utilization patterns of all servers in the environment, conducted on a continuous basis. To accomplish this, we've taken granular time series server utilization data for CPU, memory and disk and provided it to a combination of AI models for analysis and forecasting. 

These identify, based on the historical statistics, which machines are currently already in an under or over utilized state. An LLM is used to summarize the utilization patterns seen and provide both an overview report to the environment as a whole and detailed reports on every individual host. 

Then a proactive step is taken, and the time series data is analyzed by a machine learning model trained to analyze this type of data set. This model uses the historical data to predict what the utilization trend will be in the future, identifying those systems that will soon reach an either overutilized or underutilized state. The result is that operations teams can proactively plan to scale up or scale down operations before it becomes an issue in fact. 

Conclusion 

Taken together, we think this optimizer enables IT teams to more intelligently manage the investments they have made in their compute infrastructure. By taking advantage of the analysis it provides, both application throttling and infrastructure overspending can be avoided.  

We're excited to be able to demonstrate the optimizer to you running live in our AI Proving Ground. It was developed using the NVIDIA Nemo Agent Toolkit and relies on models deployed as NVIDIA NIM™. We deployed the solution into an HPE Medium Private Cloud AI cluster where it is hosted with easy access via wwt.com. 

Follow HPE and NVIDIA on wwt.com now to stay informed on all of our progress!  

Technologies