Intelligent Resource Optimizer
Solution overview
In this lab we will explore the Intelligent Resource Optimizer and the full stack that it is optimized for, starting with the underlying HPE environment, then moving up the NVIDIA stack, and finally landing with the full stack software solution. We will explain the various layers of the project to clearly investigate the benefits that each component brings to power the full solution.
Intelligent Resource Optimizer
The Intelligent Resource Optimizer leverages AI-powered analysis to help IT operations teams quickly identify which computing infrastructure in their environment is over or underutilized. By analyzing historical point-in-time utilization statistics for CPU, disk and RAM, it identifies machines which should be either sized up or scaled down based on current patterns. It then goes a steps further and provides a predictive analysis, identifying not only which systems currently have misallocated resources but those likely to be in that position in the near future. This enables teams to proactively right size their compute resources leading to better cost management in over-allocation circumstances and better application performance in under-allocation situations.
HPE Private Cloud AI Platform
We start with an overview of the HPE Private Cloud AI Platform and the benefits that the platform provides through a secure, well known, and completely modularized platform. The key components of this platform that we will guide you through is the foundation of what many AI powered systems gloss over. How is the data secured, ingested into, and then leveraged to bring an "AI Solution" into production? The HPE Private Cloud AI platform helps answer the many questions that any IT organization has to answer.
- Where is our data stored?
- How does this tie into our existing data stores?
- How can I maintain the integrity of the data?
HPE Private Cloud AI helps to answer these questions by providing a robust, all in one solution to storage, orchestration, and compute in one integrated system. The Incident Knowledge Assistant leverages this system by integrating with the built in HPE Greenlake storage, Ezmeral orchestration layer, and NVIDIA L40s GPUs.
NVIDIA Stack
The HPE Private Cloud AI system comes with several different configurations to fit a wide variety of use cases. For this use case, we chose to use the smaller configuration powered by two L40s GPUs. The L40s has 48 GBs per GPU which means model sizes are limited to the maximum amount of memory available per GPU. In this case, the largest LLM model that can be used with 48GB of memory is a 32B parameter model. Leveraging that, we have chosen to focus our efforts on picking the right inference NVIDIA NIM for this job with a supporting cast of another ML model to fill the system. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across clouds, data centers and workstations. This solution relies on:
- OpenAI's GPT OSS 20B NIM
- Cisco-time-series-model-1.0
To leverage these NVIDIA NIMs, we have chosen to use the NVIDIA NeMo Agent Toolkit (NAT) as the foundational orchestration framework of the Incident Knowledge Assistant. NeMo Agent Toolkit is an open-source AI framework for building, profiling, and optimizing agents and tools from any framework, enabling unified, cross-framework integration across connected Agent systems. NAT has allowed us to quickly develop the AI pipeline to ingest large amounts of Service Management data and make targeted recommendations from that data.
Full Stack AI Solution
The last layer of this lab will focus on the full stack solution that powers the Incident Knowledge Assistant. Built using a mixture of Python and JavaScript, the Incident Knowledge Assistant serves as an example of modern full stack architecture that is built for deployment at scale. At WWT, we understand the need to build solutions that seamlessly fit into our clients' organizational landscape. Some portions of an AI application will always be new, but the other technical decisions don't have to be. Using modern software development practices in tandem with new technologies allows WWT to quickly build solutions that are ready for the enterprise.