Daily Ops Summary Agent
Solution overview
In this lab we will explore the Daily Ops Summary Agent and the full stack that it is optimized for, starting with the underlying HPE environment, then moving up the NVIDIA stack, and finally landing with the full stack software solution. We will explain the various layers of the project to clearly investigate the benefits that each component brings to power the full solution.
Daily Ops Summary Agent
The Daily Ops Summary Agent leverages AI-powered summarization to help IT operations teams uncover insights and trends hidden in service management data. By parsing both structured and unstructured information from incident and change management systems, it generates natural language reports that highlight performance, patterns, service impacts, and actionable recommendations. This enables teams to quickly identify issues, improve operations, and enhance visibility across their IT environment.
HPE Private Cloud AI Platform
We start with an overview of the HPE Private Cloud AI Platform and the benefits that the platform provides through a secure, well known, and completely modularized platform. The key components of this platform that we will guide you through is the foundation of what many AI powered systems gloss over. How is the data secured, ingested into, and then leveraged to bring an "AI Solution" into production? The HPE Private Cloud AI platform helps answer the many questions that any IT organization has to answer.
- Where is our data stored?
- How does this tie into our existing data stores?
- How can I maintain the integrity of the data?
HPE Private Cloud AI helps to answer these questions by providing a robust, alll in one solution to storage, orchestration, and compute in one integrated system. The Daily Ops Summary Agent leverages this system by integrating with the built in HPE Greenlake storage, Ezmeral orchestration layer, and NVIDIA L40s GPUs.
NVIDIA Stack
The HPE Private Cloud AI system comes with several different configurations to fit a wide variety of use cases. For this use case, we chose to use the smaller configuration powered by two L40s GPUs. The L40s has 48 GBs per GPU which means model sizes are limited to the maximum amount of memory available per GPU. In this case, the largest LLM model that can be used with 48GB of memory is a 32B parameter model. Leveraging that, we have chosen to focus our efforts on picking the right inference NVIDIA NIM for this job with a supporting cast of other NIMs to fill the system. NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high-performance AI model inferencing across clouds, data centers and workstations. This solution relies on:
- OpenAI's GPT OSS 20B NIM
- NVEmdebQA E5 - Large NIM
- Llama-3.2-nv-rerankqa-1b-v2 NIM
To leverage these NVIDIA NIMs, we have chosen to use the NVIDIA NeMo Agent Toolkit (NAT) as the foundational orchestration framework of the Daily Ops Summary Agent. NeMo Agent Toolkit is an open-source AI framework for building, profiling, and optimizing agents and tools from any framework, enabling unified, cross-framework integration across connected Agent systems. NAT has allowed us to quickly develop the AI pipeline to ingest large amounts of Service Management data and make a detailed report from that data.
Full Stack AI Solution
The last layer of this lab will focus on the full stack solution that powers the Daily Ops Summary Agent. Built using a mixture of Python and JavaScript, the Daily Ops Summary Agent serves as an example of modern full stack architecture that is built for deployment at scale. At WWT, we understand the need to build solutions that seamlessly fit into our clients' organizational landscape. Some portions of an AI application will always be new, but the other technical decisions don't have to be. Using modern software development practices in tandem with new technologies allows WWT to quickly build solutions that are ready for the enterprise. To learn more about this use case, view our blog, Part 2: Transforming IT Operations - A Daily Ops Summary Agent.