In the world of artificial intelligence (AI), current hot topics include agents, agentic AI and agentic workflows. There have been many blogs written about them. You'll even find several on our own platform, wwt.com:

In this blog, we will discuss the effects of dynamic agentic workflows on high-performance architecture (HPA), especially high-performance storage (HPS) systems.

What is agentic AI?

According to Atom, our WWT chatbot:

Agentic AI refers to AI systems that can autonomously plan, make decisions, and take actions to achieve specific goals without constant human input. Unlike traditional AI, which typically responds to direct prompts, agentic AI models can break down complex tasks, adapt to changing conditions, and iteratively refine their approach. These systems are emerging in areas like autonomous research assistants, self-improving software agents, and AI-driven automation platforms, enabling AI to function with greater independence […].

Before agentic workflows, the inference system for a typical generative AI (GenAI) application in the enterprise wasn't too demanding. A classic GenAI application might be a basic LLM-based chatbot or a virtual assistant. The architecture of such an application may include additional models, including an embedding model and a re-ranking model (in case of utilizing retrieval-augmented generation (RAG)), and a vector DB. Of course, these items may need to scale depending on the number of users and use cases. An important distinction in these more basic, non-agentic systems is that the workflow typically involves a single pass: A user enters a prompt and it is processed by the system in a "one shot" way to produce the output.

Diagram of a sample non-agentic workflow.
Example of a non-agentic workflow.

By contrast, an agentic workflow involves multiple iterations of processing and integration with other data sources, often guided by an initial planning stage. In many cases, there may not be a human in the middle; however, in mission critical scenarios such as agentic AI coding assistants, it's best practice to include a human in the loop. The result of the planning stage is a workflow that automatically takes care of multiple step-wise tasks. As you might imagine, accomplishing this requires a lot of processing.

Diagram of a sample agentic workflow.
Example of an agentic workflow.

HPA environments must be architected to support agentic AI and dynamic agentic workflows at the enterprise scale. One component often overlooked is the HPS.

What effect will agentic AI and agentic workflows have on storage?

Agentic workflows are expected to significantly impact both storage capacity and workload. Several key factors contribute to this shift. Let's explore each in turn.

Multi-step reasoning and retrieval

The first factor is multi-step reasoning and retrieval, guided by the initial planning stage. Each planned step often involves agents calling external tools, APIs and databases — often multiple times per interaction. Compared to traditional batch-based or passive data processing, this results in a substantial increase in concurrent queries and I/O operations, especially when an agentic system is supporting a large user community. A single reasoning step may trigger numerous reads across vector databases, caches and relational systems. The orchestration of multiple concurrent tasks requested by an agentic system often amounts to spawning multiple concurrent subtasks, which multiply I/O demands.

Context and state preservation

Agentic systems must store not only inputs and outputs, but also intermediate states, execution plans, tool usage history, decision paths and memory traces. These are frequently read and written, particularly in parallel or long-running workflows. Agentic workflows require preserving progress mid-task, leading to frequent, high-speed writes of intermediate states. This places heavy demands on storage performance, something only highly performant storage architecture can handle at scale. Mechanisms like KV cache management and cache-augmented generation (CAG), which are critical to achieving adequate performance from the compute side of the architecture, further amplify the storage system's load.

Moreover, agentic systems are frequently used to solve relatively complex tasks that require lots of input data. Also, the models used tend to be reasoning models that shift compute from training to inference time by generating what is often a large number of chain-of-thought and reasoning tokens. All of this means that the models used must support a relatively large context size.

With the increase in the context size, the KV cache may need to be evacuated more often from the GPU RAM to CPU RAM, then to disk, and then retrieved. Without a performant storage system, this operation may lead to unacceptable latency for users. Some have even resorted to recalculating the KV rather than retrieving and utilizing the KV cache to meet latency requirements.

Vector and graph indices

Vectors and graph indices are common in RAG and memory graph systems. These structures are highly storage-intensive and almost always exceed the size of the raw input data. For example, we found that converting text content to vectors can increase capacity requirements by 10x. With the agentic system, there tend to be many more and larger vector databases.

Fine-tuning artifacts and model checkpoints

While model tuning techniques like LoRA (Low-Rank Adaptation) and PEFT (Parameter-Efficient Fine-Tuning) are designed to be storage-efficient compared to traditional tuning, they can consume additional storage capacity because they don't replace the original model. Instead, these methods create smaller, supplementary sets of parameters, or adapters, which are stored separately. Each specific task or fine-tuning iteration results in a new adapter. Therefore, while each individual adapter is significantly smaller than the base model (often by megabytes instead of gigabytes), the need to store the original large model plus potentially numerous adapters for different tasks leads to an overall increase in storage.

Traditional model checkpoints are written to disk at regular intervals to help protect data and work loss in the event of an error during the training phase. For agentic AI, checkpointing helps to protect the operational state of the agent. This allows for more benefits, including a persistent state for the agent, the management of long-running tasks, and auditing. These agentic checkpoints write the system memory to disk, which consumes additional capacity.

Conclusion

It's important to note that the impact on storage will vary based on the complexity of the workflows, the models and tools involved, and the specific tasks agents are performing. Because agentic systems are inherently non-deterministic, their operation is unpredictable. This makes storage workload forecasting challenging and highlights the need for flexible, scalable infrastructure that can handle fluctuating I/O and capacity demands.

What's next?

If you are investigating further, or you're already on the journey to implement AI agents and/or agentic workflows and need guidance, please contact your local WWT representative.