In this article

With Cohesity, you have the ability to not only look at your backup data holistically through one lens no matter where it lives (in the cloud, on-premises, or in SaaS), but also to be able to shift that temporally to see what's changed. 

Soon, Cohesity will provide you with the capability to use Generative AI along with natural language processing (NLP) programs from a cloud provider of your choice to gain deeper insights from your secondary data. This capability will be part of Cohesity Turing, a unique, dynamic, quickly evolving set of AI technologies to bring the power of AI to data security and management. 

Through the mechanisms of data protection, you have created a de facto time series data lake with your secondary data. Now you will be able to pair it with NLP, artificial intelligence / machine learning (AI/ML), and GenAI to have conversations with that data and use historical context to develop a deeper understanding of what's going on. This will allow you to re-leverage your own data to drive operational efficiencies by reducing time to action to accomplish tasks faster, and get more information from your systems to drive innovation.

Cohesity Context

For the last decade, Cohesity has been revolutionizing the way companies handle and leverage their secondary data.  Not surprisingly, the company has been recognized as a leader in Gartner's Magic Quadrant for Enterprise Backup & Recovery Software Solutions for 4 years in a row.

In response to inefficient legacy infrastructure, Cohesity initially brought backup data onto a modern platform that allowed customers to back up and restore their data very efficiently. They have since expanded their focus from data protection alone to include data security, then data mobility and now to data insights. Specifically, Cohesity will help you fully mine your data.

How Does it Work?

One of Cohesity's founding principles was hyper-convergence—keeping data and compute close together. This serves them well in the AI world, which is very compute and GPU-intensive. Cohesity provides customers with the flexibility to integrate anywhere with any and all cloud providers. It uses its own backup data of its customers, allowing you to select objects to protect and create AI-ready indexes to look at your data that is in Cohesity's platform when they want deeper insights.

Cohesity will leverage hyperscalers to go find those files through the index, wherever they live in Cohesity around the world, and then: 

  • Open them up, calculate the semantic data, and store these embeddings
  • Calculate that using a separate AI model
  • Send the matching data to your preferred large language model with a conversational interface to answer your questions

You will receive your answer along with resource links back to the actual files so you can double-click into the them if you want to dig deeper, as well as the ability to do semantic enterprise search and discovery of your own documents.


The Generative AI model you would use is similar to ChatGPT but much more secure and protected because it only allows you to interact with data you have access to, and because it has a shared responsibility model with cloud providers.

Also, while ChatGPT can record and store your questions and answers in order to retrain their AI Models (which is scary from a security perspective), Cohesity doesn't store any of the data to train or fine-tune. Neither do the cloud providers; large language models are never fine-tuned on any of your data. They come pre-trained. You never want your data to be trained into the models and potentially get out inappropriately.

Cohesity will fetch context just-in-time (known as Retrieval Augmented Generation), provide it as part of the question, then use this data in the AI model only to answer your question, providing additional data just-in-time.

Cohesity also ensures a secure outcome by helping you through this data-cleansing for AI process by cleansing data sets, ensuring good data practices, and providing a large computational spend. Cohesity makes your data AI-ready in a way that, for many companies, is quicker, much more direct, and safer than building it yourself.

The Development Process

Cohesity has been developing the capability for customers to use generative AI to better leverage their backup data for over a year.

First, Cohesity developed thought leadership around how this new capability could empower its customers. The company then developed partnerships with all three major cloud providers to ensure that you will have the flexibility to use any cloud hyperscaler you want, and any natural language models you prefer. Cohesity stores and secures the data, but doesn't build foundational language models, so they turned to the cloud providers, who each have their own. This ensures end-to-end security governance and a shared responsibility model. 

All Cohesity services are natively integrated in the cloud. The company is currently building the ability to find the data very quickly, encode it how the AI models need it, and build the interface to query the encoding very simply. The company is now enrolling a number of existing Cohesity clients into these early access programs.

Cohesity wanted to start conversations around how companies can use these models as early as possible, to better learn what the use cases are, where the value is, and how this can be used to make your life better so they can quickly share that knowledge.

Cohesity is intentionally entering into the market during the second wave of the hype cycle, when companies are familiar with what AI models can and can't do well. The company is currently focusing on use cases and data.

The solution will be available in the first half of 2024.

Emerging Use Cases

Use cases are already emerging from early discussions. These include:

  1. eDiscovery. Advanced eDiscovery is low-hanging fruit. This use case starts off with an issue in terms of what happened; sifts through the data payload to understand what's relevant to that particular incident; then sifts through to find data in support of that incident. Identifying the data payload and finding pieces of relevant info can be done in natural language to query the data and ask questions and find answers.
  2. Enterprise search. This includes finding keywords and understanding the meaning behind them. Words can have different meanings based on where they live in a sentence or a paragraph.
  3. Customer issue resolution. This use case includes getting assistance through support organizations, product help, and autonomous agents like bots helping to resolve customer issues faster.
  4. Cybersecurity tools. These GenAI models along with security tools can help bridge the massive skills gap in cybersecurity today around existing security tools. You can use your suite of security tools and GenAI to go through logs and data feeds to not only understand what's happening in real-time and use it as a translation language, but also to be told clearly what's happening and how to respond to it step by step.
  5. Law firms. Surprisingly, law firms have been early adopters. Although they are not usually the first to try new things, many law offices have asked to be part of the early access program, as they see great value in being able to compare case notes and history and put it through gen AI to reduce the amount of time it takes to do research.
  6. Support for IT Administration. GenAI can be extremely helpful when it comes to backup data, including telemetry data about all the systems, and all the backup jobs to help the IT administrators and backup administrators be able to ask questions. Many customers have expressed their desire for Cohesity to work in this area and roll it out soon.

What's Next?

As a strategic partner, WWT has been working closely with Cohesity to help organizations with AI-related business solutions and technology requirements. We make it our business to understand your needs and how this offering can benefit you and your organization's overall AI and data strategy.

Although this new GenAI capability to gain deeper insights from your secondary data won't be available until the first half of 2024, you can still be a part of these early conversations to evolve your data protection and AI practices.

Learn more about Data Protection and Cohesity Contact a WWT Expert

About the Authors

Dustin Zitzmann, WWT

Greg Stratton, Cohesity