Skip to content
WWT LogoWWT Logo Text
The ATC
Search...
Ctrl K
Top page results
See all search results
Featured Solutions
What's trending
Help Center
Log In
What we do
Our capabilities
AI & DataAutomationCloudConsulting & EngineeringData CenterDigitalSustainabilityImplementation ServicesLab HostingMobilityNetworkingSecurityStrategic ResourcingSupply Chain & Integration
Industries
EnergyFinancial ServicesGlobal Service ProviderHealthcareLife SciencesManufacturingPublic SectorRetailUtilities
Featured today
Learn from us
Hands on
AI Proving GroundCyber RangeLabs & Learning
Insights
ArticlesBlogCase StudiesPodcastsResearchWWT Presents
Come together
CommunitiesEvents
Featured learning path
Who we are
Our organization
About UsOur LeadershipLocationsSustainabilityNewsroom
Join the team
All CareersCareers in AmericaAsia Pacific CareersEMEA CareersInternship Program
WWT in the news
Our partners
Strategic partners
CiscoDell TechnologiesHewlett Packard EnterpriseNetAppF5IntelNVIDIAMicrosoftPalo Alto NetworksAWS
Partner spotlight
What we do
Our capabilities
AI & DataAutomationCloudConsulting & EngineeringData CenterDigitalSustainabilityImplementation ServicesLab HostingMobilityNetworkingSecurityStrategic ResourcingSupply Chain & Integration
Industries
EnergyFinancial ServicesGlobal Service ProviderHealthcareLife SciencesManufacturingPublic SectorRetailUtilities
Learn from us
Hands on
AI Proving GroundCyber RangeLabs & Learning
Insights
ArticlesBlogCase StudiesPodcastsResearchWWT Presents
Come together
CommunitiesEvents
Who we are
Our organization
About UsOur LeadershipLocationsSustainabilityNewsroom
Join the team
All CareersCareers in AmericaAsia Pacific CareersEMEA CareersInternship Program
Our partners
Strategic partners
CiscoDell TechnologiesHewlett Packard EnterpriseNetAppF5IntelNVIDIAMicrosoftPalo Alto NetworksAWS
The ATC
Atom AiAI Proving GroundResearchGenAIATCAI & Data
WWT Research • Applied Research Report
• May 28, 2024 • 12 minute read

Part 5: Inside Atom Ai – Evaluating the Impact of RAG at Scale on AI Efficacy

In the fifth article in this series, we describe the custom evaluation framework designed to assess the performance of the Atom Ai (formerly WWT GPT), an LLM-based chatbot that generates responses using a RAG pipeline.

One of the biggest challenges in building LLM-based applications is evaluating their performance. This is due primarily to the subjective assessment and inherent randomness of LLM-generated responses.

When Atom Ai (formerly WWT GPT) was initially released within WWT, evaluating its generated responses relied on manual intervention and feedback from a limited group of subject matter expert (SME) testers. To gauge the chat assistant's performance, each query and response pair was reviewed to identify potential areas of improvement in the Atom retrieval augmented generation (RAG) pipeline.

Release to a wider audience saw the user base surpass 3,500 individuals within the organization, with a significant number engaging with the GPT assistant daily. Manual assessment was naturally rendered impractical thereafter, prompting us to seek a more efficient and scalable solution.

While the chatbot has a mechanism for feedback collection, only a fraction of users — 7.5 percent to be precise — provided feedback on the responses generated by our GPT model. Moreover, user feedback is still currently limited to a binary format (i.e., thumbs up or thumbs down) devoid of the necessary nuances to understand the intricacies of model performance. Users can also provide comments to clarify their feedback, but the lack of quantity and quality available via this mode of feedback is not sufficient to gauge the performance of the application at scale.

Figure 1: WWT GPT user trend: steady daily user count across weeks but significantly improved performance with higher thumbs up over time
Figure 1: Atom Ai user trend: Steady daily user count across weeks but significantly improved performance with higher thumbs up over time.

To achieve a more robust approach to evaluating the generated LLM responses and user behavior trends, we designed a custom Evaluation Framework. The primary objective of this Evaluation Framework is to tackle the challenge of assessing LLM-based chat assistants, a task hindered by the absence of standardized benchmarks for evaluating their responses. 

"WWT Research reports provide in-depth analysis of the latest technology and industry trends, solution comparisons and expert guidance for maturing your organization's capabilities. By logging in or creating a free account you’ll gain access to other reports as well as labs, events and other valuable content."

Thanks for reading. Want to continue?

Log in or create a free account to continue viewing Part 5: Inside Atom Ai – Evaluating the Impact of RAG at Scale on AI Efficacy and access other valuable content.

What's Next How Atom Ai and RFP Assistant have Evolved and What it Means for the Future
  • About
  • Careers
  • Locations
  • Help Center
  • Sustainability
  • Blog
  • News
  • Press Kit
  • Contact Us
© 2025 World Wide Technology. All Rights Reserved
  • Privacy Policy
  • Acceptable Use Policy
  • Information Security
  • Supplier Management
  • Quality
  • Cookies