Skip to content
WWT LogoWWT Logo Text
The ATC
Search...
Ctrl K
Top page results
See all search results
Featured Solutions
What's trending
Help Center
Log In
What we do
Our capabilities
AI & DataAutomationCloudConsulting & EngineeringData CenterDigitalSustainabilityImplementation ServicesLab HostingMobilityNetworkingSecurityStrategic ResourcingSupply Chain & Integration
Industries
EnergyFinancial ServicesGlobal Service ProviderHealthcareLife SciencesManufacturingPublic SectorRetailUtilities
Featured today
Learn from us
Hands on
AI Proving GroundCyber RangeLabs & Learning
Insights
ArticlesBlogCase StudiesPodcastsResearchWWT Presents
Come together
CommunitiesEvents
Featured learning path
Who we are
Our organization
About UsOur LeadershipLocationsSustainabilityNewsroom
Join the team
All CareersCareers in AmericaAsia Pacific CareersEMEA CareersInternship Program
WWT in the news
Our partners
Strategic partners
CiscoDell TechnologiesHewlett Packard EnterpriseNetAppF5IntelNVIDIAMicrosoftPalo Alto NetworksAWS
Partner spotlight
What we do
Our capabilities
AI & DataAutomationCloudConsulting & EngineeringData CenterDigitalSustainabilityImplementation ServicesLab HostingMobilityNetworkingSecurityStrategic ResourcingSupply Chain & Integration
Industries
EnergyFinancial ServicesGlobal Service ProviderHealthcareLife SciencesManufacturingPublic SectorRetailUtilities
Learn from us
Hands on
AI Proving GroundCyber RangeLabs & Learning
Insights
ArticlesBlogCase StudiesPodcastsResearchWWT Presents
Come together
CommunitiesEvents
Who we are
Our organization
About UsOur LeadershipLocationsSustainabilityNewsroom
Join the team
All CareersCareers in AmericaAsia Pacific CareersEMEA CareersInternship Program
Our partners
Strategic partners
CiscoDell TechnologiesHewlett Packard EnterpriseNetAppF5IntelNVIDIAMicrosoftPalo Alto NetworksAWS
The ATC
ResearchAI SolutionsATCApplied ResearchData AnalyticsAI & Data
WWT Research • Applied Research Report
• January 11, 2024 • 16 minute read

An Ensemble Approach to Data Mining for Real-time Information Retrieval

Learn about an approach to Information Retrieval (IR) using a combination of multiple Natural Language Processing (NLP) models.

This was originally published in September 2020

Abstract

Information retrieval (IR), locating relevant documents in a document collection based on a user's query, is a common problem in text analysis. Traditional keyword-based IR engines are good at finding relevant information, but struggle to provide semantic and contextual results for complex queries. We propose an ensemble approach, a combination of multiple Natural Language Processing (NLP) models, which transform documents into vectors, followed by scoring and ranking documents based on their relevance to the user search query. To provide an effective search mechanism over a large document corpus, we incorporate Elasticsearch in our solution. It is a popular distributed search engine which allows indexing and searching of documents in near real-time. With the retrieved documents, we generate multi-sentence summaries using an extractive text summarizer to make it easier for users to glean the relevant content. All these components are packaged into an end-to-end solution encapsulated by a Python Flask UI where the users can enter a search query and get the relevant results in near real-time. A major challenge faced in evaluating the performance of NLP-based IR models is the absence of relevance labels or scores for document-query pairs. To benchmark the performance of our ensemble approach to this problem, we use titles of the documents as queries to calculate the Mean Reciprocal Rank (MRR) as a validation metric for individual techniques and their ensemble.

"WWT Research reports provide in-depth analysis of the latest technology and industry trends, solution comparisons and expert guidance for maturing your organization's capabilities. By logging in or creating a free account you’ll gain access to other reports as well as labs, events and other valuable content."

Thanks for reading. Want to continue?

Log in or create a free account to continue viewing An Ensemble Approach to Data Mining for Real-time Information Retrieval and access other valuable content.

  • About
  • Careers
  • Locations
  • Help Center
  • Sustainability
  • Blog
  • News
  • Press Kit
  • Contact Us
© 2025 World Wide Technology. All Rights Reserved
  • Privacy Policy
  • Acceptable Use Policy
  • Information Security
  • Supplier Management
  • Quality
  • Cookies