Skip to Content

This browser is no longer supported.

For the best WWT.com experience, please use one of our supported browsers.

Safari Logo Safari Chrome Logo Chrome Firefox Logo Firefox Edge Logo Edge
Search wwt.com...
Top page results

See all search results

Featured Solutions
What's trending
Help Center
Home
Solutions & Services
Solutions
AI and Data
Automation & Orchestration
Cloud
Data Center
Digital
Digital Workspace
ESG
Mobility
Networking
Security Transformation
See all Solutions
See all Solutions
Services
Application Services
ATC Lab Services
Consulting Services
Customer Success
Infrastructure Services
Mergers & Acquisitions
Strategic Resourcing
Supply Chain & Integration
See all Services
See all Services
Industries
Utilities
Financial Services
Global Service Provider
Healthcare
Life Sciences
Manufacturing
Oil & Gas
Public Sector
Retail
See all Industries
See all Industries
Partners
Cisco
Dell Technologies
HPE
NetApp
VMware
f5
Intel
Microsoft
Palo Alto
See all Partners
See all Partners
Learning & Support
ATC
Communities
Events
Labs
Learning Paths
Research
About
Footer Links
Blog
Careers
Contact Us
Diversity & Inclusion
Locations
News
Sustainability
AI Solutions ATC WWT Research White Paper Data Analytics Artificial Intelligence and Data
WWT Research • Applied Research Report
• September 10, 2020 • 16 minute read

An Ensemble Approach to Data Mining for Real-time Information Retrieval

Learn about an approach to Information Retrieval (IR) using a combination of multiple Natural Language Processing (NLP) models.

Abstract

Information retrieval (IR), locating relevant documents in a document collection based on a user's query, is a common problem in text analysis. Traditional keyword-based IR engines are good at finding relevant information, but struggle to provide semantic and contextual results for complex queries. We propose an ensemble approach, a combination of multiple Natural Language Processing (NLP) models, which transform documents into vectors, followed by scoring and ranking documents based on their relevance to the user search query. To provide an effective search mechanism over a large document corpus, we incorporate Elasticsearch in our solution. It is a popular distributed search engine which allows indexing and searching of documents in near real-time. With the retrieved documents, we generate multi-sentence summaries using an extractive text summarizer to make it easier for users to glean the relevant content. All these components are packaged into an end-to-end solution encapsulated by a Python Flask UI where the users can enter a search query and get the relevant results in near real-time. A major challenge faced in evaluating the performance of NLP-based IR models is the absence of relevance labels or scores for document-query pairs. To benchmark the performance of our ensemble approach to this problem, we use titles of the documents as queries to calculate the Mean Reciprocal Rank (MRR) as a validation metric for individual techniques and their ensemble.

"WWT Research reports provide in-depth analysis of the latest technology and industry trends, solution comparisons and expert guidance for maturing your organization's capabilities. By logging in or creating a free account you’ll gain access to other reports as well as labs, events and other valuable content."

Thanks for reading. Want to continue?

Log in or create a free account to continue viewing An Ensemble Approach to Data Mining for Real-time Information Retrieval and access other valuable content.

© World Wide Technology. All Rights Reserved
  • About
  • Blog
  • Careers
  • Locations
  • News
  • Press Kit
  • Contact Us
  • Privacy Policy
  • Acceptable Use Policy
  • Quality
  • Information Security
  • Supplier Management
  • Cookies