Hands-On Lab Workshop: Vector DBs and Semantic Search

Event Overview

Join us for an immersive, hands-on workshop exploring the mechanics of vector search and semantic retrieval. This session goes beyond simply wiring a vector store into a pipeline — it teaches you to understand why certain queries succeed, why others fail, and which components are responsible. Through four focused Jupyter Notebooks, you will explore how embedding models, distance metrics, and chunking strategies each shape retrieval results. Whether you are building retrieval-augmented generation (RAG) pipelines or tuning search applications, this lab equips you with the diagnostic skills and evaluation frameworks to improve retrieval quality on your own data. All experiments run locally using open-source models and FAISS — no prior vector search experience required.

Brandon Swagman

World Wide Technology

Sr Mgr, ATC Delivery

I am part of WWT’s Advanced Technology Center (ATC) Solution Development team focused on infrastructure technologies. We create and sustain high-qu...
Sam Burck

World Wide Technology

Sr Data Scientist

As a Senior Data Scientist, I specialize in developing advanced, impact-driven solutions for business and product teams, drawing on a broad foundat...

What to expect

The Workship will cover the following topics:
  • Embedding Model Behavior: Explore how embedding models handle difficult language — negation, sarcasm, idioms, and domain jargon — and identify failure modes that degrade retrieval quality.
  • Distance Metric Comparisons: Compare how metrics like dot product and cosine similarity change what a vector store returns for the same query.
  • Embedding Model Evaluation: Analyze how the choice of embedding model and model size affect retrieval behavior across different datasets.
  • Chunking Strategy Analysis: Apply different chunking configurations to documents and measure their impact on retrieval accuracy.
  • Evaluation Framework Building: Construct a repeatable evaluation framework using recall@k and MRR metrics to diagnose retrieval problems and tune your own pipelines.
  • Interactive Q&A: Collaborate with WWT experts and fellow participants during live discussion segments.

Goals and Objectives

• Test how embedding models handle difficult language to identify failure modes that degrade retrieval quality. • Compare how distance metrics change what a vector store returns for the same query and dataset. • Evaluate how the choice of embedding model and model size influences retrieval behavior. • Apply different chunking configurations and measure their impact on retrieval performance. • Build a repeatable evaluation framework using recall@k and MRR to diagnose and tune pipelines on your own data.

Who should attend?

This workshop is ideal for data scientists, ML engineers, and AI practitioners building or optimizing retrieval-augmented generation (RAG) pipelines, developers working with vector databases and semantic search applications, and anyone looking to move beyond basic vector store usage to truly understand and improve retrieval quality. No prior vector search experience is required — if you work with AI applications that rely on search or retrieval, this lab is for you.