Hands-On Lab Workshop: Vector DBs and Semantic Search
Event Overview
Join us for an immersive, hands-on workshop exploring the mechanics of vector search and semantic retrieval. This session goes beyond simply wiring a vector store into a pipeline — it teaches you to understand why certain queries succeed, why others fail, and which components are responsible. Through four focused Jupyter Notebooks, you will explore how embedding models, distance metrics, and chunking strategies each shape retrieval results. Whether you are building retrieval-augmented generation (RAG) pipelines or tuning search applications, this lab equips you with the diagnostic skills and evaluation frameworks to improve retrieval quality on your own data. All experiments run locally using open-source models and FAISS — no prior vector search experience required.
Featured Speakers
What to expect
- Embedding Model Behavior: Explore how embedding models handle difficult language — negation, sarcasm, idioms, and domain jargon — and identify failure modes that degrade retrieval quality.
- Distance Metric Comparisons: Compare how metrics like dot product and cosine similarity change what a vector store returns for the same query.
- Embedding Model Evaluation: Analyze how the choice of embedding model and model size affect retrieval behavior across different datasets.
- Chunking Strategy Analysis: Apply different chunking configurations to documents and measure their impact on retrieval accuracy.
- Evaluation Framework Building: Construct a repeatable evaluation framework using recall@k and MRR metrics to diagnose retrieval problems and tune your own pipelines.
- Interactive Q&A: Collaborate with WWT experts and fellow participants during live discussion segments.
Goals and Objectives
• Test how embedding models handle difficult language to identify failure modes that degrade retrieval quality. • Compare how distance metrics change what a vector store returns for the same query and dataset. • Evaluate how the choice of embedding model and model size influences retrieval behavior. • Apply different chunking configurations and measure their impact on retrieval performance. • Build a repeatable evaluation framework using recall@k and MRR to diagnose and tune pipelines on your own data.
Who should attend?
This workshop is ideal for data scientists, ML engineers, and AI practitioners building or optimizing retrieval-augmented generation (RAG) pipelines, developers working with vector databases and semantic search applications, and anyone looking to move beyond basic vector store usage to truly understand and improve retrieval quality. No prior vector search experience is required — if you work with AI applications that rely on search or retrieval, this lab is for you.