Hybrid Retrieval is a pattern that adapts the retrieval strategy based on the structure and metadata of the underlying index. Instead of treating all queries the same, it uses filters, hybrid search, or multi-index routing to narrow the search space before ranking.
What problem does Hybrid Retrieval solve?
A user asks your RAG system "how do I fix a flaky deploy?" and gets back nothing useful. The knowledge base absolutely contains the answer, buried in a document titled "Resolving Intermittent CI/CD Pipeline Failures." The problem is obvious once you see it. The user said "flaky deploy." The document says "intermittent pipeline failures." Vector similarity between those two phrases is lower than you would expect.
This vocabulary mismatch shows up constantly in real systems. Domain experts write documentation using precise terminology. End users ask questions using casual language, abbreviations, or descriptions of symptoms rather than root causes. A support engineer might index a troubleshooting guide under "connection pool exhaustion" while a developer searches for "database keeps timing out."
The gap gets worse when your knowledge base spans multiple teams or time periods. Terminology drifts. One team calls it "deployment," another calls it "release," a third says "ship." Basic vector search treats these as related but not identical, and that slight distance can push the right document below your relevance threshold.
How does Hybrid Retrieval work?
Index-aware retrieval is a family of techniques that reshape either the query or the search mechanism to account for how information is actually stored in your index. Instead of hoping the user's words land close enough to the right embeddings, you actively bridge the gap.
HyDE (Hypothetical Document Embeddings) flips the retrieval problem on its head. Before searching, you ask an LLM to generate a hypothetical answer to the user's question. You do not show this answer to the user. Instead, you embed the hypothetical answer and use that embedding as your search vector. The intuition is that a plausible answer will use vocabulary much closer to what exists in your knowledge base than the original question did. A hypothetical answer to "fix a flaky deploy" might mention "intermittent failures," "retry logic," and "pipeline stability," which are exactly the terms your indexed documents contain.
Query expansion takes a different approach. Rather than generating a full answer, you rewrite the original query into multiple variations that cover different phrasings. "Fix a flaky deploy" becomes three or four queries: "resolve intermittent deployment failures," "CI/CD pipeline instability troubleshooting," "unreliable release process." You run all of them and merge the results. This casts a wider net without inventing a hypothetical answer.
Hybrid search attacks the problem from the retrieval engine side. Pure vector search captures semantic meaning but can miss exact keyword matches. Traditional keyword search (BM25) catches exact terms but misses semantic similarity. Hybrid search runs both in parallel and combines their scores with a tunable weight, often called alpha. At alpha=0 you get pure keyword search. At alpha=1, pure vector. Most production systems land somewhere around 0.5 to 0.7, leaning toward semantic but keeping keyword matching as a safety net.
Graph-based retrieval adds structural relationships between chunks. Instead of treating every chunk as an independent point in vector space, you build a graph where chunks link to related chunks, parent documents, entities, and concepts. When a query matches one chunk, the graph lets you pull in neighboring chunks that share entities or belong to the same topic cluster. This is especially powerful for questions that span multiple documents or require connecting information from different sections.
When should you use Hybrid Retrieval?
Start with basic vector search and measure your retrieval quality. If you notice that relevant documents frequently appear outside the top-k results, or that users rephrase questions multiple times before getting a good answer, you have a vocabulary gap problem.
HyDE works well when your knowledge base uses specialized terminology and your users do not. It adds one LLM call per query, so it is best suited for use cases where latency tolerance is moderate and retrieval accuracy matters more than speed.
Query expansion is a lighter touch. It works well when users tend to ask short, ambiguous questions. If your average query is three to five words, expansion helps fill in the missing context.
Hybrid search should be your default in production. The cost of running BM25 alongside vector search is minimal, and the accuracy improvement is consistent across domains. There is rarely a good reason not to use it.
Graph-based retrieval makes sense when your knowledge base has strong entity relationships, when questions often require connecting information across documents, or when you have a well-structured corpus like technical documentation with cross-references.
What are the common pitfalls?
HyDE can backfire if the LLM generates a confidently wrong hypothetical answer. If the hypothetical mentions incorrect terminology, you will retrieve documents related to the wrong concept entirely. This is especially risky in specialized domains where the LLM lacks deep knowledge.
Query expansion can introduce noise. If your rewritten queries drift too far from the original intent, you pull in irrelevant results that dilute the good ones. Five expanded queries that each return ten results means fifty candidates to process, and many of them may be off-topic.
Hybrid search requires tuning the alpha parameter per domain. A value that works for legal documents may perform poorly for code documentation. If you set it once and forget it, you lose much of the benefit.
Graph-based approaches carry the highest implementation cost. Building and maintaining the graph requires entity extraction, relationship mapping, and ongoing updates as your knowledge base changes. If your corpus is flat and unstructured, the graph adds complexity without proportional benefit.
What are the trade-offs?
Every technique here adds latency, complexity, or both. HyDE adds an LLM call before retrieval. Query expansion multiplies your search load. Hybrid search requires maintaining two index types. Graph retrieval requires building and updating a knowledge graph.
The question is always whether your retrieval quality problems justify the added complexity. If basic vector search gives you 90% accuracy on your evaluation set, adding HyDE to get to 93% may not be worth the extra 500ms per query. If basic search gives you 60% accuracy, these techniques are essential.
Start with hybrid search because the cost is low and the benefit is broad. Add query expansion if short queries are common. Reserve HyDE for domains with severe vocabulary mismatch. Consider graph retrieval only when your questions genuinely require multi-document reasoning and you have the engineering capacity to maintain the graph.
Goes Well With
Semantic Indexing improves the quality of what gets stored in the index. Index-aware retrieval improves how you search it. Together they attack the vocabulary gap from both sides.
Retrieval Refinement picks up where retrieval leaves off. Even with better queries, your retrieved set will contain noise. Reranking and filtering clean up the results before they reach the LLM.
Basic RAG is the foundation. Understanding the baseline pipeline makes it easier to see exactly where each index-aware technique plugs in and what it changes.
References
- Nogueira, R., & Cho, K. (2019). Passage Re-ranking with BERT. arXiv preprint.