How do they differ?
Basic RAG and Semantic Indexing are not competing patterns. They operate at different levels of abstraction, and understanding that relationship is essential before you start building.
Basic RAG (Retrieval-Augmented Generation) is an end-to-end architecture. It has three stages. First, you retrieve relevant documents from a knowledge base given a user query. Second, you inject those documents into the LLM prompt as context. Third, the LLM generates an answer grounded in the retrieved information. The pattern addresses a fundamental limitation of language models: their knowledge is frozen at training time. RAG lets you plug in up-to-date, domain-specific information without retraining.
Semantic Indexing is specifically about how you organize and search your knowledge base. Instead of relying on keyword matching (BM25, TF-IDF), you convert documents and queries into dense vector embeddings that capture meaning. Similar concepts end up close together in embedding space, even when they use completely different words. Semantic Indexing replaces or augments the retrieval mechanism within RAG.
The relationship is straightforward. Basic RAG defines the overall pipeline. Semantic Indexing is one implementation choice for the retrieval layer within that pipeline. You can run Basic RAG with keyword search, with semantic search, or with a hybrid of both. Semantic Indexing does not do anything on its own without a generation step to consume its results.
| Dimension | Basic RAG | Semantic Indexing |
|---|---|---|
| Scope | Full pipeline (retrieve + augment + generate) | Retrieval/indexing layer only |
| Core mechanism | Combine external knowledge with LLM generation | Embed documents and queries into vector space for similarity search |
| Default retrieval | Keyword-based (BM25, full-text search) | Vector similarity (cosine, dot product) |
| Handles synonyms | Poorly (keyword mismatch problem) | Well (semantic similarity captures meaning) |
| Handles exact terms | Well (direct keyword match) | Poorly (embeddings may blur precise terms) |
| Infrastructure | Document store + LLM | Vector database + embedding model + LLM |
| Indexing cost | Low (inverted index) | Higher (embedding computation + vector storage) |
| Query latency | Fast (keyword lookup) | Slightly slower (embedding + ANN search) |
When to use Basic RAG with keyword search
Keyword-based RAG is not obsolete. It is the right starting point for many applications, and it remains the better choice in several scenarios.
When your domain has precise terminology. Legal documents, medical codes, product SKUs, API endpoint names. If users search by exact terms and the vocabulary is well-defined, keyword matching is fast, predictable, and accurate. A search for "ICD-10 code M54.5" should return documents containing exactly that code, not semantically similar medical concepts.
When you are validating the RAG pipeline. Before investing in embedding infrastructure, build a working RAG system with keyword search. This lets you validate the chunking strategy, prompt template, and generation quality without the added complexity of vector databases and embedding models. Many teams discover that their problems are in chunking or prompting, not in retrieval.
When your corpus is small and well-structured. If you have a few hundred documents with clear titles and consistent formatting, full-text search with a good tokenizer will find what you need. The overhead of running an embedding model and maintaining a vector index is not justified.
When cost is a hard constraint. Keyword indexing is essentially free. Semantic indexing requires an embedding model (either a hosted API or a self-hosted model) and a vector database. For projects with tight budgets or minimal infrastructure, keyword search keeps the system simple and cheap.
When explainability matters. Keyword search is transparent. You can show exactly which terms matched and why a document was retrieved. With semantic search, the matching happens in a high-dimensional space that is difficult to explain to end users or auditors.
When to use Semantic Indexing within RAG
Semantic Indexing becomes valuable when keyword matching starts failing for your use case. Here are the signals that it is time to upgrade.
When users phrase queries differently from your documents. This is the vocabulary mismatch problem, and it is the primary reason Semantic Indexing exists. A user asks "how do I cancel my subscription" but your documentation says "terminate your plan." Keyword search returns nothing. Semantic search maps both phrases to the same region of embedding space and returns the right document.
When your corpus covers broad, overlapping topics. If your knowledge base is a large collection of support articles, research papers, or internal wiki pages, queries often relate to concepts rather than specific terms. Semantic search excels at finding conceptually relevant documents even when the surface-level wording diverges.
When you need multilingual retrieval. Multilingual embedding models can map documents in one language and queries in another to the same embedding space. This means a user can ask a question in Spanish and retrieve English-language documentation. Keyword search cannot do this without explicit translation.
When you are building a conversational interface. Users in a chat interface ask natural language questions, not keyword queries. They say "what is the return policy for electronics bought during the holiday sale" rather than typing "return policy electronics holiday." Semantic search handles natural language queries much more gracefully.
When recall matters more than precision. If missing a relevant document is worse than including a few irrelevant ones (think medical or safety contexts), semantic search casts a wider net. It finds documents that are conceptually related even without exact keyword overlap.
Can they work together?
They are designed to work together, and most production RAG systems end up using both.
The most common approach is hybrid retrieval. You run a keyword search (BM25) and a semantic search (vector similarity) in parallel, then merge the results using reciprocal rank fusion or a weighted scoring function. This gives you the precision of keyword matching for exact terms and the recall of semantic matching for conceptual queries.
The architecture typically looks like this. Documents are indexed in two ways simultaneously. They go into a traditional search index (Elasticsearch, OpenSearch, or even SQLite FTS) for keyword retrieval. They also get embedded and stored in a vector database (Pinecone, Weaviate, Qdrant, pgvector) for semantic retrieval. At query time, both indexes are searched, results are merged, and the top-K documents go into the LLM prompt.
Some vector databases now support hybrid search natively, running both keyword and vector queries inside a single system. Weaviate, Qdrant, and Milvus all offer this. This simplifies the infrastructure by removing the need for two separate indexes.
Another integration pattern is to use keyword search as a pre-filter and semantic search as a re-ranker. First, you retrieve a broad set of candidates using keyword matching (fast and cheap). Then you embed only those candidates and the query, and re-rank by semantic similarity. This reduces the number of embeddings you need to compute at query time.
You can also go the other direction: semantic search first, then keyword filtering. Retrieve the top 50 by vector similarity, then filter down to documents that contain specific required terms. This is useful when the query combines conceptual intent with specific entity references.
Common mistakes
Skipping keyword search entirely. Some teams jump straight to semantic indexing because it sounds more sophisticated. Then they discover that the embedding model maps "Python 3.11" and "Python 3.12" to nearly identical vectors, and users cannot find version-specific documentation. Always evaluate whether keyword search alone meets your requirements before adding the complexity of embeddings.
Using the wrong embedding model for your domain. General-purpose embedding models (like OpenAI's text-embedding-3-small) work well for everyday language. But if your domain uses specialized vocabulary, like legal terminology, chemical compound names, or financial jargon, the embedding model may not represent these terms well. Consider fine-tuning or choosing a domain-specific model.
Not chunking documents before embedding. Embedding entire documents produces low-quality vectors because the meaning of a long document gets averaged into a single point. Chunk your documents into coherent passages of 200 to 500 tokens before embedding. The chunk size should match the granularity of answers you want to retrieve.
Confusing the component with the system. Semantic Indexing improves retrieval, but it does not fix problems in other parts of the RAG pipeline. If your chunks are too large, your prompt template is poorly designed, or the LLM is not following grounding instructions, better retrieval will not save you. Evaluate each stage independently.
Over-indexing on embedding benchmarks. The MTEB leaderboard is useful, but benchmark performance does not always correlate with real-world retrieval quality for your specific data. Test embedding models on your actual queries and documents. A model that ranks fifth on the leaderboard might outperform the leader for your domain.
Ignoring indexing latency and cost. Every time you add or update a document, you need to recompute its embedding. For corpora that change frequently (daily news, live support tickets), the cost and latency of re-embedding can become significant. Plan your indexing pipeline accordingly, and consider incremental updates rather than full re-indexing.
References
- Lewis, P., Perez, E., Piktus, A., et al. "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." NeurIPS 2020.
- Karpukhin, V., Oguz, B., Min, S., et al. "Dense Passage Retrieval for Open-Domain Question Answering." EMNLP 2020.
- Robertson, S., Zaragoza, H. "The Probabilistic Relevance Framework: BM25 and Beyond." Foundations and Trends in Information Retrieval, 2009.
- Muennighoff, N., Tazi, N., Magne, L., Reimers, N. "MTEB: Massive Text Embedding Benchmark." EACL 2023.
- Bruch, S. "Foundations of Vector Retrieval." Springer, 2024.