RAG & Semantic Search

Connecting AI to real-world knowledge — retrieval-augmented generation, vector databases, embeddings, and semantic search patterns.

Home /
AI Principles /
RAG & Semantic Search

Three structural limits of query transformation: latency tax, query drift, hallucinated documents from LLM rewriters

MONA explainer 12 min Apr 30, 2026

Query Transformation Limits: Latency Tax, Drift, Hallucinated Documents

Query transformation in RAG hits three hard limits: latency tax from extra LLM calls, query drift on simple inputs, and …

Diagram of a compound query splitting into parallel retrievable sub-queries via decomposition, routing, and RAG-Fusion

MONA explainer 11 min Apr 30, 2026

From Recall Failures to RAG-Fusion: Prerequisites and Inner Workings of Query Decomposition and Routing

Vector retrievers lose compound questions to a single point. Query decomposition, routing, and RAG-Fusion fix it by …

Two-stage retrieve-and-rerank pipeline where a fast bi-encoder retrieves candidates and a cross-encoder reorders them

MONA explainer 12 min Apr 30, 2026

Cross-Encoders, Bi-Encoders, and Listwise Scoring in Reranking

A reranker reorders the top candidates from vector search using a heavier model. Cross-encoders, bi-encoders, and …

Cross-encoder reranker scaling: latency grows with candidate count and token length, plus MS MARCO domain drift

MONA explainer 14 min Apr 30, 2026

Cross-Encoder Reranker Limits: Latency Walls and Domain Drift

Cross-encoder rerankers hit two architectural walls: latency scales linearly with candidates and quadratically with …

Two ranked retrieval lists — keyword and semantic — fusing into a single hybrid result for RAG pipelines

MONA explainer 12 min Apr 29, 2026

What Is Hybrid Search and How BM25 Plus Dense Vectors Beat Either Alone in RAG

Hybrid search fuses BM25 keyword retrieval with dense vector search using reciprocal rank fusion. Why two ranked lists …

Diagram of hybrid search: BM25 lexical index and dense vector index merged by reciprocal rank fusion into one ranked list

MONA explainer 11 min Apr 29, 2026

BM25, SPLADE, and Reciprocal Rank Fusion: The Building Blocks of Production Hybrid Search

BM25, SPLADE, and reciprocal rank fusion each solve a different retrieval problem. Here's how the three combine into a …

Three structural failure surfaces in production RAG: retrieval misses, position bias on long context, grounding conflicts

MONA explainer 11 min Apr 29, 2026

Why RAG Still Fails in Production: Retrieval, Chunking, Grounding

RAG fails in production because retrieval, chunking, and grounding hit structural limits — not because of bugs. Why …

Particles forming a knowledge retrieval graph that grounds an LLM response in source documents

MONA explainer 10 min Apr 29, 2026

What Is RAG and How LLMs Use Vector Search to Ground Their Answers

Retrieval-augmented generation pairs an LLM with a vector index so answers are grounded in real documents — not just …

Hybrid search fusion: BM25 and vector score distributions colliding in a merge step that yields inconsistent rankings

MONA explainer 13 min Apr 29, 2026

Score Mismatch, Tuning Hell: The Hard Limits of Hybrid Search Fusion

Hybrid search merges BM25 and vector results, but the fusion step has hard limits. Score mismatch, RRF blindness, and …

RAG pipeline as a chain of transformations: chunking, embedding, vector storage, retrieval, and reranking

MONA explainer 12 min Apr 29, 2026

From Chunking to Reranking: RAG Pipeline Components and Prerequisites

Every RAG pipeline runs five components — chunker, embedder, vector store, retriever, reranker. Here is what each one …

Geometric visualization of sentence vectors converging in embedding space through contrastive learning

MONA explainer 9 min Mar 24, 2026

What Is Sentence Transformers and How Contrastive Learning Produces Sentence-Level Embeddings

Sentence Transformers turns transformers into sentence encoders via contrastive learning. Covers bi-encoders, loss …

Geometric visualization of sentence embedding vectors collapsing into a narrow cone in high-dimensional space

MONA explainer 11 min Mar 24, 2026

From Cosine Similarity to Anisotropy: Prerequisites and Hard Limits of Sentence-Level Embeddings

Sentence Transformers encode meaning as geometry. Learn the prerequisites, token limits, and anisotropy traps that …

Geometric grid of per-token vectors with MaxSim scoring paths connecting query and document token matrices

MONA explainer 10 min Mar 24, 2026

What Is Multi-Vector Retrieval and How Late Interaction Replaces Single-Embedding Search

Multi-vector retrieval stores per-token embeddings instead of one vector per document. Learn how ColBERT MaxSim scoring …

Comparison of single-vector and token-level multi-vector retrieval showing storage and latency cost explosion

MONA explainer 9 min Mar 24, 2026

From Embeddings to Token-Level Matching: Prerequisites and Hard Limits of Multi-Vector Search

Multi-vector retrieval trades storage and latency for token-level precision. Learn the prerequisites, storage math, and …

Hierarchical graph layers connecting scattered data points across dimensional space for nearest-neighbor search

MONA explainer 10 min Mar 24, 2026

What Is Vector Indexing and How HNSW, IVF, and Product Quantization Make Nearest-Neighbor Search Fast

Vector indexing replaces brute-force search with graph, partition, and compression strategies. Learn how HNSW, IVF, and …

$Abstract visualization of expanding graph nodes consuming memory while search accuracy fractures at scale$

MONA explainer 10 min Mar 24, 2026

RAG & Semantic Search

Query Transformation Limits: Latency Tax, Drift, Hallucinated Documents

From Recall Failures to RAG-Fusion: Prerequisites and Inner Workings of Query Decomposition and Routing

Cross-Encoders, Bi-Encoders, and Listwise Scoring in Reranking

Cross-Encoder Reranker Limits: Latency Walls and Domain Drift

What Is Hybrid Search and How BM25 Plus Dense Vectors Beat Either Alone in RAG

BM25, SPLADE, and Reciprocal Rank Fusion: The Building Blocks of Production Hybrid Search

Why RAG Still Fails in Production: Retrieval, Chunking, Grounding

What Is RAG and How LLMs Use Vector Search to Ground Their Answers

Score Mismatch, Tuning Hell: The Hard Limits of Hybrid Search Fusion

From Chunking to Reranking: RAG Pipeline Components and Prerequisites

What Is Sentence Transformers and How Contrastive Learning Produces Sentence-Level Embeddings

From Cosine Similarity to Anisotropy: Prerequisites and Hard Limits of Sentence-Level Embeddings

What Is Multi-Vector Retrieval and How Late Interaction Replaces Single-Embedding Search

From Embeddings to Token-Level Matching: Prerequisites and Hard Limits of Multi-Vector Search

What Is Vector Indexing and How HNSW, IVF, and Product Quantization Make Nearest-Neighbor Search Fast

Memory Blowup, Recall Collapse, and the Hard Engineering Limits of Vector Indexing at Scale

From Distance Metrics to Graph Traversal: Prerequisites for Understanding Vector Index Internals

Vector Search for Developers: What Transfers and What Breaks

What Are Similarity Search Algorithms and How Nearest Neighbor Methods Find Matching Vectors

From Distance Metrics to Index Structures: The Building Blocks of Vector Similarity Search

Curse of Dimensionality, Recall vs. Speed, and the Hard Limits of Approximate Nearest Neighbor Search

Cookie Settings