RAG Poisoning
Also known as: knowledge database poisoning, retrieval poisoning, knowledge store injection
- RAG Poisoning
- RAG poisoning is an attack that injects crafted malicious documents into the external knowledge store a Retrieval-Augmented Generation system retrieves from, causing the LLM to produce attacker-chosen outputs when a target query is processed.
RAG poisoning is an attack where malicious documents are injected into a RAG system’s external knowledge store, causing the LLM to return attacker-controlled answers when a target query arrives.
What It Is
A Retrieval-Augmented Generation (RAG) system answers questions differently from a standard LLM: instead of relying solely on what the model learned during training, it first searches an external knowledge store — a vector database, document index, or enterprise wiki — retrieves the most relevant passages, and feeds them to the LLM as context before generating a response. This lets organizations connect models to live internal documentation without retraining. The tradeoff: the LLM trusts whatever it retrieves.
RAG poisoning exploits that trust. An attacker who gains write access to the knowledge store — through a submission form, an exposed content API, or a compromised ingestion pipeline — can inject documents engineered to hijack the LLM’s output on specific queries. The model weights stay untouched. Training-time defenses don’t apply. The compromise lives entirely in the data layer.
Think of it as a targeted shelf attack. A company’s RAG assistant answers HR questions by querying an internal policy database. If an attacker can add a document to that database, they can plant a forged policy that overrides the real one — but only for users who ask the relevant question. Everyone else gets the correct answer. The attack is quiet and targeted by design.
The injection must satisfy two conditions simultaneously. According to arXiv (PoisonedRAG), the poisoned document must rank high in retrieval for the target query (retrieval condition), and once retrieved, it must steer the LLM toward the attacker’s intended output (generation condition). The research demonstrated this in both white-box and black-box settings — the latter being more realistic, since most attackers won’t know the retriever or model architecture.
The injection doesn’t require volume. According to arXiv (PoisonedRAG), a small number of precisely crafted documents placed among millions can achieve high attack success rates on target queries — a sharp asymmetry between attacker effort and impact.
Unlike the training data poisoning attacks in the parent article — backdoor triggers, clean-label, label flipping — RAG poisoning operates at inference time. A clean, well-trained model can still produce corrupted outputs if the knowledge store it queries has been compromised. The threat surface is the data pipeline feeding retrieval, not the model’s training pipeline.
How It’s Used in Practice
The most common exposure is enterprise RAG deployments where the knowledge store is populated from multiple sources or teams. A legal team’s contract review assistant pulling from a shared document repository, a customer support agent querying a product knowledge base maintained by multiple contributors, or a compliance tool ingesting regulatory documents from a third-party feed — all share the same structural vulnerability: the LLM’s outputs depend on a data source whose integrity is only as strong as the access controls protecting it.
The attack also applies to RAG systems built on semi-public data: scraped web content, open document repositories, or crowdsourced knowledge bases. If the data ingestion pipeline doesn’t validate for adversarial content before indexing, injected documents can reach the vector database without manual review.
According to arXiv (RAG Security Survey), RAG-specific defenses remain an open research area as of 2025, and no detection method yet handles the black-box attack case reliably. Organizations running RAG systems carry this exposure whether or not they’ve named it.
Pro Tip: Audit who has write access to every data source your RAG system ingests from — including third-party integrations and automated feeds. Access control over the knowledge store is your primary defense layer. If the knowledge store is treated as a read-only, curated source with strict contributor controls and ingestion-time validation, the attack surface shrinks to near zero.
When to Use / When Not
The table below uses “Use” to signal when RAG poisoning is an active threat worth dedicated mitigation, and “Avoid” (lower priority) where standard access controls are likely sufficient.
| Scenario | Use | Avoid |
|---|---|---|
| Knowledge store accepts contributions from multiple teams or external parties | ✅ | |
| Internal read-only store maintained by a single trusted team with audited access | ❌ | |
| RAG system informs high-stakes decisions (medical, legal, financial, compliance) | ✅ | |
| Low-stakes informational use where all responses receive human review | ❌ | |
| Customer-facing deployment where adversaries have incentive to manipulate outputs | ✅ | |
| Closed internal deployment with strict, logged contributor access and no external feeds | ❌ |
Common Misconception
Myth: RAG poisoning requires access to the model’s training data or weights, so it’s primarily a concern for model providers — not organizations deploying existing models.
Reality: RAG poisoning only requires write access to the external knowledge store, not the model. Organizations that deploy off-the-shelf models remain fully exposed if their knowledge store lacks adequate access controls. The model provider’s security posture is irrelevant to this attack vector. You can run the most carefully trained model in the world and still have poisoned outputs if the retrieval layer is compromised.
One Sentence to Remember
If you’re building with RAG, treat your knowledge store with the same security discipline you’d apply to your application database — because from the model’s perspective, what it retrieves is what it believes.
FAQ
Q: How is RAG poisoning different from prompt injection? A: Prompt injection manipulates the model through crafted user inputs at query time. RAG poisoning pre-plants malicious content in the knowledge store before any user interacts with the system — a persistent compromise rather than a real-time manipulation attempt.
Q: Can you detect poisoned documents in a knowledge store? A: Automated detection is an open research problem. Anomaly detection on document similarity, provenance tracking, and retrieval auditing are early mitigation approaches, but according to arXiv (RAG Security Survey), no robust detection method exists yet for black-box attack settings.
Q: Does using a large knowledge base protect against this attack? A: No. According to arXiv (PoisonedRAG), a small number of precisely crafted documents can achieve high attack success rates on target queries even in very large databases. The scale of the knowledge base provides minimal protection against a targeted injection.
Sources
- arXiv (PoisonedRAG): PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models — foundational paper by Zou, Geng, Wang, Jia; presented at USENIX Security 2025
- OWASP: OWASP Top 10 for LLM Applications 2025 — classifies RAG poisoning under LLM04:2025 Data and Model Poisoning
Expert Takes
RAG poisoning is a product of the trust model underlying retrieval-augmented generation. The LLM treats retrieved context as authoritative — by design, because that’s what makes RAG useful. A poisoned document doesn’t have to trick the model; it has to trick the retriever. Once it clears retrieval, the generation stage processes it like any other context. The attack surface is the gap between “document is in the index” and “document was verified as legitimate.”
The practical risk surface expands with how loosely you specify your ingestion pipeline. If your knowledge store has no access gate — anyone can push documents, third-party integrations write directly, there’s no review step — you’ve opened an injection vector. Treat the knowledge store the way you’d treat an API contract: define who can write, what format is accepted, and what validation runs before indexing. That spec gap is where this attack lives.
Every organization running a RAG system is also running an adversarial content moderation problem — whether they know it or not. The productivity gains from connecting LLMs to internal knowledge bases are real. So is the attack vector those connections create. Security posture for RAG can’t be scoped in after deployment; it has to be part of the architecture from the first design decision about who controls the knowledge store.
When a poisoned document steers an AI to give legally incorrect advice or route a support case to a harmful resolution, who’s accountable? The attacker who planted the document is obvious — but they’re often anonymous and unreachable. The organization that deployed the system with insufficient knowledge store controls is also implicated. RAG poisoning makes the question of organizational responsibility for AI outputs more concrete and more urgent than most AI ethics discussions manage to be.