RAG Security Fundamentals
Learn why retrieval changes the AI security model, how untrusted documents reach the model, and what defenders must review before treating retrieved content as useful context.
Listen to hear this room section by section.
Task 1
What RAG Adds To The System
A retrieval-augmented system takes a user request, searches a knowledge source, and then places selected documents or snippets into the model context before the model answers. That can make the assistant more useful because the model no longer depends only on what it learned during training.
For defenders, the important point is that RAG introduces another path through which text reaches the model. A secure AI application is not just the model and the user prompt. It is also the retriever, the document store, the ranking logic, and the policy that decides how retrieved material is presented.
That means defenders have to review not only whether the answer is helpful, but also whether the retrieval path allows hostile, irrelevant, stale, or sensitive content to influence the system.
Task 2
Why Retrieved Content Is Untrusted By Default
A common beginner mistake is to treat retrieved text like trusted system knowledge simply because it came from the application's own search layer. In reality, many documents in a knowledge base begin as human-authored content, imported records, uploaded files, tickets, emails, web content, or synced internal documents.
That means retrieved content may be wrong, stale, adversarial, overly sensitive, or written in a way that tries to influence the model's behavior. Even when the content is not malicious, it can still be unsafe to treat it like a higher-priority instruction source.
The safer mental model is this: retrieval provides reference material, not policy authority. The application should preserve that distinction clearly.
Task 3
Common RAG Failure Paths
RAG failures often happen when the system mixes useful retrieval with too much trust. A retrieved document may contain hidden instructions, misleading claims, privileged data, or text that only makes sense in a narrow internal workflow. If the assistant treats that content like guidance it must obey, the system may become easier to manipulate.
Other failures happen earlier in the pipeline. Defenders should ask whether the corpus includes documents that should not have been indexed, whether search results are scoped correctly for the user, whether the model can cite stale or superseded procedures, and whether retrieved content is labeled with enough context for the application to treat it safely.
In short, RAG risk is not only "can a document inject the prompt?" It is also "did the system retrieve the right thing, for the right user, with the right trust handling?"
Task 4
Defensive Priorities For Retrieval
Blue teams usually defend RAG systems by combining several controls. They reduce what can be indexed, improve metadata and provenance, scope retrieval results to the right tenant or role, preserve source labels, prevent retrieved content from overriding policy, and add monitoring around suspicious retrieval patterns.
Some controls happen before retrieval, such as ingestion review and document classification. Some happen during retrieval, such as access filtering, ranking rules, and scope checks. Some happen after retrieval, such as context separation, output controls, and approval gates.
The key beginner lesson is that RAG is safer when retrieval is treated as one layer in a system, not a magical source of truth.
Task 5
Practical
Name one reason defenders should treat retrieved content as untrusted by default.
Task 6
Pipeline Check
Name two parts of a RAG system a defender should review beyond the model itself.
Task 7
Control Check
Name two controls that make RAG systems safer.
Ready To Move On?
Up next: Document Trust, Provenance, and Ingestion Hygiene