Trusted Instructions vs Untrusted Content
Learn how AI applications mix system policy, user input, retrieved content, and tool results, and why defenders must keep trusted instructions separate from untrusted content.
Listen to hear this room section by section.
Task 1
How AI Applications Build Context
AI applications rarely send only the user's visible prompt to the model. In practice, the application usually assembles a larger context package that may include system instructions, conversation history, user-provided content, retrieved documents, tool results, memory, and formatting rules.
This matters because not every part of that context should be trusted equally.
Some parts are meant to define the assistant's behavior. Other parts are meant to be treated as data, reference material, or task input.
For defenders, context assembly is where many AI security problems begin.
If the application mixes different trust levels together without clear boundaries, the model may give unsafe authority to text that should have stayed untrusted.
Task 2
What Counts as Trusted Instruction
Trusted instructions are the parts of the system that define how the assistant is supposed to behave. They usually come from the application owner or developer and describe the task boundaries, safety rules, and operating constraints the assistant should follow.
In a support assistant, trusted instructions might define how tickets should be handled, what data must stay private, which actions require approval, and which tools the model is never allowed to use without extra checks.
Trusted does not mean perfect. It means the application intentionally treats that source as policy rather than ordinary task content.
Task 3
What Counts as Untrusted Content
Untrusted content is everything the application should treat as data rather than policy. That includes user prompts, uploaded documents, retrieved articles, emails, tickets, CRM notes, websites, and many tool results. These sources may be useful, but they do not automatically deserve authority over model behavior.
The key defensive mistake is to let untrusted content behave like instruction. A retrieved article should help answer a question, not quietly redefine the assistant's rules. A user message should describe a task, not rewrite the system policy simply by sounding confident or authoritative.
Defenders should assume that any external text could contain attempts to manipulate the model, even if the text appears in an otherwise legitimate workflow.
Task 4
How Trust Levels Collapse
Trust levels collapse when the application fails to distinguish policy from reference material. In insecure designs, the model sees all text in one mixed context and the surrounding system gives it no reliable signal about which parts are instructions, which parts are data, and which actions require separate approval.
That collapse is what allows indirect prompt injection and disclosure problems to grow. Once untrusted content can influence high-trust behavior, the model may reveal restricted information, follow instructions hidden in documents, or attempt actions outside the intended workflow.
Defenders reduce this risk by separating trusted instructions from untrusted content, preserving metadata about source trust, narrowing what the model can do with external material, and placing additional checks at high-risk boundaries.
Task 5
Practical
Name one trusted source of instruction and one untrusted source of content in a typical AI application.
Task 6
Separation Check
Name one control or design choice that helps keep trusted instructions separate from untrusted content.
Task 7
Failure Check
Name one thing that can happen when untrusted content is treated like trusted instruction.
Ready To Move On?
Up next: Prompt Leakage and Sensitive Information Disclosure