Defense in Depth for AI
Learn the fundamentals of AI security defense and why strong blue teams secure the full application stack, not only the model.
Listen to hear this room section by section.
Task 1
What is AI Security Defense?
In this room, AI security defense means applying blue-team thinking across the whole AI application, not only the model. That includes user input, context construction, retrieved content, connected tools, generated output, permissions, logging, and the monitoring systems around them.
Strong defenders do not ask only whether the model is "safe." They ask where failure would move next, what privilege exists after the model, and which controls are waiting there when something goes wrong. The blue-team job is not to assume the model will always behave; the job is to make failure less useful, less damaging, and easier to detect.
Task 2
Layered Controls
Strong AI defense works in layers. In practice, that usually means validating input, separating trusted policy from untrusted context, treating retrieval as a trust boundary, scoping tool permissions, filtering risky outputs, and monitoring suspicious patterns.
Each layer answers a different question. Input handling reduces obvious abuse early. Context construction decides what the model should trust. Tool gates limit impact when the assistant can act. Output controls reduce accidental disclosure. Monitoring gives the team evidence when something still slips through.
Good defensive design does not ask one control to do every job. It distributes responsibility so later layers can still reduce damage when earlier ones miss something.
Task 3
Trust Boundaries
A trust boundary is the point where untrusted content gains influence over something more privileged. In AI systems, that often happens when external text enters context, when the model can call tools, or when outputs are trusted too quickly by users or downstream systems.
Retrieved documents are a common example. If the system treats them like neutral reference material, but they can quietly steer model behavior like instructions, a hidden boundary has already been crossed.
Blue-team thinking becomes much more concrete once you stop asking "is the model safe?" and start asking "where does untrusted content cross into something with more power?"
Task 4
Prevention, Containment, and Visibility
A secure AI feature should not be judged only by whether it blocks malicious prompts. It should also be judged by whether the system limits damage when prevention fails, whether defenders can detect that failure quickly, and whether the assistant is prevented from turning language manipulation into real-world impact.
Containment lives in things like permission scope, approval gates, and safer defaults. Visibility lives in telemetry, logging, alerting, and investigation workflows. Both matter because prevention is never perfect.
Layered defense is valuable precisely because no single control can carry the whole system safely by itself.
Task 5
Follow-on Controls Check
A prompt injection has already influenced the model context in a support assistant. Select the follow-on controls that still reduce damage even after that first preventive layer failed.
Task 6
Layer Ownership Check
Match each scenario to the layer doing the main defensive work.
Task 7
Operating Order Check
Put the defensive sequence in the order those controls usually create value during a live failure.
Practical
Complete the live review task below to apply the lesson the way a defender would in a real design review.
Defense in depth
Layered Defense Drill
Build a practical control stack for a support assistant so one prompt failure does not immediately become data exposure, risky action, or a silent incident.
Prompt and context layer
Which control best reduces the chance that hostile text quietly becomes trusted instruction?
Action and tool layer
Which control best limits damage if the model still tries to take a high-risk action?
Output and disclosure layer
Which control best stops harmful or privileged content from reaching the user?
Monitoring and response layer
Which control best helps the blue team notice and investigate the failure if it still happens?
Choose one strongest control for each layer, then validate the stack to clear the practical.
Ready To Move On?
Up next: AI Attack Surface for Defenders