Defense in Depth for AI

Learn the fundamentals of AI security defense and why strong blue teams secure the full application stack, not only the model.

55 minAI Security Blue Teameasy90 XP

Listen to hear this room section by section.

Task 1

What is AI Security Defense?

In this room, AI security defense means applying blue-team thinking across the whole AI application, not only the model. That includes user input, context construction, retrieved content, connected tools, generated output, permissions, logging, and the monitoring systems around them.

Strong defenders do not ask only whether the model is "safe." They ask where failure would move next, what privilege exists after the model, and which controls are waiting there when something goes wrong. The blue-team job is not to assume the model will always behave; the job is to make failure less useful, less damaging, and easier to detect.

Task 2

Layered Controls

Strong AI defense works in layers. In practice, that usually means validating input, separating trusted policy from untrusted context, treating retrieval as a trust boundary, scoping tool permissions, filtering risky outputs, and monitoring suspicious patterns.

Each layer answers a different question. Input handling reduces obvious abuse early. Context construction decides what the model should trust. Tool gates limit impact when the assistant can act. Output controls reduce accidental disclosure. Monitoring gives the team evidence when something still slips through.

Good defensive design does not ask one control to do every job. It distributes responsibility so later layers can still reduce damage when earlier ones miss something.

Task 3

Trust Boundaries

A trust boundary is the point where untrusted content gains influence over something more privileged. In AI systems, that often happens when external text enters context, when the model can call tools, or when outputs are trusted too quickly by users or downstream systems.

Retrieved documents are a common example. If the system treats them like neutral reference material, but they can quietly steer model behavior like instructions, a hidden boundary has already been crossed.

Blue-team thinking becomes much more concrete once you stop asking "is the model safe?" and start asking "where does untrusted content cross into something with more power?"

Task 4

Prevention, Containment, and Visibility

A secure AI feature should not be judged only by whether it blocks malicious prompts. It should also be judged by whether the system limits damage when prevention fails, whether defenders can detect that failure quickly, and whether the assistant is prevented from turning language manipulation into real-world impact.

Containment lives in things like permission scope, approval gates, and safer defaults. Visibility lives in telemetry, logging, alerting, and investigation workflows. Both matter because prevention is never perfect.

Layered defense is valuable precisely because no single control can carry the whole system safely by itself.

Task 5

Follow-on Controls Check

A prompt injection has already influenced the model context in a support assistant. Select the follow-on controls that still reduce damage even after that first preventive layer failed.

Which follow-on controls still reduce damage after the initial prompt-injection prevention layer failed?

Alert on disclosure attempts and unusual tool behaviorNarrow tool permissions so the assistant cannot act with broad authorityRequire approval before export, external email, or credit actionsChange the chatbot mascot and marketing headlineReview or redact sensitive output before it reaches the userAdd more interface animation so the product feels safer

Task 6

Layer Ownership Check

Match each scenario to the layer doing the main defensive work.

For each situation, choose the defensive layer that is carrying the main load.

Retrieved knowledge-base text is wrapped and kept separate from system instructions before the model sees it.

The assistant wants to export customer data, but the workflow stops until a human approves it.

A draft answer contains internal-only instructions, and a safeguard blocks that text before the user sees it.

Repeated refusal-bypass attempts trigger an alert and give the blue team investigation evidence.

Task 7

Operating Order Check

Put the defensive sequence in the order those controls usually create value during a live failure.

Put the blue-team sequence in order from early reduction to post-incident improvement.

Keep retrieved or external content in an untrusted lane before it shapes behavior

Require approval or least privilege before high-impact actions can happen

Generate telemetry and alerts when suspicious behavior still occurs

Investigate, contain, and improve the system after the event

Practical

Complete the live review task below to apply the lesson the way a defender would in a real design review.

practical

Defense in depth

Layered Defense Drill

Live lab

Build a practical control stack for a support assistant so one prompt failure does not immediately become data exposure, risky action, or a silent incident.

Study lab progress0%

Prompt and context layer

Which control best reduces the chance that hostile text quietly becomes trusted instruction?

Action and tool layer

Which control best limits damage if the model still tries to take a high-risk action?

Output and disclosure layer

Which control best stops harmful or privileged content from reaching the user?

Monitoring and response layer

Which control best helps the blue team notice and investigate the failure if it still happens?

Choose one strongest control for each layer, then validate the stack to clear the practical.

Ready To Move On?

Up next: AI Attack Surface for Defenders

Back to Path Continue to Next Room