Skip to content
Back to How Models Learn
How Models LearnData Is The TeacherBuilder Lab

Repair the Training Set

Convert a weak dataset repair plan into a trustworthy one by blocking unlabeled rows, duplicates, and split leakage with explicit rules for this toy customer dataset.

intermediate35 min140 XP

Listen to hear this room section by section.

Mission

This room is meant to be completed end-to-end in one workspace: theory, validation, and the live solve.

Flow

Read, clear the guided checkpoints, then use the runtime. The room assumes the learner is proving understanding step by step.

Time

Expect roughly 35 minutes if you work through the room properly rather than skipping straight to the solve.

1

Task 1

Briefing

The dataset audit is done. Now the team needs a repair plan that actually changes behavior. In this builder lab, you are editing a compact policy document that controls how the training set gets cleaned before model development continues.

This is a guided builder, not an open notebook. The attack prompts represent the bad outcomes you still need to block. When your plan contains the right safeguards, the attack suite will fail and the checkpoints will clear.

Focus on the three issues from the triage lab: unlabeled rows, duplicate examples, and customer-level leakage across splits. The exact repair rules in this builder are specific to this toy customer dataset, but the principles match real ML data-cleaning work.

2

Task 2

Objectives

Name the critical data failures

Identify the specific quality problems your repair plan must eliminate before training.

Encode the fixes in the plan

Edit the builder document so each weak practice is replaced with a concrete cleanup rule.

Pass the validation suite

Run the attack checks and clear every checkpoint with a repair plan that blocks all three issues.

3

Task 3

Key Terms

Validation suite

A set of checks used to test whether a defensive or governance implementation meets the room objective.

Attack case

A representative adversarial or risky input used to pressure-test the learner's implementation.

4

Task 4

How this room is meant to be used

This builder lab is expected to be completed inside the room rather than skimmed like static documentation. Start with the briefing, move through the objectives in order, and use the runtime or validation steps to prove understanding before you claim completion.

5

Task 5

What to pay attention to

Focus on the system behavior the room is trying to teach, not just the final answer. Strong room work means understanding why the objective matters, which assumptions are being tested, and what evidence would prove success or failure in a real environment.

  • Track where trust changes inside the scenario.
  • Notice which inputs are attacker-controlled and which controls are supposed to contain them.
  • Use mistakes as signal about the concept gap, not just as failed attempts.
6

Task 6

What good completion looks like

A strong solve leaves the learner able to explain the technique, reproduce the key step deliberately, and describe how the same issue would be attacked or defended in a real deployment. The room should feel like practice, not trivia.

7

Task 7

Hint Ladder

Tier 15 XP

Make the rules explicit

Vague notes like "clean later" will not block the attacks. The plan needs direct instructions for missing labels, duplicates, and split overlap.

Tier 210 XP

Think in data pipeline controls

Good repair plans specify what gets dropped, what gets reviewed, and how the split is enforced before metrics are calculated.

Tier 315 XP

The attack suite is looking for concrete phrases

Add specific cleanup rules for unlabeled rows, duplicate customer records, and grouping splits by customer ID in this lab's dataset.

Ready To Move On?

Up next: Prediction, Loss, and Updates