Skip to content
Back to How Models Learn
How Models LearnData Is The Teacherterminal lab

Dataset Triage Lab

Inspect a toy churn dataset, identify missing labels, duplicate records, and split leakage, then prove what is wrong from the artifacts in the lab.

beginner30 min120 XP

Listen to hear this room section by section.

Mission

This room is meant to be completed end-to-end in one workspace: theory, validation, and the live solve.

Flow

Read, clear the guided checkpoints, then use the runtime. The room assumes the learner is proving understanding step by step.

Time

Expect roughly 30 minutes if you work through the room properly rather than skipping straight to the solve.

1

Task 1

Briefing

You have been handed a tiny churn-training workspace that already "looks good enough" to a rushed team. Your job is to slow down and inspect the artifacts before anyone trusts the metrics.

The dataset is small enough to reason about manually. Use the terminal to inspect the files, compare the splits, and answer each checkpoint in order.

Treat this like a first-pass audit. You are not fixing the dataset yet. You are proving what is broken and why it matters.

2

Task 2

Objectives

Inspect the training artifacts

Use the terminal to read the dataset files, audit notes, and split manifest inside the workspace.

Prove the quality issues

Identify the missing labels, duplicate row pair, and leaking customer split from the evidence in the files.

Build a trustworthy audit trail

Answer each checkpoint with the concrete signal the operator would report back to the team.

3

Task 3

Key Terms

Artifact

A file, trace, or operational clue inside the lab that helps the learner progress toward the solve.

Working directory

The current filesystem location from which terminal commands operate inside the lab.

Runtime

The live environment where the learner inspects artifacts, executes tasks, and proves the objective.

4

Task 4

How this room is meant to be used

This terminal lab is expected to be completed inside the room rather than skimmed like static documentation. Start with the briefing, move through the objectives in order, and use the runtime or validation steps to prove understanding before you claim completion.

5

Task 5

What to pay attention to

Focus on the system behavior the room is trying to teach, not just the final answer. Strong room work means understanding why the objective matters, which assumptions are being tested, and what evidence would prove success or failure in a real environment.

  • Track where trust changes inside the scenario.
  • Notice which inputs are attacker-controlled and which controls are supposed to contain them.
  • Use mistakes as signal about the concept gap, not just as failed attempts.
6

Task 6

What good completion looks like

A strong solve leaves the learner able to explain the technique, reproduce the key step deliberately, and describe how the same issue would be attacked or defended in a real deployment. The room should feel like practice, not trivia.

7

Task 7

Hint Ladder

Tier 15 XP

Start with the directory map

Use find or ls first so you know which data, reports, and notes are available before diving into a single file.

Tier 210 XP

Compare the split artifacts, not just the row counts

Leakage often hides in repeated entities across train and validation, even when the files look separate.

Tier 315 XP

The audit report already hints at the three issues

Use the summary report to narrow the search, then confirm the exact rows or customer IDs from the raw files.

Ready To Move On?

Up next: Repair the Training Set