Skip to content
Back to Python and Data for AI
Python and Data for AITables, Features, and Shapeterminal lab

Clean Up a Tiny Dataset

Inspect a tiny dataset with missing values, inconsistent labels, a duplicate row, and an identifier field that should not be treated as a normal feature. The goal is to practice cleanup judgment, not just artifact hunting.

beginner35 min125 XP

Listen to hear this room section by section.

Mission

This room is meant to be completed end-to-end in one workspace: theory, validation, and the live solve.

Flow

Read, clear the guided checkpoints, then use the runtime. The room assumes the learner is proving understanding step by step.

Time

Expect roughly 35 minutes if you work through the room properly rather than skipping straight to the solve.

1

Task 1

Briefing

You have been handed a small dataset that looks almost usable. The problem is that "almost usable" is exactly where beginners can get into trouble.

In this lab, your job is not to build a model. Your job is to inspect the files, notice the visible issues, and make the kind of cleanup judgments a careful beginner builder should make before training is even discussed.

This lab stays intentionally small so the learner can focus on the reasoning: inspect, compare, decide, and explain.

2

Task 2

Objectives

Inspect the workspace methodically

Use the terminal to map the workspace, read the data, and compare it with the notes.

Identify concrete quality issues

Find the missing value, duplicate row, inconsistent label style, and weak identifier field.

Make practical cleanup judgments

Explain which fields should be normalized, excluded, or flagged before trusting the data.

3

Task 3

Key Terms

Artifact

A file, trace, or operational clue inside the lab that helps the learner progress toward the solve.

Working directory

The current filesystem location from which terminal commands operate inside the lab.

Runtime

The live environment where the learner inspects artifacts, executes tasks, and proves the objective.

4

Task 4

How this room is meant to be used

This terminal lab is expected to be completed inside the room rather than skimmed like static documentation. Start with the briefing, move through the objectives in order, and use the runtime or validation steps to prove understanding before you claim completion.

5

Task 5

What to pay attention to

Focus on the system behavior the room is trying to teach, not just the final answer. Strong room work means understanding why the objective matters, which assumptions are being tested, and what evidence would prove success or failure in a real environment.

  • Track where trust changes inside the scenario.
  • Notice which inputs are attacker-controlled and which controls are supposed to contain them.
  • Use mistakes as signal about the concept gap, not just as failed attempts.
6

Task 6

What good completion looks like

A strong solve leaves the learner able to explain the technique, reproduce the key step deliberately, and describe how the same issue would be attacked or defended in a real deployment. The room should feel like practice, not trivia.

7

Task 7

Hint Ladder

Tier 15 XP

Start by mapping the files

Use ls or find first so you know where the data, notes, and cleanup guide live.

Tier 210 XP

Compare values carefully

Look for blanks, repeated rows, inconsistent categories, and fields that look like identifiers rather than real features.

Tier 315 XP

Think like a reviewer, not a collector

The goal is not only to spot issues, but to decide what should be normalized, excluded, or investigated before training.

Ready To Move On?

Up next: How Notebook Workflows Actually Help