Hailey’s labeling approach

Published

August 25, 2022

Hailey had an excellent idea!

Despite having access to a massive dataset of x-ray pictures, she didn’t have too many labels. The company couldn’t afford to have doctors spending their time labeling samples, so she was in a catch-22 situation: to save the doctor’s time, she wanted to build a supervised learning model, but she couldn’t do that without having the doctors spend more time labeling data.

Hailey found a way to solve the problem: she built a model using the labels she had. Then started processing the unlabeled samples and setting aside those for which her model wasn’t confident.

After each round, Hailey asked doctors only to spend time labeling those low-confidence samples, after which she built a new model and repeated the process.

This approach saved the company a massive amount of money.

How would you classify Hailey’s approach?

  1. Hailey’s approach is a Weak Supervision technique.

  2. Hailey’s approach is a Reinforcement Learning technique.

  3. Hailey’s approach is an Unsupervised Learning technique.

  4. Hailey’s approach is a Semi-supervised Learning technique.

4

Hailey encountered a widespread problem we often face when we want to build machine learning models for companies. Sometimes, we have access to plenty of data, but labeling the dataset is time-consuming or hard to collect without a prohibitive cost.

Hailey did something clever: she only asked doctors to spend time labeling images that had the most value to her model. She built a good model with a fraction of the labels that otherwise would have been necessary.

We call this technique “Active Learning”, a semi-supervised learning technique. Hailey followed a pool-based sampling approach, where she queried the pool of unlabeled samples and used the confidence of her model to decide which instances to label next.

Recommended reading