Bagging or Boosting?

Published

September 22, 2022

Katherine wants to use an ensemble model to process her dataset.

There’s only one question for her to answer: Should she use bagging or boosting?

Both techniques have different advantages and disadvantages, and Katherina wants to ensure she evaluates them correctly before committing to one solution.

Which of the following statements are true about bagging and boosting?

  1. Bagging trains individual models sequentially, using the results from the previous model to inform the selection of training samples.

  2. Boosting trains individual models sequentially, using the results from the previous model to inform the selection of training samples.

  3. Bagging trains a group of models, each using a subset of data selected randomly with replacement from the original dataset.

  4. Each model receives equal weight in bagging to compute the final prediction while boosting uses some way of weighing each model.

2, 3, 4

Ensembling is where we combine a group of models to produce a new model that yields better results than any initial individual models. Bagging and boosting are two popular ensemble techniques.

Bagging trains a group of models in parallel and independently from each other. Each model uses a subset of the data randomly selected with replacement from the original dataset. In contrast, Boosting trains a group of learners sequentially, using the results from each model to inform which samples to use to train the next model.

This summary helps us conclude that the first choice is incorrect, but the second and third choices are correct.

Finally, when computing the final prediction, bagging averages out the results of each model. Boosting, however, weights each model depending on its performance. Therefore, the fourth choice is also correct.

Recommended reading