3
We need to revisit a few concepts and establish constraints to answer this question. How do we know a decision tree has an optimal depth?
First, the depth of a decision tree is the length of the longest path from a root to a leaf. There’s a tradeoff to keep in mind related to this value. Deeper trees will lead to more complex models prone to overfitting. Shallower trees will have the opposite problem: less complex models prone to underfitting. Finding the appropriate depth is critical to getting good results.
To determine the optimal depth, we then need to establish how we will measure the performance of the decision tree. Since the goal is to build a model that produces good predictions, I’ll assume that our goal is to find the optimal depth to get the best predictions possible on a test dataset.
Let’s start with the first choice that argues that we should set the depth at the number of training samples minus one. This is the maximum theoretical depth of a decision tree, but it’s not the optimal value. If we make the tree this deep, we will undoubtedly overfit the training data and perform poorly on any unseen data.
The second choice is also incorrect. Although this option is probably better than setting the depth to the number of samples, it will still lead to overfitting. For example, for a dataset with 1024 training samples, we will need a decision tree with 10 levels and a tree of depth 10, which requires a total of 2^10 = 1024 nodes. This choice will lead to a tree with as many nodes as samples, which will cause the tree to overfit.
We can analyze the third and fourth choices together. Are we better off setting the depth of a decision tree as low or as high as possible? Let’s pick two hypothetical decision trees, each producing the same results, but one deeper than the other. Which of these two models would you use?
Less complex classifiers will usually generalize better than more complex ones. We should remove any non-critical or redundant sections of a decision tree, which generally leads to a much better model. Decision tree pruning is one technique used to accomplish this.
If you have two decision trees with the same predictive power in your test dataset, always pick the simpler one. Therefore, we should always set the depth as low as possible without affecting the model’s predictive capabilities.
In summary, the third choice is the correct answer to this question.
Recommended reading