Decision Trees are Fragile

Decision Trees are considered fragile because small changes in the training data can lead to large changes in the tree structure and predictions.

Greedy splitting
- Trees build by making locally optimal splits at each node.
- A slight change in data (e.g., one sample or a small noise) can change which split is chosen, altering the entire subtree downstream.
High variance Variance in ML
- Decision trees have low bias but high variance.
- They overfit easily to noise or outliers unless pruned or regularized.
Hierarchical structure
- Early splits strongly influence the rest of the tree.
- A different root split due to minor changes cascades into a completely different structure.
Sensitivity to outliers
- Extreme values can dominate split selection, further increasing instability.

Different training sets produce very different trees → poor generalization.
This is why ensemble methods like Random Forestand Gradient Boosted Trees are widely used; they reduce variance by combining multiple trees.

Data Archive