Distributions in Decision Tree Leaves

In decision trees, each leaf node typically contains the outcome for all training examples that fall into that leaf. This outcome can be represented in two ways:

1. Single Predicted Value

Classification: The leaf predicts the majority class (e.g., if 60% of samples are class A, predict A).
Regression: The leaf predicts the mean of the target values in that leaf.

2. Distribution of Target Values

Instead of storing just a single prediction, the leaf can store the distribution of the target variable:

Classification: Store class probabilities. Example: If a leaf has 100 samples → 60 class 1, 40 class 2 → Distribution = {class 1: 0.6, class 2: 0.4}.
Regression: Store a histogram or density estimate of the continuous values rather than just their mean.

Why use distributions?

Enables probabilistic predictions (e.g., predict class probabilities, not just hard labels).
Provides uncertainty estimates, useful for Bayesian methods and risk-sensitive decisions.
Improves interpretability by showing the variability in outcomes for samples reaching the same leaf.

This method is commonly applied in:

Random Forests and Gradient Boosted Trees (which use class probabilities for classification).
Probabilistic decision tree models for uncertainty-aware predictions.

Data Archive

Explorer

Distributions in Decision Tree Leaves

1. Single Predicted Value

2. Distribution of Target Values

Why use distributions?

Backlinks

Explorer