In decision trees, each leaf node typically contains the outcome for all training examples that fall into that leaf. This outcome can be represented in two ways:

1. Single Predicted Value

  • Classification: The leaf predicts the majority class (e.g., if 60% of samples are class A, predict A).
  • Regression: The leaf predicts the mean of the target values in that leaf.

2. Distribution of Target Values

Instead of storing just a single prediction, the leaf can store the distribution of the target variable:

  • Classification: Store class probabilities. Example: If a leaf has 100 samples 60 class 1, 40 class 2 Distribution = {class 1: 0.6, class 2: 0.4}.
  • Regression: Store a histogram or density estimate of the continuous values rather than just their mean.

Why use distributions?

  • Enables probabilistic predictions (e.g., predict class probabilities, not just hard labels).
  • Provides uncertainty estimates, useful for Bayesian methods and risk-sensitive decisions.
  • Improves interpretability by showing the variability in outcomes for samples reaching the same leaf.

This method is commonly applied in:

  • Random Forests and Gradient Boosted Trees (which use class probabilities for classification).
  • Probabilistic decision tree models for uncertainty-aware predictions.