Z-scores standardize a value relative to a distribution by measuring how many standard deviations it is from the mean. This is useful for Outliers and Normalisation.

Definition:
The Z-score of a value is given by:

where is the sample mean and is the sample standard deviation.

Interpretation:

  • : The value equals the mean.
  • : Indicates a possible outlier (if normality is assumed).
  • Z-scores allow comparisons across different distributions.

Assumptions:

  • Data is approximately normally distributed.
  • Useful primarily when comparing existing values to a distribution.

Use Cases:

  • Standardizing data for machine learning algorithms.
  • Detecting anomalies.
  • Ranking or scoring values.

Related terms:

2. Modified Z-Score

  • Formula:
    • : Median Absolute Deviation
  • Procedure:
    • Use this method for datasets with extreme outliers.
    • Points with are typically anomalies.