Z-scores standardize a value relative to a distribution by measuring how many standard deviations it is from the mean. This is useful for Outliers and Normalisation.
Definition:
The Z-score of a value is given by:
where is the sample mean and is the sample standard deviation.
Interpretation:
- : The value equals the mean.
- : Indicates a possible outlier (if normality is assumed).
- Z-scores allow comparisons across different distributions.
Assumptions:
- Data is approximately normally distributed.
- Useful primarily when comparing existing values to a distribution.
Use Cases:
- Standardizing data for machine learning algorithms.
- Detecting anomalies.
- Ranking or scoring values.
Related terms:
2. Modified Z-Score
- Formula:
- : Median Absolute Deviation
- Procedure:
- Use this method for datasets with extreme outliers.
- Points with are typically anomalies.