Anomaly Detection

Identify unusual or unexpected data points that deviate from the norm.

There are several ways to detect Outliers

Use visual methods like boxplots, statistical methods like Z-scores, or clustering techniques.

Visual Methods

  • Boxplot: Displays the distribution and identifies outliers using the interquartile range (IQR).
  • Scatter Plot: Helps visually identify outliers.

Clustering

  • Description: Outliers often form small clusters or are isolated from main clusters.

PCA-Based Anomaly Detection

In ML_Tools see: PCA_Based_Anomaly_Detection.py

Time Series Methods

See Time Series.

Statistical Methods

Z-Score: Identifies outliers by measuring how many standard deviations a data point is from the mean.

Gaussian method

Anomaly Detection

Example: You have a dataset of servers unlabled We aim to detect those that do not work (anomalies).

Guassian model

To perform anomaly detection, you will first need to fit a model to the data’s distribution.

  • Given a training set you want to estimate the Gaussian distribution for each of the features .

  • Recall that the Gaussian distribution is given by

    where is the mean and controls the variance.

  • For each feature , you need to find parameters and that fit the data in the -th dimension (the -th dimension of each example).

You can estimate the parameters, (, ), of the -th feature by using the following equations. To estimate the mean, you will use:

and for the variance you will use:

Low proabaility of being togerher. Make a 2D plot of two features. Permute feature cominbations if necessary.

What is multivariate guassian?

  • The low probability examples are more likely to be the anomalies in our dataset.
  • One way to determine which examples are anomalies is to select a threshold based on a cross validation set. What epsilon to choose