Anomaly Detection

Anomaly detection involves identifying Outliers. Detecting these anomalies is necessary for maintaining data integrity and improving model performance.

Outlier detection needs formalized criteria (e.g., statistical models, distance measures, density estimates) because “weird” is contextual and depends on distribution.

Methods for Detecting Anomalies

In ML_Tools see:

Pycaret_Anomaly.ipynb
Explore PyOD

Process of Detection

Data Preparation

Data Cleansing: Handle missing values and remove any irrelevant data points.
Normalisation/Standardisation: Scale the data if necessary, especially if using methods sensitive to the scale.

Normality Assumption: Data Distribution

Many classic methods (like z-score, Gaussian Mixture Models) assume normally distributed data]
If the data is not normal, other methods (e.g., Isolation Forest, One-Class SVM) may be better.
Always inspect distributions first (e.g., histogram, Q-Q Plot.

Anomaly Detection with a model: Use a chosen method to flag anomalies

Training and Testing on Same Set (Careful!)

In outlier detection, sometimes you train and test on the same dataset — especially in pure unsupervised settings.
However, overfitting is a risk. It’s common to:
- Train initial model
- Remove detected outliers
- Re-train
- Repeat inspection → removal → retraining (iterative loop you wrote about).

Interpretation | Validation: Why is it unusual

Validate the detected anomalies by comparing them against known anomalies (if available) or using domain knowledge.
Adjust thresholds or methods based on validation results.
Finding an outlier is not enough. You must ask: “Why is this point different?” Necessary for Trust & Actionability.
Example: Anomalous electricity usage — is it fraud, error, or legitimate increased demand?

Visualization

Visualize the results using plots (e.g., scatter plots, box plots) to understand the distribution of data and the identified anomalies.
Boxplot: Displays the distribution and identifies outliers using the interquartile range (IQR).
Scatter Plot: Helps visually identify outliers.

Data Archive

Explorer

Anomaly Detection

Methods for Detecting Anomalies

Process of Detection

Backlinks

Explorer