DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a Clustering algorithm that groups together data points based on density. It is particularly useful when K-means doesn’t work well, such as in datasets with complex shapes or when there are outliers.
- Used when K-means doesn’t work: DBSCAN handles datasets with irregular cluster shapes and is not sensitive to outliers like K-means.
- When you have nesting of clusters: It can identify clusters of varying shapes and sizes without needing to predefine the number of clusters, unlike K-means.
- Groups core points to make clusters: DBSCAN identifies core points, which have many nearby points, and groups them together.
- Can identify Outliers: It detects noise points (outliers) that don’t belong to any cluster.
Python Example:
from sklearn.cluster import DBSCAN
from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Create sample data
X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Apply DBSCAN
dbscan = DBSCAN(eps=0.3, min_samples=5)
clusters = dbscan.fit_predict(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='plasma')
plt.show()
This will cluster the data and visualize it, highlighting core points and marking outliers as separate clusters.