File Path: \ML_Tools\Build\Clustering\KMeans\K_Means.py

Key Concepts Used in the Script

  1. Data Loading:

    • The script reads data from a CSV file (penguins.csv) and uses a sample dataset with random features for demonstration purposes.
  2. Data Preprocessing:

    • Standardization: Features are standardized using sklearn.preprocessing.scale and StandardScaler to ensure that all features contribute equally to the clustering process.
  3. Feature Selection:

    • Specific features, such as bill_length_mm and bill_depth_mm, are selected for clustering.
  4. K-Means Clustering:

    • The core clustering algorithm is applied with n_clusters=3.
    • Outputs include cluster centroids and labels for each data point.
  5. Visualization:

    • Scatter plots are used to display the clustering results, highlighting the cluster centroids.
  6. Evaluation of Optimal Clusters:

    • Elbow Method: This method iterates through different numbers of clusters to determine the optimal number based on the within-cluster sum of squares (WCSS).
  7. Cluster Assignment:

    • Labels are assigned to data points, and the results are visualized to show the clustering outcome.
  8. Exploratory Analysis:

    • The script examines the impact of different numbers of clusters using an example function (scatter_elbow).