Overview

  • Objective: To classify Iris flowers using SVM and explore various hyperparameters like kernel type, regularization (C), and gamma.
  • Dataset: The Iris dataset contains information about sepal and petal dimensions for three flower species.
  • To explore the effect of soft boundaries in SVMs, you can adjust the regularization parameter CCC. A smaller CCC allows a softer boundary (more margin violations), prioritizing generalization. A larger CCC enforces a harder boundary with fewer margin violations, but may lead to overfitting. Here’s an extended version of the script to include this exploration:

Steps in the Script

1. Data Loading and Preparation

  • The Iris dataset is loaded using sklearn.datasets.load_iris.
  • A DataFrame is created with:
    • Features: Sepal and petal dimensions.
    • Target: Numerical representation of flower species.
    • Flower name: Categorical species name derived from the target.

2. Data Visualization

  • The data is visualized to explore relationships between features:
    • Sepal Length vs. Sepal Width for two species (Setosa vs. Versicolor).
    • Petal Length vs. Petal Width for the same species.
  • Scatter plots are used to identify separable patterns.

3. Model Training

  • The data is split into training and testing sets (80%-20%).
  • An SVM classifier (sklearn.svm.SVC) is trained on the training set.
  • The model’s performance is evaluated using the .score() method.

4. Hyperparameter Tuning

  • Regularization (C):
    • Adjusting C controls the trade-off between achieving a large margin and minimizing classification errors.
    • Lower values of C allow a larger margin but can tolerate misclassified points.
    • Higher values of C prioritize correct classification over a larger margin.
  • Gamma:
    • Controls the influence of individual data points. A high value means data points closer to the hyperplane have more influence.
  • Kernel:
    • Different kernels (e.g., linear, rbf) are tested to find the best mapping of data into higher dimensions for better separation.

5. Prediction and Accuracy

  • The model is used to predict flower species for new samples.
  • The accuracy of the model is reported for each combination of hyperparameters.