K-Nearest Neighbors is a non-parametric, supervised learning algorithm used for both classification and regression tasks. It predicts the label of a new data point based on the labels of its nearest neighbors in the training data, where is a user-defined positive integer.

How It Works

  • Classification: Assigns the class most common among the nearest neighbors.
  • Regression: Predicts the average of the target values of the nearest neighbors.
  • Distance Metric: Common choices include Euclidean and Manhattan distance; the choice affects neighbor selection and model performance.
  • Choice of : Small : sensitive to noise, Large : smoother but may blur decision boundaries

Characteristics

  • Non-parametric: Makes no assumptions about the underlying data distribution.
  • Instance-based: Stores training data and delays computation until prediction.
  • Simple and interpretable: Easy to understand and implement.
  • Computationally expensive: Requires distance computation to all training points at prediction time.

Use Cases

  • Works well when the decision boundary is irregular or non-linear.
  • Most effective on smaller datasets due to computational cost.

Applications