K-Nearest Neighbors is a non-parametric, supervised learning algorithm used for both classification and regression tasks. It predicts the label of a new data point based on the labels of its nearest neighbors in the training data, where is a user-defined positive integer.
How It Works
- Classification: Assigns the class most common among the nearest neighbors.
- Regression: Predicts the average of the target values of the nearest neighbors.
- Distance Metric: Common choices include Euclidean and Manhattan distance; the choice affects neighbor selection and model performance.
- Choice of : Small : sensitive to noise, Large : smoother but may blur decision boundaries
Characteristics
- Non-parametric: Makes no assumptions about the underlying data distribution.
- Instance-based: Stores training data and delays computation until prediction.
- Simple and interpretable: Easy to understand and implement.
- Computationally expensive: Requires distance computation to all training points at prediction time.
Use Cases
- Works well when the decision boundary is irregular or non-linear.
- Most effective on smaller datasets due to computational cost.
Applications
- Recommender systems
- Pattern recognition