A probabilistic classifier based on Bayes’ theorem, assuming that features are conditionally independent given the class. This assumption is why it’s called “naive.”
Bayes’ theorem:
where:
- = Posterior probability
- = Likelihood
- = Prior probability
- = Evidence
When to use:
- Text classification (spam detection, sentiment analysis).
- Problems with many independent features.
Why Naive Bayes?
Assumes features are independent, which is rarely true, but works surprisingly well in practice.
- Simple, fast, and works well for high-dimensional data (e.g., Text Classification).
- Treats data as a bag of features (order doesn’t matter).
- Handles categorical and numeric data (with assumptions):
- Categorical: Works directly or via encoding.
- Numeric: Assumes normal distribution (Gaussian NB).
Advantages:
- Scales well to large datasets.
- Requires small amount of training data.
- Effective for text and document classification.
Smoothing Issue:
- To avoid zero probabilities (when a feature value was not seen in training), we add a small value (Laplace smoothing).
Types of Naive Bayes
- BernoulliNB: For binary features (e.g., word presence/absence).
- MultinomialNB: For count-based features (e.g., word counts in text).
- GaussianNB: For continuous features (assumes Gaussian distribution).
Example
from sklearn.naive_bayes import MultinomialNB
model = MultinomialNB(alpha=1.0) # Laplace smoothing
model.fit(X_train, y_train)
pred = model.predict(X_test)