A probabilistic classifier based on Bayes’ theorem, assuming that features are conditionally independent given the class. This assumption is why it’s called “naive.”

Bayes’ theorem:

where:

  • = Posterior probability
  • = Likelihood
  • = Prior probability
  • = Evidence

When to use:

  • Text classification (spam detection, sentiment analysis).
  • Problems with many independent features.

Why Naive Bayes?

Assumes features are independent, which is rarely true, but works surprisingly well in practice.

  • Simple, fast, and works well for high-dimensional data (e.g., Text Classification).
  • Treats data as a bag of features (order doesn’t matter).
  • Handles categorical and numeric data (with assumptions):
    • Categorical: Works directly or via encoding.
    • Numeric: Assumes normal distribution (Gaussian NB).

Advantages:

  • Scales well to large datasets.
  • Requires small amount of training data.
  • Effective for text and document classification.

Smoothing Issue:

  • To avoid zero probabilities (when a feature value was not seen in training), we add a small value (Laplace smoothing).

Types of Naive Bayes

  • BernoulliNB: For binary features (e.g., word presence/absence).
  • MultinomialNB: For count-based features (e.g., word counts in text).
  • GaussianNB: For continuous features (assumes Gaussian distribution).

Example

from sklearn.naive_bayes import MultinomialNB
 
model = MultinomialNB(alpha=1.0)  # Laplace smoothing
model.fit(X_train, y_train)
pred = model.predict(X_test)