ROC (Receiver Operating Characteristic) is a graphical representation of a classifier’s performance across different thresholds, showing the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate).

A graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

It plots the true positive rate (TPR) against the false positive rate (FPR) at different threshold settings.

Why Use Predicted Probabilities?

In ROC analysis, predicted probabilities (y_probs) are used instead of predicted classes (y_pred) because the ROC curve evaluates the model’s performance across different threshold levels. Probabilities allow you to adjust the threshold to see how it affects sensitivity and specificity.

Threshold Level

The threshold level is the probability value above which an instance is classified as the positive class. Adjusting the threshold affects Sensitivity and Specificity

  • Lower Threshold: Increases sensitivity but may decrease specificity.
  • Higher Threshold: Increases specificity but may decrease sensitivity.

Example Code

from sklearn.metrics import roc_curve, RocCurveDisplay
import matplotlib.pyplot as plt
 
# Actual and predicted values
y_act = [1, 0, 1, 1, 0]
y_pred = [1, 1, 0, 1, 0]
 
# Compute ROC curve
fpr, tpr, thresholds = roc_curve(y_act, y_pred)
 
# Display ROC curve
roc_display = RocCurveDisplay(fpr=fpr, tpr=tpr).plot()
plt.show()

Logistic Regression Example

from sklearn.linear_model import LogisticRegression
 
# Train a logistic regression model
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
 
# Predict probabilities for the positive class
y_probs = logreg.predict_proba(X_test)[:, 1]
 
# Compute ROC curve and AUC
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
roc_auc = roc_auc_score(y_test, y_probs)
 
# Plot ROC curve
plt.figure(figsize=(10, 6))
plt.plot(fpr, tpr, color='blue', lw=2, label=f'ROC Curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='red', linestyle='--', lw=2, label='Random Guessing')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.legend()
plt.show()