Study Card: Loss Function of Logistic Regression

Direct Answer

Logistic regression uses the binary cross-entropy (log loss) function as its loss function. This loss function measures the dissimilarity between the predicted probabilities and the actual binary labels (0 or 1). Minimizing the log loss encourages the model to assign high probabilities to the correct class and low probabilities to the incorrect class, effectively learning the relationship between the input features and the target variable. It's chosen because it is convex, making it easier for optimization algorithms like gradient descent to find a global minimum, hence efficient training.

Key Terms

Example

Consider a model predicting whether a customer will click on an ad (1 for click, 0 for no click). For a given customer, if the true label is 1 (click) and the predicted probability of a click is 0.9, the log loss will be low. If the predicted probability is 0.1, the log loss will be high. Minimizing the log loss across all training examples encourages the model to make accurate probability predictions. For instance, if the model predicts probabilities close to 0.5 for all instances, the log loss will be relatively high because the model isn't confident in its predictions.

Code Implementation

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    """Sigmoid function."""
    return 1 / (1 + np.exp(-z))

def log_loss(y_true, y_pred):
    """
    Calculates the binary cross-entropy/log loss.
    Handles extreme probability values (0 or 1) using epsilon to avoid numerical instability.

    Args:
      y_true: True binary labels (0 or 1).
      y_pred: Predicted probabilities.

    Returns:
      The log loss.
    """
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon) # clip predicted probability
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss

# Example usage
y_true = np.array([0, 1, 1, 0, 1])
y_pred_prob = np.array([0.1, 0.9, 0.8, 0.2, 0.7])

loss = log_loss(y_true, y_pred_prob)
print(f"Log Loss: {loss}")

# Visualizing Log Loss for different probability predictions with y_true=1
y_pred_prob_example = np.linspace(0.001, 0.999, 100)
y_true_example = np.array([1] * 100)
loss_example = log_loss(y_true_example, y_pred_prob_example)
loss_curve = [-np.log(pred) for pred in y_pred_prob_example]
plt.figure(figsize=(8,6))
plt.plot(y_pred_prob_example, loss_curve)
plt.xlabel("Predicted Probability (y_pred) with y_true=1")
plt.ylabel("Log Loss")
plt.title("Log Loss vs Predicted Probability")
plt.grid(True)

# Visualizing Log Loss for different probability predictions with y_true=0
y_pred_prob_example = np.linspace(0.001, 0.999, 100)
y_true_example = np.array([0] * 100)
loss_example = log_loss(y_true_example, y_pred_prob_example)
loss_curve = [-np.log(1 - pred) for pred in y_pred_prob_example]
plt.figure(figsize=(8,6))
plt.plot(y_pred_prob_example, loss_curve)
plt.xlabel("Predicted Probability (y_pred) with y_true=0")
plt.ylabel("Log Loss")
plt.title("Log Loss vs Predicted Probability")
plt.grid(True)
plt.show()

Related Concepts