Logistic regression uses the binary cross-entropy (log loss) function as its loss function. This loss function measures the dissimilarity between the predicted probabilities and the actual binary labels (0 or 1). Minimizing the log loss encourages the model to assign high probabilities to the correct class and low probabilities to the incorrect class, effectively learning the relationship between the input features and the target variable. It's chosen because it is convex, making it easier for optimization algorithms like gradient descent to find a global minimum, hence efficient training.
Consider a model predicting whether a customer will click on an ad (1 for click, 0 for no click). For a given customer, if the true label is 1 (click) and the predicted probability of a click is 0.9, the log loss will be low. If the predicted probability is 0.1, the log loss will be high. Minimizing the log loss across all training examples encourages the model to make accurate probability predictions. For instance, if the model predicts probabilities close to 0.5 for all instances, the log loss will be relatively high because the model isn't confident in its predictions.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
    """Sigmoid function."""
    return 1 / (1 + np.exp(-z))
def log_loss(y_true, y_pred):
    """
    Calculates the binary cross-entropy/log loss.
    Handles extreme probability values (0 or 1) using epsilon to avoid numerical instability.
    Args:
      y_true: True binary labels (0 or 1).
      y_pred: Predicted probabilities.
    Returns:
      The log loss.
    """
    epsilon = 1e-15
    y_pred = np.clip(y_pred, epsilon, 1 - epsilon) # clip predicted probability
    loss = -np.mean(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred))
    return loss
# Example usage
y_true = np.array([0, 1, 1, 0, 1])
y_pred_prob = np.array([0.1, 0.9, 0.8, 0.2, 0.7])
loss = log_loss(y_true, y_pred_prob)
print(f"Log Loss: {loss}")
# Visualizing Log Loss for different probability predictions with y_true=1
y_pred_prob_example = np.linspace(0.001, 0.999, 100)
y_true_example = np.array([1] * 100)
loss_example = log_loss(y_true_example, y_pred_prob_example)
loss_curve = [-np.log(pred) for pred in y_pred_prob_example]
plt.figure(figsize=(8,6))
plt.plot(y_pred_prob_example, loss_curve)
plt.xlabel("Predicted Probability (y_pred) with y_true=1")
plt.ylabel("Log Loss")
plt.title("Log Loss vs Predicted Probability")
plt.grid(True)
# Visualizing Log Loss for different probability predictions with y_true=0
y_pred_prob_example = np.linspace(0.001, 0.999, 100)
y_true_example = np.array([0] * 100)
loss_example = log_loss(y_true_example, y_pred_prob_example)
loss_curve = [-np.log(1 - pred) for pred in y_pred_prob_example]
plt.figure(figsize=(8,6))
plt.plot(y_pred_prob_example, loss_curve)
plt.xlabel("Predicted Probability (y_pred) with y_true=0")
plt.ylabel("Log Loss")
plt.title("Log Loss vs Predicted Probability")
plt.grid(True)
plt.show()