Study Card: Pros and Cons of the Sigmoid Function

Direct Answer

The sigmoid function is an activation function that maps any input value to a value between 0 and 1. Its primary advantage is its interpretability as a probability and its ability to introduce non-linearity into a model. However, it suffers from vanishing gradients, slow convergence during training, and non-zero centered output, which can hinder optimization. Despite its limitations, it remains widely used in binary classification problems, particularly in the output layer, due to its probability interpretation.

Key Terms

Example

In a model predicting whether a customer will click on an ad (click/no-click), the sigmoid function is used in the output layer. An output of 0.8 signifies an 80% probability of clicking on the ad. During backpropagation, however, if the output is close to 0 or 1, the gradient will be close to 0. This becomes a problem in deep networks as the gradients get multiplied through layers, leading to vanishing gradients and slow learning in initial layers. The fact that the output is always positive (between 0 and 1) can introduce a zig-zag pattern in weight updates during gradient descent.

Code Implementation

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(z):
    """
    Computes the sigmoid of z.

    Args:
        z: A scalar or numpy array of any size.

    Returns:
        Sigmoid of z.
    """
    try:
      return 1 / (1 + np.exp(-z))
    except OverflowError: # Check Overflow
      return 0.0 if z < 0 else 1.0

# Generate data for plotting
z = np.linspace(-10, 10, 200)
sigmoid_z = sigmoid(z)

# Plotting the sigmoid function
plt.figure(figsize=(8, 6))
plt.plot(z, sigmoid_z)
plt.xlabel("z")
plt.ylabel("sigmoid(z)")
plt.title("Sigmoid Function")
plt.grid(True)

# Illustrating vanishing gradients
plt.figure(figsize=(8,6))
plt.plot(z, sigmoid(z)*(1-sigmoid(z))) # Plot the derivative of the sigmoid function
plt.xlabel("z")
plt.ylabel("sigmoid'(z)")
plt.title("Derivative of Sigmoid Function (Illustrates Vanishing Gradients)")
plt.grid(True)

plt.show()

Related Concepts