The sigmoid function is an activation function that maps any input value to a value between 0 and 1. Its primary advantage is its interpretability as a probability and its ability to introduce non-linearity into a model. However, it suffers from vanishing gradients, slow convergence during training, and non-zero centered output, which can hinder optimization. Despite its limitations, it remains widely used in binary classification problems, particularly in the output layer, due to its probability interpretation.
In a model predicting whether a customer will click on an ad (click/no-click), the sigmoid function is used in the output layer. An output of 0.8 signifies an 80% probability of clicking on the ad. During backpropagation, however, if the output is close to 0 or 1, the gradient will be close to 0. This becomes a problem in deep networks as the gradients get multiplied through layers, leading to vanishing gradients and slow learning in initial layers. The fact that the output is always positive (between 0 and 1) can introduce a zig-zag pattern in weight updates during gradient descent.
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(z):
"""
Computes the sigmoid of z.
Args:
z: A scalar or numpy array of any size.
Returns:
Sigmoid of z.
"""
try:
return 1 / (1 + np.exp(-z))
except OverflowError: # Check Overflow
return 0.0 if z < 0 else 1.0
# Generate data for plotting
z = np.linspace(-10, 10, 200)
sigmoid_z = sigmoid(z)
# Plotting the sigmoid function
plt.figure(figsize=(8, 6))
plt.plot(z, sigmoid_z)
plt.xlabel("z")
plt.ylabel("sigmoid(z)")
plt.title("Sigmoid Function")
plt.grid(True)
# Illustrating vanishing gradients
plt.figure(figsize=(8,6))
plt.plot(z, sigmoid(z)*(1-sigmoid(z))) # Plot the derivative of the sigmoid function
plt.xlabel("z")
plt.ylabel("sigmoid'(z)")
plt.title("Derivative of Sigmoid Function (Illustrates Vanishing Gradients)")
plt.grid(True)
plt.show()