Study Card: Pros and Cons of Tanh Activation Function

Direct Answer

The hyperbolic tangent (tanh) activation function is a non-linear function used in neural networks that maps inputs to a range between -1 and 1. A key advantage of tanh over sigmoid is its zero-centered output, which can help with faster convergence during training. However, tanh can still suffer from the vanishing gradient problem for very high or very low input values, similar to sigmoid. Tanh is commonly used in hidden layers of neural networks, particularly when faster training is desired compared to sigmoid, but it might not be the best choice for very deep networks or tasks where gradients need to flow for many layers without diminishing.

Key Terms

Example

Consider a neural network for image classification. Using tanh as the activation function in the hidden layers would mean that the neuron outputs would range between -1 and 1. This zero-centered output can help speed up training compared to a sigmoid activation, which ranges from 0 to 1, because it allows the weights to be updated in both positive and negative directions more evenly. For example, if a neuron's weighted sum is 3, tanh(3) ≈ 0.995. If the weighted sum is -3, tanh(-3) ≈ -0.995. However, if the weighted sums become very large (e.g., 10) or very small (e.g., -10), tanh would saturate (tanh(10) ≈ 1, tanh(-10) ≈ -1), leading to very small gradients and slowing down learning. In contrast, for weighted sums closer to zero (e.g., 0.5), tanh(0.5) ≈ 0.46, and the gradient would be larger, allowing for more effective weight updates.

Code Implementation

import numpy as np
import matplotlib.pyplot as plt

# Tanh activation function
def tanh(x):
    return np.tanh(x)

# Derivative of tanh
def tanh_derivative(x):
    return 1.0 - np.tanh(x)**2

# Generate input values
x = np.linspace(-10, 10, 100)

# Calculate tanh and its derivative
y_tanh = tanh(x)
y_tanh_derivative = tanh_derivative(x)

# Plot tanh
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.plot(x, y_tanh)
plt.title('Tanh Activation Function')
plt.xlabel('Input')
plt.ylabel('Output')
plt.grid(True)

# Plot tanh derivative
plt.subplot(1, 2, 2)
plt.plot(x, y_tanh_derivative)
plt.title('Tanh Derivative')
plt.xlabel('Input')
plt.ylabel('Derivative')
plt.grid(True)

plt.show()

# Example saturation:
large_x = 10
print("Tanh(10):", tanh(large_x))
print("Tanh derivative(10)", tanh_derivative(large_x))

small_x = -10
print("Tanh(-10)", tanh(small_x))
print("Tanh derivative(-10)", tanh_derivative(small_x))

near_zero = 0.5
print("Tanh(0.5)", tanh(near_zero))
print("Tanh derivative (0.5)", tanh_derivative(near_zero))

Related Concepts