Study Card: ResNet (Residual Neural Network)

Direct Answer

ResNet (Residual Neural Network) is a deep convolutional neural network architecture that introduced the concept of skip connections or residual blocks. These connections allow the network to learn residual mappings instead of trying to learn the entire transformation from input to output. This addresses the vanishing gradient problem in very deep networks, enabling the training of significantly deeper networks and achieving improved performance on image recognition and other tasks. The key innovation is the ability to effectively train very deep networks by allowing gradients to flow more easily through skip connections.

Key Terms

Example

Consider training a very deep CNN for image classification. As the network gets deeper, using only standard convolutional layers would likely lead to the vanishing gradient problem. Initial layers of the model would have extremely small gradients, making learning extremely slow or sometimes impossible. Using ResNet, gradients can flow directly through skip connections and thus allow training of significantly deeper networks. Each residual block in ResNet learns a residual mapping, which is the difference between the input and the desired output, making the optimization process smoother and allowing training of much deeper models than using just convolutional layers. For example, instead of learning the complete transformation from the initial layers all the way to the classification layer, the network learns how to refine the features extracted by shallower layers, making the learning process more robust.

Code Implementation

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, Add, GlobalAveragePooling2D, Dense

def residual_block(x, filters, kernel_size=3, strides=1):
    shortcut = x

    y = Conv2D(filters, kernel_size, strides=strides, padding='same')(x)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)

    y = Conv2D(filters, kernel_size, padding='same')(y)
    y = BatchNormalization()(y)

    if strides != 1 or x.shape[-1] != filters: #Adjust shortcut dimensionality if necessary
      shortcut = Conv2D(filters, (1, 1), strides=strides)(x)
      shortcut = BatchNormalization()(shortcut)

    y = Add()([shortcut, y]) # Skip connection
    y = Activation('relu')(y)

    return y

def create_resnet(input_shape=(32, 32, 3), num_classes=10):
    inputs = Input(shape=input_shape)

    x = Conv2D(64, 7, strides=2, padding='same')(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = residual_block(x, 64)
    x = residual_block(x, 64)
    x = residual_block(x, 128, strides=2)
    x = residual_block(x, 128)
    x = residual_block(x, 256, strides=2)
    x = residual_block(x, 256)
    x = residual_block(x, 512, strides=2)
    x = residual_block(x, 512)

    x = GlobalAveragePooling2D()(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

# Example usage (CIFAR-10 dataset)
model = create_resnet()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

(x_train, y_train), _ = keras.datasets.cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
model.fit(x_train, y_train, epochs=1, batch_size=128)

Related Concepts