Study Card: ResNet (Residual Neural Network)

Direct Answer

ResNet (Residual Neural Network) is a deep convolutional neural network architecture that introduced the concept of skip connections or residual blocks. These connections allow the network to learn residual mappings instead of trying to learn the entire transformation from input to output. This addresses the vanishing gradient problem in very deep networks, enabling the training of significantly deeper networks and achieving improved performance on image recognition and other tasks. The key innovation is the ability to effectively train very deep networks by allowing gradients to flow more easily through skip connections.

Key Terms

Skip Connections (Residual Connections): Connections that bypass one or more layers in a neural network, allowing information to flow directly from earlier layers to later layers.
Residual Block: A building block of ResNet, consisting of convolutional layers and a skip connection that adds the input of the block to its output.
Vanishing Gradient Problem: The difficulty of training very deep neural networks due to gradients becoming very small during backpropagation, hindering the update of weights in earlier layers.
Backpropagation: The algorithm used to train neural networks by propagating the error gradient back through the network.
Convolutional Neural Network (CNN): A type of neural network specifically designed for processing image data, leveraging convolutional filters to extract features.

Example

Consider training a very deep CNN for image classification. As the network gets deeper, using only standard convolutional layers would likely lead to the vanishing gradient problem. Initial layers of the model would have extremely small gradients, making learning extremely slow or sometimes impossible. Using ResNet, gradients can flow directly through skip connections and thus allow training of significantly deeper networks. Each residual block in ResNet learns a residual mapping, which is the difference between the input and the desired output, making the optimization process smoother and allowing training of much deeper models than using just convolutional layers. For example, instead of learning the complete transformation from the initial layers all the way to the classification layer, the network learns how to refine the features extracted by shallower layers, making the learning process more robust.

Code Implementation

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Input, Conv2D, BatchNormalization, Activation, Add, GlobalAveragePooling2D, Dense

def residual_block(x, filters, kernel_size=3, strides=1):
    shortcut = x

    y = Conv2D(filters, kernel_size, strides=strides, padding='same')(x)
    y = BatchNormalization()(y)
    y = Activation('relu')(y)

    y = Conv2D(filters, kernel_size, padding='same')(y)
    y = BatchNormalization()(y)

    if strides != 1 or x.shape[-1] != filters: #Adjust shortcut dimensionality if necessary
      shortcut = Conv2D(filters, (1, 1), strides=strides)(x)
      shortcut = BatchNormalization()(shortcut)

    y = Add()([shortcut, y]) # Skip connection
    y = Activation('relu')(y)

    return y

def create_resnet(input_shape=(32, 32, 3), num_classes=10):
    inputs = Input(shape=input_shape)

    x = Conv2D(64, 7, strides=2, padding='same')(inputs)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)

    x = residual_block(x, 64)
    x = residual_block(x, 64)
    x = residual_block(x, 128, strides=2)
    x = residual_block(x, 128)
    x = residual_block(x, 256, strides=2)
    x = residual_block(x, 256)
    x = residual_block(x, 512, strides=2)
    x = residual_block(x, 512)

    x = GlobalAveragePooling2D()(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = keras.Model(inputs=inputs, outputs=outputs)
    return model

# Example usage (CIFAR-10 dataset)
model = create_resnet()

model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

(x_train, y_train), _ = keras.datasets.cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
model.fit(x_train, y_train, epochs=1, batch_size=128)

Related Concepts

Vanishing Gradient Problem: The problem that ResNets address by introducing skip connections. Interviewers might ask about training challenges in deep learning. Follow up: Explain the vanishing gradient problem and how it affects the training of deep neural networks. What techniques besides skip connections are used to address this problem?
Highway Networks: A precursor to ResNets that introduced the idea of gating units in skip connections, allowing the network to learn how much information to pass through. Follow up: Compare and contrast Highway Networks and Residual Networks.
DenseNet: Another architecture that extends the idea of skip connections by connecting each layer to every subsequent layer. Follow up: How does DenseNet differ from ResNet in its use of skip connections, and what are the advantages and disadvantages of each approach?
Gradient Flow: Skip connections improve gradient flow during backpropagation. Interviewers might focus on the training dynamics. Follow up: Explain how skip connections in ResNet facilitate better gradient flow during training.
Image Classification: ResNet significantly improved performance on image classification tasks. Interviewers may ask about specific applications. Follow up: Why did ResNet lead to substantial improvements in image classification accuracy? Discuss some of the benchmark datasets where ResNet achieved state-of-the-art results.