Perceptual loss is a loss function used in image generation tasks that measures the perceptual similarity between generated and target images by comparing their high-level features extracted from a pre-trained convolutional neural network (CNN), like VGG or Inception. Instead of focusing on pixel-wise differences, perceptual loss captures differences in the semantic content and overall visual appearance of the images. This leads to generated images that are perceptually more similar to real images, even if they differ at the pixel level. It differs from traditional pixel-wise loss functions (e.g., MSE) by focusing on feature-level differences rather than direct pixel comparisons, making it more robust to minor pixel variations and better at capturing perceptual similarity as perceived by humans. Key applications include style transfer, super-resolution, and image restoration, where preserving perceptual quality is paramount.
Imagine generating an image of a cat. Pixel-wise loss would penalize even minor differences in pixel values between the generated image and a target image of a cat. Perceptual loss, however, would focus on whether the generated image captures the essential features of a cat (e.g., shape, fur, eyes), even if the pixel values aren't identical to the target image. This would allow for minor variations in texture or color, as long as the overall perceptual quality is preserved.
import torch
import torch.nn as nn
import torchvision.models as models
# Example using VGG19 as feature extractor
class VGGPerceptualLoss(nn.Module):
def __init__(self, layer_weights=[1.0, 0.75, 0.5, 0.25]):
super(VGGPerceptualLoss, self).__init__()
vgg = models.vgg19(pretrained=True).features
self.layers = [
nn.Sequential(*list(vgg.children())[:4]), # conv1_2
nn.Sequential(*list(vgg.children())[4:9]), # conv2_2
nn.Sequential(*list(vgg.children())[9:16]), # conv3_3
nn.Sequential(*list(vgg.children())[16:23]) # conv4_3
]
self.layer_weights = layer_weights
for param in self.parameters():
param.requires_grad = False # Freeze VGG parameters
def forward(self, generated_image, target_image):
loss = 0
for i, layer in enumerate(self.layers):
gen_features = layer(generated_image)
target_features = layer(target_image)
loss += self.layer_weights[i] * nn.functional.mse_loss(gen_features, target_features) # Calculate weighted MSE loss at different levels of feature extractor
return loss
# Example usage
generated_image = torch.randn(1, 3, 256, 256) # Example generated image
target_image = torch.randn(1, 3, 256, 256) # Example target image
perceptual_loss = VGGPerceptualLoss()
loss = perceptual_loss(generated_image, target_image)
print(loss)