Study Card: Key Differences between GPT and BERT

Direct Answer

GPT and BERT are both transformer-based language models but differ significantly in their architecture and training objectives. GPT is an autoregressive model, trained to predict the next word in a sequence, making it suitable for text generation tasks. BERT, on the other hand, is a bidirectional model trained using masked language modeling and next sentence prediction, excelling at understanding context and relationships within text, making it ideal for tasks like text classification and question answering. Key differences include unidirectional vs. bidirectional processing, text generation vs. context understanding, and different pre-training tasks.

Key Terms

Example

Given the sentence "The quick brown fox jumps over the lazy dog.", GPT would be trained to predict "jumps" given "The quick brown fox," "over" given "The quick brown fox jumps," and so on. BERT, using MLM, might be given "The quick brown [MASK] jumps over the [MASK] dog" and trained to predict "fox" and "lazy". For NSP, BERT would be given pairs of sentences and determine if they are consecutive in the original text.

Code Implementation

# Demonstrates usage of pre-trained GPT-2 and BERT models for different tasks

from transformers import pipeline

# Text Generation with GPT-2
generator = pipeline('text-generation', model='gpt2')
generated_text = generator("The quick brown fox jumps over the", max_length=30, num_return_sequences=1)
print("GPT-2 Generated Text:", generated_text[0]['generated_text'])

# Masked Language Modeling with BERT
unmasker = pipeline('fill-mask', model='bert-base-uncased')
masked_text = "The quick brown [MASK] jumps over the lazy dog."
filled_text = unmasker(masked_text)
print("BERT Filled Text:", filled_text[0]['sequence'])

# Note: For Next Sentence Prediction (NSP) with BERT, you'd typically
# use a specific BERT model configuration and manually extract the
# relevant output from the model. Simplified example below:
from transformers import BertTokenizer, BertForNextSentencePrediction
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')

prompt = "The quick brown fox jumps over the lazy dog. The dog barked."
encoded = tokenizer.encode_plus(prompt, return_tensors='pt')

with torch.no_grad():
    outputs = model(**encoded)
    seq_relationship_logits = outputs.seq_relationship_logits
    is_next = seq_relationship_logits.argmax().item()
    print(f"Are the sentences consecutive according to BERT? {'Yes' if is_next == 0 else 'No'}")

Related Concepts