Study Card: Evaluating and Mitigating Hallucination in LLMs

Direct Answer

Hallucination in LLMs refers to the generation of factually incorrect, nonsensical, or irrelevant outputs that deviate from the provided context or real-world knowledge. Evaluating hallucination involves comparing model outputs against ground truth, using metrics like accuracy, precision, and recall for factual consistency, and employing human evaluation for nuanced assessments of coherence and relevance. Mitigation strategies include improving training data quality, incorporating knowledge grounding techniques, using reinforcement learning with human feedback, and prompting engineering for better context control.

Key Terms

Example

Consider an LLM asked to generate a biography of a historical figure. The LLM might hallucinate details, inventing events or attributing incorrect achievements to the person. Evaluation could involve comparing the generated biography to reliable historical sources, assessing the accuracy of specific claims. If the prompt is "Write a biography of Abraham Lincoln", and the LLM writes that Lincoln was the first US president, this can be factually verified to be a hallucination. Mitigation could involve retrieving and providing relevant information from a knowledge base about Abraham Lincoln along with the prompt to guide the model and ground its output in factual information. Further fine-tuning a model on biographies with RLHF can lead to outputs aligned better with human expectations.

Code Implementation

import transformers
import nltk  # For BLEU score calculation (example)

# Example: Evaluating factual consistency using BLEU score against a reference text
def evaluate_factual_consistency(generated_text, reference_text):
    """Calculates BLEU score to compare generated text with a reference."""
    generated_tokens = nltk.word_tokenize(generated_text)
    reference_tokens = nltk.word_tokenize(reference_text)
    bleu_score = nltk.translate.bleu_score.sentence_bleu([reference_tokens], generated_tokens)
    return bleu_score

# Example usage
generated_text = "Abraham Lincoln was the first president of the USA."  # Hallucinated output
reference_text = "Abraham Lincoln was the 16th president of the USA." # Ground truth
bleu = evaluate_factual_consistency(generated_text, reference_text)
print(f"BLEU Score: {bleu}")

# Example: Knowledge grounding with a simple lookup (Illustrative)
knowledge_base = {"Abraham Lincoln": "16th president of the USA"}

def ground_text(text, knowledge_base):
    """Simple knowledge grounding by replacing incorrect information."""
    for entity, fact in knowledge_base.items():
        if entity in text:
            text = text.replace(text, fact)
    return text

grounded_text = ground_text(generated_text, knowledge_base)
print(f"Grounded Text: {grounded_text}")

# Example: Enhanced prompt for context control
original_prompt = "Write a short story about a historical figure."
enhanced_prompt = f"""Write a short story about Abraham Lincoln, adhering to the following facts:
- Abraham Lincoln was the 16th president of the USA.
- ... other relevant facts ...
"""

Related Concepts