Study Card: Evaluating and Mitigating Hallucination in LLMs

Direct Answer

Hallucination in LLMs refers to the generation of factually incorrect, nonsensical, or irrelevant outputs that deviate from the provided context or real-world knowledge. Evaluating hallucination involves comparing model outputs against ground truth, using metrics like accuracy, precision, and recall for factual consistency, and employing human evaluation for nuanced assessments of coherence and relevance. Mitigation strategies include improving training data quality, incorporating knowledge grounding techniques, using reinforcement learning with human feedback, and prompting engineering for better context control.

Key Terms

Hallucination: Generating incorrect, fabricated, or irrelevant information not supported by the input or real-world knowledge.
Knowledge Grounding: Connecting the LLM's outputs to external knowledge sources like databases or knowledge graphs to ensure factual consistency.
Reinforcement Learning with Human Feedback (RLHF): Training LLMs based on human feedback to improve the quality and relevance of generated outputs.
Prompt Engineering: Carefully crafting input prompts to guide the LLM's generation process and reduce the likelihood of hallucination.

Example

Consider an LLM asked to generate a biography of a historical figure. The LLM might hallucinate details, inventing events or attributing incorrect achievements to the person. Evaluation could involve comparing the generated biography to reliable historical sources, assessing the accuracy of specific claims. If the prompt is "Write a biography of Abraham Lincoln", and the LLM writes that Lincoln was the first US president, this can be factually verified to be a hallucination. Mitigation could involve retrieving and providing relevant information from a knowledge base about Abraham Lincoln along with the prompt to guide the model and ground its output in factual information. Further fine-tuning a model on biographies with RLHF can lead to outputs aligned better with human expectations.

Code Implementation

import transformers
import nltk  # For BLEU score calculation (example)

# Example: Evaluating factual consistency using BLEU score against a reference text
def evaluate_factual_consistency(generated_text, reference_text):
    """Calculates BLEU score to compare generated text with a reference."""
    generated_tokens = nltk.word_tokenize(generated_text)
    reference_tokens = nltk.word_tokenize(reference_text)
    bleu_score = nltk.translate.bleu_score.sentence_bleu([reference_tokens], generated_tokens)
    return bleu_score

# Example usage
generated_text = "Abraham Lincoln was the first president of the USA."  # Hallucinated output
reference_text = "Abraham Lincoln was the 16th president of the USA." # Ground truth
bleu = evaluate_factual_consistency(generated_text, reference_text)
print(f"BLEU Score: {bleu}")

# Example: Knowledge grounding with a simple lookup (Illustrative)
knowledge_base = {"Abraham Lincoln": "16th president of the USA"}

def ground_text(text, knowledge_base):
    """Simple knowledge grounding by replacing incorrect information."""
    for entity, fact in knowledge_base.items():
        if entity in text:
            text = text.replace(text, fact)
    return text

grounded_text = ground_text(generated_text, knowledge_base)
print(f"Grounded Text: {grounded_text}")

# Example: Enhanced prompt for context control
original_prompt = "Write a short story about a historical figure."
enhanced_prompt = f"""Write a short story about Abraham Lincoln, adhering to the following facts:
- Abraham Lincoln was the 16th president of the USA.
- ... other relevant facts ...
"""

Related Concepts

Truthfulness: A related concept to hallucination focusing on whether generated information aligns with reality. An interviewer may ask about the subtleties and overlaps between truthfulness and hallucination.
Explainability: Understanding why an LLM generated a particular output. Explainability techniques can help identify the source of hallucinations. The interviewer might ask you to evaluate and identify parts of the LLM architecture responsible for hallucination.
Data Bias: Biases present in the training data can lead to hallucinations that reflect those biases. Interviewers may explore how biases impact hallucinations and techniques to mitigate them.
Safety and Trustworthiness: Hallucinations pose significant challenges to the safety and trustworthiness of LLMs, especially in critical applications. Potential interview question could be discussing strategies to avoid deploying unsafe models or how to incorporate safety checks.
Evaluation Metrics for LLMs: Beyond perplexity and BLEU, various other metrics are used to evaluate different aspects of LLM performance, including factual consistency, coherence, and relevance. The interviewer may discuss the limitations of existing metrics and emerging techniques for comprehensive LLM evaluation.