For image-text matching, training data is typically annotated by creating aligned pairs of images and text snippets that describe the same concept or share semantic meaning. This can involve collecting image-caption pairs, manually assigning text descriptions to images, or gathering data from sources where images and text are naturally aligned, such as product descriptions or news articles with accompanying images. Key considerations include ensuring the alignment accuracy, covering diverse concepts and scenarios, maintaining a balance between positive and negative pairs (images and text that are not related), and addressing potential biases in the data. High-quality annotations are essential for training robust image-text matching models.
A dataset for image-text matching might include images of animals paired with captions describing them (e.g., an image of a cat with the caption "A fluffy ginger cat"). To create negative pairs, the image of the cat could be paired with captions describing other animals or unrelated concepts. The annotation process would involve ensuring that the positive pairs are accurately matched and that the negative pairs are clearly non-matching.
# Example demonstrating creation of positive and negative pairs
import random
images = ["image1.jpg", "image2.jpg", "image3.jpg"]
captions = ["A cat sitting on a mat.", "A dog playing with a ball.", "A bird flying in the sky."]
# Create positive pairs
positive_pairs = list(zip(images, captions))
# Create negative pairs (randomly pair images and captions)
negative_pairs = []
for image in images:
for caption in captions:
if (image, caption) not in positive_pairs: # only create pairs that are not correct matches
negative_pairs.append((image, caption))
num_negative_to_sample = len(positive_pairs) # balance the positive/negative samples at 1:1 ratio. You can control how many negative samples to use by setting this value.
sampled_negative_pairs = random.sample(negative_pairs, min(num_negative_to_sample, len(negative_pairs)))
# Combine positive and negative pairs with labels
all_pairs = positive_pairs + sampled_negative_pairs
labels = [1] * len(positive_pairs) + [0] * len(sampled_negative_pairs)
print("Positive Pairs:", positive_pairs)
print("Sampled Negative Pairs:", sampled_negative_pairs)
print("All Pairs:", all_pairs)
print("Labels:", labels)
# In a real application, a dedicated annotation tool might be used
# to manage the annotation process and ensure alignment quality.