Study Card: t-SNE (t-distributed Stochastic Neighbor Embedding)

Direct Answer

Key Terms

Example

Imagine you have a dataset of thousands of images, each represented by a high-dimensional vector of pixel values. t-SNE can be used to map these high-dimensional vectors to a 2D plane, where similar images are clustered together visually. This allows you to see patterns and clusters in the image data that would be difficult or impossible to discern in the original high-dimensional space. For instance, images of cats might cluster in one region, images of dogs in another, and images of cars in a third.

Code Implementation

import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.datasets import load_digits

# Load a sample dataset (digits dataset)
digits = load_digits()
X = digits.data
y = digits.target

# Apply t-SNE
tsne = TSNE(n_components=2, perplexity=30.0, learning_rate=200.0, n_iter=1000, random_state=42) # Example parameters. Tuning might be needed.
X_embedded = tsne.fit_transform(X)

# Plot the embedded data
plt.figure(figsize=(10, 8))
plt.scatter(X_embedded[:, 0], X_embedded[:, 1], c=y, cmap='Spectral')
plt.colorbar(label="Digit Label")
plt.title("t-SNE Visualization of Digits Dataset")
plt.xlabel("t-SNE Component 1")
plt.ylabel("t-SNE Component 2")
plt.show()

Related Concepts