Study Card: Bagging VS boosting

Direct Answer

Bagging and boosting are both ensemble methods that combine multiple weak learners to create a stronger model. However, they differ significantly in how they construct and combine these learners. Bagging (Bootstrap Aggregating) trains learners independently on bootstrapped subsets of the data and averages their predictions. Boosting, on the other hand, trains learners sequentially, where each learner focuses on correcting the errors of its predecessors. Consequently, bagging primarily aims to reduce variance, while boosting focuses on reducing bias.

Key Terms

Example

Consider classifying images of cats and dogs.

Code Implementation

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier # Bagging and Boosting examples
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_wine

# Load sample dataset
data = load_wine()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Bagging
bagging_clf = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10, random_state=42) # 10 decision tree whose outputs will be averaged
bagging_clf.fit(X_train, y_train)
bagging_predictions = bagging_clf.predict(X_test)
bagging_accuracy = accuracy_score(y_test, bagging_predictions)

# Boosting (AdaBoost)
boosting_clf = AdaBoostClassifier(n_estimators=50, random_state=42)  # Example with 50 estimators
boosting_clf.fit(X_train, y_train)
boosting_predictions = boosting_clf.predict(X_test)
boosting_accuracy = accuracy_score(y_test, boosting_predictions)

print(f"Bagging Accuracy: {bagging_accuracy:.4f}")
print(f"Boosting Accuracy: {boosting_accuracy:.4f}")

Related Concepts