Study Card: Bagging VS boosting

Direct Answer

Bagging and boosting are both ensemble methods that combine multiple weak learners to create a stronger model. However, they differ significantly in how they construct and combine these learners. Bagging (Bootstrap Aggregating) trains learners independently on bootstrapped subsets of the data and averages their predictions. Boosting, on the other hand, trains learners sequentially, where each learner focuses on correcting the errors of its predecessors. Consequently, bagging primarily aims to reduce variance, while boosting focuses on reducing bias.

Key Terms

Ensemble Method: A machine learning technique that combines multiple base learners to improve predictive performance.
Bootstrap Aggregating (Bagging): An ensemble method that creates multiple subsets of data by sampling with replacement and trains a base learner on each subset. Final predictions are aggregated usually through averaging or voting depending on problem.
Boosting: An ensemble method that trains base learners sequentially, each learner focusing on correcting errors made by previous ones through iteratively updating weights given to samples based on their classification difficulty in prior models thus boosting their contribution for next training iterations.
Bias: Error introduced by approximating real-world problems with simplified models, potentially leading to underfitting if not managed.
Variance: Sensitivity to fluctuations in the training data, potentially leading to overfitting.

Example

Consider classifying images of cats and dogs.

Bagging: We create several subsets of the image data, each a random sample with replacement. We train a separate decision tree on each subset. To classify a new image, each tree makes a prediction, and we take the majority vote as the final classification. This reduces variance and increase generalizability by combining decisions from different trees trained on different data which may have contained outlier images whose high impact has been reduced through the averaging process.
Boosting: We train an initial, simple model (e.g., a shallow decision tree). Then, we train a second model that focuses on the images misclassified by the first one, effectively giving them higher weights for next model. We continue this process sequentially, each model focusing on the mistakes of its predecessors. The combined prediction comes from a weighted combination of all models sequentially built, where each model specializes in different types of images that were difficult to classify for other models. This specialization step decreases overall bias and leads to higher prediction accuracy.

Code Implementation

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier, AdaBoostClassifier # Bagging and Boosting examples
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_wine

# Load sample dataset
data = load_wine()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Bagging
bagging_clf = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=10, random_state=42) # 10 decision tree whose outputs will be averaged
bagging_clf.fit(X_train, y_train)
bagging_predictions = bagging_clf.predict(X_test)
bagging_accuracy = accuracy_score(y_test, bagging_predictions)

# Boosting (AdaBoost)
boosting_clf = AdaBoostClassifier(n_estimators=50, random_state=42)  # Example with 50 estimators
boosting_clf.fit(X_train, y_train)
boosting_predictions = boosting_clf.predict(X_test)
boosting_accuracy = accuracy_score(y_test, boosting_predictions)

print(f"Bagging Accuracy: {bagging_accuracy:.4f}")
print(f"Boosting Accuracy: {boosting_accuracy:.4f}")

Related Concepts

Bias-Variance Tradeoff: Bagging mainly addresses variance, while boosting tackles bias. Interviewers might ask about this tradeoff and how each method affects it. A good discussion point would be ensemble's ability to perform via reduction of variance hence better generalizability with reduced sensitivity to particularities of training sample.
Random Forests: A specific type of bagging that uses decision trees and random feature subsets during training, a specialized type of bagging hence inherits its properties related to variance, bias and generalizability, hence discussions related to Random Forest would benefit discussion on bagging, boosting, and its performance with optimal parameter selections.
Gradient Boosting Machines (GBM): A popular boosting algorithm where multiple weak learners learn from their predecessors' errors improving generalizability by decreasing bias. Discussion points include the advantages and disadvantages compared to other boosting methods.
Stacking: Another ensemble technique where the predictions of multiple base learners are used as input to a meta-learner, rather than simple voting or averaging. Key discussion points include how this addresses model limitations.