[ML_8] Ensemble : Voting , Bagging , Boosting(+)

2025. 8. 22. 13:45python/ML

Ensemble Method Definition

An ensemble method is a machine learning approach that combines multiple weak learners (individual models) to produce a stronger and more accurate predictive model.
The key idea is that by aggregating the predictions of several models, the ensemble can reduce variance, bias, or improve generalization compared to a single model. 

 

Representative Types of Ensemble Methods

  1. Voting
    • Concept: Voting combines predictions from multiple models of different types (e.g., logistic regression, decision tree, k-nearest neighbors) that are all trained on the same dataset.
    • How it works:
      • Hard Voting: Each model votes for a class label, and the class with the majority votes is selected.
      • Soft Voting: Each model outputs class probabilities, and the probabilities are averaged to make the final decision.
    • Key Point: Voting is a model-agnostic ensemble technique since it can combine any type of algorithm.
#%% 
#### Voting classifier 
from sklearn.ensemble import VotingClassifier 
from sklearn.linear_model import LogisticRegression 
from sklearn.neighbors import KNeighborsClassifier 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import accuracy_score 
from sklearn.datasets import load_breast_cancer 
import pandas as pd 

bc = load_breast_cancer() 
data_df = pd.DataFrame(bc.data, columns = bc.feature_names)   

#%% 
lr_ml = LogisticRegression(solver = 'liblinear') 
kc_ml = KNeighborsClassifier(n_neighbors= 8) 

vo_clf = VotingClassifier( estimators = [('LR', lr_ml), ('KNN', kc_ml)] , voting = 'soft') 

X_train, X_test, y_train, y_test = train_test_split(bc.data, bc.target, test_size = 0.2, random_state=156) 

vo_clf.fit(X_train, y_train) 
y_pred = vo_clf.predict(X_test) 
acc_score = accuracy_score(y_test, y_pred) 
print(f"voting acc_score : {acc_score}") 

classifiers = [lr_ml, kc_ml] 

for classifier in classifiers: 
    classifier.fit(X_train, y_train) 
    y_pred = classifier.predict(X_test) 
    class_name = classifier.__class__.__name__ 
    print(f"class_name : {class_name} ,acc_score : {accuracy_score(y_test,y_pred)}")

  1. Bagging (Bootstrap Aggregating)
    • Concept: Bagging uses the same algorithm (e.g., decision trees) but trains each model on a different bootstrapped sample of the data.
    • Bootstrapping: Sampling with replacement, which means some data points may appear multiple times in a sample while others may not appear at all.
    • How it works:
      • Each model (e.g., tree) is trained independently on its own random sample.
      • The final prediction is made by averaging (for regression) or voting (for classification) the outputs of all models.
    • Example: Random Forest is a well-known bagging-based ensemble of decision trees.

  1. Boosting
    • Concept: Boosting trains models sequentially, where each new model focuses on correcting the errors of the previous models.
    • How it works:
      • The first model is trained on the dataset.
      • Subsequent models are trained on data points that were misclassified or had high error, giving them higher weights.
      • Final predictions are made by combining all models with weighted voting or averaging.
    • Key Intuition: Boosting “boosts” weak learners by focusing more on difficult cases.
    • Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM.

 

 

 

 

 

 

Ref : https://medium.com/@chyun55555/ensemble-learning-voting-and-bagging-with-python-40de683b8ff0 

 

Ensemble Learning — Voting and Bagging with Python

In machine learning, ensemble learning uses multiple learning algorithms in order to obtain better performance results. In classification…

medium.com

https://www.nb-data.com/p/comparing-model-ensembling-bagging

 

Comparing Model Ensembling: Bagging, Boosting, and Stacking - NBD Lite #7

Simple summary of the popular methodologies

www.nb-data.com

 

'python > ML' 카테고리의 다른 글

[ML] MNIST_Hand Digit with CrossEntropy and Matrix for  (0) 2025.08.12
[ML] MNIST_Hand written code  (0) 2025.08.09
[ML] Bayesian Concept learning  (0) 2025.08.04
[Probability] Bayes Rule  (0) 2025.08.03
[Linear_algebra] Null space  (0) 2025.08.01