2025. 8. 22. 13:45ㆍpython/ML
Ensemble Method Definition
An ensemble method is a machine learning approach that combines multiple weak learners (individual models) to produce a stronger and more accurate predictive model.
The key idea is that by aggregating the predictions of several models, the ensemble can reduce variance, bias, or improve generalization compared to a single model.
Representative Types of Ensemble Methods
- Voting
- Concept: Voting combines predictions from multiple models of different types (e.g., logistic regression, decision tree, k-nearest neighbors) that are all trained on the same dataset.
- How it works:
- Hard Voting: Each model votes for a class label, and the class with the majority votes is selected.
- Soft Voting: Each model outputs class probabilities, and the probabilities are averaged to make the final decision.
- Key Point: Voting is a model-agnostic ensemble technique since it can combine any type of algorithm.
#%%
#### Voting classifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
import pandas as pd
bc = load_breast_cancer()
data_df = pd.DataFrame(bc.data, columns = bc.feature_names)
#%%
lr_ml = LogisticRegression(solver = 'liblinear')
kc_ml = KNeighborsClassifier(n_neighbors= 8)
vo_clf = VotingClassifier( estimators = [('LR', lr_ml), ('KNN', kc_ml)] , voting = 'soft')
X_train, X_test, y_train, y_test = train_test_split(bc.data, bc.target, test_size = 0.2, random_state=156)
vo_clf.fit(X_train, y_train)
y_pred = vo_clf.predict(X_test)
acc_score = accuracy_score(y_test, y_pred)
print(f"voting acc_score : {acc_score}")
classifiers = [lr_ml, kc_ml]
for classifier in classifiers:
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
class_name = classifier.__class__.__name__
print(f"class_name : {class_name} ,acc_score : {accuracy_score(y_test,y_pred)}")
- Bagging (Bootstrap Aggregating)
- Concept: Bagging uses the same algorithm (e.g., decision trees) but trains each model on a different bootstrapped sample of the data.
- Bootstrapping: Sampling with replacement, which means some data points may appear multiple times in a sample while others may not appear at all.
- How it works:
- Each model (e.g., tree) is trained independently on its own random sample.
- The final prediction is made by averaging (for regression) or voting (for classification) the outputs of all models.
- Example: Random Forest is a well-known bagging-based ensemble of decision trees.
- Boosting
- Concept: Boosting trains models sequentially, where each new model focuses on correcting the errors of the previous models.
- How it works:
- The first model is trained on the dataset.
- Subsequent models are trained on data points that were misclassified or had high error, giving them higher weights.
- Final predictions are made by combining all models with weighted voting or averaging.
- Key Intuition: Boosting “boosts” weak learners by focusing more on difficult cases.
- Examples: AdaBoost, Gradient Boosting, XGBoost, LightGBM.
Ref : https://medium.com/@chyun55555/ensemble-learning-voting-and-bagging-with-python-40de683b8ff0
Ensemble Learning — Voting and Bagging with Python
In machine learning, ensemble learning uses multiple learning algorithms in order to obtain better performance results. In classification…
medium.com
https://www.nb-data.com/p/comparing-model-ensembling-bagging
Comparing Model Ensembling: Bagging, Boosting, and Stacking - NBD Lite #7
Simple summary of the popular methodologies
www.nb-data.com
'python > ML' 카테고리의 다른 글
[ML] MNIST_Hand Digit with CrossEntropy and Matrix for (0) | 2025.08.12 |
---|---|
[ML] MNIST_Hand written code (0) | 2025.08.09 |
[ML] Bayesian Concept learning (0) | 2025.08.04 |
[Probability] Bayes Rule (0) | 2025.08.03 |
[Linear_algebra] Null space (0) | 2025.08.01 |