μ΄λ²μ Emsemble Methods (μμλΈ κΈ°λ²)μ λ°νμ¬ νλ² μμλ³΄κ² μ΅λλ€.
μμλΈ κΈ°λ²μ μ¬λ¬ κ°μ μμΈ‘ λͺ¨λΈμ κ²°ν©νμ¬ λ¨μΌ λͺ¨λΈλ³΄λ€ λ λμ μ±λ₯μ μ»λ λ°©λ²μ λλ€.
μ΄λ₯Ό ν΅ν΄ μμΈ‘μ μ νλλ₯Ό λμ΄κ³ , λͺ¨λΈμ μμ μ±μ ν₯μμν€λ©°, κ³Όμ ν©μ μ€μΌ μ μμ΅λλ€.
Emsemble κΈ°λ²μ λͺ©μ
μμλΈ κΈ°λ²μ μ¬μ©νλ λͺ©μ μ κ³Όμ° λ¬΄μμΌκΉμ?
- μμΈ‘ μ±λ₯ ν₯μ: μ¬λ¬ λͺ¨λΈμ κ²°ν©νμ¬ κ°λ³ λͺ¨λΈλ³΄λ€ λ λμ μμΈ‘ μ νλλ₯Ό λ¬μ±ν©λλ€.
- κ³Όμ ν© κ°μ: λ€μν λͺ¨λΈμ κ²°κ³Όλ₯Ό κ²°ν©ν¨μΌλ‘μ¨ κ°λ³ λͺ¨λΈμ΄ νμ΅ λ°μ΄ν°μ κ³Όμ ν©λλ κ²μ λ°©μ§ν©λλ€.
- μμ μ± ν₯μ: λͺ¨λΈμ λ³λμ±μ μ€μ΄κ³ μμΈ‘μ μΌκ΄μ±μ λμ΄λ λ° λμμ μ€λλ€.
Emsemble κΈ°λ²μ μ’ λ₯
μμλΈ κΈ°λ²μ 3κ°μ§μ μ’ λ₯κ° μμ΅λλ€. μλμμ μμΈν νλ² μ€λͺ ν΄ λ³΄κ² μ΅λλ€.
- λ°°κΉ (Bagging)
- λΆμ€ν (Boosting)
- μ€ννΉ(Stacking)
λ°°κΉ (Bagging)
λ°°κΉ (Bootstrap Aggregating)μ μ¬λ¬ λͺ¨λΈμ λ³λ ¬μ μΌλ‘ νμ΅νκ³ ,
μμΈ‘μ νκ· λ΄κ±°λ λ€μκ²° ν¬νλ₯Ό ν΅ν΄ μ΅μ’ μμΈ‘μ κ²°μ νλ λ°©λ²μ λλ€.
Baggingμ μ리
λΆνΈμ€νΈλ© μνλ§
- μλ³Έ λ°μ΄ν°μ μμ μ€λ³΅μ νμ©νμ¬ μ¬λ¬ κ°μ μνμ 무μμλ‘ μΆμΆν©λλ€. μ΄λ κ° λͺ¨λΈμ΄ μλ‘ λ€λ₯Έ μνμ νμ΅νλλ‘ νμ¬ λ€μμ±μ ν보ν©λλ€.
κ°λ³ λͺ¨λΈ νμ΅
- κ° λΆνΈμ€νΈλ© μνμ μ¬μ©νμ¬ κ°λ³ λͺ¨λΈμ νμ΅ν©λλ€. κ° λͺ¨λΈμ λμΌν μκ³ λ¦¬μ¦μ μ¬μ©νμ§λ§, μλ‘ λ€λ₯Έ λ°μ΄ν°λ‘ νμ΅λ©λλ€.
μμΈ‘ κ²°ν©
- λͺ¨λ λͺ¨λΈμ μμΈ‘μ νκ· λ΄κ±°λ, λ€μκ²° ν¬νλ₯Ό ν΅ν΄ μ΅μ’ μμΈ‘μ κ²°μ ν©λλ€. μ΄ κ³Όμ μμ λͺ¨λΈ κ°μ μ€μ°¨κ° μμλμ΄ μμΈ‘ μ±λ₯μ΄ ν₯μλ©λλ€.
λν μκ³ λ¦¬μ¦
- λλ€ ν¬λ μ€νΈ(Random Forest): μ¬λ¬ κ°μ κ²°μ νΈλ¦¬λ₯Ό λ°°κΉ λ°©μμΌλ‘ κ²°ν©ν μκ³ λ¦¬μ¦μ λλ€.
λΆμ€ν (Boosting)
λΆμ€ν (Boosting)μ λͺ¨λΈμ μμ°¨μ μΌλ‘ νμ΅μν€λ©°,
μ΄μ λͺ¨λΈμ΄ μλͺ» μμΈ‘ν μνμ λ ν° κ°μ€μΉλ₯Ό λΆμ¬νμ¬ μ€λ₯λ₯Ό 보μ νλ λ°©λ²μ λλ€.
λΆμ€ν μ μ리
μ΄κΈ° λͺ¨λΈ νμ΅
- 첫 λ²μ§Έ λͺ¨λΈμ νμ΅μν΅λλ€. μ΄κΈ° λͺ¨λΈμ μ 체 λ°μ΄ν°μ μ μ¬μ©νμ¬ νμ΅λ©λλ€.
μ€λ₯ μν κ°μ€μΉ μ¦κ°
- 첫 λ²μ§Έ λͺ¨λΈμ΄ μλͺ» μμΈ‘ν μνμ κ°μ€μΉλ₯Ό μ¦κ°μν΅λλ€. μ΄λ μ΄ν λͺ¨λΈμ΄ μ΄ μ€λ₯λ₯Ό λ μ νμ΅ν μ μλλ‘ λμ΅λλ€.
μμ°¨μ λͺ¨λΈ νμ΅
- κ°μ€μΉκ° μ‘°μ λ μνμ μ¬μ©νμ¬ λ€μ λͺ¨λΈμ νμ΅ν©λλ€. μ΄ κ³Όμ μ΄ λ°λ³΅λλ©΄μ λͺ¨λΈλ€μ΄ μ°¨λ‘λ‘ νμ΅λ©λλ€.
μμΈ‘ κ²°ν©
- λͺ¨λ λͺ¨λΈμ μμΈ‘μ κ°μ€ νκ· νμ¬ μ΅μ’ μμΈ‘μ λ§λλλ€. κ° λͺ¨λΈμ μ±λ₯μ λ°λΌ κ°μ€μΉκ° λΆμ¬λ©λλ€.
λν μκ³ λ¦¬μ¦
- AdaBoost
- Gradient Boosting
- XGBoost
μ€ννΉ(Stacking)
μ€ννΉ(Stacking)μ μ¬λ¬ λͺ¨λΈμ μμΈ‘ κ²°κ³Όλ₯Ό μ λ ₯μΌλ‘ μ¬μ©νμ¬ λ©ν λͺ¨λΈμ νμ΅μν€λ λ°©λ²μ λλ€.
μ€ννΉμ μ리
κΈ°λ³Έ λͺ¨λΈ νμ΅
- μ¬λ¬ κ°μ κΈ°λ³Έ λͺ¨λΈμ νμ΅μν΅λλ€. κ° κΈ°λ³Έ λͺ¨λΈμ λ 립μ μΌλ‘ νμ΅λλ©°, μλ‘ λ€λ₯Έ μκ³ λ¦¬μ¦μ μ¬μ©ν μ μμ΅λλ€.
λ©ν λͺ¨λΈ νμ΅
- κΈ°λ³Έ λͺ¨λΈμ μμΈ‘ κ²°κ³Όλ₯Ό μ λ ₯μΌλ‘ μ¬μ©νμ¬ λ©ν λͺ¨λΈμ νμ΅μν΅λλ€. λ©ν λͺ¨λΈμ κΈ°λ³Έ λͺ¨λΈμ κ²°κ³Όλ₯Ό κ²°ν©νμ¬ μ΅μ’ μμΈ‘μ λ§λλλ€.
μ΅μ’ μμΈ‘
- λ©ν λͺ¨λΈμ μμΈ‘ κ²°κ³Όλ₯Ό μ΅μ’ μμΈ‘μΌλ‘ μ¬μ©ν©λλ€. μ΄ κ³Όμ μμ κΈ°λ³Έ λͺ¨λΈλ€μ΄ κ°μ§ μ₯μ μ μ΅λν νμ©ν μ μμ΅λλ€.
λν μκ³ λ¦¬μ¦
- λ€μΈ΅ νΌμ νΈλ‘ (MLP)μ λ©ν λͺ¨λΈλ‘ μ¬μ©νλ μ€ννΉ λ°©λ²μ΄ λνμ μ λλ€.
Emsemble κΈ°λ²μ μ₯, λ¨μ
μμλΈ κΈ°λ²μ μ₯μ
- μμΈ‘ μ±λ₯ ν₯μ: μ¬λ¬ λͺ¨λΈμ μμΈ‘μ κ²°ν©νμ¬ λ λμ μ νλλ₯Ό μ»μ μ μμ΅λλ€.
- κ³Όμ ν© κ°μ: λ€μν λͺ¨λΈμ κ²°κ³Όλ₯Ό κ²°ν©ν¨μΌλ‘μ¨ κ°λ³ λͺ¨λΈμ κ³Όμ ν©μ λ°©μ§ν μ μμ΅λλ€.
- μμ μ± ν₯μ: λͺ¨λΈμ λ³λμ±μ μ€μ΄κ³ μμΈ‘μ μΌκ΄μ±μ λμΌ μ μμ΅λλ€.
μμλΈ κΈ°λ²μ λ¨μ
- 볡μ‘μ± μ¦κ°: μ¬λ¬ λͺ¨λΈμ νμ΅μν€κ³ κ²°ν©νλ κ³Όμ μ΄ λ³΅μ‘ν μ μμΌλ©°, λͺ¨λΈμ μ€κ³μ νλμ΄ κΉλ€λ‘μΈ μ μμ΅λλ€.
- ν΄μ μ΄λ €μ: λ¨μΌ λͺ¨λΈμ λΉν΄ κ²°κ³Όλ₯Ό ν΄μνλ κ²μ΄ λ μ΄λ €μΈ μ μμ΅λλ€. νΉν, μ€ννΉμ²λΌ 볡μ‘ν μμλΈ κΈ°λ²μ ν΄μμ΄ μ΄λ ΅μ΅λλ€.
- κ³μ° λΉμ©: μ¬λ¬ λͺ¨λΈμ νμ΅μν€λ λ° μκ°μ΄ λ§μ΄ 걸리며, κ³μ° μμλ λ§μ΄ μλͺ¨λ©λλ€. λκ·λͺ¨ λ°μ΄ν°μ μμ νΉν κ·Έλ μ΅λλ€.
Emsemble Method Example Code
λ°°κΉ (Bagging) Example
# νμν λΌμ΄λΈλ¬λ¦¬ μν¬νΈ
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Iris λ°μ΄ν°μ
λ‘λ
iris = load_iris()
X = iris.data
y = iris.target
# νμ΅ λ°μ΄ν°μ ν
μ€νΈ λ°μ΄ν°λ‘ λΆν
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Random Forest λͺ¨λΈ μμ±
rf = RandomForestClassifier(n_estimators=100, random_state=42)
# λͺ¨λΈ νμ΅
rf.fit(X_train, y_train)
# μμΈ‘ μν
y_pred = rf.predict(X_test)
# μ νλ κ³μ°
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
# νΌλ νλ ¬ μκ°ν
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
# λΆλ₯ 리ν¬νΈ μΆλ ₯
print(classification_report(y_test, y_pred, target_names=iris.target_names))
precision recall f1-score support
setosa 1.00 1.00 1.00 19
versicolor 1.00 1.00 1.00 13
virginica 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
λΆμ€ν (Boosting) Example - Gradient Boosting
# νμν λΌμ΄λΈλ¬λ¦¬ μν¬νΈ
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Iris λ°μ΄ν°μ
λ‘λ
iris = load_iris()
X = iris.data
y = iris.target
# νμ΅ λ°μ΄ν°μ ν
μ€νΈ λ°μ΄ν°λ‘ λΆν
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Gradient Boosting λͺ¨λΈ μμ±
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
# λͺ¨λΈ νμ΅
gb.fit(X_train, y_train)
# μμΈ‘ μν
y_pred = gb.predict(X_test)
# μ νλ κ³μ°
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
# νΌλ νλ ¬ μκ°ν
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
# λΆλ₯ 리ν¬νΈ μΆλ ₯
print(classification_report(y_test, y_pred, target_names=iris.target_names))
precision recall f1-score support
setosa 1.00 1.00 1.00 19
versicolor 1.00 1.00 1.00 13
virginica 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
μ€ννΉ(Stacking) Example
# νμν λΌμ΄λΈλ¬λ¦¬ μν¬νΈ
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
# Iris λ°μ΄ν°μ
λ‘λ
iris = load_iris()
X = iris.data
y = iris.target
# νμ΅ λ°μ΄ν°μ ν
μ€νΈ λ°μ΄ν°λ‘ λΆν
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# κΈ°λ³Έ λͺ¨λΈ μ μ
estimators = [
('rf', RandomForestClassifier(n_estimators=10, random_state=42)),
('svr', SVC(kernel='rbf', probability=True, random_state=42))
]
# μ€ννΉ λͺ¨λΈ μμ±
stacking = StackingClassifier(estimators=estimators, final_estimator=LogisticRegression())
# λͺ¨λΈ νμ΅
stacking.fit(X_train, y_train)
# μμΈ‘ μν
y_pred = stacking.predict(X_test)
# μ νλ κ³μ°
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
# νΌλ νλ ¬ μκ°ν
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(10, 7))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix')
plt.show()
# λΆλ₯ 리ν¬νΈ μΆλ ₯
print(classification_report(y_test, y_pred, target_names=iris.target_names))
precision recall f1-score support
setosa 1.00 1.00 1.00 19
versicolor 1.00 1.00 1.00 13
virginica 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
'π Data Engineering > π Machine Learning' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[ML] Reinforcement Learning (κ°ν νμ΅) - Q-Learning (0) | 2024.08.27 |
---|---|
[ML] Recommender System (μΆμ²μμ€ν ) (0) | 2024.08.26 |
[ML] μ°κ΄ κ·μΉ νμ΅ (Association Rule Learning) (0) | 2024.08.22 |
[ML] t-SNE (t-Distributed Stochastic Neighbor Embedding) (0) | 2024.08.20 |
[ML] Isomap (μμ΄μ맡) (0) | 2024.08.20 |