A A
[ํ˜ผ๊ณต๋จธ์‹ ] Tree's Ensemble - Extra Tree (์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ)

Extra Trees (์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ)

์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ(Extra Trees)๋Š” ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ๋งค์šฐ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•˜๋ฉฐ, ๊ธฐ๋ณธ์ ์œผ๋กœ 100๊ฐœ์˜ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

  • ์ด ๋ชจ๋ธ์€ ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฐ์ • ํŠธ๋ฆฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์›ํ•˜๊ณ , ์ผ๋ถ€ ํŠน์„ฑ์„ ๋žœ๋คํ•˜๊ฒŒ ์„ ํƒํ•˜์—ฌ ๋…ธ๋“œ๋ฅผ ๋ถ„ํ• ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ์˜ ์ฃผ์š” ์ฐจ์ด์ ์€ ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
  • ์ฆ‰, ๊ฐ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ๋งŒ๋“ค ๋•Œ ์ „์ฒด ํ›ˆ๋ จ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋Œ€์‹ , ๋…ธ๋“œ๋ฅผ ๋ถ„ํ• ํ•  ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ๋ถ„ํ• ์„ ์ฐพ์ง€ ์•Š๊ณ  ๋ฌด์ž‘์œ„๋กœ ๋ถ„ํ• ํ•ฉ๋‹ˆ๋‹ค.
  • ์‚ฌ์‹ค, ์ด์ „์— DecisionTreeClassifier์˜ spliter ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ 'random'์œผ๋กœ ์„ค์ •ํ•œ ๊ฒƒ์ด ๋ฐ”๋กœ ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • ๊ฐ ๊ฒฐ์ • ํŠธ๋ฆฌ์—์„œ ํŠน์„ฑ์„ ๋ฌด์ž‘์œ„๋กœ ๋ถ„ํ• ํ•˜๋ฏ€๋กœ ๊ฐœ๋ณ„ ํŠธ๋ฆฌ์˜ ์„ฑ๋Šฅ์€ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ ์—ฌ๋Ÿฌ ํŠธ๋ฆฌ๋ฅผ ์•™์ƒ๋ธ”ํ•˜๋ฉด ๊ณผ๋Œ€์ ํ•ฉ์„ ๋ง‰๊ณ  ๊ฒ€์ฆ ์„ธํŠธ์˜ ์ ์ˆ˜๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‚ฌ์ดํ‚ท๋Ÿฐ์—์„œ ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ๋Š” ExtraTreesClassifier ํด๋ž˜์Šค๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.
from sklearn.ensemble import ExtraTreesClassifier

et = ExtraTreesClassifier(n_jobs=-1, random_state=42)
scores = cross_validate(et, train_input, train_target, return_train_score=True, n_jobs=-1)

print(np.mean(scores['train_score']), np.mean(scores['test_score']))

# 0.9974503966084433 0.8887848893166506
  • ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ ๋ชจ๋ธ์˜ ๊ต์ฐจ ๊ฒ€์ฆ ์ ์ˆ˜๋ฅผ ํ™•์ธํ•œ ๊ฒฐ๊ณผ, ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • ์ด ์˜ˆ์ œ์—์„œ๋Š” ํŠน์„ฑ์ด ๋งŽ์ง€ ์•Š์•„ ๋‘ ๋ชจ๋ธ ๊ฐ„์˜ ์ฐจ์ด๊ฐ€ ํฌ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ๋Š” ๋ฌด์ž‘์œ„์„ฑ์ด ๋” ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ๋ณด๋‹ค ๋” ๋งŽ์€ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ฅผ ํ›ˆ๋ จํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ ๋…ธ๋“œ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ถ„ํ• ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ณ„์‚ฐ ์†๋„๊ฐ€ ๋” ๋น ๋ฅด๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ๋„ ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ํŠน์„ฑ ์ค‘์š”๋„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
์ˆœ์„œ๋Š” [์•Œ์ฝ”์˜ฌ ๋„์ˆ˜, ๋‹น๋„, pH]์ธ๋ฐ, ๊ฒฐ๊ณผ๋ฅผ ๋ณด๋ฉด ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ๋„ ๊ฒฐ์ • ํŠธ๋ฆฌ๋ณด๋‹ค ๋‹น๋„์— ๋Œ€ํ•œ ์˜์กด์„ฑ์ด ์ž‘์Šต๋‹ˆ๋‹ค.
et.fit(train_input, train_target)
print(et.feature_importances_)

# [0.20183568 0.52242907 0.27573525]
  • ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ์˜ ํšŒ๊ท€ ๋ฒ„์ „์€ ExtraTreesRegressor ํด๋ž˜์Šค ์ž…๋‹ˆ๋‹ค.

Summary

์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ (Extra Trees)

  • ์—‘์ŠคํŠธ๋ผ ํŠธ๋ฆฌ: ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ์œ ์‚ฌํ•˜์ง€๋งŒ, ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋…ธ๋“œ๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๋ถ„ํ• ํ•˜์—ฌ ๊ณผ๋Œ€์ ํ•ฉ์„ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค.
  • ์ฃผ์š” ๋งค๊ฐœ๋ณ€์ˆ˜:
    • n_estimators, criterion, max_depth, min_samples_split, max_features: ๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ์™€ ๋™์ผ
    • bootstrap: ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ์ƒ˜ํ”Œ ์‚ฌ์šฉ ์—ฌ๋ถ€ (๊ธฐ๋ณธ๊ฐ’: False)
    • oob_score: OOB ์ƒ˜ํ”Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ํ‰๊ฐ€ ์—ฌ๋ถ€ (๊ธฐ๋ณธ๊ฐ’: False)
    • n_jobs: ๋ณ‘๋ ฌ ์‹คํ–‰์— ์‚ฌ์šฉํ•  CPU ์ฝ”์–ด ์ˆ˜ (๊ธฐ๋ณธ๊ฐ’: 1)