A A
[ML] Support Vector Machine (SVM, ์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹ )
์ด๋ฒˆ์—๋Š” Support Vector Machine (์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹ )์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹ (Support Vector Machine, SVM)์€ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ํšจ๊ณผ์ ์ธ ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ์ง€๋„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ž…๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ์ตœ์ ์˜ ์ดˆํ‰๋ฉด(๊ฒฐ์ • ๊ฒฝ๊ณ„)์„ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.

SVM์˜ ์ฃผ์š” ํŠน์ง•๊ณผ ์›๋ฆฌ๋ฅผ ์ž์„ธํžˆ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Support Vector Machine (SVM)์˜ ์ฃผ์š” ํŠน์ง•

  • ๊ฒฐ์ • ์ดˆํ‰๋ฉด(Decision Hyperplane): ๋‘ ํด๋ž˜์Šค๋ฅผ ๋ถ„๋ฆฌํ•˜๋Š” ๊ฐ€์žฅ ์ข‹์€ ์ดˆํ‰๋ฉด์„ ์ฐพ์Šต๋‹ˆ๋‹ค. ์ด ํ‰๋ฉด์€ ๋‘ ํด๋ž˜์Šค ๊ฐ„์˜ ๋งˆ์ง„(๊ฑฐ๋ฆฌ)์„ ์ตœ๋Œ€ํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • w: ์ดˆํ‰๋ฉด์˜ ๋ฒ•์„  ๋ฒกํ„ฐ, x: ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ, b: ์ ˆํŽธ
    • w * x + b = 0
  • ์„œํฌํŠธ ๋ฒกํ„ฐ(Support Vectors): ๊ฒฐ์ • ๊ฒฝ๊ณ„์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์ด ์œ„์น˜ํ•œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค๋กœ, ๊ฒฝ๊ณ„์˜ ์œ„์น˜์™€ ๋ฐฉํ–ฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
    • ||w||: ๋ฒ•์„  ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ
    • M = 2 / ||w||
  • ๋งˆ์ง„ ์ตœ๋Œ€ํ™”(Maximizing Margin): ์„œํฌํŠธ ๋ฒกํ„ฐ์™€ ๊ฒฐ์ • ์ดˆํ‰๋ฉด ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ(๋งˆ์ง„)์„ ์ตœ๋Œ€ํ™”ํ•˜์—ฌ, ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ž…๋‹ˆ๋‹ค.

yi: ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ xi์˜ ๋ ˆ์ด๋ธ”


SVM์˜ ๊ธฐ๋ณธ ์›๋ฆฌ

 

  • ์ดˆํ‰๋ฉด(Hyperplane):
    • w⋅x + b = 0 ํ˜•ํƒœ์˜ ์„ ํ˜• ๋ฐฉ์ •์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
    • ww๋Š” ์ดˆํ‰๋ฉด์˜ ๋ฒ•์„  ๋ฒกํ„ฐ์ด๊ณ , ๋Š” ์ ˆํŽธ์ž…๋‹ˆ๋‹ค.
  • ๋งˆ์ง„(Margin):
    • ์ดˆํ‰๋ฉด์œผ๋กœ๋ถ€ํ„ฐ ์„œํฌํŠธ ๋ฒกํ„ฐ๊นŒ์ง€์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ์ž…๋‹ˆ๋‹ค.
    • ๋งˆ์ง„์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชจ๋ธ์˜ ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.
  • ์ปค๋„ ํŠธ๋ฆญ(Kernel Trick):
    • ์„ ํ˜• ๋ถ„๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด์„œ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•˜์—ฌ ์„ ํ˜•์ ์œผ๋กœ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
    • ์ปค๋„ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์˜ ๋‚ด์ ์„ ๊ฐ„์ ‘์ ์œผ๋กœ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.

Kernel Trick (์ปค๋„ ํŠธ๋ฆญ)

  • SVM์€ ๋ณธ์งˆ์ ์œผ๋กœ ์„ ํ˜• ๋ถ„๋ฆฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค.
  • ๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ์— ์ปค๋„ํŠธ๋ฆญ์„ ์‚ฌ์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์„ ํ˜•์ ์œผ๋กœ ๋ถ„๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๋งคํ•‘์„ ๋ช…์‹œ์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ ๋„ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์—์„œ์˜ ๋‚ด์ (Inner Product)์„ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ์ปค๋„์€? ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

์ฃผ์š” Kernel Function (์ปค๋„ ํ•จ์ˆ˜)

Kernel Function์€ ์ฃผ๋กœ 4๊ฐœ์˜ ํ•จ์ˆ˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1. Linear Kernel (์„ ํ˜• ์ปค๋„)

 

2. Polynomial Kernel (๋‹คํ•ญ ์ปค๋„)

3. RBF ์ปค๋„ (Radial Basis Function, Gaussian Kernel)

  • ์—ฌ๊ธฐ์„œ, γ๋Š” ์ปค๋Ÿด ํ•จ์ˆ˜์˜ ํญ์„ ์กฐ์ ˆ ํ•˜๋Š” ์—ญํ•  ์ž…๋‹ˆ๋‹ค.

4. Sigmoid Kernel (์‹œ๊ทธ๋ชจ์ด๋“œ ์ปค๋„)

  • ์—ฌ๊ธฐ์„œ a, c๋Š” HyperParameter๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.


SVM์˜ ์žฅ, ๋‹จ์ 

์žฅ์ 

  • ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์—์„œ๋„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.
  • ๋งˆ์ง„ ์ตœ๋Œ€ํ™”๋ฅผ ํ†ตํ•œ ๋†’์€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๋ณด์ž…๋‹ˆ๋‹ค.
  • ๋น„์„ ํ˜• ๋ถ„๋ฅ˜๊ฐ€ ๊ฐ€๋Šฅํ•œ ์ปค๋„ ํŠธ๋ฆญ์˜ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๋‹จ์ 

  • ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ํ•™์Šต ์‹œ๊ฐ„์ด ๊ธธ์–ด์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ปค๋„์˜ ์„ ํƒ๊ณผ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์ด ์ค‘์š”ํ•˜๋ฉฐ, ์ด์— ๋”ฐ๋ผ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์˜ํ–ฅ์„ ๋ฐ›์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ํ•จ๊ป˜ ์ •ํ™•ํ•œ ๋ชจ๋ธ ํŠœ๋‹์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

SVM Example Code

# ์„œํฌํŠธ ๋ฒกํ„ฐ ๋จธ์‹  (SVM) ์˜ˆ์ œ

# ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

 

# ์™€์ธ ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
wine = load_wine()
X, y = wine.data, wine.target

# ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„ํ• 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# SVM ๋ชจ๋ธ ํ•™์Šต
svm = SVC(kernel='linear', random_state=42)
svm.fit(X_train, y_train)

# ์˜ˆ์ธก ๋ฐ ํ‰๊ฐ€
y_pred = svm.predict(X_test)
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        14
           1       1.00      0.93      0.96        14
           2       0.89      1.00      0.94         8

    accuracy                           0.97        36
   macro avg       0.96      0.98      0.97        36
weighted avg       0.98      0.97      0.97        36
# ํ˜ผ๋™ ํ–‰๋ ฌ ์‹œ๊ฐํ™”
ConfusionMatrixDisplay.from_estimator(svm, X_test, y_test)
plt.title("SVM Confusion Matrix")
plt.show()