A A
[ML] Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)
์ด๋ฒˆ์—” Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€(Logistic Regression)๋Š” ์ฃผ๋กœ ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ํ†ต๊ณ„์  ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
์ž…๋ ฅ๋œ ๋…๋ฆฝ ๋ณ€์ˆ˜๋“ค์˜ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ํ†ตํ•ด ์ข…์† ๋ณ€์ˆ˜(์ด์ง„ ๋ณ€์ˆ˜)์˜ ๋ฐœ์ƒ ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.


๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ์ฃผ์š” ํŠน์ง•

 

  • ๋ถ„๋ฅ˜ ์•Œ๊ณ ๋ฆฌ์ฆ˜: ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ์ฃผ๋กœ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ๋„ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ™•๋ฅ  ์ถœ๋ ฅ: ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ 0๊ณผ 1 ์‚ฌ์ด์˜ ํ™•๋ฅ  ๊ฐ’์œผ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ์„ ํ˜• ํšŒ๊ท€์™€์˜ ์ฐจ์ด์ : ์„ ํ˜• ํšŒ๊ท€๋Š” ์—ฐ์†์ ์ธ ๊ฐ’์„ ์˜ˆ์ธกํ•˜์ง€๋งŒ, ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ด์ง„ ๊ฐ’์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ๊ธฐ๋ณธ ์›๋ฆฌ

๊ทธ๋Ÿฌ๋ฉด, Logistic Regression (๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€)์˜ ๊ธฐ๋ณธ ์›๋ฆฌ๋Š” ๋ฌด์—‡์ด ์žˆ์„๊นŒ์š”? ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1. ์„ ํ˜• ๋ชจ๋ธ

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ž…๋ ฅ ๋ณ€์ˆ˜ (X)์™€ ๊ฐ€์ค‘์น˜ (W)์˜ ์„ ํ˜• ๊ฒฐํ•ฉ์„ ํ†ตํ•ด ์˜ˆ์ธก ๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ์„ ํ˜• ๊ฒฐํ•ฉ์€ ์•„๋ž˜์˜ ์ˆ˜์‹๊ณผ ๊ฐ™์ด ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ (b)๋Š” ํŽธํ–ฅ(bias)์ž…๋‹ˆ๋‹ค. ์ด (z) ๊ฐ’์€ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜์— ์ž…๋ ฅ๋˜์–ด ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

 

 

2. ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜ (์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜)

์„ ํ˜• ๊ฒฐํ•ฉ์˜ ๊ฒฐ๊ณผ (z)๋ฅผ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜์— ์ ์šฉํ•˜์—ฌ 0๊ณผ 1 ์‚ฌ์ด์˜ ํ™•๋ฅ  ๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜๋Š” ์•„๋ž˜์˜ ์ˆ˜์‹๊ณผ ๊ฐ™์ด ์ •์˜๋ฉ๋‹ˆ๋‹ค.

์ด ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ ๊ฐ’์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์ถœ๋ ฅ์ด 0์—์„œ 1๋กœ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋ณ€ํ™”ํ•˜๋Š” S์ž ํ˜•ํƒœ์˜ ๊ณก์„ ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

3. Decision Border (๊ฒฐ์ • ๊ฒฝ๊ณ„)

์ถœ๋ ฅ ํ™•๋ฅ ์ด 0.5๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ์ถœ๋ ฅ ํ™•๋ฅ ์ด 0.5 ์ด์ƒ์ด๋ฉด ์–‘์„ฑ ํด๋ž˜์Šค(1)๋กœ, 0.5 ๋ฏธ๋งŒ์ด๋ฉด ์Œ์„ฑ ํด๋ž˜์Šค(0)๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

์ด ๊ฒฐ์ • ๊ฒฝ๊ณ„๋Š” ์„ ํ˜•์ ์œผ๋กœ ์„ค์ •๋˜์ง€๋งŒ, ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์— ๋”ฐ๋ผ ๋‹ค๋ฅด๊ฒŒ ์กฐ์ •๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


 

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ์žฅ์ , ๋‹จ์ 

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ์žฅ์ 

  • ํ•ด์„ ์šฉ์ด์„ฑ: ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ์ถœ๋ ฅ์ด ํ™•๋ฅ ์ด๊ธฐ ๋•Œ๋ฌธ์—, ๊ฒฐ๊ณผ๋ฅผ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์ด ์šฉ์ดํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํŠน์ • ์ž…๋ ฅ์— ๋Œ€ํ•ด 0.8์˜ ํ™•๋ฅ ๋กœ ์–‘์„ฑ ํด๋ž˜์Šค์— ์†ํ•œ๋‹ค๊ณ  ํ•ด์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํšจ์œจ์„ฑ: ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋‚ฎ๊ณ , ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํŠนํžˆ ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ๋•Œ ์œ ๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ๋‹จ์ 

  • ์„ ํ˜• ๊ฒฐ์ • ๊ฒฝ๊ณ„: ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ž…๋ ฅ ๋ณ€์ˆ˜์™€ ์ข…์† ๋ณ€์ˆ˜ ๊ฐ„์˜ ๊ด€๊ณ„๊ฐ€ ์„ ํ˜•์ด๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋น„์„ ํ˜• ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์„ฑ๋Šฅ์ด ์ €ํ•˜๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๊ฒฝ์šฐ, ๋‹คํ•ญ ํšŒ๊ท€๋‚˜ ์ปค๋„ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด์ง„ ๋ถ„๋ฅ˜ ์ œํ•œ: ๊ธฐ๋ณธ์ ์œผ๋กœ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์ด์ง„ ๋ถ„๋ฅ˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค ํšŒ๊ท€์™€ ๊ฐ™์€ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ํ™•์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์ค‘ ํด๋ž˜์Šค ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•ด ๋ณ„๋„์˜ ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜์—ฌ ๋‹ค์ค‘ ํด๋ž˜์Šค๋ฅผ ๋™์‹œ์— ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€์˜ ํ™œ์šฉ

๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋Š” ์˜๋ฃŒ ์ง„๋‹จ, ๋งˆ์ผ€ํŒ… ๋ถ„์„, ์‹ ์šฉ ์ ์ˆ˜ ์˜ˆ์ธก ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ด ๋ชจ๋ธ์€ ๊ฐ„๋‹จํ•˜๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ธ ์˜ˆ์ธก ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋งŽ์€ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž์™€ ๋ถ„์„๊ฐ€๋“ค์ด ์„ ํ˜ธํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.


๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ Example Code

๊ทธ๋Ÿฌ๋ฉด, ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ (Logistic Regression)๋Š” ์–ด๋–ป๊ฒŒ ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฑธ๊นŒ์š”? ํ•œ๋ฒˆ ์˜ˆ์‹œ ์ฝ”๋“œ๋ฅผ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# ์œ ๋ฐฉ์•” ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
cancer = load_breast_cancer()
X, y = cancer.data, cancer.target

# ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„ํ• 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# ๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ๋ชจ๋ธ ํ•™์Šต
log_reg = LogisticRegression(max_iter=10000)
log_reg.fit(X_train, y_train)

# ์˜ˆ์ธก ๋ฐ ํ‰๊ฐ€
y_pred = log_reg.predict(X_test)
print(classification_report(y_test, y_pred))
              precision    recall  f1-score   support

           0       0.98      0.95      0.96        43
           1       0.97      0.99      0.98        71

    accuracy                           0.97       114
   macro avg       0.97      0.97      0.97       114
weighted avg       0.97      0.97      0.97       114
# ํ˜ผ๋™ ํ–‰๋ ฌ ์‹œ๊ฐํ™”
ConfusionMatrixDisplay.from_estimator(log_reg, X_test, y_test)
plt.title("Logistic Regression Confusion Matrix")
plt.show()