A A
[ML] ์‹ ๊ฒฝ๋ง (Neural Network) - ๋‹ค์ธต ํผ์…‰ํŠธ๋ก 
์ด๋ฒˆ์—๋Š” ๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP)์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP)

๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP)์€ ๊ธฐ๋ณธ์ ์ธ ์ธ๊ณต ์‹ ๊ฒฝ๋ง์˜ ํ˜•ํƒœ ์ค‘ ํ•˜๋‚˜๋กœ, ํŠนํžˆ ๋ณต์žกํ•œ ๋น„์„ ํ˜• ๊ด€๊ณ„์™€ ํŒจํ„ด์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์œผ๋กœ ์ธํ•ด ๋ถ„๋ฅ˜ ๋ฐ ํšŒ๊ท€ ๋ฌธ์ œ์— ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

๊ธฐ๋ณธ ๊ตฌ์กฐ๋Š” ์ž…๋ ฅ์ธต, ํ•˜๋‚˜ ์ด์ƒ์˜ ์€๋‹‰์ธต, ๊ทธ๋ฆฌ๊ณ  ์ถœ๋ ฅ์ธต์œผ๋กœ ์ด๋ฃจ์–ด์ง„ FeedForward ์‹ ๊ฒฝ๋ง ์ด๋ฉฐ, ๊ฐ ์ธต์€ ์—ฌ๋Ÿฌ ๋‰ด๋Ÿฐ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

๋˜ํ•œ ๊ฐ ๋‰ด๋Ÿฐ์€ ์ด์ „์ธต์˜ ๋‰ด๋Ÿฐ์œผ๋กœ๋ถ€ํ„ฐ ์ž…๋ ฅ์„ ๋ฐ›์•„ ๊ฐ€์ค‘์น˜๋ฅผ ์ ์šฉํ•˜๊ณ , ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.


๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP)์˜ ๊ตฌ์กฐ

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๊ตฌ์กฐ๋Š” 3๊ฐ€์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

์ž…๋ ฅ์ธต(Input Layer)

  • ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅํ•˜๋Š” ์ตœ์ดˆ์˜ ์ธต.
  • ๊ฐ ์ž…๋ ฅ ๋‰ด๋Ÿฐ์€ ํ•˜๋‚˜์˜ ํŠน์„ฑ(feature)๋ฅผ ๋Œ€ํ‘œํ•ฉ๋‹ˆ๋‹ค.

์€๋‹‰์ธต(Hidden Layers)

  • ์ž…๋ ฅ์ธต๊ณผ ์ถœ๋ ฅ์ธต ์‚ฌ์ด์— ํ•˜๋‚˜ ์ด์ƒ ์กด์žฌ.
  • ๋ฐ์ดํ„ฐ์˜ ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ ๋‰ด๋Ÿฐ์€ ์ด์ „ ์ธต์˜ ์ถœ๋ ฅ์—์„œ ์ž…๋ ฅ์„ ๋ฐ›์•„ ๊ฐ€์ค‘์น˜์™€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

์ถœ๋ ฅ์ธต(Output Layer)

  • ์ตœ์ข…์ ์ธ ์˜ˆ์ธก ๊ฒฐ๊ณผ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ๋ถ„๋ฅ˜ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ, ๊ฐ ๋‰ด๋Ÿฐ์€ ํŠน์ • ํด๋ž˜์Šค๋ฅผ ๋Œ€ํ‘œํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํšŒ๊ท€ ๋ฌธ์ œ์˜ ๊ฒฝ์šฐ ํ•˜๋‚˜ ๋˜๋Š” ์—ฌ๋Ÿฌ ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP)์˜ ๊ธฐ๋ณธ ์›๋ฆฌ

์œ„์—์„œ๋Š” ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๊ตฌ์กฐ๊ฐ€ ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜๋Š”์ง€ ์•Œ์•˜์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์€ ์–ด๋– ํ•œ ์›๋ฆฌ๋กœ ์ž‘๋™ํ• ๊นŒ์š”?

 

Neuron (๋‰ด๋Ÿฐ)

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๊ธฐ๋ณธ ๋‹จ์œ„๋Š” ๋‰ด๋Ÿฐ์œผ๋กœ, ์ด๋Š” ์ธ๊ฐ„์˜ ์‹ ๊ฒฝ ์„ธํฌ๋ฅผ ๋ชจ๋ฐฉํ•œ ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.

๊ฐ ๋‰ด๋Ÿฐ์€ ์—ฌ๋Ÿฌ ์ž…๋ ฅ ๊ฐ’์„ ๋ฐ›์•„๋“ค์—ฌ ๊ฐ€์ค‘์น˜(weight)๋ฅผ ์ ์šฉํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์„ ํ˜• ๊ฒฐํ•ฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์„ ํ˜• ๊ฒฐํ•ฉ์˜ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค

 

  • ์—ฌ๊ธฐ์„œ wi๋Š” ๊ฐ€์ค‘์น˜, xi๋Š” ์ž…๋ ฅ ๊ฐ’, b๋Š” ๋ฐ”์ด์–ด์Šค(bias), ํŽธํ–ฅ ๊ฐ’์ž…๋‹ˆ๋‹ค.
  • ์ดํ›„, ์ด ์„ ํ˜• ๊ฒฐํ•ฉ์˜ ๊ฒฐ๊ณผ๊ฐ’ z๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ์ „๋‹ฌ๋˜์–ด ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ aa๋Š” ๋‹ค์Œ ์ธต์˜ ๋‰ด๋Ÿฐ์œผ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

Activation Function (ํ™œ์„ฑํ™” ํ•จ์ˆ˜)

ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋Š” ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์„ ๋น„์„ ํ˜•์ ์œผ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ๋ชจ๋ธ์ด ๋ณต์žกํ•œ ํŒจํ„ด์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.

  • ๋Œ€ํ‘œ์ ์ธ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜(sigmoid), ํ•˜์ดํผ๋ณผ๋ฆญ ํƒ„์  ํŠธ ํ•จ์ˆ˜(tanh), ๋ ๋ฃจ ํ•จ์ˆ˜(ReLU)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜: ์ถœ๋ ฅ ๊ฐ’์„ 0๊ณผ 1 ์‚ฌ์ด๋กœ ์ œํ•œํ•˜๋ฉฐ, ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ

์ˆœ๋ฐฉํ–ฅ ์ „๋‹ฌ(Foward Propagation)์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ ์ž…๋ ฅ์ธต์—์„œ ์‹œ์ž‘ํ•˜์—ฌ ์€๋‹‰์ธต์„ ๊ฑฐ์ณ ์ถœ๋ ฅ์ธต์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

  • ๊ฐ ์ธต์˜ ๋‰ด๋Ÿฐ์€ ์ด์ „ ์ธต์—์„œ ์ „๋‹ฌ๋œ ๊ฐ’์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ์ถœ๋ ฅ์„ ๋‹ค์Œ ์ธต์œผ๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๊ณผ์ •์—์„œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋ธ์˜ ๊ณ„์ธต์„ ๋”ฐ๋ผ ํ๋ฅด๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ ์˜ˆ์ธก ๊ฐ’์„ ์ถœ๋ ฅํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

 

์†์‹ค ํ•จ์ˆ˜

์†์‹ค ํ•จ์ˆ˜(Loss Function)๋Š” ๋ชจ๋ธ์˜ ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ธก์ •ํ•˜๋Š” ํ•จ์ˆ˜๋กœ, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

๋Œ€ํ‘œ์ ์ธ ์†์‹ค ํ•จ์ˆ˜๋กœ๋Š” ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ(MSE)์™€ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค(Cross-Entropy Loss)๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํ‰๊ท ์ œ๊ณฑ์˜ค์ฐจ(MSE): ํšŒ๊ท€ ๋ฌธ์ œ์—์„œ ์ž์ฃผ ์‚ฌ์šฉ๋˜๋ฉฐ, ์˜ˆ์ธก ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’์˜ ์ฐจ์ด๋ฅผ ์ œ๊ณฑํ•˜์—ฌ ํ‰๊ท ์„ ๊ตฌํ•œ ๊ฐ’์ž…๋‹ˆ๋‹ค.

  • ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค: ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋ฉฐ, ์˜ˆ์ธก ํ™•๋ฅ ๊ณผ ์‹ค์ œ ํด๋ž˜์Šค ๊ฐ„์˜ ๋ถˆ์ผ์น˜๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.

 

Backpropagation(์—ญ์ „ํŒŒ)

์—ญ์ „ํŒŒ(Backpropagation)๋Š” ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค๋ฅผ ์กฐ์ •ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.

  • ์ด ๊ณผ์ •์€ ๊ฐ ์ธต์˜ ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค์— ๋Œ€ํ•œ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ , ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์„ ์‚ฌ์šฉํ•ด ๊ฐ€์ค‘์น˜์™€ ๋ฐ”์ด์–ด์Šค๋ฅผ ์—…๋ฐ์ดํŠธํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ์žฅ์  & ๋‹จ์ 

 

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ์žฅ์ 

  • ๋น„์„ ํ˜•์„ฑ: ๋น„์„ ํ˜• ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ‘œํ˜„๋ ฅ: ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์€๋‹‰์ธต์„ ํ†ตํ•ด ๋†’์€ ํ‘œํ˜„๋ ฅ์„ ๊ฐ€์ง€๋ฉฐ, ๋‹ค์–‘ํ•œ ๋ฌธ์ œ ํ•ด๊ฒฐ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ์œ ์—ฐ์„ฑ: ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ์™€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—ฌ๋Ÿฌ ์œ ํ˜•์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๋‹จ์ 

  • ๊ณ„์‚ฐ ๋น„์šฉ: ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šต ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ , ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’์Šต๋‹ˆ๋‹ค.
  • ๊ณผ์ ํ•ฉ: ๋ชจ๋ธ์ด ์ง€๋‚˜์น˜๊ฒŒ ๋ณต์žกํ•ด์ง€๋ฉด ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์— ๊ณผ์ ํ•ฉ(overfitting)๋  ๊ฐ€๋Šฅ์„ฑ์ด ํฝ๋‹ˆ๋‹ค.
  • ํ•ด์„ ์–ด๋ ค์›€: ๋งŽ์€ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ์€๋‹‰์ธต์„ ๊ฐ€์ง€๋ฏ€๋กœ, ๋ชจ๋ธ์˜ ๋‚ด๋ถ€ ๋™์ž‘์„ ํ•ด์„ํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต์Šต๋‹ˆ๋‹ค.

๋‹ค์ธต ํผ์…‰ํŠธ๋ก (Multilayer Perceptron, MLP) Example Code

# ๋‹ค์ธต ํผ์…‰ํŠธ๋ก  (Multilayer Perceptron) ์˜ˆ์ œ

# ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import classification_report, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
# MNIST ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
mnist = fetch_openml('mnist_784')
X, y = mnist.data / 255., mnist.target
# ๋ฐ์ดํ„ฐ์…‹์„ ํ•™์Šต ์„ธํŠธ์™€ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋กœ ๋ถ„ํ• 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ๋ฐ์ดํ„ฐ ํ‘œ์ค€ํ™”
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# ๋‹ค์ธต ํผ์…‰ํŠธ๋ก  ๋ชจ๋ธ ํ•™์Šต
mlp = MLPClassifier(hidden_layer_sizes=(30,), max_iter=20, alpha=1e-4,
                    solver='sgd', verbose=10, random_state=42,
                    learning_rate_init=0.1)
mlp.fit(X_train, y_train)

# ์˜ˆ์ธก ๋ฐ ํ‰๊ฐ€
y_pred = mlp.predict(X_test)
print(classification_report(y_test, y_pred))
Iteration 1, loss = 0.32666430
Iteration 2, loss = 0.25799440
Iteration 3, loss = 0.20402879
Iteration 4, loss = 0.17531073
Iteration 5, loss = 0.14613715
Iteration 6, loss = 0.13942319
Iteration 7, loss = 0.13021557
Iteration 8, loss = 0.13022002
Iteration 9, loss = 0.12556882
Iteration 10, loss = 0.11247478
Iteration 11, loss = 0.10460484
Iteration 12, loss = 0.11144142
Iteration 13, loss = 0.11057812
Iteration 14, loss = 0.11260484
Iteration 15, loss = 0.11193568
Iteration 16, loss = 0.13083183
Iteration 17, loss = 0.13530305
Iteration 18, loss = 0.11458551
Iteration 19, loss = 0.12796077
Iteration 20, loss = 0.11219598
              precision    recall  f1-score   support

           0       0.97      0.97      0.97      1343
           1       0.97      0.98      0.97      1600
           2       0.95      0.94      0.94      1380
           3       0.95      0.93      0.94      1433
           4       0.96      0.95      0.95      1295
           5       0.93      0.93      0.93      1273
           6       0.96      0.97      0.96      1396
           7       0.96      0.96      0.96      1503
           8       0.92      0.93      0.93      1357
           9       0.93      0.95      0.94      1420

    accuracy                           0.95     14000
   macro avg       0.95      0.95      0.95     14000
weighted avg       0.95      0.95      0.95     14000
# ํ˜ผ๋™ ํ–‰๋ ฌ ์‹œ๊ฐํ™”
ConfusionMatrixDisplay.from_estimator(mlp, X_test, y_test)
plt.title("MLP Confusion Matrix")
plt.show()