A A
[NLP] ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• & Neural Network (์‹ ๊ฒฝ๋ง)
  • ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๊ณผ Neural Network(์‹ ๊ฒฝ๋ง)์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ์ 

๋‹จ์–ด๋ฅผ Vector๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ตœ๊ทผ์—๋Š” ํฌ๊ฒŒ ๋‘ ๋ถ€๋ฅ˜๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 'ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•'๊ณผ '์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•' ์ž…๋‹ˆ๋‹ค.
  • ๋‘ ๋ฐฉ๋ฒ•์ด ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ์–ป๋Š” ๋ฐฉ์‹์€ ์„œ๋กœ ๋‹ค๋ฅด์ง€๋งŒ, ๊ทธ ๋ฐฐ๊ฒฝ์—๋Š” ๋ชจ๋‘ ๋ถ„ํฌ ๊ฐ€์„ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์—์„œ๋Š” ์ฃผ๋ณ€ ๋ฐ˜์–ด์˜ ๋นˆ๋„๋ฅผ ๊ธฐ์ดˆ๋กœ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋‹จ์–ด์˜ Co-Occurance Matrix(๋™์‹œ ๋ฐœ์ƒ ํ–‰๋ ฌ)์„ ๋งŒ๋“ค๊ณ  ๊ทธ ํ–‰๋ ฌ์— ํŠน์ž‡๊ฐ’๋ถ„ํ•ด(Singular Value Decomposition, SVD)๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฐ€์ง‘๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜, ์ด ๋ฐฉ์‹์€ ๋Œ€๊ทœ๋ชจ Corpus(๋ง๋ญ‰์น˜)๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

 

  • ์ผ๋‹จ, ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์€ Corpus(๋ง๋ญ‰์น˜) ์ „์ฒด์˜ ํ†ต๊ณ„(Co-Occurance Matrix(๋™์‹œ ๋ฐœ์ƒ ํ–‰๋ ฌ), PPMI)๋ฅผ ์ด์šฉํ•ด์„œ ๋‹จ ํ•œ๋ฒˆ์˜ ์ฒ˜๋ฆฌ SVD ๋“ฑ)๋งŒ์— ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์„ ์–ป์Šต๋‹ˆ๋‹ค.
  • ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•, ์ฆ‰ Neural Network(์‹ ๊ฒฝ๋ง)์„ ์ด์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” Mini-Batch๋กœ ํ•™์Šต ํ•˜๋Š”๊ฒƒ์ด ์ผ๋ฐ˜์ ์ž…๋‹ˆ๋‹ค.
    • Mini-Batch ํ•™์Šต์—์„œ๋Š” Neural Network(์‹ ๊ฒฝ๋ง)์ด ํ•œ๋ฒˆ์˜ Mini-Batch(์†Œ๋Ÿ‰)์˜ ํ•™์Šต ์ƒ˜ํ”Œ์”ฉ ๋ฐ˜๋ณตํ•˜์—ฌ ํ•™์Šต ํ›„, Weight(๊ฐ€์ค‘์น˜)๋ฅผ ๊ฐฑ์‹ ํ•ด๊ฐ‘๋‹ˆ๋‹ค.

ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• & ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• ๋น„๊ต

  • ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์€ Corpus(๋ง๋ญ‰์น˜)์˜ ์–ดํœ˜๊ฐ€ ๋งŽ์•„ ํฐ ์ž‘์—…์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒฝ์šฐ์—๋„ Neural Network(์‹ ๊ฒฝ๋ง)์„ ํ•™์Šต ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ๋ฅผ ์ž‘๊ฒŒ ๋‚˜๋ˆ ์„œ ํ•™์Šตํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ณ‘๋ ฌ ๊ณ„์‚ฐ๋„ ๊ฐ€๋Šฅํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ์†๋„๋„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์˜ˆ์‹œ๋กœ ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์„ ์‚ฌ์šฉํ•œ ๊ฒƒ์€ Word2Vec ์ž…๋‹ˆ๋‹ค. ์•„๋ž˜์˜ ์„ค๋ช…๊ธ€์„ ๋‹ฌ์•„๋†“์•˜์œผ๋‹ˆ, ์ฐธ๊ณ ํ•˜์‹ค ๋ถ„๋“ค์€ ์ฐธ๊ณ ํ•˜์„ธ์š”!
 

[NLP] Word2Vec, CBOW, Skip-Gram - ๊ฐœ๋… & Model

1. What is Word2Vec? Word2Vec์€ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ธ๊ธฐ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹จ์–ด๋Š” ๋ณดํ†ต 'Token' ํ† ํฐ ์ž…๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹จ์–ด(Token)๋“ค ์‚ฌ์ด์˜ ์˜๋ฏธ์  ๊ด€๊ณ„๋ฅผ Vector ๊ณต๊ฐ„์—

daehyun-bigbread.tistory.com


์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• ๊ฐœ์š”

์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์—์„œ๋Š” '์ถ”๋ก '์ด ์ฃผ๋œ ์ž‘์—…์ž…๋‹ˆ๋‹ค. 
  • ์ถ”๋ก ์ด๋ž€? ์ฃผ๋ณ€ ๋‹จ์–ด(๋งฅ๋žต)์ด ์ฃผ์–ด์กŒ์„๋•Œ "?" ์— ๋ฌด์Šจ ๋‹จ์–ด๊ฐ€ ๋“ค์–ด๊ฐ€๋Š”์ง€๋ฅผ ์ถ”์ธกํ•˜๋Š” ์ž‘์—…์ž…๋‹ˆ๋‹ค.

์ฃผ๋ณ€ ๋‹จ์–ด๋“ค์„ ๋งฅ๋žต์œผ๋กœ ์‚ฌ์šฉํ•ด "?"์— ๋“ค์–ด๊ฐˆ ๋‹จ์–ด๋ฅผ ์ถ”์ธกํ•œ๋‹ค

  • ์œ„์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ํ’€๊ณ  ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ด '์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•'์ด ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ๋ฐ˜๋ณตํ•ด์„œ ํ’€๋ฉด์„œ ๋‹จ์–ด์˜ ์ถœํ˜„ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • ์—ฌ๊ธฐ์„œ ๋ชจ๋ธ์€ Neural Network(์‹ ๊ฒฝ๋ง)์„ ์‚ฌ์šฉํ•˜๊ณ , ๋งฅ๋žต ์ •๋ณด๋ฅผ ๋ฐ›์•„์„œ ๊ฐ ๋‹จ์–ด์˜ ์ถœํ˜„ ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ํ‹€ ์•ˆ์—์„œ Corpus๋ฅผ ์‚ฌ์šฉํ•ด ๋ชจ๋ธ์ด ์˜ฌ ๋ฐ”๋ฅธ ์ถ”์ธก์„ ๋‚ด๋†“๋„๋ก ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ๊ทธ ํ•™์Šต์˜ ๊ฒฐ๊ณผ๋กœ ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์„ ์–ป๋Š” ๊ฒƒ์ด ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์˜ ์ „์ฒด์ ์ธ ํ๋ฆ„์ž…๋‹ˆ๋‹ค.

Neural Network(์‹ ๊ฒฝ๋ง)์—์„œ์˜ ๋‹จ์–ด ์ฒ˜๋ฆฌ

์ง€๊ธˆ๋ถ€ํ„ฐ Neural Network(์‹ ๊ฒฝ๋ง)์„ ์ด์šฉํ•ด '๋‹จ์–ด'๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, ์‹ ๊ฒฝ๋ง์€ "you", "say"๊ฐ™์€ ๋‹จ์–ด๋ฅผ ๊ทธ๋Œ€๋กœ ์ฒ˜๋ฆฌ ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ ๋‹จ์–ด๋ฅผ "๊ณ ์ •๋œ ๊ธธ์ด์˜ Vector"๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ๋Œ€ํ‘œ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์ด One-Hot Vector๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
One-Hot Vector๋Š” Vector์˜ ์›์†Œ์ค‘ ํ•˜๋‚˜๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” 0์ธ ๋ฒกํ„ฐ๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
"You say goodbye and I say hello" - ์˜ˆ์‹œ ๋ฌธ์žฅ
  • ์˜ˆ์‹œ ๋ฌธ์žฅ์„ ๋ณด๋ฉด ์–ดํœ˜๊ฐ€ 7๊ฐœ๊ฐ€ ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค. "You", "say", "goodbye", "and", "I", "say", "hello"
  • ์ด์ค‘ ๋‘ ๋‹จ์–ด์˜ One-Hot ํ‘œํ˜„์„ ๊ทธ๋ฆผ์— ๋‚˜ํƒ€๋‚ด์–ด ๋ดค์Šต๋‹ˆ๋‹ค.

  • ์œ„์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๋‹จ์–ด๋Š” Text, ๋‹จ์–ด ID, ๊ทธ๋ฆฌ๊ณ  One-Hot ํ‘œํ˜„ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ด ์–ดํœ˜์ˆ˜๋งŒํผ ์›์†Œ๋ฅผ ๊ฐ€์ง€๋Š” Vector๋ฅผ ์ค€๋น„ํ›„, Index๊ฐ€ ๋‹จ์–ด ID์™€ ๊ฐ™์€ ์›์†Œ๋ฅผ 1, ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ 0์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด์ฒ˜๋Ÿผ ๋‹จ์–ด๋ฅผ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋ฉด Neural Network(์‹ ๊ฒฝ๋ง)์˜ Input Layer๋Š” ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ Neuron(๋‰ด๋Ÿฐ)์˜ ์ˆ˜๋ฅผ ๊ณ ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Input Layer์˜ Neuron: ๊ฐ Neuron์ด ๊ฐ ๋‹จ์–ด์— ๋Œ€์‘ (๊ฒ€์€์ƒ‰:1, ํฐ์ƒ‰:0)

  • ์œ„์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ Input Layer(์ž…๋ ฅ์ธต)์˜ ๋‰ด๋Ÿฐ์€ 7๊ฐœ ์ž…๋‹ˆ๋‹ค.
  • ์ด 7๊ฐœ์˜ ๋‰ด๋Ÿฐ์€ ์ฐจ๋ก€๋กœ 7๊ฐœ์˜ ๋‹จ์–ด๋“ค์— ๋Œ€์‘ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์‹ ๊ฒฝ๋ง์˜ Layer๋Š” Vector ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ, One-Hot ํ‘œํ˜„์œผ๋กœ ๋œ ๋‹จ์–ด๋ฅผ Fully-Connected Layer(์™„์ „-์—ฐ๊ฒฐ ๊ณ„์ธต)์„ ํ†ตํ•ด ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Fully-Connected Layer์— ์˜ํ•œ ๋ณ€ํ™˜: Input Layer์˜ ๊ฐ Neuron์€ 7๊ฐœ ๋‹จ์–ด ๊ฐ๊ฐ์— ๋Œ€์‘

  • ์—ฌ๊ธฐ์„œ Neural Network(์‹ ๊ฒฝ๋ง)์€ Fully-Connected Layer(์™„์ „-์—ฐ๊ฒฐ ๊ณ„์ธต)์ด๋ฏ€๋กœ ๊ฐ๊ฐ์˜ ๋…ธ๋“œ๊ฐ€ ์ด์›ƒ ์ธต์˜ ๋ชจ๋“  ๋…ธ๋“œ์™€ ํ™”์‚ดํ‘œ๋กœ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด ํ™”์‚ดํ‘œ์—” Weight(๊ฐ€์ค‘์น˜)๊ฐ€ ์กด์žฌํ•˜๋ฉฐ, Input Layer์˜ Neuron(๋‰ด๋Ÿฐ)๊ณผ Weight(๊ฐ€์ค‘์น˜)์˜ ํ•ฉ์ด Hidden Layer Neuron(์€๋‹‰์ธต ๋‰ด๋Ÿฐ)์ด ๋ฉ๋‹ˆ๋‹ค.

Fully-Connected Layer์— ์˜ํ•œ ๋ณ€ํ™˜์„ ๋‹จ์ˆœํ™” ํ•œ ๊ทธ๋ฆผ์ž…๋‹ˆ๋‹ค.


Fully-Connected Layer์— ์˜ํ•œ ๋ณ€ํ™˜ Code (by Python)

Fully-Connected Layer์— ์˜ํ•œ ๋ณ€ํ™˜๋œ ์ฝ”๋“œ๋Š” ์ด๋ ‡๊ฒŒ ์ ์„์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import numpy as np
c = np.array([[1, 0, 0, 0, 0, 0, 0]]) # Input(์ž…๋ ฅ)
W = np.random.randn(7, 3) # Weight(๊ฐ€์ค‘์น˜)
h = np.matmul(c, W) # ์ค‘๊ฐ„ ๋…ธ๋“œ

print(h)
# [[-0.70012195 0.25204755 -0.79774592]]
  • ์ด ์ฝ”๋“œ๋Š” ๋‹จ์–ด ID๊ฐ€ 0์ธ ๋‹จ์–ด๋ฅผ One-Hot ํ‘œํ˜„์œผ๋กœ ํ‘œํ˜„ํ•œ ๋‹ค์Œ, Fully-Connected Layer๋ฅผ ํ†ต๊ณผํ•˜์—ฌ ๋ณ€ํ™˜ํ•˜๋Š” ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • c๋Š” One-Hot ํ‘œํ˜„์ด๋ฉฐ, ๋‹จ์–ด ID์— ๋Œ€์‘ํ•˜๋Š” ์›์†Œ๋งŒ 1์ด๊ณ  ๋‚˜๋จธ์ง€๋Š” 0์ธ Vector์ž…๋‹ˆ๋‹ค.

๋งฅ๋žต C์™€ Weight(๊ฐ€์ค‘์น˜)W์˜ ๊ณฑ์œผ๋กœ ํ•ด๋‹น ์œ„์น˜์˜ ํ–‰ Vector๊ฐ€ ์ถ”์ถœ๋ฉ๋‹ˆ๋‹ค. ๊ฐ ์š”์†Œ์˜ Weight(๊ฐ€์ค‘์น˜) ํฌ๊ธฐ๋Š” ํ‘๋ฐฑ์˜ ์ง„ํ•˜๊ธฐ๋กœ ํ‘œํ˜„.

  • ๊ทธ๋ฆฌ๊ณ  ์•ž์—์„œ ๊ตฌํ˜„ํ•œ Fully-Connected Layer์— ์˜ํ•œ ๋ณ€ํ™˜๋œ ์ฝ”๋“œ๋„ MatMul ๊ณ„์ธต์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
import sys
sys.path.append('..')
import numpy as np
from common.layers import MatMul

c = np.array([[1, 0, 0, 0, 0, 0, 0]])
W = np.random.randn(7, 3)
layer = MatMul(W)
h = layer.forward(c)

print(h)
# [[-0.70012195 0.25204755 -0.79774592]]
  • ์ด ์ฝ”๋“œ๋Š” MatMul Layer์— Weight(๊ฐ€์ค‘์น˜) W๋ฅผ ์„ค์ •ํ•˜๊ณ  forward() Method๋ฅผ ํ˜ธ์ถœํ•ด Forward Propagation(์ˆœ์ „ํŒŒ)๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

MatMul Layer

 ๊ธฐ๋ณธ์ ์ธ ์‹ ๊ฒฝ๋ง Layer์—์„œ ํ–‰๋ ฌ ๊ณฑ์…ˆ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ , Backpropagation(์—ญ์ „ํŒŒ)๋ฅผ ํ†ตํ•ด Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
import numpy as np

class MatMul:
    def __init__(self, W):
        """
        ํด๋ž˜์Šค ์ดˆ๊ธฐํ™” ๋ฉ”์„œ๋“œ. ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W๋ฅผ ๋ฐ›์•„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
        
        Parameters:
        W (numpy.ndarray): ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ
        """
        self.params = [W]                   # ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W๋ฅผ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ์ €์žฅ
        self.grads = [np.zeros_like(W)]     # W์™€ ๊ฐ™์€ ํ˜•ํƒœ์˜ ์˜ํ–‰๋ ฌ๋กœ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ดˆ๊ธฐํ™”
        self.x = None                       # ์ž…๋ ฅ x๋ฅผ ์ €์žฅํ•  ๋ณ€์ˆ˜ ์ดˆ๊ธฐํ™”

    def forward(self, x):
        """
        ์ˆœ๋ฐฉํ–ฅ ์ „ํŒŒ ๋ฉ”์„œ๋“œ. ์ž…๋ ฅ x์™€ ๊ฐ€์ค‘์น˜ W์˜ ํ–‰๋ ฌ ๊ณฑ์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
        
        Parameters:
        x (numpy.ndarray): ์ž…๋ ฅ ๋ฐ์ดํ„ฐ
        
        Returns:
        out (numpy.ndarray): ์ž…๋ ฅ๊ณผ ๊ฐ€์ค‘์น˜์˜ ํ–‰๋ ฌ ๊ณฑ ๊ฒฐ๊ณผ
        """
        W, = self.params                    # ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ๊ฐ€์ค‘์น˜ W๋ฅผ ๊บผ๋ƒ„
        out = np.dot(x, W)                  # ์ž…๋ ฅ x์™€ ๊ฐ€์ค‘์น˜ W์˜ ํ–‰๋ ฌ ๊ณฑ ์ˆ˜ํ–‰
        self.x = x                          # ์ž…๋ ฅ x๋ฅผ ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜์— ์ €์žฅ
        return out                          # ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜

    def backward(self, dout):
        """
        ์—ญ๋ฐฉํ–ฅ ์ „ํŒŒ ๋ฉ”์„œ๋“œ. ์ถœ๋ ฅ์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ dout์„ ๋ฐ›์•„ ์ž…๋ ฅ๊ณผ ๊ฐ€์ค‘์น˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
        
        Parameters:
        dout (numpy.ndarray): ์ถœ๋ ฅ ๊ธฐ์šธ๊ธฐ
        
        Returns:
        dx (numpy.ndarray): ์ž…๋ ฅ์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ
        """
        W, = self.params                    # ํŒŒ๋ผ๋ฏธํ„ฐ์—์„œ ๊ฐ€์ค‘์น˜ W๋ฅผ ๊บผ๋ƒ„
        dx = np.dot(dout, W.T)              # ์ถœ๋ ฅ ๊ธฐ์šธ๊ธฐ์™€ ๊ฐ€์ค‘์น˜์˜ ์ „์น˜ ํ–‰๋ ฌ์„ ๊ณฑํ•˜์—ฌ ์ž…๋ ฅ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ
        dW = np.dot(self.x.T, dout)         # ์ž…๋ ฅ์˜ ์ „์น˜ ํ–‰๋ ฌ๊ณผ ์ถœ๋ ฅ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ณฑํ•˜์—ฌ ๊ฐ€์ค‘์น˜ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ
        self.grads[0][...] = dW             # ๊ณ„์‚ฐ๋œ ๊ฐ€์ค‘์น˜ ๊ธฐ์šธ๊ธฐ๋ฅผ grads ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ
        return dx                           # ์ž…๋ ฅ ๊ธฐ์šธ๊ธฐ ๋ฐ˜ํ™˜
  • __init__ : ํด๋ž˜์Šค์˜ ์ดˆ๊ธฐํ™” Method๋กœ, ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W๋ฅผ ๋ฐ›์•„์„œ ์ด๋ฅผ params ๋ฆฌ์ŠคํŠธ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
    • ๋˜ํ•œ, W์™€ ๊ฐ™์€ ํ˜•ํƒœ์˜ ์˜ํ–‰๋ ฌ์„ grads ๋ฆฌ์ŠคํŠธ์— ์ €์žฅํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • x๋Š” ๋‚˜์ค‘์— ์ˆœ๋ฐฉํ–ฅ ์ „ํŒŒ ์‹œ ์ž…๋ ฅ ๊ฐ’์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด ์ดˆ๊ธฐํ™”๋ฉ๋‹ˆ๋‹ค.
  • forward : Forward Propagation(์ˆœ์ „ํŒŒ)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” Method ์ž…๋‹ˆ๋‹ค.
    • ์ž…๋ ฅ x์™€ ๊ฐ€์ค‘์น˜ W์˜ ํ–‰๋ ฌ ๊ณฑ์„ ๊ณ„์‚ฐํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ๊ณผ์ •์—์„œ ์ž…๋ ฅ x๋ฅผ ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜์— ์ €์žฅํ•˜์—ฌ ๋‚˜์ค‘์— ์—ญ๋ฐฉํ–ฅ ์ „ํŒŒ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • backward: Back Propagation(์—ญ์ „ํŒŒ)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” Method ์ž…๋‹ˆ๋‹ค.
    • ์ถœ๋ ฅ์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ dout์„ ์ž…๋ ฅ๋ฐ›์•„, ์ž…๋ ฅ์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ dx์™€ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๊ธฐ์šธ๊ธฐ dW๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ๊ณ„์‚ฐ๋œ dW๋Š” grads ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ๋˜๊ณ , dx๋Š” ๋ฐ˜ํ™˜๋ฉ๋‹ˆ๋‹ค.