A A
[NLP] Vanilla RNN Model, Long-Term Dependency - ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ

1. ๊ธฐ๋ณธ RNN ๋ชจ๋ธ (Vanilla RNN Model)์˜ ํ•œ๊ณ„

RNN๋ถ€๋ถ„์„ ์„ค๋ช…ํ•œ ๊ธ€์—์„œ ๊ธฐ๋ณธ RNN Model์„ ์•Œ์•„๋ณด๊ณ  ๊ตฌํ˜„ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค.
  • ๋ณดํ†ต RNN Model์„ ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ํ˜•ํƒœ์˜ RNN ์ด๋ผ๊ณ  ํ•˜๋ฉฐ ๋ฐ”๋‹๋ผ RNN (Vanilla RNN)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, Vanilla RNN ๋ชจ๋ธ์— ๋‹จ์ ์œผ๋กœ ์ธํ•˜์—ฌ, ๊ทธ ๋‹จ์ ๋“ค์„ ๊ทน๋ณตํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ RNN ๋ณ€ํ˜• Model์ด ๋‚˜์™”์Šต๋‹ˆ๋‹ค.
  • ๋Œ€ํ‘œ์ ์œผ๋กœ LSTM, GRU ๋ชจ๋ธ์ด ์žˆ๋Š”๋ฐ, ์ผ๋‹จ ์ด๋ฒˆ๊ธ€์—์„œ๋Š” LSTM Model์— ๋Œ€ํ•œ ์„ค๋ช…์„ ํ•˜๊ณ , ๋‹ค์Œ ๊ธ€์—์„œ๋Š” GRU Model์— ๋Œ€ํ•˜์—ฌ ์„ค๋ช…์„ ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
Vanilla RNN์€ ์ด์ „์˜ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ์— ์˜์กดํ•˜์—ฌ ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋ƒ…๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์€ Vanilla RNN์€ ์งง์€ Sequence์—๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์ง€๋งŒ, ๊ธด Sequnce๋Š” ์ •๋ณด๊ฐ€ ์ž˜ ์ „๋‹ฌ์ด ์•ˆ๋˜๋Š” ํ˜„์ƒ์ด ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
  • ์ด์œ ๋Š” RNN์€ ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ์ •๋ณด ์••์ถ•์— ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  • RNN์„ ์„ค๋ช…ํ•œ ๊ธ€์—์„œ๋„ ๋‚˜์™€์žˆ๋“ฏ์ด, RNN์€ ์ž…๋ ฅ ์ •๋ณด๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ์˜ค๋ž˜ ์ „์— ์ฝ์—ˆ๋˜ ๋‹จ์–ด๋Š” ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

Long-Term Dependency ํ˜„์ƒ: ์ž…๋ ฅ ์ •๋ณด๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ์˜ค๋ž˜ ์ „์— ์ฝ์—ˆ๋˜ ๋‹จ์–ด๋Š” ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” ๊ฒฝํ–ฅ

์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด, Vanilla RNN Model์ด ๋Œ์•„๊ฐˆ๋•Œ ์ฒซ๋ฒˆ์งธ๋กœ ๋“ค์–ด๊ฐ€๋Š” input๊ฐ’ ๊ฒ€์€์ƒ‰์ด ์ง„ํ•˜๊ฒŒ ๋ณด์ด๋Š”๊ฑด, ๋“ค์–ด๊ฐ€๋Š” ๊ฐ’์˜ ๊ธฐ์–ต๋ ฅ์„ ๋‚˜ํƒ€๋‚ธ๊ฒƒ์ž…๋‹ˆ๋‹ค. 
  • ๊ทผ๋ฐ, ์ ์ฐจ ๋‹ค์Œ ์‹œ์ ์œผ๋กœ ๋„˜์–ด๊ฐˆ์ˆ˜๋ก ์ƒ‰๊ฐˆ์ด ํ๋ ค์ง€๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๊ฒƒ์€ ๋‹ค์Œ ์‹œ์ ์œผ๋กœ ๋„˜์–ด๊ฐ€๋ฉด ๋„˜์–ด๊ฐˆ์ˆ˜๋ก Vanilla RNN Model์˜ ์ฒซ๋ฒˆ์งธ input์ด ๊ฐ’์ด ์†์‹ค๋˜์–ด์ง€๋Š” ๊ณผ์ •์„ ๋‚˜ํƒ€๋‚ด์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค๋ฅธ๋ง๋กœ๋Š” ์ ์  ๊นŒ๋จน๋Š”๋‹ค ๋ผ๊ณ  ๋ณผ์ˆ˜๋„ ์žˆ๊ฒ ๋„ค์š”.
  • ๊ฒฐ๋ก ์€ ์ฒซ๋ฒˆ์งธ๋กœ ๋“ค์–ด๊ฐ„ input์˜ ๊ธฐ์–ต๋ ฅ์€ ์ ์  ์†์‹ค๋˜๊ณ , ๋งŒ์•ฝ ์‹œ์ ์ด ๊ธธ๋‹ค๊ณ  ํ•˜๋ฉด ์ด RNN ๋ชจ๋ธ์— ๋Œ€ํ•œ ์˜ํ–ฅ๋ ฅ์€ ์ ์  ์‚ฌ๋ผ์ง„๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 2. Long-Term Dependency ํ˜„์ƒ

์ด ํ˜„์ƒ์„ Long-Term Dependency (์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ) ํ˜„์ƒ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • Long-Term Dependency ํ˜„์ƒ์€ Time step์ด ์ง€๋‚˜๋ฉด์„œ ์ž…๋ ฅ๊ฐ’์˜ ์˜ํ–ฅ๋ ฅ์ด ์ ์  ๊ฐ์†Œํ•œ๋‹ค.
  • Sequence์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก, ์˜ค๋ž˜ ์ „ ์ž…๋ ฅ ๊ฐ’์˜ ์ •๋ณด๋ฅผ ์ œ๋Œ€๋กœ ๋ฐ˜์˜ํ•˜์ง€ ๋ชปํ•œ๋‹ค..
๐Ÿ’ก Long-Term Dependency ํ˜„์ƒ
ํ•œ๋ฒˆ ์˜ˆ์‹œ๋กœ ๋“ค์–ด์„œ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • Long-Term Dependency ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚˜๋ฉด์„œ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ๋Š” ์ฒซ๋ฒˆ์งธ๋‚˜ ์•ž์ชฝ์— ๋“ค์–ด๊ฐ„ input์ด ์ค‘์š”ํ•œ ์ •๋ณด ์ผ๋•Œ ์ž…๋‹ˆ๋‹ค.
๐Ÿ’ก example of Long-Term Dependency ํ˜„์ƒ
"์š”์ฆ˜ ์ผ๋ณธ๊ฐ€๋Š” ๋น„ํ–‰๊ธฐ๊ฐ€ ์ธ์ฒœ๊ณตํ•ญ์—์„œ ๋‚˜๋ฆฌํƒ€ ๊ณตํ•ญ ๊ฐ€๋Š” ๋น„ํ–‰๊ธฐ๊ฐ’์ด ๊น€ํฌ๊ณตํ•ญ์—์„œ ํ•˜๋„ค๋‹ค ๊ณตํ•ญ ๊ฐ€๋Š” ๋น„ํ–‰๊ธฐ ๊ฐ’๋ณด๋‹ค ๋” ๋น„์‹ธ๋”๋ผ. ๊ทธ๋Ÿฌ๋ฉด ์‹ผ ๊ณณ์œผ๋กœ ๊ฐ€์•ผ๊ฒ ๋Š”๋ฐ. ๊ทธ๋ž˜์„œ ๋‚˜๋Š”         "
  • ๊ธ€์„ ๋ณด๋ฉด ๋นˆ์นธ์— ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์žฅ์†Œ์— ๋Œ€ํ•œ ๊ธฐ์–ต์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, ์žฅ์†Œ ์ •๋ณด์— ๋Œ€ํ•œ "์ธ์ฒœ๊ณตํ•ญ"์€ ์•ž์ชฝ์— ์œ„์น˜ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • RNN Model์ด ์ถฉ๋ถ„ํ•œ ๊ธฐ์–ต๋ ฅ์ด ์—†๋‹ค๋ฉด? ์•„๋งˆ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์—‰๋šฑํ•˜๊ฒŒ ์˜ˆ์ธก ํ•  ๊ฒƒ ์ž…๋‹ˆ๋‹ค.
  • ์ด ํ˜„์ƒ์„ Long-Term Dependency ํ˜„์ƒ. ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ ํ˜„์ƒ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  RNN Model์ด ๋™์ž‘ํ•˜๋Š” ๊ณผ์ •์„ ํ•œ๋ฒˆ ๋ณด์—ฌ์ฃผ๋ฉด ์ดํ•ด๋ฅผ ํ•˜๋Š”๋ฐ ๋„์›€์ด ๋ ๊ฒธ, ํ•œ๋ฒˆ Train & Test ํ•˜๋Š” ๊ณผ์ •์„ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

3. RNN Training & Test Example

RNN Training Example

๋ณดํ†ต RNN Model์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ฐ™์€ ์Šคํƒ€์ผ์˜ ๋ฌธ์ž์—ด์„ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ๋กœ ๊ธ€์ž ๋‹จ์œ„๋กœ ํ›ˆ๋ จ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค๋ฉด, "hello" ๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๋งŒ๋“ค๊ณ  ์‹ถ์–ด์„œ "h" ๋ฅผ RNN Model์— ์ฒซ๋ฒˆ์งธ input์„ ์ฃผ๋ฉด "ello"๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์€ ๊ธ€์ž or ๋‹จ์–ด์‚ฌ์ „ (Vocab)์œผ๋กœ ์ค๋‹ˆ๋‹ค.
- Example training Sequence: "hello"
- Vocabulary: [h, e, l, o]

RNN Example
RNN Training ๊ณผ์ • (์ด ๊ทธ๋ฆผ์€ ์ž˜๋ชป ์˜ˆ์ธกํ•œ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.. ๊ทธ๋ƒฅ ๋Œ€๋žต์ ์ธ ์ˆœ์„œ๋„๋งŒ ๋ณด์—ฌ์ฃผ๋ ค๊ณ  ํ•œ๊ฑฐ๋‹ˆ๊นŒ ๊ทธ๋ ‡๊ฒŒ๋งŒ ์ฐธ๊ณ ํ•ด์ฃผ์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

  • ์ด์ œ input์œผ๋กœ ์ค€ ๊ฐ’ "h"์„ ์ด์šฉํ•ด์„œ 4๊ฐœ์˜ Vocabulary: [h, e, l, o]๋ฅผ One-hot-encoding ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • One-hot-encoding ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•œ ๊ฐ’๊ณผ ์ด์ „ ์‹œ์ ์˜ hidden state๋ฅผ ์ด์šฉํ•˜์—ฌ hidden layer ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ์„ Whh๋ผ๊ณ  ํ•˜๊ณ , input๊ฐ’ (์—ฌ๊ธฐ์„œ๋Š” One-hot-encoding ๊ฐ’)๊ณผ hidden state๋ฅผ ๊ณฑํ•ฉ๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆผ์—์„œ ๋‚˜์™€ ์žˆ์ง€๋Š” ์•Š์ง€๋งŒ ํŽธํ–ฅ(bias) ๊ฐ’๋„ ๋”ํ•ด์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผ์‹œ์ผœ ์ƒˆ๋กœ์šด hidden state๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.
hidden layer ์—ฐ์‚ฐ: htโ€‹= σ(Whh⋅ht−1โ€‹ + Why⋅xtโ€‹ + bh),  *[σ๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. - ์—ฌ๊ธฐ์„œ๋Š” Softmax ํ•จ์ˆ˜, bh๋Š” ํŽธํ–ฅ(bias)]
  • ๊ทธ๋ฆฌ๊ณ , hidden layer (์€๋‹‰์ธต) ์—ฐ์‚ฐ ์ˆ˜์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ’์„ ๋„์ถœํ•œํ›„ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ Why (input ๊ฐ’ * hidden state)๋Š” output layer (์ถœ๋ ฅ์ธต)์— ์ „๋‹ฌ์„ ํ•ด์ฃผ๊ณ , Whh(input ๊ฐ’ * hidden state)๋Š” ๋‹ค์Œ ์‹œ์ (time step)์œผ๋กœ ์ „๋‹ฌ์„ ํ•ด์ค๋‹ˆ๋‹ค.
๋ณด์ถฉ์„ค๋ช…: Whh๋Š” ์€๋‹‰์ธต (hidden layer)๊ฐ„์˜ ๊ฐ€์ค‘์น˜, Why๋Š” ์€๋‹‰์ธต (hidden layer) ์—์„œ ์ถœ๋ ฅ์ธต (output layer)์œผ๋กœ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

์ถœ๋ ฅ์ธต์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ Why ์ˆ˜์‹: xtโ€‹ = Why⋅htโ€‹ [yt: ํ˜„์žฌ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’, ht๋Š” ํ˜„์žฌ ์‹œ์ ์˜ hidden state]
๋‹ค์Œ ์‹œ์ ์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ Whh ์ˆ˜์‹: ht+1 = Whh⋅ht [ht+1์€ ๋‹ค์Œ ์‹œ์ ์˜ hidden state]
  • ๊ทธ ๋‹ค์Œ์—, softmax ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•˜์—ฌ ํ™•๋ฅ ๊ฐ’์„ ๊ณ„์‚ฐํ•œํ›„, 4๊ฐœ์˜ vocab [h, e, l ,o] ์ค‘์— ๊ฐ€์žฅ ํ™•๋ฅ ๊ฐ’์ด ๋†’์€๊ฑธ Target์œผ๋กœ ์žก์•„์ฃผ๊ณ  ๋‹ค์Œ input์œผ๋กœ ๋„ฃ์–ด์ค๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ๋‹ค์Œ input์„ ๊ฐ€์ง€๊ณ  ๋‹ค์‹œ One-hot-encoding, ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ ๊ณ„์‚ฐํ•ด์„œ output layer, ๋‹ค์Œ ์‹œ์ ์œผ๋กœ ์ „๋‹ฌ, softmax ํ•จ์ˆ˜์— ๋„ฃ์–ด์ค˜์„œ Target๊ฐ’ ์‚ฐ์ถœ.. ๊ณ„์† ์ด ๊ณผ์ •์ด input Sequence์˜ ๊ธธ์ด ๋งŒํผ ๋ฐ˜๋ณต๋œ๋‹ค๊ณ  ๋ณด์‹œ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

RNN Test Example

RNN Test ๊ณผ์ • ์˜ˆ์‹œ๋ฅผ ํ•œ๋ฒˆ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

RNN Test ๊ณผ์ •

  • RNN Test ๊ณผ์ •๋„ RNN Training ๊ณผ์ •์ด๋ž‘ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ Training ๊ณผ์ •์—์„œ ๋นผ๋จน์€ ์„ค๋ช…์„ ๋ณด์ถฉ ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด ํ•œ๋ฒˆ ๊ณผ์ •์„ ๋Œ๊ณ  Sample๊ฐ’ "e"๊ฐ€ ๋‚˜์˜จ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฑธ 2๋ฒˆ์งธ input์œผ๋กœ ๋„ฃ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  RNN์˜ ํ•™์Šต๊ณผ์ •์„ ๋ณด๋ฉด input์œผ๋กœ "hell"์„ ๋„ฃ์œผ๋ฉด ์ถœ๋ ฅ๊ฐ’์œผ๋กœ "ello"๊ฐ€ ๋‚˜์˜จ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ณด์‹œ๋ฉด ๋ ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค.

4. Vanilla RNN ๋‚ด๋ถ€

Vanilla RNN ๋‚ด๋ถ€๋ฅผ ์—ด์–ด์„œ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Vanilla RNN ๋‚ด๋ถ€ (์ด ๊ทธ๋ฆผ์—์„œ๋Š” ํŽธํ–ฅ(bias) ์ž…๋ ฅ์„ ์€ ์—†์Šต๋‹ˆ๋‹ค.)

  • Vanilla RNN์˜ ๋‚ด๋ถ€๋Š” ์ด ๊ธ€ ์•ž์—์„œ ์„ค๋ช…ํ–ˆ๋˜ RNN Training & Test Example์„ ์ดํ•ดํ•˜๊ณ  ๋ณด์‹œ๋ฉด ์ดํ•ด๊ฐ€ ๋ ๊ฒ๋‹ˆ๋‹ค.
htโ€‹= tanh(Whh⋅ht−1โ€‹ + Why⋅xtโ€‹ + bh) *[tanh๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. - ์—ฌ๊ธฐ์„œ๋Š” Softmax ํ•จ์ˆ˜, bh๋Š” ํŽธํ–ฅ(bias)]
  • ์„ค๋ช…์„ ํ•ด๋ณด๋ฉด Whh๋Š” ์€๋‹‰์ธต (hidden layer)๊ฐ„์˜ ๊ฐ€์ค‘์น˜
  • Why๋Š” ์€๋‹‰์ธต (hidden layer) ์—์„œ ์ถœ๋ ฅ์ธต (output layer)์œผ๋กœ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • ์ถœ๋ ฅ์ธต์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ Why ์ˆ˜์‹: xtโ€‹ = Why⋅htโ€‹ [xt: ํ˜„์žฌ ์‹œ์ ์˜ ์ถœ๋ ฅ๊ฐ’, ht๋Š” ํ˜„์žฌ ์‹œ์ ์˜ hidden state]
  • ๋‹ค์Œ ์‹œ์ ์œผ๋กœ ์ „๋‹ฌ๋˜๋Š” ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ Whh ์ˆ˜์‹: ht+1 = Whh⋅ht ์ž…๋‹ˆ๋‹ค.
    • [ht+1์€ ๋‹ค์Œ ์‹œ์ ์˜ hidden state, ์ˆ˜์‹์—์„œ ๋‚˜์˜จ ht-1์€ ์ด์ „ ์‹œ์ ์˜ hidden state]
๋‹ค์‹œ ๊ทธ๋ฆผ์„ ๋ณด๊ณ  ์„ค๋ช… ํ•œ๋‹ค๋ฉด
  1. ํ˜„์žฌ ์‹œ์  xt์—์„œ์˜ input ๊ฐ’์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.
  2. ht-1 (์ด์ „ ์‹œ์ ์—์„œ hidden state) ๊ฐ’์„ ์˜ต๋‹ˆ๋‹ค.
  3. ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W(Whh, Why)๋ฅผ ๊ณฑํ•˜์—ฌ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  4. ๊ทธ ๋‹ค์Œ์— ํ™œ์„ฑํ™” ํ•จ์ˆ˜ tanh๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  5. ์ตœ์ข…์ ์œผ๋กœ ๊ณ„์‚ฐ๋œ ๊ฐ’์ด ํ˜„์žฌ ์‹œ์  t์˜ hidden state ht๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • Vanilla RNN์€ xt์™€ ht-1์ด๋ผ๋Š” 2๊ฐœ์˜ input์ด ๊ฐ๊ฐ ๊ฐ€์ค‘์น˜ ํ–‰๋ ฌ W(Whh, Why)์™€ ๊ณฑํ•ด์„œ Memory Cell์˜ ์ž…๋ ฅ์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ด๋ฅผ tanh (ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ) ํ•จ์ˆ˜์˜ input์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ , ์ด ๊ฐ’๋“ค์€ hidden layer(์€๋‹‰์ธต)์˜ ์ถœ๋ ฅ์ธ hidden state(์€๋‹‰ ์ƒํƒœ)๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

Vanilla RNN ๋‚ด๋ถ€ (RNN Train, Test Model์„ input Sequence ๊ธธ์ด๋งŒํผ ๊ตฌํ˜„ํ•˜๋ฉด ์ด๋Ÿฌํ•œ ๊ตฌ์กฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.)