A A
[NLP] LSTM - Long Short Term Memory Model

1. LSTM Model์€ ๋ฌด์—‡์ผ๊นŒ?

LSTM์€ Long Short-Term Memory์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค. RNN - Recurrent Neural Network (์ˆœํ™˜ ์‹ ๊ฒฝ๋ง)์˜ ๋ฌธ์ œ์ธ Long-Term Dependency (์žฅ๊ธฐ ์˜์กด์„ฑ) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ๊ธฐ์กด์˜ RNN(์ˆœํ™˜ ์‹ ๊ฒฝ๋ง)๋ชจ๋ธ์€ ์‹œ๊ฐ„ & ๊ณต๊ฐ„์  ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ณ  ์˜ˆ์ธกํ•˜๋Š”๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ˆœ์ฐจ์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ์—๋Š” ๊ฐ•์ ์ด ์žˆ๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ๋‹ค๋งŒ Long-Term Dependency(์žฅ๊ธฐ ์˜์กด์„ฑ) ๋ฌธ์ œ๊ฐ€ ์žˆ์–ด์„œ ๊ธด Sequence์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š”๋ฐ ์–ด๋ ค์›€์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • Long-Term Dependency(์žฅ๊ธฐ ์˜์กด์„ฑ)์— ๋Œ€ํ•œ ์„ค๋ช…์€ ์•„๋ž˜์˜ ๊ธ€์— ์ ํ˜€์žˆ์œผ๋‹ˆ๊นŒ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.
 

[NLP] Vanilla RNN Model, Long-Term Dependency - ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ

1. ๊ธฐ๋ณธ RNN ๋ชจ๋ธ (Vanilla RNN Model)์˜ ํ•œ๊ณ„ RNN๋ถ€๋ถ„์„ ์„ค๋ช…ํ•œ ๊ธ€์—์„œ ๊ธฐ๋ณธ RNN Model์„ ์•Œ์•„๋ณด๊ณ  ๊ตฌํ˜„ํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋ณดํ†ต RNN Model์„ ๊ฐ€์žฅ ๋‹จ์ˆœํ•œ ํ˜•ํƒœ์˜ RNN ์ด๋ผ๊ณ  ํ•˜๋ฉฐ ๋ฐ”๋‹๋ผ RNN (Vanilla RNN)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ

daehyun-bigbread.tistory.com

  • ๊ทธ๋ฆฌ๊ณ  LSTM ๋ชจ๋ธ์€ ๊ธฐ์กด์— RNN ๋ชจ๋ธ๊ณผ๋Š” ๋‹ค๋ฅธ์ ์€ Gradient Flow์— Weight(๊ฐ€์ค‘์น˜)๊ฐ€ ๊ณฑํ•ด์ง€์ง€ ์•Š๋„๋ก ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค.

  • ์ด์œ ๋Š” ๊ธฐ๋ณธ์ ์ธ RNN์˜ ๋ชจ๋ธ์˜ ํ•™์Šต ๋ฌธ์ œ์ธ Gradient Vanishing(๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค)๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.
  • ๊ธฐ๋ณธ RNN Model์€ Backpropagation(์—ญ์ „ํŒŒ) ๊ณผ์ •์—์„œ Gradient(๊ธฐ์šธ๊ธฐ)๊ฐ€ ์‹œ๊ฐ„์„ ๊ฑฐ์Šฌ๋Ÿฌ ์˜ฌ๋ผ๊ฐ€๋ฉด์„œ w๊ฐ€ ๊ณฑํ•ด์ง‘๋‹ˆ๋‹ค.
  • ์ด ๋ฌธ์ œ๋กœ ์ธํ•ด์„œ ๋งŒ์•ฝ Time-step์ด ๊ธธ์–ด์ง€๋ฉด,  Gradient(๊ธฐ์šธ๊ธฐ)๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ง‘๋‹ˆ๋‹ค.
    • ๊ฒฐ๊ตญ, Network๊ฐ€ ์ด์ „ Time-step์˜ ์ •๋ณด๋ฅผ ์ž˜ ๊ธฐ์–ต ๋ชปํ•œ๋‹ค๋Š” Gradient loss(๊ธฐ์šธ๊ธฐ ์†์‹ค) ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ Gradient loss(๊ธฐ์šธ๊ธฐ ์†์‹ค) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐ ํ•˜๊ธฐ ์œ„ํ•ด์„œ "Uninterrupted Gradient flow" ์ฆ‰, Gradient(๊ธฐ์šธ๊ธฐ)๊ฐ€ ์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ์›ํ™œํ•˜๊ฒŒ ํ๋ฅผ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.
 Tip. Gradient๋ฅผ ์›ํ™œํžˆ ํ๋ฅด๊ฒŒ ํ•˜๋Š” ๋ฐฉ์‹์ด ResNet์˜ Residual Connection๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
* Residual Connection:  ResNet์˜ ํ•ต์‹ฌ ๊ฐœ๋…์ค‘ ํ•˜๋‚˜์ด์ž ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต์„ ๋•๋Š” ์—ฐ๊ฒฐ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

ResNet์˜ Residual Connection

  • ๊ทธ๋ž˜์•ผ Gradient loss(๊ธฐ์šธ๊ธฐ ์†์‹ค) ๋ฌธ์ œ ์—†์ด Backpropagation(์—ญ์ „ํŒŒ)๊ณผ์ •์ด ์ž˜ ์ด๋ฃจ์–ด ์งˆ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

2. LSTM ๋ชจ๋ธ์˜ ๊ตฌ์กฐ

LSTM ๋ชจ๋ธ์€ ์–ด๋– ํ•œ ๊ตฌ์กฐ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๊ณ , ์–ด๋– ํ•œ ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•˜๋Š”์ง€ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • LSTM ๋ชจ๋ธ์€ ์žฅ๊ธฐ ๊ธฐ์–ต๊ณผ ๋‹จ๊ธฐ ๊ธฐ์–ต์ด ์ƒˆ๋กœ์šด Event์™€ ํ•ฉ์ณ์ €์…” ๊ฐฑ์‹ ๋˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • ์žฅ๊ธฐ ๊ธฐ์–ต์€ ์˜ค๋ž˜ ์ง€์†๋˜๋„๋ก, ๋‹จ๊ธฐ ๊ธฐ์–ต์€ ์ตœ๊ทผ ์‚ฌ๊ฑด์„ ์ค‘์‹ฌ์œผ๋กœ ๊ธฐ์–ตํ•˜๋„๋ก ๊ธฐ์–ต์ด ํ˜•์„ฑ๋˜๋Š” ๊ณผ์ •์ด ๋ถ„๋ฆฌ๋œ ๊ตฌ์กฐ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆฌ๊ณ  ์žฅ๊ธฐ ๊ธฐ์–ต์€ "Cell State", ๋‹จ๊ธฐ ๊ธฐ์–ต์€ "Hidden state" ์ƒํƒœ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค. Cell State & Hidden state์˜ ๋Œ€ํ•œ ์„ค๋ช…์€ ์•„๋ž˜์—์„œ ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

LSTM ๋ชจ๋ธ์˜ ๋Œ€๋žต์ ์ธ ๊ตฌ์กฐ.

์ด๋ฒˆ์—๋Š” ํ•œ๋ฒˆ LSTM ๋ชจ๋ธ์˜ ์ž์„ธํ•œ ๊ตฌ์กฐ์— ๋Œ€ํ•˜์—ฌ ํ•œ๋ฒˆ ๋ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

LSTM ๋ชจ๋ธ์˜ ์ž์„ธํ•œ ๊ตฌ์กฐ.

๊ทผ๋ฐ, ๋ชจ๋ธ์˜ ๊ตฌ์กฐ๋ฅผ ๋ณด๋ฉด, "Gate"๋ผ๋Š” ๊ฐœ๋…์ด ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์„ค๋ช…์„ ํ•ด๋ณด๋ฉด
LSTM ๋ชจ๋ธ์€ Long-Term Dependency(์žฅ๊ธฐ ์˜์กด์„ฑ)์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ 'Cell State (์…€ ์ƒํƒœ)' ๋ผ๋Š” ๊ฐœ๋…์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
  • "Cell State (์…€ ์ƒํƒœ)"๋Š” LSTM Model์—์„œ ๊ฐ Cell์—์„œ ์œ ์ง€๋˜๋Š” 'Memory' ๋ผ๋Š” ๊ฐœ๋…์œผ๋กœ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ฅผ ํ†ตํ•ด์„œ LSTM Model์€ ์ •๋ณด๋ฅผ ๊ณ„์† ๊ฐ€์ง€๊ณ  ์žˆ๊ฑฐ๋‚˜ ํ•„์š”์—†๋Š” ์ •๋ณด๋ฅผ ๋ฒ„๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” "Gate(๊ฒŒ์ดํŠธ)" ๋ผ๋Š” ๊ตฌ์กฐ๋กœ ์กฐ์ ˆ๋ฉ๋‹ˆ๋‹ค.
  • LSTM ๋ชจ๋ธ์€ 4์ข…๋ฅ˜์˜ Gate๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ๊ฐ "Gate"๋Š” ์–ด๋– ํ•œ ์—ญํ• ์„ ํ•˜๋Š”์ง€ ํ•œ๋ฒˆ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  1. ๋ง๊ฐ ๊ฒŒ์ดํŠธ (Forget Gate)
  2. ๊ธฐ์–ต ๊ฒŒ์ดํŠธ (Remember Gate)
  3. ์ž…๋ ฅ ๊ฒŒ์ดํŠธ (Input Gate)
  4. ์ถœ๋ ฅ ๊ฒŒ์ดํŠธ (Output Gate)

Forget Gate (๋ง๊ฐ ๊ฒŒ์ดํŠธ)

Forget Gate (๋ง๊ฐ ๊ฒŒ์ดํŠธ)๋Š” ์žฅ๊ธฐ ๊ธฐ์–ต์ค‘ Cell State (์…€ ์ƒํƒœ) ์—์„œ ์–ด๋– ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ• ์ง€๋ฅผ, ์–ด๋– ํ•œ ์ •๋ณด๋ฅผ ๊ธฐ์–ตํ• ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  • ๊ณผ๊ฑฐ์˜ ์ •๋ณด์ค‘ ํ•„์š”ํ•˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ Cell State (์…€ ์ƒํƒœ)์—์„œ ์ œ๊ฑฐํ•˜์—ฌ, ์‹ ๊ฒฝ๋ง์ด ํ•„์š”ํ•œ ์ •๋ณด๋งŒ์„ ์œ ์ง€ ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์˜ ์ž‘๋™๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  1. ํ˜„์žฌ์˜ input๊ฐ’๊ณผ ์ด์ „์˜ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๊ฐ€ Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  2. ํ˜„์žฌ์˜ input๊ฐ’, hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๋Š” Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์œผ๋กœ ์กฐ์ •๋œํ›„, Sigmoid ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค. ์ด Sigmoid ํ•จ์ˆ˜๋Š” 0~1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜๋ฉฐ, ์ถœ๋ ฅ๊ฐ’์ด ์–ด๋–ค ์ •๋ณด๋ฅผ 'Forget' ์ฆ‰, '๋ง๊ฐ' ํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  3. Sigmoid ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ๊ฐ’์ด 1์— ๊ฐ€๊นŒ์šฐ๋ฉด, 'Forget' ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ฆ‰, '๋ง๊ฐ'ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค. ๋งŒ์•ฝ 0์— ๊ฐ€๊นŒ์šฐ๋ฉด ๊ทธ ์ •๋ณด๋ฅผ ๋ง๊ฐํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.
  4. ์ด ์ถœ๋ ฅ ๊ฐ’์ด "Cell State(์…€ ์ƒํƒœ)"์™€ ์š”์†Œ๋ณ„๋กœ ๊ณฑํ•ด์ ธ์„œ "Cell State(์…€ ์ƒํƒœ)"์—์„œ ํŠน์ • ์ •๋ณด๋ฅผ Forget(๋ง๊ฐ)ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Remember Gate (๊ธฐ์–ต ๊ฒŒ์ดํŠธ)

Remember Gate (๊ธฐ์–ต ๊ฒŒ์ดํŠธ)๋Š” ์„ ํƒ๋œ ์ƒˆ๋กœ์šด ๊ธฐ์–ต์œผ๋กœ ์žฅ๊ธฐ ๊ธฐ์–ต์„ ๊ฐฑ์‹ ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๊ฒŒ์ดํŠธ์™€ ๋‹ฌ๋ฆฌ ๋ณ„๋„์˜ Gate ์—ฐ์‚ฐ ์—†์ด ๋”ํ•˜๊ธฐ๋กœ๋งŒ ๊ตฌ๋ณ„๋ฉ๋‹ˆ๋‹ค.
  • ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ Cell State(์…€ ์ƒํƒœ)์— ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ฐฑ์‹ ํ•ฉ๋‹ˆ๋‹ค. ์ƒˆ๋กœ์šด input์„ ๊ธฐ์–ตํ•˜๊ณ  ์ด์ „์˜ ์ƒํƒœ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฒฐ์ •ํ•˜๋Š”๋ฐ ๋„์›€์„ ์ฃผ๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
Remember Gate(๊ธฐ์–ต ๊ฒŒ์ดํŠธ)์˜ ์ž‘๋™๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  1. ํ˜„์žฌ input๊ฐ’๊ณผ ์ด์ „ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๊ฐ€ Remember Gate(๊ธฐ์–ต ๊ฒŒ์ดํŠธ)๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  2. ์ด๋“ค์€ ๊ฐ๊ฐ Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์œผ๋กœ ์กฐ์ •๋œ ํ›„ ํ•˜๋‚˜๋Š” Sigmoid ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•˜๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” tan(ํƒ„์  ํŠธ) ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
  3. Sigmoid ํ•จ์ˆ˜๋Š” 0 ~ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์—ฌ ์ƒˆ๋กœ์šด ์ •๋ณด์˜ ์ค‘์š”์„ฑ์„ ๊ฒฐ์ •ํ•˜๊ณ , tan(ํƒ„์  ํŠธ) ํ•จ์ˆ˜๋Š” -1์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์—ฌ ์ƒˆ๋กœ์šด ๊ฐ’๋“ค์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  4. Sigmoid ํ•จ์ˆ˜์˜ ๊ฐ’, tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜์˜ ๊ฐ’ ๋“ค์ด ๊ณฑํ•ด์ ธ์„œ ์…€ ์ƒํƒœ์— ์ถ”๊ฐ€๋  ์ƒˆ๋กœ์šด ๊ฐ’์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  5. ์ƒˆ๋กœ์šด ๊ฐ’์ด ๋ง๊ฐ ๊ฒŒ์ดํŠธ์—์„œ ์ƒ์„ฑ๋œ ์—…๋ฐ์ดํŠธ๋œ Cell State(์…€ ์ƒํƒœ)์— ๋”ํ•ด์ ธ์„œ ์ตœ์ข… Cell State(์…€ ์ƒํƒœ)๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.

Input Gate (์ž…๋ ฅ ๊ฒŒ์ดํŠธ)

Input Gate(์ž…๋ ฅ ๊ฒŒ์ดํŠธ)๋Š” Cell State(์…€ ์ƒํƒœ)์— ์–ด๋–ค ์ƒˆ๋กœ์šด ์ •๋ณด๋ฅผ ์ถ”๊ฐ€ํ• ์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹จ๊ธฐ ๊ธฐ์–ต๊ณผ ์ƒˆ๋กœ์šด Event๊ฐ€ ํ•ฉ์ณ์ง„ ์ƒˆ๋กœ์šด ๊ธฐ์–ต์—์„œ Predict(์˜ˆ์ธก)์— ํ•„์š”ํ•œ ๋ถ€๋ถ„์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋ถ€๋ถ„์„ ์„ค๋ช…ํ•ด๋ณด๋ฉด, Cell State(์…€ ์ƒํƒœ - ๊ธฐ์–ต)์™€ hidden state(๋‹จ๊ธฐ ๊ธฐ์–ต)์„ ์œ ์ง€ํ•˜๋ฉด์„œ ์ƒˆ๋กœ์šด ์ž…๋ ฅ(Event)๊ฐ€ ๋“ค์–ด์˜ฌ ๋•Œ ๋งˆ๋‹ค ์ด๋ฅผ ์—…๋ฐ์ดํŠธ ํ•ฉ๋‹ˆ๋‹ค.
Input Gate(์ž…๋ ฅ ๊ฒŒ์ดํŠธ)์˜ ์ž‘๋™๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  1. ํ˜„์žฌ ์ž…๋ ฅ๊ณผ ์ด์ „ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๊ฐ€ Input Gate(์ž…๋ ฅ ๊ฒŒ์ดํŠธ) ๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  2. ์ด๋“ค์€ ๊ฐ๊ฐ Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์œผ๋กœ ์กฐ์ •๋œ ํ›„, ํ•˜๋‚˜๋Š” Sigmoid ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•˜๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
  3. Sigmoid ํ•จ์ˆ˜๋Š” 0 ~ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์—ฌ ์–ด๋–ค ์ •๋ณด๋ฅผ Cell State(์…€ ์ƒํƒœ)์— ์ถ”๊ฐ€ํ• ์ง€ ๊ฒฐ์ •ํ•˜๊ณ , tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜๋Š” -1์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์—ฌ ์ƒˆ๋กœ์šด ํ›„๋ณด ๊ฐ’๋“ค์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  4. Sigmoid ํ•จ์ˆ˜์˜ ๊ฐ’, tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜์˜ ๊ฐ’ ๋“ค์ด ๊ณฑํ•ด์ ธ์„œ ์…€ ์ƒํƒœ์— ์ถ”๊ฐ€๋  ์ƒˆ๋กœ์šด ๊ฐ’์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  5. ์ด ์ƒˆ๋กœ ์ƒ์„ฑ๋œ ๊ฐ’์€ Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์—์„œ ์—…๋ฐ์ดํŠธ๋œ Cell State(์…€ ์ƒํƒœ) ์— ๋”ํ•ด์ ธ์„œ ์ตœ์ข…์ ์ธ Cell State(์…€ ์ƒํƒœ) ๋ฅผ ํ˜•์„ฑํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Output Gate (์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)

Output Gate (์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋Š” Cell State(์…€ ์ƒํƒœ)์—์„œ ์–ด๋–ค ์ •๋ณด๋ฅผ ์ตœ์ข… ์ถœ๋ ฅ์œผ๋กœ ๋ณด๋‚ผ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
LSTM Model์ด ํ•„์š”ํ•œ ์ •๋ณด๋งŒ์„ ์„ ํƒํ•˜์—ฌ ์ตœ์ข… ์ถœ๋ ฅ์œผ๋กœ ์ „๋‹ฌํ•˜๊ฒŒ ๋•์Šต๋‹ˆ๋‹ค.
  • ์‚ฌ๊ฑด, ๋‹จ๊ธฐ, ์žฅ๊ธฐ ๊ธฐ์–ต์ด ์—ฐํ•ฉ๋˜์–ด ์žˆ๋Š” ๊ฐฑ์‹ ๋œ ์žฅ๊ธฐ ๊ธฐ์–ต์—์„œ ์˜ˆ์ธก์— ํ•„์š”ํ•œ ๋ถ€๋ถ„์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
Output Gate(์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)์˜ ์ž‘๋™๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  1. ํ˜„์žฌ ์ž…๋ ฅ๊ณผ ์ด์ „ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๊ฐ€ Output Gate(์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  2. ์ด๋“ค์€ ๊ฐ๊ฐ Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์œผ๋กœ ์กฐ์ •๋œ ํ›„, ํ•˜๋‚˜๋Š” Sigmoid ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•ฉ๋‹ˆ๋‹ค.
  3. Sigmoid ํ•จ์ˆ˜๋Š” 0 ~ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•˜์—ฌ ์–ด๋–ค Cell State(์…€ ์ƒํƒœ)์˜ ๋ถ€๋ถ„์„ ์ถœ๋ ฅํ• ์ง€ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  4. ๋™์‹œ์—, ํ˜„์žฌ Cell State(์…€ ์ƒํƒœ)๋Š” tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•˜์—ฌ -1์—์„œ 1 ์‚ฌ์ด์˜ ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  5. Sigmoid ํ•จ์ˆ˜์˜ ๊ฐ’, tan(ํƒ„์  ํŠธ)ํ•จ์ˆ˜์˜ ๊ฐ’๋“ค์ด ๊ณฑํ•ด์ ธ์„œ ์ตœ์ข… ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  6. Output Gate(์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋Š” Cell State(์…€ ์ƒํƒœ)์˜ ์ •๋ณด ์ค‘ ์–ด๋–ค ๋ถ€๋ถ„์ด ๋‹ค์Œ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๋กœ ์ „๋‹ฌ๋ ์ง€, ๋˜๋Š” ์‹ ๊ฒฝ๋ง์˜ ์ตœ์ข… Output(์ถœ๋ ฅ)์œผ๋กœ ์‚ฌ์šฉ๋ ์ง€๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

3.  LSTM Model์— ๋“ค์–ด๊ฐ€๋ณด๊ธฐ

์‚ฌ๊ฑด์˜ ๋‹จ๊ธฐ ๊ธฐ์–ต, ์žฅ๊ธฐ ๊ธฐ์–ต์ด ์–ด๋Š์ •๋„ ์˜ˆ์ธก์— ๊ด€์—ฌ ํ•˜๋Š”์ง€๋Š” LSTM์˜ Gate ๊ตฌ์กฐ๋กœ ์กฐ์ ˆ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  LSTM ๋ชจ๋ธ์€ ์žฅ๊ธฐ๊ธฐ์–ต์„ ์˜ค๋ž˜ ๊ธฐ์–ตํ•  ์ˆ˜ ์žˆ๊ณ , ์–ด๋Š ๋ถ€๋ถ„์„ ๊ธฐ์–ตํ• ์ง€๋ฅผ ์„ ํƒ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

LSTM ๋ชจ๋ธ์˜ ์žฅ๊ธฐ๊ธฐ์–ต์„ ์˜ค๋ž˜ ๊ธฐ์–ต ๋ฐ ์–ด๋Š ๋ถ€๋ถ„์„ ๊ธฐ์–ตํ• ์ง€๋ฅผ ์„ ํƒํ•˜๋Š”๊ฑธ ๋ณด์—ฌ์ฃผ๋Š” ์˜ˆ์‹œ ๊ทธ๋ฆผ.

๊ทธ๋Ÿฌ๋ฉด ์ด์ œ LSTM์˜ ๋ชจ๋ธ์„ ์ž์„ธํ•˜๊ฒŒ ํ•œ๋ฒˆ ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

LSTM ๋ชจ๋ธ์˜ ์ž์„ธํ•œ ๊ตฌ์กฐ

  • ๊ทธ๋ฆผ์— ๋ฐํ•˜์—ฌ ์„ค๋ช…์„ ํ•ด๋ณด๋ฉด Gate ์ข…๋ฅ˜ ft(๋ง๊ฐ ๊ฒŒ์ดํŠธ), it(์ž…๋ ฅ ๊ฒŒ์ดํŠธ), ot(์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋Š” ์œ„์—์„œ ์—ญํ• ๊ณผ ์ž‘๋™ ๋ฐฉ์‹์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ–ˆ์œผ๋‹ˆ ํŒจ์Šค ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์œ„์˜ ์ ์€ ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•ด์ฃผ๋ฉด์„œ ํ•œ๋ฒˆ ๋ด์ฃผ์„ธ์š”!

์ž์„ธํ•œ ๊ตฌ์กฐ ์„ค๋ช…

  • ์—ฌ๊ธฐ์„œ "Ct"๋Š” Cell State(์…€ ์ƒํƒœ)๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ *Internal state๋กœ ์žฅ๊ธฐ ๊ธฐ์–ต์ด ๋œ๋‹ค๊ณ  ๋˜์–ด์žˆ์Šต๋‹ˆ๋‹ค.
    • *Internal state๋Š” Model์ด ์‹œ๊ฐ„์— ๋”ฐ๋ฅธ input์˜ Sequence๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ ์œ ์ง€ํ•˜๋Š” "State(์ƒํƒœ)"๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • RNN ์—์„œ๋Š” Internal stater๊ฐ€ ๊ฐ time-step์—์„œ์˜ hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๋กœ ํ‘œํ˜„ ๋˜๋ฉฐ, ์ด๋Š” ์ด์ „ time-step์˜  hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)์™€ ํ˜„์žฌ์˜ time-step์˜ ์ž…๋ ฅ์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.

 

  • ๊ทธ๋ฆฌ๊ณ  Cell State(์…€ ์ƒํƒœ)๋Š” Sigmoid ํ•จ์ˆ˜(์ž‘์€ Network)์— ๊ธฐ๋ฐ˜ํ•˜๋ฉฐ, Cell State(์…€ ์ƒํƒœ) ๊ฐ„์— *Linear Interaction ์ƒํƒœ๋กœ ๊ตฌ์„ฑํ•˜๋ฉฐ *Gradient Flow ์ง€๋ฆ„๊ธธ์„ ์ƒ์„ฑํ•œ ๊ฒƒ์ด ํ•ต์‹ฌ idea์ž…๋‹ˆ๋‹ค.
    • *Linear Interaction: input & output ์‚ฌ์ด์˜ ์ƒํ˜ธ์ž‘์šฉ์ด ์„ ํ˜•์ ์ด ๊ด€๊ณ„๋ฅผ ๊ฐ€์ง€๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, input์˜ ํฌ๊ธฐ๊ฐ€ ๋ณ€ํ•˜๋ฉด output๋„ ๊ทธ์— ๋น„๋ก€ํ•˜์—ฌ ๋ณ€ํ•ฉ๋‹ˆ๋‹ค.
    • Gradient Flow: ์‹ ๊ฒฝ๋ง ํ•™์Šต ๊ณผ์ •์—์„œ ์˜ค์ฐจ๋ฅผ Backpropagation(์—ญ์ „ํŒŒ)ํ•˜์—ฌ Weight(๊ฐ€์ค‘์น˜)๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๋งค์ปค๋‹ˆ์ฆ˜ ์ž…๋‹ˆ๋‹ค.

 

  • Cell State(์…€ ์ƒํƒœ)๋Š” Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์™€ ์ „์˜ Cell State(์…€ ์ƒํƒœ)๋ฅผ ๊ณฑํ•œ ๊ฐ’๊ณผ, Input(์ž…๋ ฅ ๊ฒŒ์ดํŠธ)์™€ Cell์—์„œ ํ˜•์„ฑ๋œ ์ƒˆ๋กœ์šด ๊ธฐ์–ต์„ ๊ณฑํ•œ ๊ฐ’์„ ๋”ํ•ด์„œ Cell State(์…€ ์ƒํƒœ)๋ฅผ ์—…๋ฐ์ดํŠธ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋•Œ Cell State(์…€ ์ƒํƒœ)์˜ ๊ฐ’์ด 1์”ฉ ์ฆ๊ฐ€ or ๊ฐ์†Œ ํ•˜๋Š”๋ฐ, Element(์š”์†Œ)๋ณ„๋กœ Integer Counter ํ•ฉ๋‹ˆ๋‹ค.
    • ๋ฌด์Šจ ๋ง์ด๋ƒ๋ฉด, LSTM Model์€ Sequence์˜ ๊ฐ ์š”์†Œ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉด์„œ ์ •๋ณด๋ฅผ ์Œ“๊ฑฐ๋‚˜ ์ œ๊ฑฐํ•˜๋Š” ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

 

  • ๊ทธ๋ฆฌ๊ณ  "ht", Hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ)๋Š” Cell์˜ Output๊ฐ’์œผ๋กœ ๋‹จ๊ธฐ ๊ธฐ์–ต์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • Hidden state(์ˆจ๊ฒจ์ง„ ์ƒํƒœ), "ht"์˜ ๊ฐ’์€ Output Gate(์ถœ๋ ฅ ๊ฒŒ์ดํŠธ)๋ž‘ "Ct", Cell State(์…€ ์ƒํƒœ)๊ฐ’์„ ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ(tanh)ํ•จ์ˆ˜์— ๋„ฃ์€ ๊ฐ’๊ณผ ๊ณฑํ•ด์„œ ์ถœ๋ ฅ๊ฐ’์„ ์–ป์Šต๋‹ˆ๋‹ค.
  • ์ด๋•Œ Counter ๊ฐ’์„ -1 ~ 1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋กœ squashing ํ•ฉ๋‹ˆ๋‹ค. "squashing" ํ•œ๋‹ค๋Š” ๋ง์€ Cell State(์…€ ์ƒํƒœ)์˜ ๊ฐ’์„ -1 ~ 1 ์‚ฌ์ด์˜ ๋ฒ”์œ„๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

LSTM Model์˜ ๊ตฌ์กฐ

๊ทธ๋ฆฌ๊ณ  LSTM Model์€ Gradient Flow์— Weight(๊ฐ€์ค‘์น˜)๊ฐ€ ๊ณฑํ•ด์ง€์ง€ ์•Š๋„๋ก ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์„ค๋ช…์„ ํ•ด๋ณด์ž๋ฉด, "Ct" -> ํ˜„์žฌ์˜ ์…€ ์ƒํƒœ ์—์„œ "Ct-1" -> ์ด์ „์— ์…€ ์ƒํƒœ ์‚ฌ์ด Gradient(๊ธฐ์šธ๊ธฐ) ์—ฐ์‚ฐ์—์„œ W(๊ฐ€์ค‘์น˜)์€ ์™„์ „ํžˆ ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค.
  • ์ด์œ ๋Š” "ft" -> Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์™€์˜ Element(์š”์†Œ) ๊ณฑ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Local Gradient๋Š” Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
    • ์ด๊ฑฐ์— ๋ฐํ•˜์—ฌ ์„ค๋ช…์„ ํ•ด๋ณด๋ฉด, Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์˜ ์ถœ๋ ฅ๊ฐ’์ธ 0~1 ์‚ฌ์ด์˜ ๊ฐ’ (Cell state์˜ ๊ฐ ์š”์†Œ์— ๊ณฑํ•ด์ ธ์„œ ์—…๋ฐ์ดํŠธ) ์ด Gradient(๊ธฐ์šธ๊ธฐ)์˜ ํ๋ฆ„์„ ๊ฒฐ์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, Local Gradient๋Š” Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์˜ ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.

 

๊ทธ๋Ÿฌ๋ฉด ์ด๋ฒˆ์—๋Š” LSTM์—์„œ Gradient Flow๊ฐ€ ์ข‹์€ ์ด์œ ๋ฅผ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

  1. "ft" -> Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ) ๊ฐ’์€ ๋งค๋ฒˆ ๋ด๋€Œ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ™์€ ๊ฐ’์ด ๋ฐ˜๋ณตํ•ด์„œ ๊ณฑํ•ด์ง€์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  2. Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ) ๊ฐ’์ด (0 ~ 1) ์‚ฌ์ด ๋ฒ”์œ„์—์„œ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— Gradient Exploding(๊ธฐ์šธ๊ธฐ ํญํŒ”)์€ ์ผ์–ด๋‚˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  3. Final hidden state(์ตœ์ข… ์ˆจ๊ฒจ์ง„ ์ƒํƒœ)์—์„œ Fist Cell State(์ฒซ๋ฒˆ์žฌ ์…€ ์ƒํƒœ)๊นŒ์ง€ backward path(์—ญ์ „ํŒŒ ๋‹จ๊ณ„)์—๋Š” tanh(ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ)ํ•จ์ˆ˜๋ฅผ ํ•œ๋ฒˆ๋งŒ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. (์ฆ‰, ๋ฐ˜๋ณต์ ์ธ tanh(ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ)ํ•จ์ˆ˜์˜ ๊ณฑ์…ˆ์ด ์‚ฌ๋ผ์ง‘๋‹ˆ๋‹ค)
+ Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)๊ฐ’์ด 1๋ณด๋‹ค ์ž‘๊ธฐ ๋•Œ๋ฌธ์— Gradient Vanishing (๊ธฐ์šธ๊ธฐ ์†Œ์‹ค)์ด ์ผ์–ด๋‚ ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด๋ฅผ ๋ฐฉ์ง€ ํ•˜๊ธฐ ์œ„ํ•ด์„œ Forget Gate(๋ง๊ฐ ๊ฒŒ์ดํŠธ)์˜ bias(ํŽธํ–ฅ)์„ 1๋กœ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค.

4.  LSTM Model ์ฝ”๋“œ ์˜ˆ์‹œ

ํ•œ๋ฒˆ LSTM Model์˜ ์ฝ”๋“œ ์˜ˆ์‹œ๋ฅผ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
from keras.models import Sequential
from keras.layers import LSTM, Dense

# model ์ดˆ๊ธฐํ™”
model = Sequential()

# LSTM layer add
# ์ž…๋ ฅ ์ฐจ์›์€ ํŠน์„ฑ์˜ ์ˆ˜์— ๋”ฐ๋ผ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. 100 - hidden unit ๊ฐœ์ˆ˜
model.add(LSTM(100, input_shape=(timesteps, input_dim)))

# fully connected layer add, ์ถœ๋ ฅ Node ๊ฐœ์ˆ˜๊ฐ€ 1๊ฐœ์ด๋ฏ€๋กœ 1์ด๋ผ๊ณ  ํ•จ
model.add(Dense(1, activation='sigmoid'))

# model compile
# ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์ด๋ฏ€๋กœ ์†์‹ค ํ•จ์ˆ˜๋กœ 'binary_crossentropy'๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
  • LSTM layer๋ฅผ ์„ ์–ธํ•ด์ค„๋•Œ, "model.add(LSTM(100, input_shape=(timesteps, input_dim)))" ์ด๋Ÿฐ ํ˜•์‹์œผ๋กœ ์„ ์–ธ์„ ํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ๋Š” 100์€ hidden Unit์˜ ๊ฐœ์ˆ˜์ž…๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ "timesteps"์€ Sequence์˜ Length(๊ธธ์ด), input_dim์€ ๊ฐ Sequence ์š”์†Œ์˜ ํŠน์„ฑ ๊ฐœ์ˆ˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

 

  • ๊ทธ๋ฆฌ๊ณ  ํ•˜๋‚˜์˜ ์ถœ๋ ฅ ๋…ธ๋“œ๋ฅผ ๊ฐ€์ง„ Dense Layer(์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • "model.add(Dense(1, activation='sigmoid'))" -> ์ถœ๋ ฅ Node์˜ ๊ฐœ์ˆ˜๊ฐ€ ํ•˜๋‚˜์ด๋ฏ€๋กœ, activation ์™ผ์ชฝ ์˜†์— ์ถœ๋ ฅ๋…ธ๋“œ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต๊ฐ„์— 1์„ ์ ์–ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.

 

  • ๋˜ํ•œ ์ œ๊ฐ€ ๊ฐ„๋‹จํžˆ ๋งŒ๋“  ๋ชจ๋ธ์€ Binary(์ด์ง„)๋ถ„๋ฅ˜ ๋ฌธ์ œ์— ์ ํ•ฉํ•˜๊ฒŒ ๋งŒ๋“ค์—ˆ์Šต๋‹ˆ๋‹ค.
    • ๊ทธ๋ž˜์„œ ์ถœ๋ ฅ Dense Layer์— 0~1์‚ฌ์ด์˜ ํ™•๋ฅ ๊ฐ’์„ ์ถœ๋ ฅํ•ด์ฃผ๋Š” "Sigmoid ํ•จ์ˆ˜", Compile ์ฝ”๋“œ์—์„œ๋Š” Binary(์ด์ง„)๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด์„œ "binary_crossentropy"๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.

 

  • ๋งŒ์•ฝ์— ๋‹ค์ค‘ Class๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋ผ๋ฉด ํ™œ์„ฑํ™” ํ•จ์ˆ˜ -> "activation"์„ "softmax"ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๊ณ 
  • ๋ชจ๋ธ Compile ์ฝ”๋“œ์—์„œ ์†์‹ค ํ•จ์ˆ˜ -> "loss" ๋ฅผ 'categorical_crossentropy'๋กœ ์„ค์ •ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.

 

  • ๊ณต์‹๋ฌธ์„œ์— ์–ด๋– ํ•œ parameter๊ฐ€ ๋“ค์–ด๊ฐˆ์ˆ˜ ์žˆ๋Š”์ง€ ๋‚˜์™€์žˆ์œผ๋‹ˆ๊นŒ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”.
  • ๋‹ค์Œ์€ GRU Model์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ•˜๋Š” ๊ธ€๋กœ ๋Œ์•„์˜ค๊ฒ ์Šต๋‹ˆ๋‹ค.
 

tf.keras.layers.LSTM  |  TensorFlow v2.15.0.post1

Long Short-Term Memory layer - Hochreiter 1997.

www.tensorflow.org

+ Softmax ํ•จ์ˆ˜: Vector๋ฅผ input์œผ๋กœ ๋ฐ›์•„์„œ ๊ฐ ์›์†Œ์˜ ๊ฐ’์„ 0~1 ์‚ฌ์ด์˜ ๊ฐ’์œผ๋กœ ๋ฐ˜ํ™˜ํ•˜์—ฌ, ์ด ๊ฐ’๋“ค์˜ ํ•ฉ์ด 1์ด ๋˜๋„๋ก ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์ฃผ๋กœ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š”๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.