A A
[NLP] RNN (Recurrent Netural Network) - ์ˆœํ™˜์‹ ๊ฒฝ๋ง

1. RNN ์ด๋ž€?

RNN์€ Sequence data๋ฅผ ์ฒ˜๋ฆฌ ํ•˜๊ธฐ ์œ„ํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ ์ž…๋‹ˆ๋‹ค.
  • ์ฃผ๋กœ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ(NLP)๋ฅผ ํฌํ•จํ•œ ์—ฌ๋Ÿฌ Sequence Modeling ์ž‘์—…์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ํŠน์ง•์œผ๋กœ๋Š” ์‹œ๊ฐ„์ , ๊ณต๊ฐ„์  ์ˆœ์„œ ๊ด€๊ณ„์— ์˜ํ•˜์—ฌ Context๋ฅผ ๊ฐ€์ง€๋Š” ํŠน์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
๐Ÿ’ก example
I want to have an apple
  • ์ด 'apple'์— ํ•œ๋ฒˆ ์ฃผ๋ชฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์ด apple์ด๋ผ๋Š” ๋‹จ์–ด๋Š” ๋ฌธ๋งฅ์ด ํ˜•์„ฑํ•˜๋Š” ์ฃผ๋ณ€์˜ ๋‹จ์–ด๋“ค์„ ํ•จ๊ป˜ ์‚ดํŽด๋ด์•ผ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. RNN์— ๋Œ€ํ•˜์—ฌ

RNN์˜ ํŠน์ง•์€ ์–ด๋–ค๊ฒƒ์ด ์žˆ์„๊นŒ์š”?
  • RNN์€ ์€๋‹‰์ธต(hidden layer)์˜ node์—์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activation function)์„ ํ†ตํ•ด ๋‚˜์˜จ ๊ฒฐ๊ณผ๊ฐ’์„ ์ถœ๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ๋ณด๋‚ด๋ฉด์„œ, hidden layer node์˜ ๋‹ค์Œ ๊ณ„์‚ฐ์˜ input์œผ๋กœ ๋ณด๋ƒ…๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, RNN์€ ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์งˆ์ˆ˜๋ก ์ •๋ณด ์••์ถ•์— ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. RNN์€ ์ž…๋ ฅ ์ •๋ณด๋ฅผ ์ฐจ๋ก€๋Œ€๋กœ ์ฒ˜๋ฆฌํ•˜๊ณ  ์˜ค๋ž˜ ์ „์— ์ฝ์—ˆ๋˜ ๋‹จ์–ด๋Š” ์žŠ์–ด๋ฒ„๋ฆฌ๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
    • ์ด ๋ง์€ ์ฆ‰, ์˜ค๋ž˜ ์ „์— ์ž…๋ ฅ๋œ ๋‹จ์–ด๋Š” ์žŠ์–ด๋ฒ„๋ฆฌ๊ฑฐ๋‚˜, ํŠน์ • ๋‹จ์–ด์˜ ์ •๋ณด๋ฅผ ๊ณผ๋„ํ•˜๊ฒŒ ๋ฐ˜์˜ํ•ด ์ „์ฒด ์ •๋ณด๋ฅผ ์™œ๊ณกํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ž์ฃผ ์ƒ๊ธด๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.

3. RNN์…€์˜ ํŠน์ง•, ๊ตฌ์กฐ ๋ฐ ์ž‘๋™์›๋ฆฌ

RNN์…€์˜ ํŠน์ง•๊ณผ ๋Œ€ํ•˜์—ฌ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  1. ์‹œํ€€์Šค์˜ ๊ฐ ์š”์†Œ๋ฅผ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋‹จ๊ณ„์—์„œ ์ถœ๋ ฅ๋œ ์€๋‹‰ ์ƒํƒœ๋Š” ๋‹ค์Œ ๋‹จ๊ณ„์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. - ์ˆœ์ฐจ์  ์ฒ˜๋ฆฌ
  2. RNN ์…€์€ Sequence์˜ ๋ชจ๋“  ๋‹จ๊ณ„์— ๊ฑธ์ณ ๊ฐ™์€ ๊ฐ€์ค‘์น˜(weight)์™€ ํŽธํ–ฅ(bias)์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ์ค„์ด๊ณ , ๋ชจ๋ธ์˜ ํ•™์Šต ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. - ํŒŒ๋ผ๋ฏธํ„ฐ ๊ณต์œ 
  3. RNN์€ ๊ฐ€๋ณ€ ๊ธธ์ด์˜ ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. - ๊ฐ€ํŽธ ๊ธธ์ด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ
์ด๋ฒˆ์€ RNN์…€์˜ ๊ตฌ์กฐ ๋ฐ ์ž‘๋™์›๋ฆฌ์— ๋Œ€ํ•˜์—ฌ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์•„๋ž˜์˜ ๊ทธ๋ฆผ์„ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์„œ x๋Š” ์ž…๋ ฅ์ธต์˜ Input Vector, Y๋Š” ์ถœ๋ ฅ์ธต์˜ Output Vector ์ž…๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ๊ฐ€์šด๋ฐ์— h๊ฐ€ ์žˆ๋Š”๋ฐ, ์—ฌ๊ธฐ์„œ๋Š” h๋Š” RNN์…€ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • RNN์…€์€ RNN์—์„œ ์€๋‹‰์ธต(hidden layer)์—์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activation function)์„ ํ†ตํ•ด ๋‚˜์˜จ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๋…ธ๋“œ๋ฅผ ์…€(cell)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์…€์€ ์ด์ „์˜ ๊ฐ’์„ ๊ธฐ์–ตํ•˜๋ ค๋Š” ์„ฑํ–ฅ์ด ์žˆ๋Š”๋ฐ, memory์˜ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค. ์ด ์…€์„ ๋ฉ”๋ชจ๋ฆฌ ์…€ or RNN ์…€ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์€๋‹‰์ธต(hidden layer)์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์…€์€ time step(๊ฐ๊ฐ์˜ ์‹œ์ )์—์„œ์˜ ๋ฐ”๋กœ ์ด์ „ ์‹œ์ ์—์„œ์˜ ์€๋‹‰์ธต์˜ ๋ฉ”๋ชจ๋ฆฌ ์…€์—์„œ ๋‚˜์˜จ ๊ฐ’์„ ์ž์‹ ์˜ input์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ˜„์žฌ ์‹œ์ ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค -> ์ด ๊ณผ์ •์„ RNN์—์„œ๋Š” Feedback ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

Feedback ๋ณด์ถฉ ์„ค๋ช…
  • Feedback์€ hidden state์˜ ์ˆœํ™˜์ ์ธ ์ „๋‹ฌ์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. ์ด์ „ ์‹œ์ ์—์„œ์˜ hidden state๋Š” ๊ฐ€์ค‘์น˜(weight)์™€ ํ•จ๊ป˜ ํ˜„์žฌ ์‹œ์ ์˜ ์ž…๋ ฅ์œผ๋กœ ๊ฒฐํ•ฉ๋˜์–ด์„œ ์ƒˆ๋กœ์šด hidden state๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ์ฆ‰ ์š”์•ฝํ•˜๋ฉด -> Hidden state์— ์ด์ „ input์— ๋Œ€ํ•œ ๊ธฐ์–ต์„ ์ €์žฅํ•˜๊ณ , ์ƒˆ๋กœ์šด input์ด ๋“ค์–ด์˜ฌ๋•Œ ๋งˆ๋‹ค ๊ธฐ์–ต์„ ์ˆ˜์ •ํ•ฉ๋‹ˆ๋‹ค.

  • ๋‹ค์‹œ RNN ์…€์˜ ์„ค๋ช…์œผ๋กœ ๋Œ์•„์™€์„œ ํ˜„์žฌ ์‹œ์ ์˜ ๋ณ€์ˆ˜๋ช…์„ t๋ผ๊ณ  ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. (์œ„์˜ ์‚ฌ์ง„์—๋Š” ์—†์ง€๋งŒ x,y์˜†์— t๊ฐ€ ๋ถ™์–ด์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ด์ฃผ์„ธ์š”) like Xt, Yt
  • ํ˜„์žฌ์‹œ์ ์—์„œ t์—์„œ ๋ฉ”๋ชจ๋ฆฌ ์…€์ด ๊ฐ–๊ณ ์žˆ๋Š” ๊ฐ’์€ ๊ณผ๊ฑฐ์˜ ๋ฉ”๋ชจ๋ฆฌ ์…€๋“ค์˜ ๊ฐ’์— ์˜ํ–ฅ์„ ๋ฐ›์€๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฉ”๋ชจ๋ฆฌ ์…€์ด output layer ๋ฐฉํ–ฅ ๋˜๋Š” ๋‹ค์Œ ์‹œ์ ์ธ t+1์˜ ์ž์‹ ์—๊ฒŒ ๋ณด๋‚ด๋Š” ๊ฐ’์„ ์€๋‹‰์ƒํƒœ (hidden state)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • t์‹œ์ ์˜ ๋ฉ”๋ชจ๋ฆฌ์…€์€ t-1์˜ ๋ฉ”๋ชจ๋ฆฌ ์…€์ด ๋ณด๋‚ธ hidden state ๊ฐ’์„ t์‹œ์ ์˜ ์€๋‹‰ ์ƒ๋Œ€ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ์ž…๋ ฅ๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.RNN์„ ํ‘œํ˜„ํ•  ๋•Œ ์ขŒ์ธก๊ณผ ๊ฐ™์ด ํ™”์‚ดํ‘œ๋กœ ์‚ฌ์ดํด์„ ๊ทธ๋ ค์„œ ํ‘œํ˜„ํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ, ์šฐ์ธก๊ณผ ๊ฐ™์ด ํ™”์‚ดํ‘œ๋กœ ์‚ฌ์ดํด์„ ๊ทธ๋ ค์„œ ํ‘œํ˜„ ํ•˜๋Š” ๋Œ€์‹  ์—ฌ๋Ÿฌ ์‹œ์ ์œผ๋กœ ํŽผ์ณ์„œ ํ‘œํ˜„ํ•˜๊ธฐ๋„ ํ•ฉ๋‹ˆ๋‹ค.

RNN Unfolding

  • ๋‘ ๊ทธ๋ฆผ๋‹ค ๋™์ผํ•œ ๊ทธ๋ฆผ์œผ๋กœ ์‚ฌ์ดํด์„ ๊ทธ๋ฆฌ๋Š” ํ™”์‚ดํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ‘œํ˜„ํ•˜์˜€์ง€๋งŒ, ํ๋ฆ„์— ๋”ฐ๋ผ์„œ ์–ด๋–ป๊ฒŒ ํ‘œํ˜„ํ•˜์˜€๋Š๋ƒ์˜ ์ฐจ์ด์ผ ๋ฟ ๋‘˜ ๋‹ค ๋™์ผํ•œ RNN์„ ํ‘œํ˜„ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • RNN์—์„œ๋Š” input layer(์ž…๋ ฅ์ธต)๊ณผ output layer(์ถœ๋ ฅ์ธต)์—์„œ๋Š” ๊ฐ๊ฐ input, output ๋ฒกํ„ฐ, hidden layer(์€๋‹‰์ธต)์—์„œ๋Š” ์€๋‹‰ ์ƒํƒœ(hidden state)์ด๋ผ๋Š” ํ‘œํ˜„์„ ์ฃผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

4. RNN ๋ชจ๋ธ์˜ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ

RNN ๋ชจ๋ธ์˜ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์— ๋Œ€ํ•˜์—ฌ ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
  • RNN์€ ์ž…๋ ฅ, ์ถœ๋ ฅ์˜ ๊ธธ์ด๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์„ค๊ณ„ ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๋˜ํ•œ ๋‹ค์–‘ํ•œ ์šฉ๋„๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • RNN์…€์˜ ๊ฐ ์‹œ์ ์˜ ์ž…, ์ถœ๋ ฅ์˜ ๋‹จ์œ„๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์ •ํ•  ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ฐ€์žฅ ๋ณดํŽธ์ ์ธ ๋‹จ์œ„๋Š” 'Word Vector' ์ž…๋‹ˆ๋‹ค.

Many-to-One ๋ชจ๋ธ

Many-to-One ๊ตฌ์กฐ ๋ชจ๋ธ (๋‹ค ๋Œ€ ์ผ ๊ตฌ์กฐ)
  • Many-to-One๊ตฌ์กฐ์˜ ๋ชจ๋ธ์€ input(์ž…๋ ฅ)๋˜๋Š” Sequence๊ฐ€ ๊ธ์ •, ๋ถ€์ •์ ์ธ์ง€ ํŒ๋‹จํ•˜๋Š” ๊ฐ์„ฑ ๋ถ„๋ฅ˜ (Sentiment Classification)
  • ๋ฉ”์ผ์ด ์ •์ƒ์ธ์ง€, ์ŠคํŽจ์ธ์ง€ ํŒ๋ณ„ํ•˜๋Š” Spam Detection์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Mant-to-Many ๋ชจ๋ธ

Many-to-Many ๊ตฌ์กฐ ๋ชจ๋ธ (๋‹ค ๋Œ€ ๋‹ค ๊ตฌ์กฐ)
  • Many-to-Many ๊ตฌ์กฐ์˜ ๋ชจ๋ธ์€ ์‚ฌ์šฉ์ž๊ฐ€ ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๋ฉด ์˜ˆ์ธกํ•˜์—ฌ ๋‹จ์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ๋Œ€๋‹ต ๋ฌธ์žฅ์„ ์ถœ๋ ฅํ•˜๋Š” ChatGpt์™€ ๊ฐ™์€ ์ฑ—๋ด‡๊ณผ
  • ์ž…๋ ฅ ๋ฌธ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ๋ฒˆ์—ญ๋œ ๋ฌธ์žฅ์„ ์ถœ๋ ฅํ•˜๋Š” ๋ฒˆ์—ญ๊ธฐ, ๋˜๋Š” ๊ฐœ์ฒด๋ช… ์ธ์‹์ด๋‚˜ ํ’ˆ์‚ฌ ํƒœ๊น…๊ณผ ๊ฐ™์€ ์ž‘์—…
  • ์•„๋‹ˆ๋ฉด ํ”„๋ ˆ์ž„๋ณ„ ๋น„๋””์˜ค๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ์ž‘์—…์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

One-to-Many ๋ชจ๋ธ

One-to-many ๊ตฌ์กฐ

One-to-many ๊ตฌ์กฐ ๋ชจ๋ธ (์ผ ๋Œ€ ๋‹ค ๊ตฌ์กฐ)
  • One-to-many ๊ตฌ์กฐ์˜ ๋ชจ๋ธ์€ ํ•˜๋‚˜์˜ image input์— ๋Œ€ํ•˜์—ฌ ์‚ฌ์ง„์˜ ์ œ๋ชฉ์„ ์ถœ๋ ฅํ•˜๋Š” Image Captioning(์ด๋ฏธ์ง€ ์บก์…”๋‹) ์ž‘์—…์„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์‚ฌ์ง„์˜ ์ œ๋ชฉ์€ ๋‹จ์–ด๋“ค์˜ ๋‚˜์—ด์ด๋ฏ€๋กœ Sequence Output(์‹œํ€€์Šค ์ถœ๋ ฅ) ์ž…๋‹ˆ๋‹ค.

Bidrectional ๋ชจ๋ธ

Bidrectional ๊ตฌ์กฐ ๋ชจ๋ธ (์–‘๋ฐฉํ–ฅ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ)
  • Bidrectional ๊ตฌ์กฐ ๋ชจ๋ธ์€ Sequence Data๋ฅผ ์•ž์—์„œ ๋’ค๋กœ, ๋’ค์—์„œ ์•ž์œผ๋กœ ์–‘๋ฐฉํ–ฅ์œผ๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ๊ฐ ์‹œ์ ์—์„œ์˜ ๊ณผ๊ฑฐ, ๋ฏธ๋ž˜์˜ ์ •๋ณด ๋ชจ๋‘๋ฅผ ๊ณ ๋ คํ•˜๊ฒŒ ํ•ด์ฃผ๋Š” ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
  • Bidrectional ๊ตฌ์กฐ ๋ชจ๋ธ(์–‘๋ฐฉํ–ฅ ๊ตฌ์กฐ ๋ชจ๋ธ)์€ ์•ž์—์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด
    • ์•ž์—์„œ ๋’ค๋กœ์ธ ์ •๋ฐฉํ–ฅ RNN (์‹œํ€€์Šค ์ฒ˜์Œ-> ๋๊นŒ์ง€ ์ˆœ์ฐจ์ ์œผ๋กœ ์ฒ˜๋ฆฌ), ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ๋ชจ๋ธ๋งํ•˜๋ฉฐ
    • ๋’ค์—์„œ ์•ž์œผ๋กœ์ธ ์—ญ๋ฐฉํ–ฅ RNN (์‹œํ€€์Šค ๋-> ์ฒ˜์Œ๊นŒ์ง€ ์—ญ์ˆœ์œผ๋กœ ์ฒ˜๋ฆฌ), ๊ฐ ์‹œ์ ์—์„œ ๋ฏธ๋ž˜์˜ ์ •๋ณด๋ฅผ ๋ชจ๋ธ๋ง ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋ ‡๊ฒŒ ์ •๋ฐฉํ–ฅ, ์—ญ๋ฐฉํ–ฅ RNN, 2๊ฐœ์˜ RNN์ธต์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆฌ๊ณ , ์ •๋ฐฉํ–ฅ, ์—ญ๋ฐฉํ–ฅ RNN์˜ ์€๋‹‰ ์ƒํƒœ(hidden state)๋Š” ์—ฐ๊ฒฐ or ๊ฒฐํ•ฉ ๋˜์–ด์„œ ์ตœ์ข… output์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋ž˜์„œ Bidrectional ๊ตฌ์กฐ ๋ชจ๋ธ์€ ํ•ด๋‹น ์‹œ์ ์˜ ์ „ํ›„ ์ปจํ…์ŠคํŠธ๋ฅผ ๋ชจ๋‘ ์ดํ•ดํ• ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ „์ฒด Sequence ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์— ์œ ์šฉํ•˜๊ณ , Sequence์˜ ๊ฐ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์ „์ฒด์ ์ธ ์ดํ•ด๊ฐ€ ํ•„์š”ํ•œ ์ž‘์—…์—์„œ ์„ ํ˜ธ ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Bidrectional ๊ตฌ์กฐ, ์ฆ‰ ์–‘๋ฐฉํ–ฅ RNN์€ ํ…์ŠคํŠธ ๋ถ„๋ฅ˜, ๊ฐ์ • ๋ถ„์„, ๊ธฐ๊ณ„ ๋ฒˆ์—ญ, ์Œ์„ฑ ์ธ์‹ ๋“ฑ์˜ ์ž‘์—…์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. 

5. RNN ๊ณต์‹

ํ•œ๋ฒˆ RNN์— ๋Œ€ํ•œ ๊ณต์‹์„ ์ •๋ฆฌํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

RNN Formula (RNN ๊ณต์‹)

  • ํ˜„์žฌ ์‹œ์  t์—์„œ ์€์‹์ƒํƒœ๊ฐ’์„ ht๋ผ๊ณ  ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์€๋‹‰์ธต (hidden layer)์˜ Memory Cell์€ ht๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด์„œ 2๊ฐœ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ€๋ฆฌํ‚ต๋‹ˆ๋‹ค.
  • ํ•˜๋‚˜๋Š” ์ž…๋ ฅ์ธต (input layer)์— ๋“ค์–ด๊ฐ€๋Š” ๊ฐ€์ค‘์น˜ Wx์ด๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ์ด์ „์‹œ์  t-1์˜ ์€๋‹‰ ์ƒํƒœ ๊ฐ’์ธ ht-1์„ ์œ„ํ•œ ๊ฐ€์ค‘์น˜ Wh ์ž…๋‹ˆ๋‹ค.
์ด๋ฅผ ์ˆ˜์‹ํ™” ํ•˜๋ฉด ์ด๋ ‡์Šต๋‹ˆ๋‹ค.

 

์ขŒ: ์€๋‹‰์ธต ์ˆ˜์‹, ์šฐ: ์ถœ๋ ฅ์ธต ์ˆ˜์‹

RNN์˜ ์€๋‹‰์ธต ์—ฐ์‚ฐ์„ Vector, ํ–‰๋ ฌ ์—ฐ์‚ฐ์„ ์ด์šฉํ•ด์„œ ํ•œ๋ฒˆ ๊ณ„์‚ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์ž์—ฐ์–ด์ฒ˜๋ฆฌ(NLP)์—์„œ RNN์˜ ์ž…๋ ฅ Xt๋Š” ๋Œ€๋ถ€๋ถ„ ๋‹จ์–ด Vector๋กœ ๊ฐ„์ฃผํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹จ์–ด Vector์˜ ์ฐจ์›์„ d๋ผ๊ณ  ํ•˜๊ณ , ์€๋‹‰ ์ƒํƒœ (hidden state)์˜ ํฌ๊ธฐ๋ฅผ Dh๋ผ๊ณ  ํ•˜๋ฉด, ๊ฐ Vector & ํ–‰๋ ฌ์˜ ํฌ๊ธฐ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • Batch size๊ฐ€ 1์ด๊ณ , d & Dh๊ฐ’์„ 4๋กœ ๊ฐ€์ •ํ–ˆ์„๋•Œ, RNN์˜ hidden layer ์—ฐ์‚ฐ์„ ๊ทธ๋ฆผ์œผ๋กœ ํ‘œํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.

์ถœ์ฒ˜: https://wikidocs.net/22886

์€๋‹‰์ธต (hidden layer)์˜ ์ˆ˜์‹, ht๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ ํ•จ์ˆ˜๋Š” ํ•˜์ดํผ๋ณผ๋ฆญํƒ„์  ํŠธ ํ•จ์ˆ˜ (tan h) ํ•จ์ˆ˜๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋•Œ ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜ Wx, Wh, Wy๋Š” ํ•˜๋‚˜์˜ ์ธต์—์„œ๋Š” ๋ชจ๋“  ์‹œ์ ์—์„œ์˜ ๊ฐ’์„ ๋™์ผํ•˜๊ฒŒ ๊ณต์œ ํ•˜์ง€๋งŒ?
  • ์€๋‹‰์ธต (hidden layer)๊ฐ€ 2๊ฐœ ์ด์ƒ์ด๋ฉด, ๊ฐ ์€๋‹‰์ธต (hidden layer)์—์„œ์˜ ๊ฐ€์ค‘์น˜๋Š” ์„œ๋กœ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

6. RNN ๊ตฌํ˜„ with Keras

TF(tensorflow)์™€ Keras ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•ด์„œ RNN์ธต์„ ์ถ”๊ฐ€ํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ํ•œ๋ฒˆ ์ž‘์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
import tensorflow as tf
  • ์ด ์ฝ”๋“œ๋Š” tf.keras.layers.SimpleRNN์„ ์ด์šฉํ•˜์—ฌ ๊ธฐ๋ณธ์ ์ธ RNN ์ธต์œผ๋กœ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ด layer๋Š” 128๊ฐœ์˜ ๋‰ด๋Ÿฐ(unit)์„ ๊ฐ€์ง€๋ฉฐ, ๊ธฐ๋ณธ์ ์œผ๋กœ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ NLP์—์„œ Embedding์ธต์€ ๋‹จ์–ด์˜ ์ •์ˆ˜ encoding์„ Vector๋กœ ๋ณ€ํ™˜ํ•  ๋•Œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
๐Ÿ’ก example
: 1000๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ๊ฐ€์ง„ ์–ดํœ˜ ์‚ฌ์ „๊ณผ ๊ฐ ๋‹จ์–ด๋ฅผ 32์ฐจ์›์˜ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์ฝ”๋“œ๋กœ ๋ณด๋ฉด ์•„๋ž˜ ์ฝ”๋“œ์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
model = tf.keras.Sequential([
	tf.keras.layers.Embedding(input_dim=1000, output_dim=32),
  • input_dim์€ Embedding ์ธต์— ๋„ฃ์–ด์ฃผ๋Š” parameter, ์˜ˆ์‹œ๋กœ๋Š” 1000๊ฐœ์˜ ์–ดํœ˜์‚ฌ์ „(vocab)์ด๋ฏ€๋กœ, 1000์„ ์ ์–ด์ค๋‹ˆ๋‹ค.
  • output_dim์€ input์œผ๋กœ ๋„ฃ์–ด์ค€ ์–ดํœ˜์‚ฌ์ „(vocab) & ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ถœ๋ ฅํ•ด์ค๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” 32์ฐจ์›์˜ Vector๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค๊ณ  ํ–ˆ์œผ๋‹ˆ๊นŒ, 32๋ฅผ ์ ์–ด์ค๋‹ˆ๋‹ค.
simple RNN์€ ๊ธฐ๋ณธ์ ์ธ RNN์ธต์œผ๋กœ, ์—ฌ๊ธฐ์„œ๋Š” 128๊ฐœ์˜ unit์„ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ํ–ˆ์œผ๋‹ˆ๊นŒ, ์•„๋ž˜ ์ฝ”๋“œ์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • SimpleRNN์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ tanh ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ, ๊ตณ์ด activation='tanh' ์„ ์จ์ค„ ํ•„์š”๋Š” ์—†์Šต๋‹ˆ๋‹ค.
tf.keras.layers.SimpleRNN(128),
SimpleRNN์˜ parameter๊ฐ€ ๋ฌด์—‡์ด ์žˆ๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
SimpleRNN(hidden_units, input_shape=(timesteps, input_dim))
  • hidden_units = hidden state์˜ ํฌ๊ธฐ๋ฅผ ์ •์˜. ๋ฉ”๋ชจ๋ฆฌ ์…€์ด ๋‹ค์Œ ์‹œ์ ์˜ ๋ฉ”๋ชจ๋ฆฌ ์…€๊ณผ ์ถœ๋ ฅ์ธต์œผ๋กœ ๋ณด๋‚ด๋Š” ๊ฐ’์˜ ํฌ๊ธฐ(output_dim)์™€๋„ ๋™์ผ. RNN ๋ชจ๋ธ์˜ ์šฉ๋Ÿ‰(capacity)์„ ๋Š˜๋ฆฐ๋‹ค๊ณ  ๋ณด๋ฉด ๋˜๋ฉฐ, ๋ณดํ†ต ์ž‘์€ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ณดํ†ต 128, 256, 512, 1024 ๋“ฑ์˜ ๊ฐ’์„ ๊ฐ€์ง„๋‹ค.
  • timesteps = input Sequence์˜ ๊ธธ์ด(input_length)๋ผ๊ณ  ํ‘œํ˜„ํ•˜๊ธฐ๋„ ํ•จ.
  • input_dim = ์ž…๋ ฅ์˜ ํฌ๊ธฐ.

RNN์ธต - 3D Tensor๋ฅผ input์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.

 

  • RNN ์ธต์€ (batch_size, timesteps, input_dim) ํฌ๊ธฐ์˜ 3D ํ…์„œ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ , ๋งŒ์•ฝ 10๊ฐœ์˜ Class๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋ฉด, Dense์ธต์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. - output layer
    tf.keras.layers.Dense(10, activation='softmax')
])
  • ๋งŒ์•ฝ์— 2๊ฐœ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ๋ฉด, unit ๊ฐ’์„ 2๊ฐœ๋กœ ํ•ด์ฃผ๊ณ , ํ™œ์„ฑํ™” ํ•จ์ˆ˜ (activation)ํ•จ์ˆ˜๋ฅผ 'binary'๋กœ ํ•ด์ฃผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ compile ํ•ด์ฃผ๋Š” ์ฝ”๋“œ๋ž‘ ๋ชจ๋ธ์˜ ์š”์•ฝ๊ฐ’์„ ์ถœ๋ ฅ ํ•ด์ฃผ๋Š” ์ฝ”๋“œ์„ ์ž‘์„ฑํ•ด์ค๋‹ˆ๋‹ค.
# ๋ชจ๋ธ ์ปดํŒŒ์ผ
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# ๋ชจ๋ธ ์š”์•ฝ ์ถœ๋ ฅ
model.summary()
  • ์‹คํ–‰ ํ•˜๋ฉด ๋ชจ๋ธ์— ๋Œ€ํ•œ ์š”์•ฝ ๊ฐ’์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding (Embedding)       (None, None, 32)          32000     
                                                                 
 simple_rnn (SimpleRNN)      (None, 128)               20608     
                                                                 
 dense (Dense)               (None, 10)                1290      
                                                                 
=================================================================
Total params: 53898 (210.54 KB)
Trainable params: 53898 (210.54 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
๊ทธ๋Ÿฌ๋ฉด RNN์ธต์€ ์–ด๋–ป๊ฒŒ 3D Tensor๋ฅผ ์ž…๋ ฅ๋ฐ›๊ณ , ์–ด๋–ป๊ฒŒ hidden state๋ฅผ ์ถœ๋ ฅํ• ๊นŒ์š”?
  • RNN์ธต์€ ์‚ฌ์šฉ์ž๊ฐ€ ์„ค์ •ํ•˜๋Š”๊ฒƒ์— ๋”ฐ๋ผ 2๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ์ถœ๋ ฅ์„ ๋งŒ๋“ค์–ด ๋ƒ…๋‹ˆ๋‹ค.
  • ๋งŒ์•ฝ Memory Cell์˜ ์ตœ์ข… ์‹œ์ ์—์„œ์˜ hidden state(์€๋‹‰ ์ƒํƒœ) ๊ฐ’๋งŒ return ํ•˜๋ ค๋ฉด
(batch_size, output_dim)
  • ์ด๋ ‡๊ฒŒ 2D Tensor๋งŒ return ํ•ฉ๋‹ˆ๋‹ค.

 

  • ๊ทธ๋Ÿฌ๋‚˜, Memory Cell์˜ ๊ฐ Time step(์‹œ์ )์˜ hidden state ๊ฐ’๋“ค์„ ๋ชจ์•„์„œ return ํ•˜๋ ค๋ฉด
(batch_size, timesteps, output_dim)
  • ์ด๋ ‡๊ฒŒ 3D Tensor๋ฅผ return ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๊ฒŒ ๊ฐ€๋Šฅํ•œ ์ด์œ ๋Š”, RNN ์ธต์˜ return_sequences parameter(๋งค๊ฐœ๋ณ€์ˆ˜)์— True๋ฅผ ์ง€์ •ํ•ด์„œ ์„ค์ •์„ ํ•˜์˜€๊ธฐ์— ๊ฐ€๋Šฅํ•œ ์ผ์ž…๋‹ˆ๋‹ค.
  • output_dim์€ hidden_unit์˜ ๊ฐ’์œผ๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.
๐Ÿ’ก example
์˜ˆ์‹œ๋ฅผ ํ•œ๋ฒˆ ๋“ค์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
time step=4์ผ ๋•Œ, return_sequences = True๋ฅผ ์„ค์ •ํ–ˆ์„ ๋•Œ์™€ ๊ทธ๋ ‡์ง€ ์•Š์•˜์„ ๋•Œ ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ๋Š”์ง€ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ขŒ: ๋‹ค์Œ์ธต์œผ๋กœ ๋งˆ์ง€๋ง‰ hidden state(์€๋‹‰ ์ƒํƒœ)๋งŒ ์ „๋‹ฌ, ์šฐ: ๋‹ค์Œ์ธต์œผ๋กœ ๋ชจ๋“   hidden state(์€๋‹‰ ์ƒํƒœ) ์ „๋‹ฌ

  • return_sequences = True๋ฅผ ์„ค์ •ํ•˜๋ฉด, Memory Cell์ด ๋ชจ๋“  time step(๋ชจ๋“  ์‹œ์ )์— ๋Œ€ํ•ด์„œ hidden state (์€๋‹‰ ์ƒํƒœ)๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, parameter๋ฅผ ์•ˆ์ ์—ˆ๊ฑฐ๋‚˜, return_sequences = False ์ด๋ฉด, Memory cell์€ ํ•˜๋‚˜์˜ ์€๋‹‰ ์ƒํƒœ ๊ฐ’๋งŒ์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ด ํ•˜๋‚˜์˜ ๊ฐ’์€ ์•ž์—์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด, ๋งˆ์ง€๋ง‰ ์‹œ์ (time step)์˜ Memory cell์˜ ์€๋‹‰ ์ƒํƒœ ๊ฐ’์ž…๋‹ˆ๋‹ค.
๋งˆ์ง€๋ง‰ hidden state (์€๋‹‰ ์ƒํƒœ)๋งŒ ์ „๋‹ฌํ•˜๋ฉด many-to-one (๋‹ค ๋Œ€ ์ผ)์ด๊ณ , ๋ชจ๋“  ์‹œ์ ์˜ hidden state (์€๋‹‰ ์ƒํƒœ)๋ฅผ ์ „๋‹ฌํ•˜๋ฉด, ๋‹ค์Œ์ธต์— RNN์˜ hidden layer๊ฐ€ ํ•˜๋‚˜ ๋” ์žˆ๋Š” ๊ฒฝ์šฐ๋‚˜, ์•„๋‹ˆ๋ฉด many-to-many (๋‹ค ๋Œ€ ๋‹ค) ๊ฒฝ์šฐ ์ž…๋‹ˆ๋‹ค.

RNN ๊ตฌํ˜„ - ๊นŠ์€ RNN Model ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

RNN์€ ๋‹ค์ˆ˜์˜ hidden layer(์€๋‹‰์ธต)์„ ๊ฐ€์งˆ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊นŠ์€ RNN Model, ์ถœ์ฒ˜: https://wikidocs.net/22886

  • ์œ„์˜ ์˜ˆ์‹œ๋Š” RNN ์—์„œ hidden layer๊ฐ€ 1๊ฐœ ๋” ์ถ”๊ฐ€๋˜์–ด ์€๋‹‰์ธต์ด 2๊ฐœ์ธ ๊นŠ์€(deep) ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง์˜ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ์•„๋ž˜ ์ฝ”๋“œ๋Š” ์€๋‹‰์ธต์„ 2๊ฐœ ์ถ”๊ฐ€ํ•˜๋Š” ์ฝ”๋“œ์ž…๋‹ˆ๋‹ค.
model = Sequential()
model.add(SimpleRNN(hidden_units, input_length=10, input_dim=5, return_sequences=True))
model.add(SimpleRNN(hidden_units, return_sequences=True))
  • ์ฒซ๋ฒˆ์งธ ์€๋‹‰์ธต ์ฝ”๋“œ๋Š” ๋‹ค์Œ ์€๋‹‰์ธต์ด ์กด์žฌํ•จ์œผ๋กœ, return_sequences = True๋ฅผ ์„ค์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ๋ชจ๋“  ์‹œ์ ์— ๋Œ€ํ•ด์„œ hidden state ๊ฐ’์„ ๋‹ค์Œ ์€๋‹‰์ธต์œผ๋กœ ๋ณด๋‚ด์ค๋‹ˆ๋‹ค.

RNN ๊ตฌํ˜„ - ์–‘๋ฐฉํ–ฅ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง (Bidrectional) Model ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

 t์—์„œ์˜ ์ถœ๋ ฅ๊ฐ’์„ ์˜ˆ์ธกํ•  ๋•Œ ์ด์ „ ์‹œ์ ์˜ ์ž…๋ ฅ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์ดํ›„ ์‹œ์ ์˜ ์ž…๋ ฅ ๋˜ํ•œ ์˜ˆ์ธก์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์–‘๋ฐฉํ–ฅ RNN์€ ๊ณผ๊ฑฐ ์‹œ์ ์˜ ์ž…๋ ฅ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ๋ฏธ๋ž˜ ์‹œ์ ์˜ ์ž…๋ ฅ์— ํžŒํŠธ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ ์ด์ „, ์ดํ›„์˜ ์‹œ์ ์„ ๊ณ ๋ คํ•ด์„œ ํ˜„์žฌ ์‹œ์ (time step)์˜ ์˜ˆ์ธก์„ ๋”์šฑ ์ •ํ™•ํ•˜๊ฒŒ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ ๋ชจ๋ธ์ด ์–‘๋ฐฉํ–ฅ RNN ์ž…๋‹ˆ๋‹ค.

์–‘๋ฐฉํ–ฅ RNN, ์ถœ์ฒ˜: https://wikidocs.net/22886

  • ์–‘๋ฐฉํ–ฅ RNN์€ ํ•˜๋‚˜์˜ ์ถœ๋ ฅ๊ฐ’์„ ์˜ˆ์ธก ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๊ธฐ๋ณธ์ ์œผ๋กœ 2๊ฐœ์˜ Memory Cell์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ฒซ๋ฒˆ์งธ Memory Cell์€ ์•ž ์‹œ์ ์˜ ์€๋‹‰ ์ƒํƒœ (Forward State)๋ฅผ ์ „๋‹ฌ๋ฐ›์•„ ํ˜„์žฌ์˜ hidden state(์€๋‹‰ ์ƒํƒœ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆผ์—์„œ๋Š” A' Memory Cell์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
  • ๋‘๋ฒˆ์งธ Memory Cell์€ ๋‘ ์‹œ์ ์˜ ์€๋‹‰ ์ƒํƒœ (Forward State)๋ฅผ ์ „๋‹ฌ๋ฐ›์•„ ํ˜„์žฌ์˜ hidden state(์€๋‹‰ ์ƒํƒœ)๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค
    • ๊ทธ๋ฆผ์—์„œ๋Š” A Memory Cell์— ํ•ด๋‹นํ•˜๋ฉฐ, input sequence๋ฅผ ๋ฐ˜๋Œ€๋กœ ์ฝ์Šต๋‹ˆ๋‹ค.
  • ์ด 2๊ฐœ์˜ Memory Cell์€ ํ˜„์žฌ ์‹œ์ ์—์„œ์˜ ์ถœ๋ ฅ์ธต์—์„œ ์ถœ๋ ฅ ๊ฐ’์„ ์˜ˆ์ธก ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
์•„๋ž˜ ์ฝ”๋“œ๋Š” ๋‹จ์ง€ ์˜ˆ์‹œ์ผ๋ฟ ์œ„์˜ ๊ทธ๋ฆผ์„ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•œ๊ฒƒ์ด ์•„๋‹™๋‹ˆ๋‹ค!
from tensorflow.keras.layers import Bidirectional

timesteps = 8
input_dim = 4

model = Sequential()
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True), input_shape=(timesteps, input_dim)))

 

์–‘๋ฐฉํ–ฅ RNN๋„ ์ผ๋ฐ˜ RNN ๋ชจ๋ธ์ฒ˜๋Ÿผ ๋‹ค์ˆ˜์˜ hidden layer(์€๋‹‰์ธต)์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์•„๋ž˜ ๊ทธ๋ฆผ์€ hidden layer(์€๋‹‰์ธต)๊ฐ€ 1๊ฐœ๊ฐ€ ์ถ”๊ฐ€๋˜์–ด์„œ 2๊ฐœ์ธ ๊นŠ์€ ์–‘๋ฐฉํ–ฅ RNN ๋ชจ๋ธ์˜ ๋ชจ์Šต์„ ๊ทธ๋ฆผ์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

๊นŠ์€(deep) ์–‘๋ฐฉํ–ฅ RNN Model, ์ถœ์ฒ˜: https://wikidocs.net/22886

 

์•„๋ž˜ ์ฝ”๋“œ๋Š” hidden layer (์€๋‹‰์ธต)์ด 2๊ฐœ์ธ ๊ฒฝ์šฐ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
model = Sequential()
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True), input_shape=(timesteps, input_dim)))
model.add(Bidirectional(SimpleRNN(hidden_units, return_sequences=True)))
  • ๊ทผ๋ฐ, hidden layer (์€๋‹‰์ธต)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค๊ณ  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ผญ ์ข‹์•„์ง€๋Š”๊ฑด ์•„๋‹™๋‹ˆ๋‹ค.
  • hidden layer (์€๋‹‰์ธต)๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” ์–‘๋„ ๋งŽ์•„์ง€์ง€๋งŒ, ๊ทธ์— ๋น„๋ก€ํ•˜์—ฌ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์…‹์˜ ์–‘๋„ ๋Š˜์–ด๋‚˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๊ฐ€ ์ž˜ ํŒ๋‹จํ•ด์„œ hidden layer (์€๋‹‰์ธต)๋ฅผ ๋Š˜๋ฆด์ง€, ์ค„์ผ์ง€ ํŒ๋‹จํ•ด์„œ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด์•ผ ํ•  ๊ฒƒ ์ž…๋‹ˆ๋‹ค.