A A
[NLP] Seq2Seq, Encoder & Decoder

1..sequence-to-sequence

๐Ÿ’ก ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๋ชจ๋ธ์€ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ(machine translation) ๋“ฑ ์‹œํ€€์Šค-ํˆฌ-์‹œํ€€์Šค(sequence-to-sequence) ๊ณผ์ œ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • sequence: ๋‹จ์–ด ๊ฐ™์€ ๋ฌด์–ธ๊ฐ€์˜ ๋‚˜์—ด์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ์—ฌ๊ธฐ์„œ sequence-to-sequence๋Š” ํŠน์ • ์†์„ฑ์„ ์ง€๋‹Œ ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ฅธ ์†์„ฑ์˜ ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ์ž‘์—…(Task) ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  sequence-to-sequence๋Š” RNN์—์„œ many-to-many ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”๋ฐ, RNN์€.. ์ถ”ํ›„์— ์„ค๋ช…ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
๐Ÿ’ก example
๊ธฐ๊ณ„ ๋ฒˆ์—ญ: ์–ด๋–ค ์–ธ์–ด(์†Œ์Šค ์–ธ์–ด, source language)์˜ ๋‹จ์–ด ์‹œํ€€์Šค๋ฅผ ๋‹ค๋ฅธ ์–ธ์–ด(๋Œ€์ƒ ์–ธ์–ด, target language)์˜ ๋‹จ์–ด ์‹œํ€€์Šค๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ณผ์ œ

๊ธฐ๊ณ„ ๋ฒˆ์—ญ์—์„œ์˜ sequence-to- sequence

  • source sequence ๊ธธ์ด (๋‹จ์–ด 6๊ฐœ), target sequence ๊ธธ์ด (๋‹จ์–ด 10๊ฐœ)๊ฐ€ ๋‹ค๋ฅด๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ฃผ์˜ ์‚ฌํ•ญ: sequence-to-sequence๋Š” source์™€ target์˜ ๊ธธ์ด๊ฐ€ ๋‹ฌ๋ผ๋„ ์ˆ˜ํ–‰ํ•˜๋Š”๋ฐ ์ง€์žฅ์ด ์—†์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
    • Source sequence, Target sequence์˜ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅด๋ฉด ๋ชจ๋ธ์„ ์‹คํ–‰ํ•˜๋Š”๋ฐ ๋ฌธ์ œ๊ฐ€ ์ƒ๊ธฐ๊ฑฐ๋‚˜ ์‹คํ–‰์ด ์•ˆ๋ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

2. Sequence-to-sequence ๊ธฐ๋ณธ์‚ฌํ•ญ

Sequence-to-sequence ๊ธฐ๊ณ„๋ฒˆ์—ญ์„ ํ• ๋•Œ input sequence๋ฅผ ๋„ฃ์–ด์„œ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์•ž์— 3-1์—์„œ ์„ค๋ช…ํ–ˆ๋“ฏ์ด, input sequence ์ˆ˜์‹์€ x1, x2, ..., xn, output sequence ์ˆ˜์‹์€ y1, y2, ..., yn ์ž…๋‹ˆ๋‹ค.
  • input, output sequence์˜ ๊ธธ์ด๋Š” ์„œ๋กœ ๋‹ค๋ฅผ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์—ฌ๊ธฐ์„œ ๋ฒˆ์—ญ(Translation)์€ ์ฃผ์–ด์ง„ input์—์„œ ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ(ํ™•๋ฅ )์ด ๋†’์€ ์ˆœ์„œ (์„œ์—ด)์„ ์ฐพ๋Š”๊ฒƒ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์˜ ์ˆ˜์‹ -> ์‚ฌ๋žŒ์ด ๋ฒˆ์—ญํ•˜๋Š” ๋Š๋‚Œ์„ ์ˆ˜์‹ํ™” ํ•˜๋ฉด: y* = argmax p(y|x)์ด๋‹ค.
  • ๊ทธ๋ ‡์ง€๋งŒ ๊ธฐ๊ณ„๋กœ ๋ฒˆ์—ญ์„ ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด ์ˆ˜์‹์€ p(y|x, ฮธ)[model๋ฅผ ์˜๋ฏธ] -> ์ด ์ˆ˜์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธฐ๊ณ„๋กœ ๋ฒˆ์—ญํ• ๋•Œ ์ด ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ argmax๋ฅผ ์ฐพ์Šต๋‹ˆ๋‹ค. 
    • *์—ฌ๊ธฐ์„œ ฮธ(์„ธํƒ€)๋Š” ์ผ๋ถ€ ๋งค๊ฐœ๋ณ€์ˆ˜(parameter) ์ž…๋‹ˆ๋‹ค.

์ถœ์ฒ˜: https://lena-voita.github.io/nlp_course/seq2seq_and_attention.html#enc_dec_framework

์—ฌ๊ธฐ์„œ ๊ธฐ๊ณ„๋กœ ๋ฒˆ์—ญ์„ ํ•˜๋Š” ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•ด๋ณด๋ ค๋ฉด 3๊ฐ€์ง€์— ๋Œ€ํ•˜์—ฌ ์•Œ๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Modeling Part: Model์€ p(y|x, ฮธ) ์ˆ˜์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์–ด๋–ป๊ฒŒ ๋™์ž‘ํ•˜๋Š”์ง€?
  • learning(ํ•™์Šต) Part: parameter ฮธ๋ฅผ ์–ด๋–ป๊ฒŒ ์ฐพ์„๊ฒƒ์ธ์ง€
  • inference part: y์— ๋Œ€ํ•œ ์ตœ์„ ์˜ ๋ฐฉ๋ฒ•์„ ์–ด๋–ป๊ฒŒ ์ฐพ์„๊ฒƒ์ธ์ง€?

3. Sequence-to-sequence ๊ตฌ์„ฑ

  • ํ•œ๋ฒˆ Sequence-to-sequence (seq2seq) ๋ชจ๋ธ์ด ์–ด๋–ป๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š”์ง€ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
Sequence-to-sequence (seq2seq)๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” Model๋“ค์€ ๋Œ€๊ฐœ ์ธ์ฝ”๋”(encoder) ์™€ ๋””์ฝ”๋”(decoder) ๋‘ ๊ฐœ ํŒŒํŠธ๋กœ ๊ตฌ์„ฑ์ด ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

Encoder

  • Encoder๋Š” ์ž…๋ ฅ ๋ฌธ์žฅ์˜ ๋ชจ๋“  ๋‹จ์–ด๋“ค (์ฆ‰, Source Sequence)์„ ์ˆœ์ฐจ์ ์œผ๋กœ ์ž…๋ ฅ๋ฐ›์€ ๋’ค์— ๋งˆ์ง€๋ง‰์— ์ด ๋ชจ๋“  ๋‹จ์–ด๋“ค์„ ์••์ถ•ํ•ด์„œ ํ•˜๋‚˜๋กœ Vectorํ™”๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ฅผ ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ(context vector)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์ž…๋ ฅ ๋ฌธ์žฅ์˜ ์ •๋ณด(Source Sequence)๊ฐ€ ํ•˜๋‚˜์˜ Context vector๋กœ ๋ชจ๋‘ ์••์ถ•์ด ๋œํ›„, Encoder๋Š” ์ปจํ…์ŠคํŠธ ๋ฒกํ„ฐ๋ฅผ Decoder๋กœ ๋ณด๋ƒ…๋‹ˆ๋‹ค.
  • Context Vector๋Š” ๋ณดํ†ต์€ ์ˆ˜๋ฐฑ๊ฐœ์˜ ์ด์ƒ์˜ ์ฐจ์›(dimension)์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

context-vector ์˜ˆ์‹œ. ์ถœ์ฒ˜: https://wikidocs.net/24996

Decoder

  • Decoder๋Š” context vector๋ฅผ ๋ฐ›์•„์„œ ๋ฒˆ์—ญ๋œ ๋‹จ์–ด๋ฅผ ํ•œ ๊ฐœ์”ฉ ์ˆœ์„œ๋Œ€๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

RNN, LSTM

์ด๋ฒˆ์—๋Š” Encoder, Decoder์˜ ์•ˆ์œผ๋กœ ํ•œ๋ฒˆ ๋“ค์–ด๊ฐ€๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

  • Encoder, Decoder๋Š” ๋ชจ๋‘ RNN ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์šฉ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ RNN(Recurrent Neural Network)์€ ์ˆœ์ฐจ์ ์ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
    • ๋˜ํ•œ Sequence ๋ฐ์ดํ„ฐ์˜ ๋‚ด๋ถ€ ์ƒํƒœ๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ ์ •๋ณด๋ฅผ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฐ๋ฐ, RNN์€ Sequence ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ๊ณผ๊ฑฐ์˜ ์ •๋ณด๋ฅผ ์ž˜ ๊ธฐ์–ตํ•˜์ง€ ๋ชปํ•œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค. -> aka. Vanishing Gradient (๋‹จ๊ธฐ ๊ธฐ์–ต ๋ฌธ์ œ)
    • Sequence ๋ฐ์ดํ„ฐ์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ์•ž์— ์žˆ๋Š” Sequence๋ฅผ ๊นŒ๋จน๋Š” ๊ฒฝ์šฐ...
  • ๊ทธ๋ž˜์„œ Encoder, Decoder์—๋Š” RNN์„ ๋ณ€ํ˜•ํ•œ LSTM์ด๋ผ๋Š” RNN์„ ๋ฒˆํ˜•ํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
LSTM (Long Short-Term Memory) - ๋‹ค๋ฅธ ๊ธ€์—์„œ ๋” ์ž์„ธํžˆ ์„ค๋ช… ํ•˜๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.. RNN์ด๋ž‘ ๊ฐ™์ด
  • Vanishing Gradient (๋‹จ๊ธฐ ๊ธฐ์–ต๋ฌธ์ œ)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ RNN์˜ ๋ฒˆํ˜• ๋ชจ๋ธ ์ž…๋‹ˆ๋‹ค.
  • ๋‹ฌ๋ผ์ง„ ๋ฐฉ์‹์€ RNN์—์„œ๋Š” ์€๋‹‰ ์ƒํƒœ(hidden state)๋งŒ ์‚ฌ์šฉ๋˜์—ˆ์ง€๋งŒ, LSTM์€ ์…€ ์ƒํƒœ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์žฅ๊ธฐ์ ์€ ์˜์กด์„ฑ์„ ๋” ์ž˜ ํ•™์Šตํ• ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค.Encoder, Decoder์˜ ๋‚ด๋ถ€๋Š” 2๊ฐœ์˜ RNN ์ž…๋‹ˆ๋‹ค.

LSTM Encoder, Decoder ๊ตฌ์กฐ (์ถœ์ฒ˜: https://wikidocs.net/24996)

  • input sequence(๋ฌธ์žฅ)์„ ๋ฐ›๋Š” RNN์…€์„ Encoder, output sequence(๋ฌธ์žฅ)์„ ์ถœ๋ ฅํ•˜๋Š” RNN์…€์„ Decoder๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์œ„์˜ ๊ทธ๋ฆผ์—์„œ๋Š” Encoder์˜ RNN ์…€์„ ์ดˆ๋ก์ƒ‰์œผ๋กœ, ๋””์ฝ”๋”์˜ RNN ์…€์„ ์ดˆ๋ก์ƒ‰์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, ๊ทธ๋ƒฅ RNN์…€๋กœ ๊ตฌ์„ฑํ•˜๋Š”๊ฒƒ์ด ์•„๋‹ˆ๊ณ , LSTM ์…€ ๋˜๋Š” GRU ์…€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

Encoder ์ˆ˜ํ–‰ ๊ณผ์ •

Encoder
  • Encoder๋Š” input sequence(๋ฌธ์žฅ)๋Š” ๋‹จ์–ด Tokenํ™”๋ฅผ ํ•ด์„œ ๋‹จ์–ด ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ๊ฐ ํ† ํฐ์€ RNN ์…€์˜ ๊ฐ ์‹œ์ ์˜ input์ด ๋ฉ๋‹ˆ๋‹ค.
  • Encoder RNN ์…€์€ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์ž…๋ ฅ๋ฐ›์€ ๋’ค์— ์ธ์ฝ”๋” RNN ์…€์˜ ๋งˆ์ง€๋ง‰ ์‹œ์ ์˜ ์€๋‹‰ ์ƒํƒœ(hidden state)๋ฅผ Decoder RNN ์…€, Context Vector๋กœ ๋„˜๊ฒจ์ค๋‹ˆ๋‹ค. 
  • Context Vector ๋Š” ๋””์ฝ”๋” RNN ์…€์˜ ์ฒซ๋ฒˆ์งธ ์€๋‹‰ ์ƒํƒœ์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
Context Vector
  • ์œ„์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด Encoder์˜ ๊ฐ๊ฐLSTM ๋ชจ๋ธ์— Text๋ฅผ Vectorํ™” ํ• ๋•Œ Word Embedding์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ฆ‰, seq2seq์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋ชจ๋“  ๋‹จ์–ด๋Š” Embedding Vectorํ™” ํ›„ LSTM ๋ชจ๋ธ์— input์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์•„๋ž˜ ๊ทธ๋ฆผ์€ ๋ชจ๋“  ๋‹จ์–ด์— ๋Œ€ํ•ด์„œ enbedding์„ ๊ฑฐ์น˜๊ฒŒ ํ•˜๋Š” ๋‹จ๊ณ„์ธ ์ž„๋ฒ ๋”ฉ ์ธต(embedding layer)์˜ ๋ชจ์Šต์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Embedding Vector ์˜ˆ์‹œ (์ถœ์ฒ˜: https://wikidocs.net/24996)

Encoder์—์„œ์˜ RNN ์…€

 

  • NLP์—์„œ Text๋ฅผ Vectorํ™” ํ• ๋•Œ, Word Embedding์ด ์‚ฌ์šฉ๋œ๋‹ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Embedding Vector๋Š” ์ˆ˜๋ฐฑ๊ฐœ์˜ ์ฐจ์›์„ ๊ฐ€์ง„๋‹ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค, ์•„๋ž˜ ๊ทธ๋ฆผ์—์„œ๋Š” Embedding Vector๋Š” 4๊ฐœ์ด์ง€๋งŒ ์›๋ž˜๋Š” ์ˆ˜๋ฐฑ๊ฐœ์˜ ์ฐจ์›์„ ๊ฐ€์งˆ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  RNN์˜ ์…€์€ time-step๋งˆ๋‹ค 2๊ฐœ์˜ input์„ ๋ฐ›์Šต๋‹ˆ๋‹ค.

RNN ์…€ ์˜ˆ์‹œ (์ถœ์ฒ˜: https://wikidocs.net/24996)

 

 

  • Time Step์„ t๋ผ๊ณ  ํ•˜๋ฉด, RNN ์…€์€ t-1์—์„œ hidden state (์€๋‹‰ ์ƒํƒœ)์™€ t์—์„œ input vector๋ฅผ input์œผ๋กœ ๋ฐ›๊ณ , t์—์„œ hidden state๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
  • ๊ทธ๋•Œ, t์—์„œ์˜ hidden state๋Š” ์œ„์˜ ๋‹ค๋ฅธ hidden layer(์€๋‹‰์ธต) or output layer(์ถœ๋ ฅ์ธต)์ด ์žˆ์œผ๋ฉด, ์œ„์˜ ์ธต์œผ๋กœ ๋ณด๋‚ด๊ฑฐ๋‚˜, ํ•„์š”์—†๋Š” ๊ฒฝ์šฐ๋Š” ๊ทธ๋ƒฅ ๋ฌด์‹œํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  RNN ์…€์€ t+1์˜ RNN์…€(๋‹ค์Œ์‹œ์ )์— input์œผ๋กœ ํ˜„์žฌ t์—์„œ์˜ hidden state๋ฅผ input์œผ๋กœ ๋ณด๋ƒ…๋‹ˆ๋‹ค.
์ด๋Ÿฐ ๊ตฌ์กฐ์—์„œ๋Š” t์˜ hidden state (์€๋‹‰ ์ƒํƒœ)๋Š” ๊ณผ๊ฑฐ ์‹œ์ ์—์„œ์˜ ๋™์ผํ•œ RNN ์…€์˜ ๋ชจ๋“  hidden state์˜ ๊ฐ’์˜ ์˜ํ–ฅ๋“ค์ด ๋ˆ„์ ๋œ ๊ฐ’์ด๋ผ๊ณ  ํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด Context Vector๋Š” Encoder์—์„œ ๋งˆ์ง€๋ง‰ RNN์…€์˜ hidden state๊ฐ’์„ ์˜๋ฏธํ•˜๋ฉฐ, ์ž…๋ ฅ๋ฐ›์€ ๋ฌธ์žฅ ๋“ค์˜ ๋‹จ์–ด Token์˜ ์š”์•ฝ๋œ ์ •๋ณด๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Decoder ์ˆ˜ํ–‰ ๊ณผ์ •

Decoder
  • Decoder๋Š”  RNN Language Model - RNNLM ์ž…๋‹ˆ๋‹ค.
  • ๊ฐœ๋…์„ ์„ค๋ช…๋“œ๋ฆฌ๋ฉด, Decoder๋Š” ์ดˆ๊ธฐ ์ž…๋ ฅ์œผ๋กœ ๋ฌธ์žฅ์„ ์‹œ์ž‘ํ• ๋•Œ <sos> ๋ผ๋Š” Token์ด ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.
  • <sos> ๊ฐ€ ์ž…๋ ฅ๋˜๋ฉด, ๋‹ค์Œ์— ๋“ฑ์žฅํ•  ๋‹จ์–ด์ค‘์—์„œ ํ™•๋ฅ ์ด ๋†’์€ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
Decoder์—์„œ์˜ RNN ์…€
  • Decoder๋Š” Encoder์—์„œ์˜ ๋งˆ์ง€๋ง‰ RNN์…€์˜ hidden state์ธ Context Vector๋ฅผ hidden state๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • Decoder์˜ ์ฒซ๋ฒˆ์งธ RNN ์…€์€ ์ด ์ฒซ๋ฒˆ์งธ hidden state๊ฐ’ ์ด๋ž‘ ํ˜„์žฌ Time-step t์—์„œ์˜ ์ž…๋ ฅ๊ฐ’์ธ <sos>๋กœ๋ถ€ํ„ฐ ๋‹ค์Œ์— ๋“ฑ์žฅํ•  ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • <sos> ๋‹ค์Œ์— ๋“ฑ์žฅํ•  ๋‹จ์–ด, ์ฆ‰ ์˜ˆ์ธกํ•œ ๋‹จ์–ด๋Š” t์˜ ๋‹ค์Œ ์‹œ์ ์ธ t+1 RNN์—์„œ์˜ input ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  t+1์—์„œ์˜ RNN ๋˜ํ•œ ์ด input๊ฐ’ t์—์„œ์˜ ์€๋‹‰ ์ƒํƒœ๋กœ๋ถ€ํ„ฐ t+1์—์„œ์˜ output vector -> ๋‹ค์Œ์— ๋“ฑ์žฅํ•  ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

Decoder์—์„œ ๋‹ค์Œ์— ๋“ฑ์žฅํ•  ๋‹จ์–ด ์˜ˆ์ธก ๋ถ€๋ถ„ ๊ทธ๋ฆผ&nbsp;(์ถœ์ฒ˜: https://wikidocs.net/24996)

  • Seq2seq ๋ชจ๋ธ์€ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋“  ๋‹จ์–ด๋“ค ์ค‘์—์„œ ํ•˜๋‚˜์˜ ๋‹จ์–ด๋ฅผ ๊ณจ๋ผ์„œ ์˜ˆ์ธก์„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์˜ˆ์ธก์„ softmax๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

Softmax ํ•จ์ˆ˜

  • ์†Œํ”„ํŠธ๋งฅ์Šค(Softmax) ํ•จ์ˆ˜๋Š” ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • ์ด ํ•จ์ˆ˜๋Š” Vector์˜ ๊ฐ’์„ ํ™•๋ฅ  ๋ถ„ํฌ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ํŠน์ง•์€ ๊ฐ ์š”์†Œ์˜ ๊ฐ’์€ 0~1 ์‚ฌ์ด์ด๋ฉฐ, Input Vector์˜ ๋ชจ๋“  ์š”์†Œ์˜ ๊ฐ’์€ 1์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  input Vector์˜ ๊ฐ€์žฅ ํฐ ๊ฐ’์ด output vector์—์„œ๋„ ๊ฐ€์žฅ ํฐ ํ™•๋ฅ ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

Softmax ํ•จ์ˆ˜์˜ ๊ทธ๋ž˜ํ”„ & ์ˆ˜์‹

 

Softmax๋ฅผ ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•ด์„œ Score์˜ ๊ฐ’์„ ํ™•๋ฅ ๊ฐ’์œผ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Score๊ฐ’์„ ํ™•๋ฅ ๊ฐ’ ๋ณ€ํ™˜ ์˜ˆ์‹œ

 

๋‹ค์‹œ Decoder๋กœ ๋Œ์•„์™€์„œ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • Decoder์—์„œ ๊ฐ time step์˜ RNN์…€์—์„œ Output Vector(๊ทธ๋ฆผ์—์„œ ์™ผ์ชฝ ๊ทธ๋ฆผ)๋ฅผ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ํ•ด๋‹น Output Vector๋Š” Softmax ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด์„œ Output Sequence์˜ ๊ฐ ๋‹จ์–ด๋ณ„ ํ™•๋ฅ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•˜๊ณ , ๊ทธ ํ™•๋ฅ ๊ฐ’์„ ์ด์šฉํ•ด์„œ Decoder๋Š” ์ถœ๋ ฅ ๋‹จ์–ด๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.