๐Ÿ“• Natural_Language_Processing

๐Ÿ“• Natural_Language_Processing

[NLP] RNNLM - RNN์„ ์‚ฌ์šฉํ•œ Language Model

RNNLM (RNN์„ ์‚ฌ์šฉํ•œ Language (์–ธ์–ด) ๋ชจ๋ธ)์ด๋ฒˆ์—๋Š” RNN์„ ์‚ฌ์šฉํ•˜์—ฌ Language Model(์–ธ์–ด ๋ชจ๋ธ)์„ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.๊ทธ ์ „์— ๋จผ์ € ์‚ฌ์šฉ๋˜๋Š” Neural Network(์‹ ๊ฒฝ๋ง)์„ ํ•œ๋ฒˆ ๋ณด๊ณ  ์‹œ์ž‘ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.์™ผ์ชฝ์€ RNNLM์˜ ๊ณ„์ธต ๊ตฌ์„ฑ์ด๊ณ , ์˜ค๋ฅธ์ชฝ์—๋Š” ์ด๋ฅผ ์‹œ๊ฐ„์ถ•์œผ๋กœ ํŽผ์นœ Neural Network(์‹ ๊ฒฝ๋ง)์ž…๋‹ˆ๋‹ค.๊ทธ๋ฆผ์˜ Embedding Layer(๊ณ„์ธต)์€ ๋‹จ์–ด ID์˜ ๋ถ„์‚ฐ ํ‘œํ˜„ (๋‹จ์–ด Vector)๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.๊ทธ๋ฆฌ๊ณ  ๊ทธ ๋ถ„์‚ฐ ํ‘œํ˜„์ด RNN Layer(RNN ๊ณ„์ธต)๋กœ ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.RNN ๊ณ„์ธต์€ Hidden State(์€๋‹‰ ์ƒํƒœ)๋ฅผ ๋‹ค์Œ Layer(์ธต)์œผ๋กœ ์ถœ๋ ฅํ•จ๊ณผ ๋™์‹œ์—, ๋‹ค์Œ ์‹œ๊ฐ์˜ RNN ๊ณ„์ธต(์˜ค๋ฅธ์ชฝ)์œผ๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.๊ทธ๋ฆฌ๊ณ  RNN ๊ณ„์ธต์ด ์œ„๋กœ ์ถœ๋ ฅํ•œ Hidden State(..

๐Ÿ“• Natural_Language_Processing

[NLP] BPTT (Backpropagation Through Time)

BPTT (Backpropagation Through Time)BPTT(Backpropagation Through Time)๋Š” ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN, Recurrent Neural Network)์˜ ํ•™์Šต์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” Backpropagation(์—ญ์ „ํŒŒ) ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ํ™•์žฅ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค.์—ฌ๊ธฐ์„œ์˜ Backpropagation(์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•)์€?'์‹œ๊ฐ„ ๋ฐฉํ–ฅ์œผ๋กœ ํŽผ์นœ ์‹ ๊ฒฝ๋ง์˜ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•' ์ด๋ž€ ๋œป์œผ๋กœ BPTT(Backpropagation Through Time)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.์ด BPTT๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด RNN์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. RNN์— ๊ด€ํ•œ ๊ฐœ๋…์€ ์•„๋ž˜์˜ ๊ธ€์— ์ ์–ด๋†“์•˜์œผ๋‹ˆ ์ฐธ๊ณ ํ•ด ์ฃผ์‹œ๋ฉด ๋ ๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค. [DL] RNN (Recurrent Netural Network) - ์ˆœํ™˜์‹ ๊ฒฝ๋ง1. RNN ์ด๋ž€?RNN์€ Sequ..

๐Ÿ“• Natural_Language_Processing

[NLP] ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• & Neural Network (์‹ ๊ฒฝ๋ง)

์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•๊ณผ Neural Network(์‹ ๊ฒฝ๋ง)์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ์ ๋‹จ์–ด๋ฅผ Vector๋กœ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ตœ๊ทผ์—๋Š” ํฌ๊ฒŒ ๋‘ ๋ถ€๋ฅ˜๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. 'ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•'๊ณผ '์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•' ์ž…๋‹ˆ๋‹ค.๋‘ ๋ฐฉ๋ฒ•์ด ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ์–ป๋Š” ๋ฐฉ์‹์€ ์„œ๋กœ ๋‹ค๋ฅด์ง€๋งŒ, ๊ทธ ๋ฐฐ๊ฒฝ์—๋Š” ๋ชจ๋‘ ๋ถ„ํฌ ๊ฐ€์„ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์—์„œ๋Š” ์ฃผ๋ณ€ ๋ฐ˜์–ด์˜ ๋นˆ๋„๋ฅผ ๊ธฐ์ดˆ๋กœ ๋‹จ์–ด๋ฅผ ํ‘œํ˜„ ํ–ˆ์Šต๋‹ˆ๋‹ค.๊ตฌ์ฒด์ ์œผ๋กœ๋Š” ๋‹จ์–ด์˜ Co-Occurance Matrix(๋™์‹œ ๋ฐœ์ƒ ํ–‰๋ ฌ)์„ ๋งŒ๋“ค๊ณ  ๊ทธ ํ–‰๋ ฌ์— ํŠน์ž‡๊ฐ’๋ถ„ํ•ด(Singular Value Decomposition, SVD)๋ฅผ ์ ์šฉํ•˜์—ฌ ๋ฐ€์ง‘๋ฒกํ„ฐ๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค.๊ทธ๋Ÿฌ๋‚˜, ์ด ๋ฐฉ์‹์€ ๋Œ€๊ทœ๋ชจ Corpus(๋ง๋ญ‰์น˜)๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์ผ๋‹จ, ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ..

๐Ÿ“• Natural_Language_Processing

[NLP] ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ• ๊ฐœ์„ ํ•˜๊ธฐ

์•ž์— ๊ธ€, Thesaurus(์‹œ์†Œ๋Ÿฌ์Šค), Co-occurence Matrix(๋™์‹œ๋ฐœ์ƒ ํ–‰๋ ฌ)๋ถ€๋ถ„์—์„œ ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ–ˆ์Šต๋‹ˆ๋‹ค.Thesaurus(์‹œ์†Œ๋Ÿฌ์Šค), Co-occurence Matrix(๋™์‹œ๋ฐœ์ƒ ํ–‰๋ ฌ) ๊ธ€์ž…๋‹ˆ๋‹ค. ์ง€๊ธˆ ๋‚ด์šฉ๊ณผ ์—ฐ๊ฒฐ๋˜๋Š” ๊ธ€์ด๋‹ˆ๊นŒ ํ•œ๋ฒˆ ์ฝ์–ด๋ณด์„ธ์š”. [NLP] Thesaurus(์‹œ์†Œ๋Ÿฌ์Šค), Co-occurence Matrix(๋™์‹œ๋ฐœ์ƒ ํ–‰๋ ฌ)์˜ค๋žœ๋งŒ์— NLP ๊ด€๋ จ ๊ธ€์„ ์“ฐ๋„ค์š”.. ์‹œ๊ฐ„ ๋‚˜๋Š”๋Œ€๋กœ ์—ด์‹ฌํžˆ ์“ฐ๊ณ  ์˜ฌ๋ ค ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. Thesaursus - ์‹œ์†Œ๋Ÿฌ์Šค์‹œ์†Œ๋Ÿฌ์Šค(Thesaurus)๋Š” ๋‹จ์–ด์™€ ๊ทธ ์˜๋ฏธ๋ฅผ ์—ฐ๊ฒฐ์‹œ์ผœ์ฃผ๋Š” ๋„๊ตฌ์ž…๋‹ˆ๋‹ค.์ฃผ๋กœ ํŠน์ • ๋‹จ์–ด์™€ ์˜๋ฏธdaehyun-bigbread.tistory.com Pointwise Mutual Information (PMI) - ์ ๋ณ„ ์ƒํ˜ธ์ •..

๐Ÿ“• Natural_Language_Processing

[NLP] Thesaurus(์‹œ์†Œ๋Ÿฌ์Šค), Co-occurence Matrix(๋™์‹œ๋ฐœ์ƒ ํ–‰๋ ฌ)

์˜ค๋žœ๋งŒ์— NLP ๊ด€๋ จ ๊ธ€์„ ์“ฐ๋„ค์š”.. ์‹œ๊ฐ„ ๋‚˜๋Š”๋Œ€๋กœ ์—ด์‹ฌํžˆ ์“ฐ๊ณ  ์˜ฌ๋ ค ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. Thesaursus - ์‹œ์†Œ๋Ÿฌ์Šค์‹œ์†Œ๋Ÿฌ์Šค(Thesaurus)๋Š” ๋‹จ์–ด์™€ ๊ทธ ์˜๋ฏธ๋ฅผ ์—ฐ๊ฒฐ์‹œ์ผœ์ฃผ๋Š” ๋„๊ตฌ์ž…๋‹ˆ๋‹ค.์ฃผ๋กœ ํŠน์ • ๋‹จ์–ด์™€ ์˜๋ฏธ์ ์œผ๋กœ ์œ ์‚ฌํ•œ ๋‹จ์–ด(๋™์˜์–ด)์™€ ๋ฐ˜๋Œ€ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง„ ๋‹จ์–ด(๋ฐ˜์˜์–ด)๋ฅผ ์ œ๊ณตํ•˜์—ฌ, ๊ธ€์„ ์“ฐ๊ฑฐ๋‚˜ ๋ง์„ ํ•  ๋•Œ ๋‹ค์–‘ํ•œ ํ‘œํ˜„์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ๋•์Šต๋‹ˆ๋‹ค.๋‹ค๋ฅธ ์˜๋ฏธ๋กœ ๋งํ•˜๋ฉด, ์œ ์˜์–ด ์‚ฌ์ „์œผ๋กœ '๋œป์ด ๊ฐ™์€ ๋‹จ์–ด(๋™์˜์–ด)'๋‚˜ '๋œป์ด ๋น„์Šทํ•œ ๋‹จ์–ด(์œ ์˜์–ด)'๊ฐ€ ํ•œ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„๋ฅ˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.๋˜ํ•œ NLP์—์„œ ์ด์šฉ๋˜๋Š” ์‹œ์†Œ๋Ÿฌ์Šค์—์„œ๋Š” ๋‹จ์–ด ์‚ฌ์ด์˜ '์ƒ์œ„, ํ•˜์œ„' ํ˜น์€ '์ „์ฒด, ๋ถ€๋ถ„'๋“ฑ ๋” ์„ธ์„ธํ•œ ๊ด€๊ณ„๊นŒ์ง€ ์ •์˜ํ•ด๋‘” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.์˜ˆ๋ฅผ ๋“ค์–ด์„œ ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„ ์ฒ˜๋Ÿผ ๊ด€๊ณ„๋ฅผ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.์ด์ฒ˜๋Ÿผ ๋ชจ๋“  ๋‹จ์–ด์— ๋ฐํ•œ ์œ ์˜์–ด ์ง‘ํ•ฉ์„ ๋งŒ..

๐Ÿ“• Natural_Language_Processing

[NLP] Transformer Model - ํŠธ๋žœ์Šคํฌ๋จธ ๋ชจ๋ธ ์•Œ์•„๋ณด๊ธฐ

์ด๋ฒˆ ๊ธ€์—์„œ๋Š” Transformer ๋ชจ๋ธ์˜ ์ „๋ฐ˜์ ์ธ Architecture ๋ฐ ๊ตฌ์„ฑ์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Transformer: Attention is All You Need Transformer ๋ชจ๋ธ์€ 2017๋…„์— "Attention is All You Need"๋ผ๋Š” ๋…ผ๋ฌธ์„ ํ†ตํ•ด์„œ ์†Œ๊ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์š”ํ•œ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” "Self-Attention" ์ด๋ผ๋Š” ๋งค์ปค๋‹ˆ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ, ๋ฌธ์žฅ ๋‚ด์˜ ๋ชจ๋“  ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํ•œ ๋ฒˆ์— ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์— ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์˜ ์„ค๋ช…ํ–ˆ๋˜ RNN(Recurrent Neural Network), LSTM(Long Short-Term Memory)๊ณผ ๊ฐ™์€ ์ˆœ์ฐจ์ ์ธ Model์ด ๊ฐ€์ง„ ์ˆœ์ฐจ์  ์ฒ˜๋ฆฌ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ–ˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ˜„์žฌ Transformer ๋ชจ๋ธ..

๐Ÿ“• Natural_Language_Processing

[NLP] ํ•ฉ์„ฑ๊ณฑ, ์ˆœํ™˜์‹ ๊ฒฝ๋ง, Encoder, Decoder์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” Self-Attention

์ „์— ์ผ๋˜ ๋‚ด์šฉ์— ์ด์–ด์„œ ์จ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง (CNN Model)๊ณผ ๋น„๊ตํ•œ Self-Attention CNN์€ *Convolution filter(ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ)๋ผ๋Š” ํŠน์ˆ˜ํ•œ ์žฅ์น˜๋ฅผ ์ด์šฉํ•ด์„œ Sequence์˜ ์ง€์—ญ์ ์ธ ํŠน์ง•์„ ์žก์•„๋‚ด๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ Convolution filter(ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ)๋Š” ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•˜๋Š” ํ•˜๋‚˜์˜ ์š”์†Œ-ํ•„ํ„ฐ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ „์ฒด์ ์œผ๋กœ ํ›‘์œผ๋ฉด์„œ ์ธ์ ‘ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ž์—ฐ์–ด๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ Sequence(๋‹จ์–ด ํ˜น์€ ํ˜•ํƒœ์†Œ์˜ ๋‚˜์—ด)์ด๊ณ  ํŠน์ • ๋‹จ์–ด ๊ธฐ์ค€ ์ฃผ๋ณ€ ๋ฌธ๋งฅ์ด ์˜๋ฏธ ํ˜•์„ฑ์— ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์œผ๋ฏ€๋กœ, CNN์ด ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ์— ๋„๋ฆฌ ์“ฐ์ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์œ„์˜ ๊ทธ๋ฆผ์€ CNN ๋ฌธ์žฅ์˜ Encoding ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. Convolution filter(ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ)๊ฐ€ ..

๐Ÿ“• Natural_Language_Processing

[NLP] Attention - ์–ดํ…์…˜

1. Attention Attention์€ CS ๋ฐ ML์—์„œ ์ค‘์š”ํ•œ ๊ฐœ๋…์ค‘ ํ•˜๋‚˜๋กœ ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค. Attention์˜ ๋งค์ปค๋‹ˆ์ฆ˜์€ ์ฃผ๋กœ Sequence Data๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ฑฐ๋‚˜ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. -> Sequence ์ž…๋ ฅ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ํ•™์Šต ๋ฐฉ๋ฒ•์˜ ์ผ์ข… Attention์˜ ๊ฐœ๋…์€ Decoder์—์„œ ์ถœ๋ ฅ์„ ์˜ˆ์ธกํ•˜๋Š” ๋งค์‹œ์ (time step)๋งˆ๋‹ค, Encoder์—์„œ์˜ ์ „์ฒด์˜ ์ž…๋ ฅ ๋ฌธ์žฅ์„ ๋‹ค์‹œ ํ•œ๋ฒˆ ์ฐธ๊ณ ํ•˜๊ฒŒ ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋‹จ, ์ „์ฒด ์ž…๋ ฅ ๋ฌธ์žฅ์„ ์ „๋ถ€ ๋‹ค ์ข…์ผํ•œ ๋น„์œจ๋กœ ์ฐธ๊ณ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ํ•ด๋‹น ์‹œ์ ์—์„œ ์˜ˆ์ธกํ•ด์•ผ ํ•  ์š”์†Œ์™€ ์—ฐ๊ด€์ด ์žˆ๋Š” ์ž…๋ ฅ ์š”์†Œ ๋ถ€๋ถ„์„ Attention(์ง‘์ค‘)ํ•ด์„œ ๋ณด๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์ด ๋ฌธ๋งฅ์„ ํŒŒ์•…ํ•˜๋Š” ํ•ต์‹ฌ์˜ ๋ฐฉ๋ฒ•์ด๋ฉฐ, ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์„ DL(๋”ฅ๋Ÿฌ๋‹)๋ชจ๋ธ์— ์ ์šฉํ•œ๊ฒƒ์ด 'Attent..

๐Ÿ“• Natural_Language_Processing

[NLP] Word Embedding - ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ

1. Word Embedding? Word Embedding, ์›Œ๋“œ์ž„๋ฒ ๋”ฉ ์ด๋ž€? ํ…์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์น˜ํ˜• ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ ๋งํ•˜๋ฉด Text๋‚ด์˜ ๋‹จ์–ด๋“ค์„ ์ปดํ“จํ„ฐ๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” Vector์˜ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๋‹จ์–ด๋ฅผ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์˜ ์ €์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Word Embedding ๊ณผ์ •์„ ๊ฑฐ์นœ Vector๋Š” ๋‹จ์–ด์˜ ์˜๋ฏธ(mean), ๋ฌธ๋งฅ(context), ์œ ์‚ฌ์„ฑ(similar) ๋“ฑ์„ ์ˆ˜์น˜ํ™” ํ•ด์„œ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์›Œ๋“œ ์ž„๋ฒ ๋”ฉ์˜ ๊ณผ์ •์€ ํฌ๊ฒŒ ๋ณด๋ฉด 2๊ฐ€์ง€์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค. 2. Word Embedding์˜ ๋ฐฉ๋ฒ• Word Embedding์˜ ๋ฐฉ๋ฒ•์€ ํฌ๊ฒŒ ๋ณด๋ฉด 2๊ฐ€์ง€์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ์ด๋ฃจ์–ด ์ง„๋‹ค๊ณ  ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” Count๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•, ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ์˜ˆ์ธก ๊ธฐ..

๐Ÿ“• Natural_Language_Processing

[NLP] Word2Vec, CBOW, Skip-Gram - ๊ฐœ๋… & Model

1. What is Word2Vec? Word2Vec์€ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜ํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ์ธ๊ธฐ์žˆ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋‹จ์–ด๋Š” ๋ณดํ†ต 'Token' ํ† ํฐ ์ž…๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹จ์–ด(Token)๋“ค ์‚ฌ์ด์˜ ์˜๋ฏธ์  ๊ด€๊ณ„๋ฅผ Vector ๊ณต๊ฐ„์— ์ž˜ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šตํ•˜๋Š” ๋น„์ง€๋„๋ฐฉ์‹(Unsupervised learning)์œผ๋กœ ์„ค๊ณ„ํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค. ์ฃผ๋ณ€ ๋‹จ์–ด๋“ค(๋ฌธ๋งฅ)์„ ํ†ตํ•ด์„œ ๊ฐ ๋‹จ์–ด๋“ค์„ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜, ๋ฐ˜๋Œ€๋กœ ๊ฐ ๋‹จ์–ด๋“ค์„ ํ†ตํ•ด ์ฃผ๋ณ€์˜ ๋‹จ์–ด๋“ค์„ ๋ณด๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋น„์œ  ํ•˜์ž๋ฉด ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šตํ•˜๋“ฏ, ๋‹จ์–ด๋ฅผ Vector๋กœ ๋ณด๊ณ  ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ Word2Vec์€ ๋‹จ์–ด๋“ค ์‚ฌ์ด์˜ ์˜๋ฏธ์ ์ธ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ , ์œ„์˜ ๊ทธ๋ฆผ์— ์žˆ๋Š” ๋ฌธ์žฅ์„ ์ด์šฉํ•ด ๋ชจ๋ธ์„ ํ•™์Šต ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ ๊ฐ ๋‹จ์–ด(Token..

Bigbread1129
'๐Ÿ“• Natural_Language_Processing' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก