๐Ÿ–ฅ๏ธ Deep Learning

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Quantization(์–‘์žํ™”), LoRA & QLoRA

์ด๋ฒˆ์—๋Š” Quantization(์–‘์žํ™”), LoRA & QLoRA์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.Quantization(์–‘์žํ™”)? Quantization(์–‘์žํ™”)๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ ๋กœ, ๊ณ ์ • ์†Œ์ˆ˜์ (fixed-point) ์ˆซ์ž ํ‘œํ˜„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜์™€ ํ™œ์„ฑํ™” ๊ฐ’์„ ํ‘œํ˜„ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด๋Š” ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ์„ ์ตœ๋Œ€ํ•œ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค.Quantization(์–‘์žํ™”)์˜ ํ•„์š”์„ฑ๋”ฅ๋Ÿฌ๋‹์—์„œ Quantization(์–‘์žํ™”)๊ฐ€ ํ•„์š”ํ•œ ์ด์œ ์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.๋ฉ”๋ชจ๋ฆฌ ์ ˆ๊ฐ: ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ ˆ๊ฐํ•ฉ๋‹ˆ๋‹ค.์—ฐ์‚ฐ ์†๋„ ํ–ฅ์ƒ: ๊ณ ์ • ์†Œ์ˆ˜์  ์—ฐ์‚ฐ์€ ๋ถ€๋™ ์†Œ์ˆ˜์  ์—ฐ์‚ฐ๋ณด๋‹ค ๋น ๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ์ถ”๋ก  ์†๋„๊ฐ€ ๋นจ๋ผ์ง‘๋‹ˆ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Model Distillation (๋ชจ๋ธ ์ฆ๋ฅ˜)

์ด๋ฒˆ์—๋Š” Model Distillation (๋ชจ๋ธ ์ฆ๋ฅ˜)์— ๊ด€ํ•œ ๋‚ด์šฉ์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์„ค๋ช…ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.What is Model Distillation (๋ชจ๋ธ ์ฆ๋ฅ˜)? ๋ชจ๋ธ ์ฆ๋ฅ˜๋Š” ๋”ฅ๋Ÿฌ๋‹์—์„œ ํฐ ๋ชจ๋ธ์˜ ์ง€์‹์„ ์ž‘์€ ๋ชจ๋ธ๋กœ ์ „๋‹ฌํ•˜์—ฌ, ์ž‘์€ ๋ชจ๋ธ์ด ํฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ•œ ๋ชจ๋ฐฉํ•˜๋„๋ก ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.์ด ๋ฐฉ๋ฒ•์€ ํฐ ๋ชจ๋ธ(๊ต์‚ฌ ๋ชจ๋ธ, Teacher Model)์ด ๋ณต์žกํ•œ ํŒจํ„ด๊ณผ ์ง€์‹์„ ์ด๋ฏธ ํ•™์Šตํ•œ ์ƒํƒœ์—์„œ, ๊ทธ ์ง€์‹์„ ์ž‘์€ ๋ชจ๋ธ(ํ•™์ƒ ๋ชจ๋ธ, Student Model)์— ์ „๋‹ฌํ•˜์—ฌ ๋” ํšจ์œจ์ ์ธ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•ฉ๋‹ˆ๋‹ค. Model Distillation (๋ชจ๋ธ ์ฆ๋ฅ˜)์˜ ํ•„์š”์„ฑ๊ทธ๋Ÿฌ๋ฉด Model Distillation (๋ชจ๋ธ ์ฆ๋ฅ˜)๊ฐ€ ํ•„์š”ํ•œ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”?์ž์› ํšจ์œจ์„ฑ: ๋Œ€ํ˜• ๋ชจ๋ธ์€ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์ด์ง€๋งŒ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Finetuning (ํŒŒ์ธํŠœ๋‹)

์ด๋ฒˆ์—๋Š” Fine-Tuning (ํŒŒ์ธํŠœ๋‹)์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํŒŒ์ธํŠœ๋‹(Finetuning)์€ ๋”ฅ๋Ÿฌ๋‹๊ณผ LLM ๋ชจ๋‘์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ, ์ด๋ฏธ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ํŠน์ • ์ž‘์—…์— ๋งž๊ฒŒ ์ถ”๊ฐ€ ํ•™์Šต์‹œํ‚ค๋Š” ๊ณผ์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ธํŠœ๋‹์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋™์ผํ•œ ์›๋ฆฌ๋ฅผ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค. Fine-Tuning?  Fine-Tuning(ํŒŒ์ธํŠœ๋‹)์€ ์‚ฌ์ „ ํ•™์Šต(Pre-training)๋œ ๋ชจ๋ธ์„ ํŠน์ •ํ•œ ์ž‘์—…์— ๋งž๊ฒŒ ์ถ”๊ฐ€๋กœ ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.์‚ฌ์ „ ํ•™์Šต ๋‹จ๊ณ„์—์„œ๋Š” ๋ชจ๋ธ์ด ์ผ๋ฐ˜์ ์ธ ํŒจํ„ด์„ ํ•™์Šตํ•˜๊ณ , ํŒŒ์ธํŠœ๋‹ ๋‹จ๊ณ„์—์„œ๋Š” ์ด ๋ชจ๋ธ์„ ํŠน์ •ํ•œ ๋ชฉ์ ์— ๋งž์ถฐ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.์ด๋ฏธ ์‚ฌ์ „ ํ•™์Šต์„ ํ†ตํ•ด ํ•™์Šต๋œ ๋ชจ๋ธ์˜ ์ผ๋ฐ˜์ ์ธ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ, ์ƒˆ๋กœ์šด ์ž‘์—…์ด๋‚˜ ๋ฐ์ดํ„ฐ์…‹์— ๋น ๋ฅด๊ฒŒ ์ ์‘ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.Pre-Training vs Fine..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Deep Learning Model Optimization (๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™”)

Deep Learning Model Optimization (๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™”) ๊ธฐ๋ฒ•์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋‹ค์–‘ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ตœ์ ํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ชจ๋ธ์˜ ํ•™์Šต ๊ณผ์ •์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋ฉฐ, ์ ์ ˆํ•œ ์„ค์ •์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์•„๋ž˜๋Š” ์ฃผ์š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๊ทธ ์„ค์ • ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ƒ์„ธํ•œ ์ •๋ฆฌ์ž…๋‹ˆ๋‹ค. ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์ •๋ฆฌ1. ํ•™์Šต๋ฅ  (Learning Rate)์ •์˜: ํ•™์Šต๋ฅ ์€ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์—์„œ ๊ฐ€์ค‘์น˜๊ฐ€ ์—…๋ฐ์ดํŠธ๋˜๋Š” ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.์˜ํ–ฅ:๋†’์€ ํ•™์Šต๋ฅ : ํ•™์Šต ์†๋„๊ฐ€ ๋นจ๋ผ์งˆ ์ˆ˜ ์žˆ์ง€๋งŒ, ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ์ตœ์†Ÿ๊ฐ’์— ๋„๋‹ฌํ•˜์ง€ ๋ชปํ•˜๊ณ  ๋ฐœ์‚ฐํ•  ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.๋‚ฎ์€ ํ•™์Šต๋ฅ : ํ•™์Šต ์†๋„๋Š” ๋Š๋ ค์ง€์ง€๋งŒ, ์†์‹ค ํ•จ์ˆ˜๊ฐ€ ๋” ์•ˆ์ •์ ์œผ๋กœ ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Transfer Learning - ์ „์ด ํ•™์Šต

Transfer Learning, ์ฆ‰ ์ „์ด ํ•™์Šต์€ ML(๋จธ์‹  ๋Ÿฌ๋‹)๊ณผ DL(๋”ฅ๋Ÿฌ๋‹)์—์„œ ๊ธฐ์กด์˜ Pre-Training ๋œ ๋ชจ๋ธ์„ ์ƒˆ๋กœ์šด ์ž‘์—…์— ์žฌ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ํŠนํžˆ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์— ์ ์šฉํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.์ „์ด ํ•™์Šต์€ ๋ชจ๋ธ์ด ์‚ฌ์ „ ํ•™์Šตํ•œ ์ง€์‹์„ ์ƒˆ๋กœ์šด ๋ฌธ์ œ์— ์ ์šฉํ•˜์—ฌ ํ•™์Šต ์†๋„๋ฅผ ๋†’์ด๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.Transfer Learning (์ „์ด ํ•™์Šต)๊ธฐ์กด์˜ Neural Network(์‹ ๊ฒฝ๋ง)์—์„œ ์ตœ์ƒ์œ„ ๋ถ€๋ถ„์„ ์ƒˆ๋กœ ์ •์˜ํ•œ ๋‹ค์Œ, ์ด ๋ถ€๋ถ„์„ Training ์‹œํ‚ค๋Š” ๊ฒƒ์ด Transfer Learning (์ „์ด ํ•™์Šต) ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.์ด๋•Œ Neural Network(์‹ ๊ฒฝ๋ง)์˜ ํ•˜์œ„ ๋ถ€๋ถ„์€ ์ด๋ฏธ Training๋œ Neural Network(์‹ ๊ฒฝ๋ง)์„ ์‚ฌ์šฉํ•˜..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] ๋Œ€ํ‘œ์ ์ธ CNN Network - LeNet 5, AlexNet, ZFNet, VGGNet, GoogLeNet, ResNet

์ด๋ฒˆ๊ธ€์—์„œ๋Š” ๋‹ค์–‘ํ•œ CNN ๋„คํŠธ์›Œํฌ์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. LeNet 5LeNet-5๋Š” ๊ธฐ๋ณธ์ ์ธ CNN ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ •์˜ํ•˜๋ฉฐ, ํ˜„์žฌ์˜ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ์ดˆ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.์ฃผ๋กœ ์†๊ธ€์”จ ์ˆซ์ž ์ธ์‹(MNIST ๋ฐ์ดํ„ฐ์…‹) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ์œผ๋ฉฐ, ๋˜ํ•œ ํ˜„๋Œ€ CNN์˜ ๊ธฐ์ดˆ๋ฅผ ๋งˆ๋ จํ•œ ๋ชจ๋ธ๋กœ ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค.LeNet-5๋Š” ์ด 7๊ฐœ์˜ ๋ ˆ์ด์–ด(์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ํฌํ•จ)๋กœ ๊ตฌ์„ฑ๋œ ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.LeNet-5์˜ ๊ตฌ์กฐ๋Š” ํฌ๊ฒŒ ๋‘ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.Convolutional Neural Network (CNN)Fully Connected Network (FCN)๊ฐ ๋ ˆ์ด์–ด๋Š” ํŠน์ •ํ•œ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, Convolutional Neural Network(CNN - ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด)์™€ ์„œ๋ธŒ์ƒ˜ํ”Œ๋ง ๋ ˆ์ด์–ด(Pooling Layer)๋ฅผ ๊ต๋Œ€๋กœ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Convolution & Pooling Layer ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

์ด๋ฒˆ์—๋Š” Convolution Layer, Pooling Layer๋ฅผ ํ•œ๋ฒˆ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Convolution & Pooling Layer ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ4-Dimension Array (4์ฐจ์› ๋ฐฐ์—ด)Convolution Neural Network(CNN)์—์„œ Layer ์‚ฌ์ด๋ฅผ ํ๋ฅด๋Š” ๋ฐ์ดํ„ฐ๋Š” 4์ฐจ์›์ž…๋‹ˆ๋‹ค.์˜ˆ๋ฅผ ๋“ค์–ด์„œ ๋ฐ์ดํ„ฐ์˜ ํ˜•์ƒ์ด (10, 1, 28, 28)์ด๋ฉด?Height(๋†’์ด): 28, Width(๋„ˆ๋น„): 28, Channel(์ฑ„๋„): 1๊ฐœ์ธ ๋ฐ์ดํ„ฐ๊ฐ€ 10๊ฐœ๋ผ๋Š” ์ด์•ผ๊ธฐ ์ž…๋‹ˆ๋‹ค.์ด๋ฅผ Python์œผ๋กœ ๊ตฌํ˜„ํ•˜๋ฉด ์•„๋ž˜์˜ ์ฝ”๋“œ์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.x = np.random.rand(10, 1, 28, 28) # ๋ฌด์ž‘์œ„๋กœ ๋ฐ์ดํ„ฐ ์ƒ์„ฑx[0, 0] # ๋˜๋Š” x[0][0] ์ฒซ๋ฒˆ์งธ ๋ฐ์ดํ„ฐ์˜ ์ฒซ ์ฑ„๋„ ๊ณต๊ฐ„ ๋ฐ์ดํ„ฐ์— ์ ‘๊ทผ์—ฌ๊ธฐ์—..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Convolution Neural Network - CNN (ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง), Convolution Layer, Pooling Layer

Convolutional Neural Network, CNN์€ ์ด๋ฏธ์ง€ ์ธ์‹ & ์Œ์‹ ์ธ์‹๋“ฑ ๋‹ค์–‘ํ•œ ๊ณณ์—์„œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.ํŠนํžˆ ์ด๋ฏธ์ง€ ์ธ์‹ ๋ถ„์•ผ ์—์„œ ๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•œ ๊ธฐ๋ฒ•์€ ๊ฑฐ์ด ๋‹ค CNN์„ ๊ธฐ์ดˆ๋กœ ํ•ฉ๋‹ˆ๋‹ค.CNN ์ „์ฒด ๊ตฌ์กฐConvolution Layer(ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต)๊ณผ Pooling Layer(ํ’€๋ง ๊ณ„์ธต)์ด ์ด๋ฒˆ์— ์ƒˆ๋กœ ๋“ฑ์žฅํ•ฉ๋‹ˆ๋‹ค.์šฐ๋ฆฌ๊ฐ€ ๋ณธ ์ง€๊ธˆ๊นŒ์ง€์˜ Neural Network(์‹ ๊ฒฝ๋ง)์€ ๋ชจ๋“  Neuron๊ณผ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.์ด๋ฅผ Fully-Connected (FC) - ์™„์ „์—ฐ๊ฒฐ ์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ์™„์ „ํžˆ ์—ฐ๊ฒฐ๋œ Layer๋Š” 'Affine ๊ณ„์ธต' ์ด๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ๊ตฌํ˜„ํ–ˆ์Šต๋‹ˆ๋‹ค.๋งŒ์•ฝ Affine ๊ณ„์ธต์„ ์‚ฌ์šฉํ•˜๋ฉด, Layer๊ฐ€ 5๊ฐœ์ธ Fully-Connected Neural Network(FC ์‹ ๊ฒฝ๋ง)์€ ์•„๋ž˜์˜ ๊ตฌ๋ฆผ๊ณผ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] ์˜ฌ๋ฐ”๋ฅธ ํ•™์Šต์„ ์œ„ํ•ด - Overfitting, Dropout, Hyperparameter

์˜ฌ๋ฐ”๋ฅธ ํ•™์Šต์„ ์œ„ํ•ด Machine Learning์—์„œ Overfitting์ด ๋˜๋Š” ์ผ์ด ๋งŽ์Šต๋‹ˆ๋‹ค. Overiftting(์˜ค๋ฒ„ํ”ผํŒ…)์€ ์‹ ๊ฒฝ๋ง์ด Training data(ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ)์—๋งŒ ์ง€๋‚˜์น˜๊ฒŒ ์ ์šฉ๋˜์–ด์„œ ๊ทธ ์™ธ์˜ ๋ฐ์ดํ„ฐ์—๋Š” ์ œ๋Œ€๋กœ ๋Œ€์‘ํ•˜์ง€ ๋ชปํ•˜๋Š” ์ƒํƒœ์ž…๋‹ˆ๋‹ค.Overfitting (์˜ค๋ฒ„ํ”ผํŒ…)์˜ค๋ฒ„ํ”ผํŒ…์€ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋งŽ๊ณ  ํ‘œํ˜„๋ ฅ์ด ๋†’์€ ๋ชจ๋ธ์ธ ๊ฒฝ์šฐ, ํ›ˆ๋ จ๋ฐ์ดํ„ฐ๊ฐ€ ์ ์€ ๊ฒฝ์šฐ์— ์ฃผ๋กœ ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค.์ด ๋‘ ์š”๊ฑด์„ ์ถฉ์กฑํ•˜์—ฌ Overiftting(์˜ค๋ฒ„ํ”ผํŒ…)์„ ์ผ์œผ์ผœ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.MNIST Dataset์˜ ํ›ˆ๋ จ๋ฐ์ดํ„ฐ์ค‘ 300๊ฐœ๋งŒ ์‚ฌ์šฉํ•˜๊ณ , 7-Layer Network๋ฅผ ์‚ฌ์šฉํ•ด์„œ Network์˜ ๋ณต์žก์„ฑ์„ ๋†’ํ˜€๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.๊ฐ Layer์˜ Neuron์€ 100๊ฐœ, Activation Function(ํ™œ์„ฑํ™” ํ•จ์ˆ˜)๋Š” ReLU ํ•จ์ˆ˜๋ฅผ ์‚ฌ..

๐Ÿ–ฅ๏ธ Deep Learning

[DL] Batch Normalization - ๋ฐฐ์น˜ ์ •๊ทœํ™”

Batch Normalization - ๋ฐฐ์น˜ ์ •๊ทœํ™”Batch Normalization (๋ฐฐ์น˜ ์ •๊ทœํ™”)์˜ ๊ฐœ๋…์€ 2015๋…„์— ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.์ผ๋‹จ, Batch Normalization(๋ฐฐ์น˜ ์ •๊ทœํ™”)๊ฐ€ ์ฃผ๋ชฉ๋ฐ›๋Š” ์ด์œ ๋Š” ๋‹ค์Œ์˜ ์ด์œ ๋“ค๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.Training(ํ•™์Šต)์„ ๋นจ๋ฆฌ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, Training(ํ•™์Šต) ์†๋„๋ฅผ ๊ฐœ์„ ํ•˜๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.์ดˆ๊นƒ๊ฐ’์— ํฌ๊ฒŒ ์˜์กดํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.๊ทธ๋ฆฌ๊ณ  Overiftting์„ ์–ต์ œํ•˜๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, Dropout๋“ฑ์˜ ํ•„์š”์„ฑ์ด ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค.Batch Normalization(๋ฐฐ์น˜ ์ •๊ทœํ™”)์˜ ๊ธฐ๋ณธ ์•„์ด๋””์–ด๋Š” ์•ž์—์„œ ๋งํ–ˆ๋“ฏ์ด ๊ฐ Layer(์ธต)์—์„œ์˜ Activation Value(ํ™œ์„ฑํ™” ๊ฐ’)์ด ์ ๋‹นํžˆ ๋ถ„ํฌ๊ฐ€ ๋˜๋„๋ก ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์˜ˆ์‹œ๋ฅผ ๋ณด๊ฒ ์Šต..

Bigbread1129
'๐Ÿ–ฅ๏ธ Deep Learning' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก