A A
[DL] ๋Œ€ํ‘œ์ ์ธ CNN Network - LeNet 5, AlexNet, ZFNet, VGGNet, GoogLeNet, ResNet
์ด๋ฒˆ๊ธ€์—์„œ๋Š” ๋‹ค์–‘ํ•œ CNN ๋„คํŠธ์›Œํฌ์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

LeNet 5

LeNet-5๋Š” ๊ธฐ๋ณธ์ ์ธ CNN ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ •์˜ํ•˜๋ฉฐ, ํ˜„์žฌ์˜ ๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ์ดˆ๊ฐ€ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
์ฃผ๋กœ ์†๊ธ€์”จ ์ˆซ์ž ์ธ์‹(MNIST ๋ฐ์ดํ„ฐ์…‹) ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐœ๋ฐœ๋˜์—ˆ์œผ๋ฉฐ, ๋˜ํ•œ ํ˜„๋Œ€ CNN์˜ ๊ธฐ์ดˆ๋ฅผ ๋งˆ๋ จํ•œ ๋ชจ๋ธ๋กœ ์—ฌ๊ฒจ์ง‘๋‹ˆ๋‹ค.

  • LeNet-5๋Š” ์ด 7๊ฐœ์˜ ๋ ˆ์ด์–ด(์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ํฌํ•จ)๋กœ ๊ตฌ์„ฑ๋œ ๋„คํŠธ์›Œํฌ์ž…๋‹ˆ๋‹ค.
  • LeNet-5์˜ ๊ตฌ์กฐ๋Š” ํฌ๊ฒŒ ๋‘ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • Convolutional Neural Network (CNN)
    • Fully Connected Network (FCN)
  • ๊ฐ ๋ ˆ์ด์–ด๋Š” ํŠน์ •ํ•œ ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, Convolutional Neural Network(CNN - ํ•ฉ์„ฑ๊ณฑ ๋ ˆ์ด์–ด)์™€ ์„œ๋ธŒ์ƒ˜ํ”Œ๋ง ๋ ˆ์ด์–ด(Pooling Layer)๋ฅผ ๊ต๋Œ€๋กœ ๋ฐฐ์น˜ํ•˜์—ฌ ์ด๋ฏธ์ง€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ  ์ฐจ์›์„ ์ถ•์†Œํ•ฉ๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ LeNet 5 ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ๋ฅผ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

  • C1 - Convolutional Layer: ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ํŠน์ง• ๋งต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด ํ•„ํ„ฐ๋Š” ์ด๋ฏธ์ง€์˜ ์ง€์—ญ์ ์ธ ํŒจํ„ด์„ ์ธ์‹ํ•ฉ๋‹ˆ๋‹ค.
  • S2 - Subsampling Layer: C1์—์„œ ์ถ”์ถœ๋œ ํŠน์ง• ๋งต์„ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•˜์—ฌ ํฌ๊ธฐ๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ„์‚ฐ ๋น„์šฉ์„ ์ค„์ด๊ณ  ๊ณต๊ฐ„์ ์ธ ์ •๋ณด๋ฅผ ๋ณด์กดํ•ฉ๋‹ˆ๋‹ค.
  • C3 - Convolutional Layer: S2์—์„œ ์–ป์€ ํŠน์ง• ๋งต์— ๋‹ค์‹œ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•˜์—ฌ ๋” ๋งŽ์€ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • S4 - Subsampling Layer: C3์˜ ํŠน์ง• ๋งต์„ ๋‹ค์šด์ƒ˜ํ”Œ๋งํ•ฉ๋‹ˆ๋‹ค.
  • C5, F6 - Fully Connected Layer: ์ด ๋ถ€๋ถ„์€ ์ „ํ†ต์ ์ธ ์‹ ๊ฒฝ๋ง๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๋ชจ๋“  ๋‰ด๋Ÿฐ์ด ์ด์ „ ๋ ˆ์ด์–ด์˜ ๋ชจ๋“  ๋‰ด๋Ÿฐ๊ณผ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
  • Output Layer (Output): FC5์˜ ์ถœ๋ ฅ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

 

  • LeNet5 ๋ชจ๋ธ์˜ ํŠน์ง• 
• 7 Layer: [CONV-POOL-CONV-POOL-FC-FC-FC]
• Conv Layer: 5x5 ํ•„ํ„ฐ, ์ŠคํŠธ๋ผ์ด๋“œ 1
• Pooling Layer: 2x2 ํ‰๊ท  ํ’€๋ง, ์ŠคํŠธ๋ผ์ด๋“œ 2
• Activation Function: ์‹œ๊ทธ๋ชจ์ด๋“œ/ํƒ„ํ—ˆ (Sigmoid/tanh)
• Parameters: 60,000๊ฐœ

AlexNet

AlexNet์€ 2012๋…„์— ILSVRC (ImageNet Large Scale Visual Recognition Challenge)์—์„œ ์šฐ์Šนํ•œ ์œ ๋ช…ํ•œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  •  AlexNet์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ํŠน์ง•์€ ๋„คํŠธ์›Œํฌ๋ฅผ ์ชผ๊ฐœ์„œ GPU๋กœ ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๋ฅผ ํ–ˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจ๋ธ์„ ๋ฐ˜์œผ๋กœ ์ชผ๊ฐœ์„œ ๋„ฃ๋Š”๋‹ค๋Š” ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.
  • ReLU Activation Function (ReLU ํ™œ์„ฑํ™” ํ•จ์ˆ˜): AlexNet์—์„œ๋Š” ์Œ์ˆ˜ ๊ฐ’์— ๋Œ€ํ•œ ํ™œ์„ฑํ™”๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๊ณ  Non-Linear(๋น„์„ ํ˜•์„ฑ)์„ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์ตœ์†Œ๋กœ ReLU(Rectified Linear Unit) Activation Function(ํ™œ์„ฑํ™” ํ•จ์ˆ˜)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , Gradient ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ์™„ํ™”์‹œํ‚ค๋Š” ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Dropout: AlexNet์€ ๋“œ๋กญ์•„์›ƒ(dropout)์„ 0.5 ์ •๋„๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•˜์—ฌ ๊ณผ์ ํ•ฉ(overfitting)์„ ์ค„์ด๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์ •๊ทœํ™” (Normalization): AlexNet์—์„œ๋Š” Local Response Normalization (LRN)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์ •๊ทœํ™”๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• (Data Augmentation): AlexNet์—์„œ๋Š” ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณผ์ ํ•ฉ(overfitting)์„ ์ค„์ด๊ณ  ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
• 8 Layer: [CONV-RELU-POOL-CONV-RELU-POOL-CONV-RELU-CONV-RELU-CONV-RELU-POOL-FC-FC-FC]
• Conv Layer: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ (11x11, 5x5, 3x3), ์ŠคํŠธ๋ผ์ด๋“œ 4/1
• Pooling Layer: 3x3 ์ตœ๋Œ€ ํ’€๋ง(Max Pooling), ์ŠคํŠธ๋ผ์ด๋“œ 2
• Activation Function: ReLU (Rectified Linear Unit)
• Parameters: ์•ฝ 60 million (6000๋งŒ ๊ฐœ)

ZFNet

AlexNet์˜ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๊ณ  CNN์˜ ๋‚ด๋ถ€ ์ž‘๋™ ๋ฐฉ์‹์„ ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ฐ ์ค‘์ ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค. 

  • AlexNet๊ณผ Architecture๋Š” ๋™์ผํ•˜์ง€๋งŒ, Hyperparameter๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์˜ค๋ฅ˜์œจ์„ ๊ฐœ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ ํฌ๊ธฐ์˜ ๊ฐ์†Œ: ZFNet์€ AlexNet์˜ ํ•ฉ์„ฑ๊ณฑ ํ•„ํ„ฐ ํฌ๊ธฐ๋ฅผ ์ค„์—ฌ ๋” ๋งŽ์€ ๊ณ ์ˆ˜์ค€์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜๋„๋ก ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • CONV1: (11x11 stride 4)๋ฅผ (7x7 stride 2)๋กœ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ๋” ๊นŠ์€ ํ•ฉ์„ฑ๊ณฑ ์ธต: ZFNet์€ AlexNet๋ณด๋‹ค ๋” ๊นŠ๊ณ , ๋‹ค์–‘ํ•œ ํ•ฉ์„ฑ๊ณฑ ์ธต์„ ์‚ฌ์šฉํ•˜์—ฌ ๋” ๋งŽ์€ ์ถ”์ƒ์ ์ธ ํŠน์ง•์„ ํ•™์Šต ๋ฐ ์ถ”์ถœํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ด๋ฃจ์–ด ๋ƒˆ์Šต๋‹ˆ๋‹ค.
    • CONV3,4,5: 384, 384, 256 filter๋ฅผ 512, 1024, 512๋กœ ์‚ฌ์šฉํ–ˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์‹œ๊ฐํ™” ๊ธฐ๋ฒ•: ๊ทธ๋ฆฌ๊ณ  ZFNet์€ CNN์˜ ๋‚ด๋ถ€ ์ž‘๋™์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
      • ์ด ์‹œ๊ฐํ™”๋Š” ์–ด๋–ค ํŠน์ง•๋“ค์ด ๊ฐ ๋ ˆ์ด์–ด์—์„œ ํ•™์Šต๋˜๊ณ  ์žˆ๋Š”์ง€ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
    • Dropout: Dropout์„ ์‚ฌ์šฉํ•˜์—ฌ ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด์—์„œ Overfitting(๊ณผ์ ํ•ฉ)์„ ๋ฐฉ์ง€ํ–ˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
• 8 Layer: [CONV-RELU-POOL-CONV-RELU-POOL-CONV-RELU-CONV-RELU-CONV-RELU-POOL-FC-FC-FC]
• Conv Layer: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ filter (7x7, 5x5, 3x3), stride 2/1
• Pooling Layer: 3x3 ์ตœ๋Œ€ ํ’€๋ง(Max Pooling), stride 2
• Activation Function: ReLU (Rectified Linear Unit)
• Parameters: ์•ฝ 60 million (6000๋งŒ ๊ฐœ)

VGGNet

2014๋…„์— Visual Geometry Group(VGG)์—์„œ ๊ฐœ๋ฐœํ•œ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง(CNN) ๋ชจ๋ธ๋กœ, ํŠนํžˆ ILSVRC 2014 ๋Œ€ํšŒ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ ๋„๋ฆฌ ์•Œ๋ ค์กŒ์Šต๋‹ˆ๋‹ค. 
  • ์ฃผ์š” ํŠน์ง•์€ ๋‹จ์ˆœํ•˜๊ณ  ์ผ๊ด€๋œ ๊ตฌ์กฐ๋ฅผ ๊ฐ–์ถ”๊ณ  ์žˆ์œผ๋ฉฐ, ๋„คํŠธ์›Œํฌ ๊นŠ์ด๋ฅผ ๋Š˜๋ ค ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • VGGNet์€ ์ด์ „์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋ณด๋‹ค ํ›จ์”ฌ ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์œ„์—์„œ ์„ค๋ช… ๋“œ๋ ธ์ง€๋งŒ, VGGNet์€ 16๊ฐœ ๋˜๋Š” 19๊ฐœ์˜ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋“  Convolution Matrix Layer(ํ•ฉ์„ฑ๊ณฑ์ธต)์˜ ํ•„ํ„ฐ ํฌ๊ธฐ๊ฐ€ 3x3: VGGNet์€ ๋ชจ๋“  ํ•ฉ์„ฑ๊ณฑ์ธต์— 3x3 ํฌ๊ธฐ์˜ ์ž‘์€ ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ํ’€๋ง์ธต (Pooling Layers): VGGNet์€ 2x2 Max ํ’€๋ง์ธต(max-pooling layers)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ณต๊ฐ„์ ์ธ ๋ถˆ๋ณ€์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

VGGNet Network์˜ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

  • VGGNet์ด 32x32 ์˜์ƒ์„ ์ฒ˜๋ฆฌํ•˜๊ณ , 1000๊ฐœ์˜ class๊ฐ€ ๋˜๋„๋ก ์กฐ์ ˆํ•ฉ๋‹ˆ๋‹ค.
  • ๋‚˜๋จธ์ง€๋Š” ๊ธฐ๋ณธ์˜ Neural Network๊ณผ ๊ฐ™์€ ๊ตฌ์กฐ๋กœ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด 16 or 19๊ฐœ์˜ ๋‹จ์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

3 Convolution Filter์˜ Receptive Field (1D)

VGGNet์€ Receptive Field๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Receptive Field๋Š” Neural Network(์‹ ๊ฒฝ๋ง)์˜ ํŠน์ • Neuron์ด ์ž…๋ ฅ ๊ณต๊ฐ„์—์„œ ๋ณด๋Š” ์˜์—ญ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • ์ฆ‰, CNN์—์„œ ์—ฐ์†์ ์ธ ํ•ฉ์„ฑ๊ณฑ์ธต์„ ์Œ“์„ ๋•Œ, ๊ฐ ๋‰ด๋Ÿฐ์ด ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋” ๋„“์€ ์˜์—ญ์„ ๋ณด๊ฒŒ ๋˜๋Š” ํ˜„์ƒ์ž…๋‹ˆ๋‹ค.
    • ๋˜ํ•œ, 3 Convolution Filter์˜ Receptive Field (1D) → ๊ณ„์ธต์„ ์—ฌ๋Ÿฌ๋ฒˆ ๊ฐ€์ ธ๊ฐ„๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • 1-Dimension(1์ฐจ์›) Convolution FIlter ์˜ ๊ฒฝ์šฐ, Receptive field๋Š” ํ•ด๋‹น ๋‰ด๋Ÿฐ์ด ์ž…๋ ฅ์˜ ์–ด๋Š ๋ฒ”์œ„๋ฅผ ์ปค๋ฒ„ํ•˜๋Š”์ง€๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. Receptive field๋Š” ํ•„ํ„ฐ์˜ ํฌ๊ธฐ์™€ ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง‘๋‹ˆ๋‹ค.

 

  • ์ฒซ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ์ธต์—์„œ๋Š” ๊ฐ ๋‰ด๋Ÿฐ์ด ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ์ž‘์€ ์˜์—ญ์— ๋Œ€ํ•ด ์ˆ˜์šฉ ์˜์—ญ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค. ์ด ์˜์—ญ์€ 3x3 ํฌ๊ธฐ์˜ ํ•„ํ„ฐ๋กœ ์ •์˜๋ฉ๋‹ˆ๋‹ค.
  • ๋‘ ๋ฒˆ์งธ ํ•ฉ์„ฑ๊ณฑ์ธต์—์„œ๋Š” ๊ฐ ๋‰ด๋Ÿฐ์ด ์ด์ „ ์ธต์˜ ๋‰ด๋Ÿฐ๋“ค์˜ ์ˆ˜์šฉ ์˜์—ญ์„ ๋ณด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ด์ „ ์ธต์˜ ํ•„ํ„ฐ ํฌ๊ธฐ์™€ ํ•ฉ์„ฑ๊ณฑ ์—ฐ์‚ฐ์— ์˜ํ•ด ์ •ํ•ด์ง‘๋‹ˆ๋‹ค.
    • ์˜ˆ๋ฅผ ๋“ค์–ด, ๋‘ ๋ฒˆ์งธ ์ธต์—์„œ 3x3 ํ•„ํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ๊ฐ ๋‰ด๋Ÿฐ์€ ์ด์ „ ์ธต์˜ 3x3 ์˜์—ญ์— ๋Œ€ํ•ด ์ˆ˜์šฉ ์˜์—ญ์„ ๊ฐ€์ง€๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  • ์ด์™€ ๊ฐ™์€ ๊ณผ์ •์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐ˜๋ณตํ•˜๋ฉด, ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์งˆ์ˆ˜๋ก ๊ฐ ๋‰ด๋Ÿฐ์˜ ์ˆ˜์šฉ ์˜์—ญ์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋” ๋„“์€ ์˜์—ญ์„ ํฌํ•จํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

  • Receptive Field๋Š” ์ž‘์€ filter๋กœ ์—ฌ๋Ÿฌ๊ฐœ์˜ Layer๊ฐ€ ์Œ“๋Š” ๋ฐฉ์‹์œผ๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค.
    • 5x5 ์™€ (3x3)x2 ํ•„ํ„ฐ๋Š” ๊ฐ™์€ ์˜์—ญ(Receptive Field)์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
    • ์ธต์ด ๊นŠ์–ด์ ธ ReLU ๊ฐ™์€ Activation(๋น„์„ ํ˜•์„ฑ )์ด ์ถ”๊ฐ€๋ฉ๋‹ˆ๋‹ค.

  • ์œ„์˜ ๊ทธ๋ฆผ, ์ฆ‰ ๋ชจ๋ธ์˜ Memory ์‚ฌ์šฉ๊ฐ’์„ ๋ณด์‹œ๋ฉด, ์ดˆ๊ธฐ Convolution Layer์—์„œ Memory ์‚ฌ์šฉ์ด ์ง‘์ค‘ ๋œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋˜ํ•œ ๋งˆ์ง€๋ง‰ Fully-Connected Layer์— Parameter ์‚ฌ์šฉ์ด ์ง‘์ค‘๋ฉ๋‹ˆ๋‹ค.

์ฆ‰, VGG16 ๋ณด๋‹ค VGG19 ๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ์ด ๋งŽ์ง€๋งŒ, ์„ฑ๋Šฅ์€ ์กฐ๊ธˆ ๋” ์ข‹๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

• 16 ๋˜๋Š” 19 Layer: [CONV-RELU-CONV-RELU-POOL-CONV-RELU-CONV-RELU-POOL-CONV-RELU-CONV-RELU-CONV-RELU-POOL-CONV-RELU-CONV-RELU-CONV-RELU-POOL-CONV-RELU-CONV-RELU-CONV-RELU-POOL-FC-FC-FC]
• Conv Layer: 3x3 filter, stride 1, padding 1
• Pooling Layer: 2x2 ์ตœ๋Œ€ ํ’€๋ง(Max Pooling), stride 2
• Activation Function: ReLU (Rectified Linear Unit)
• Parameters: ์•ฝ 138 million (1์–ต 3800๋งŒ ๊ฐœ)

GoogLeNet

2014๋…„ ILSVRC(ImageNet Large Scale Visual Recognition Challenge)์—์„œ ์šฐ์Šนํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ, Google์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

• 22 Layers: ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๋Š” 22๊ฐœ Layer์— ๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
• Inception ๋ชจ๋“ˆ: ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ(1x1, 3x3, 5x5)์™€ Pooling Layer๋ฅผ ๊ฒฐํ•ฉํ•œ ๊ตฌ์กฐ.
• Global Average Pooling: ๋งˆ์ง€๋ง‰์— ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด(FC) ๋Œ€์‹  ์‚ฌ์šฉํ•˜์—ฌ parameter ์ˆ˜๋ฅผ ์ค„์ž„.
• Auxiliary Classifiers: ์ค‘๊ฐ„ ์ถœ๋ ฅ์—์„œ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ•™์Šต์„ ์•ˆ์ •ํ™”ํ•˜๊ณ  Gradient Loss(๊ธฐ์šธ๊ธฐ ์†์‹ค) ๋ฌธ์ œ๋ฅผ ์™„ํ™”.
• Parameters: ์•ฝ 5 million (500๋งŒ ๊ฐœ)
  • ์ฃผ์š”ํ•œ ํŠน์ง•์€ Inception ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•˜์—ฌ ๊นŠ์ด์™€ ๋„ˆ๋น„๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ํ™•์žฅํ•œ ๊ฒƒ์ด ํŠน์ง•์ž…๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ์ธ์…‰์…˜ ๋ชจ๋“ˆ์€ ์—ฌ๋Ÿฌ ํฌ๊ธฐ์˜ ์ปจ๋ณผ๋ฃจ์…˜ ์—ฐ์‚ฐ๊ณผ ๋งฅ์Šคํ’€๋ง(max pooling) ์—ฐ์‚ฐ์„ ํ•œ ๋ ˆ์ด์–ด์—์„œ ๋ณ‘๋ ฌ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์น˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
  • ํ•˜์ง€๋งŒ ๋‹จ์ˆœํžˆ ๋ ˆ์ด์–ด๋ฅผ ๋งŽ์ด ์Œ“๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, Inception ๋ชจ๋“ˆ์„ ํ†ตํ•ด ๊นŠ์ด์™€ ๋„ˆ๋น„๋ฅผ ๊ท ํ˜• ์žˆ๊ฒŒ ํ™•์žฅํ•˜๋ฉฐ ํšจ์œจ์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜๋„๋ก ์„ค๊ณ„ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๋˜ํ•œ 1x1 Convolution์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฐจ์›์„ ์ถ•์†Œ์‹œํ‚ค๋Š” ๊ธฐ๋ฒ•๋„ ๋„์ž…ํ–ˆ์Šต๋‹ˆ๋‹ค.
    • ์ด๋ฅผ ํ†ตํ•ด ๋‹ค๋ฅธ ํฌ๊ธฐ์˜ Convolution ์—ฐ์‚ฐ ์ „์— ๊ณ„์‚ฐ๋Ÿ‰์„ ์ค„์ด๋Š” ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Fully Connected Layer(FC)๋ฅผ ์ œ๊ฑฐ, ๋Œ€์‹  ๋งˆ์ง€๋ง‰ Convolution ๋ ˆ์ด์–ด์˜ ์ถœ๋ ฅ์— Global Average Pooling์„ ์ ์šฉํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • ์ด๋Š” ๋„คํŠธ์›Œํฌ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ด๊ณ , Overfitting(๊ณผ์ ํ•ฉ)์„ ๋ฐฉ์ง€ํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

Inception Module

์ธ์…‰์…˜ ๋ชจ๋“ˆ์€ ์—ฌ๋Ÿฌ ํฌ๊ธฐ์˜ Convolution ์—ฐ์‚ฐ๊ณผ Max pooling ์—ฐ์‚ฐ์„ ํ•œ ๋ ˆ์ด์–ด์—์„œ ๋ณ‘๋ ฌ๋กœ ์ˆ˜ํ–‰ํ•˜๊ณ , ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ํ•ฉ์น˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
Inception Module์˜ ๊ตฌ์„ฑ์š”์†Œ -> ๋ชจ๋“ˆ์˜ ์ถœ๋ ฅ์€ ์ฑ„๋„ ๋ฐฉํ–ฅ์œผ๋กœ ํ•ฉ์ณ์ ธ ๋‹ค์Œ ๋ ˆ์ด์–ด๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
• 1x1 Convolution: ์ฐจ์› ์ถ•์†Œ ๋ฐ ๋น„์„ ํ˜•์„ฑ ์ถ”๊ฐ€
• 3x3 Convolution: ์ค‘๊ฐ„ ๊ทœ๋ชจ์˜ ํŠน์ง• ์ถ”์ถœ
• 5x5 Convolution: ํฐ ๊ทœ๋ชจ์˜ ํŠน์ง• ์ถ”์ถœ
• 3x3 Max Pooling: ๊ณต๊ฐ„์ ์ธ ์ถ•์†Œ์™€ ํ•จ๊ป˜ ๊ฐ•ํ•œ ํŠน์ง• ๊ฐ•์กฐ

  • ๋‹ค๋งŒ, ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์™œ์ผ๊นŒ์š”? ์ด์œ ๋Š” ์•„๋ž˜์˜ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ๊ณ„์‚ฐ์–‘์ด ๋งค์šฐ ๋งŽ๊ณ , Feature map depth๊ฐ€ ์ ์  ์ค‘๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Pooling Layer๊ฐ€ feature depth๋ฅผ ์œ ์ง€ํ•˜๊ธฐ ๋•Œ๋ฌธ์— filter concatํ›„ depth๊ฐ€ ์ ์ • ์ฆ๊ฐ€ํ•˜๊ฒŒ ๋œ๋‹ค๋Š” ๋ฌธ์ œ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ feature depth๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ "bottleneck" layer๋ฅผ ์‚ฌ์šฉํ•ด์„œ feature depth๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

 

Bottleneck Layer

๊ณ ์ฐจ์› ๊ณต๊ฐ„์˜ ํŠน์ง•์„ ์ €์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ์••์ถ•ํ•œ ๋‹ค์Œ ๋‹ค์‹œ ๊ณ ์ฐจ์› ๊ณต๊ฐ„์œผ๋กœ ํ™•์žฅํ•˜๋Š” ๊ณผ์ •์„ ํ†ตํ•ด ๊ณ„์‚ฐ์˜ ํšจ์œจ์  ์ฒ˜๋ฆฌ๋ฅผ ํ• ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๋Š” Layer ์ž…๋‹ˆ๋‹ค.

  • Bottleneck Layer์€ 3๊ฐœ์˜ ์—ฐ์†๋œ Layer๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
1. 1x1 Convolution: ์ฐจ์› ์ถ•์†Œ(Dimensionality Reduction)
2. 3x3 ๋˜๋Š” 5x5 Convolution: ํŠน์ง• ์ถ”์ถœ
3. 1x1 Convolution: ์ฐจ์› ํ™•์žฅ(Dimensionality Expansion)
  • ๊ทธ๋ฆฌ๊ณ  Bottleneck Layer์€ Dimensionality Reduction(์ฐจ์› ์ถ•์†Œ)๋ฅผ ํ•˜๋ฉด์„œ ๊ฐ Channel์˜ Weight(๊ฐ€์ค‘์น˜)๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ์ด์ œ Activation map์˜ depth(๊นŠ์ด)๋ฅผ ์ค„์—ฌ์„œ ๊ณ„์‚ฐ์˜ ํšจ์œจ์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
* Activation map (Feature map): Convolutional Neural Network, CNN ์—์„œ ๊ฐ ํ•„ํ„ฐ๊ฐ€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ƒ์„ฑํ•œ ์ถœ๋ ฅ
  • ๊ทธ๋Ÿฌ๋ฉด ํ•œ๋ฒˆ ์ด์ œ Inception Module์— 'bottleneck' Layer๋ฅผ ์ ์šฉ์‹œํ‚จ ๊ตฌ์กฐ๋ฅผ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Pooling Layer ๋‹ค์Œ Bottleneck์„ ํ†ตํ•ด depth๋ฅผ ์ค„์˜€๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Naive Inception Module์˜ Parameter ๋ณด๋‹ค ์ ˆ๋ฐ˜ ์ดํ•˜๋กœ ์ค„์˜€๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

์ „์ฒด GoogLeNet Architecture

  • GoogLeNet์€ ํ•˜์œ„ ๊ณ„์ธต์— Gradient๋ฅผ ์›ํ™œํžˆ ๊ณต๊ธ‰ํ•˜๊ฒŒ ์œ„ํ•ด์„œ ๋ณด์กฐ Classification ์ถœ๋ ฅ์„ ๋‘์—ˆ์Šต๋‹ˆ๋‹ค.
    • AVGPool - 1x1 Conv Layer - FC - FC - Softmax(activation) [๋ณด์กฐ์ถœ๋ ฅ]
    • ์ „์ฒด 22 ๊ณ„์ธต → Inception Module 9๊ฐœ + Conv Layer 4๊ฐœ
    • Parallel Layer๋Š” 1๊ฐœ๋กœ ๊ณ„์‚ฐ → Inception ๋ชจ๋“ˆ ๋‚ด๋ถ€์˜ ๋ณ‘๋ ฌ ์—ฐ์‚ฐ ๊ตฌ์กฐ๋ฅผ ๋‹จ์ผ ๊ณ„์ธต์œผ๋กœ ๊ฐ„์ฃผํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.
  • Inception Module ๋ณ„๋กœ 2-Layer๋กœ ๊ณ„์‚ฐ → ๊ณ„์‚ฐ์ƒ 2๊ฐœ์˜ ๊ณ„์ธต์œผ๋กœ ๊ฐ„์ฃผํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.
    • ๋˜ํ•œ Gradient Loss ๋ฌธ์ œ๋ž‘ ํ•™์Šต์„ ์•ˆ์ •ํ™” ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ค‘๊ฐ„ Layer์— Auxiliary Classifiers๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, ์—ฌ๊ธฐ์„œ Auxiliary Classifiers๊ฐ€ ๋ญ˜๊นŒ์š”?

 

Auxiliary Classifiers

GoogLeNet(๋˜๋Š” Inception V1)์—์„œ ๋„์ž…๋œ ๊ธฐ๋ฒ•์œผ๋กœ, ๋„คํŠธ์›Œํฌ์˜ ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ์ถ”๊ฐ€์ ์ธ ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์„ ๋•๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ Gradient Loss ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•˜๊ณ , ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ๋‚˜์€ ํŠน์„ฑ์„ ํ•™์Šตํ•˜๋„๋ก ์œ ๋„ํ•˜๋ฉฐ, ํ•™์Šต์„ ์•ˆ์ •ํ™”์‹œํ‚ค๋Š” ๋ฐ ๊ธฐ์—ฌํ•ฉ๋‹ˆ๋‹ค.

๋…ธ๋ž€์ƒ‰ ์ ์„  ๋ฐ•์Šค์•ˆ์— ์žˆ๋Š” ๊ฒƒ์ด Auxiliary Classifier ์ž…๋‹ˆ๋‹ค. ์•ž์—๊ฐ€ 4a, ๋’ค์—๊ฐ€ 4d

  • GoogLeNet์—์„œ๋Š” Inception ๋ชจ๋“ˆ ๊ทธ๋ฃน ๋’ค์— ๋ฐฐ์น˜๋ฉ๋‹ˆ๋‹ค.
  • ๋ณดํ†ต ๋‘ ๊ฐœ์˜ Auxiliary Classifiers๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” Inception ๋ชจ๋“ˆ(4a) ๋’ค์—, ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” Inception ๋ชจ๋“ˆ(4d) ๋’ค์— ์œ„์น˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ค‘๊ฐ„ Layer์—์„œ Gradient๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋’ค๋กœ ์ „๋‹ฌํ•จ์œผ๋กœ์จ, ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ ๋ฐœ์ƒํ•˜๋Š” Gradient Loss ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•ฉ๋‹ˆ๋‹ค.
  • Auxiliary Classifiers๋Š” ์ฃผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด Layer๊ฐ€ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.
• Average Pooling Layer: 5x5 ํฌ๊ธฐ, stride 3
• Convolution Layer: 1x1 filter, ์ถœ๋ ฅ channel ์ˆ˜๋Š” 128
• Fully Connected Layer (FC- ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด): 1024๊ฐœ Neuron
• Dropout Layer: Dropout ํ™•๋ฅ  0.7
• Output Layer: 1000๊ฐœ ํด๋ž˜์Šค์— ๋Œ€ํ•œ Softmax(์†Œํ”„ํŠธ๋งฅ์Šค) Activation Function

ResNet

ResNet์€ 2015๋…„์— Microsoft Research์—์„œ ๊ฐœ๋ฐœ๋œ CNN(Convolutional Neural Network) ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
  • ResNet์€ ๊ทธ ์ด์ „๊นŒ์ง€์˜ ๋ฌธ์ œ์˜€๋˜ ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋ฅผ ํ›ˆ๋ จ์‹œํ‚ฌ ๋•Œ ๋ฐœ์ƒํ•˜๋Š” ์†Œ์‹ค๋œ(gradients vanishing) ๋˜๋Š” ํญ๋ฐœํ•˜๋Š” (gradients exploding) Gradient Saturation(๊ทธ๋ž˜๋””์–ธํŠธ ๋ฌธ์ œ)๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

• 1x1 Convolution: ์ฐจ์› ์ถ•์†Œ ๋ฐ Non-Linear ์ถ”๊ฐ€
• 3x3 Convolution: ์ค‘๊ฐ„ ๊ทœ๋ชจ์˜ Feature ์ถ”์ถœ
• 1x1 Convolution: Dimension ํ™•์žฅ ๋ฐ Non-Linear ์ถ”๊ฐ€
• Batch Normalization: ๋‚ด๋ถ€ ๊ณต๊ฐ„ ๋ณ€ํ™”๋Ÿ‰ ๋ณ€ํ™” ๊ฐ์†Œ
• ReLU Activation: Non-Linear ์ถ”๊ฐ€
• Identity Shortcut Connection: Gradient Loss ๋ฌธ์ œ ์™„ํ™”
  • ์ฃผ์š” ํŠน์ง•์„ ํ•œ๋ฒˆ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • Residual Block(์ž”์ฐจ ๋ธ”๋ก): Residual Block(์ž”์ฐจ ๋ธ”๋ก)์€ ์ง์ ‘์ ์ธ ์—ฐ๊ฒฐ(Identity Shortcut Connection)์„ ํ†ตํ•ด ์ž…๋ ฅ์„ ์ถœ๋ ฅ์„ ๋”ํ•˜๋Š” ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด Gradient๊ฐ€ ์†Œ์‹ค๋˜์ง€ ์•Š๊ณ  ์ž˜ ์ „๋‹ฌ๋˜๋ฉฐ, ๋งค์šฐ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ ํšจ๊ณผ์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Identity Shortcut Connection: Residual Block์—์„œ ์ž…๋ ฅ์„ ์ถœ๋ ฅ์„ ๋”ํ•˜์—ฌ ๋ชจ๋ธ์ด ํ•™์Šตํ•ด์•ผ ํ•  ๊ฒƒ์„ ์ค„์ด๊ณ , ๊ธฐ์กด์˜ ์ž…๋ ฅ์„ ๋ณด์กดํ•˜์—ฌ ์•ˆ์ •์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌ๋ฉด ํŠธ์›Œํฌ์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง€๋”๋ผ๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ด์ง‘๋‹ˆ๋‹ค.
  • ๊ทผ๋ฐ, ์—ฌ๊ธฐ์„œ 'Residual Block(์ž”์ฐจ ๋ธ”๋ก)'์€ ๋ฌด์—‡์ผ๊นŒ์š”?

Residual Learning (์ž”์ฐจ ํ•™์Šต)

ResNet์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด๋Š” "์ž”์ฐจ ํ•™์Šต(Residual Learning)” ์ž…๋‹ˆ๋‹ค.

  • ์ž”์ฐจ ๋ธ”๋ก์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ง์ ‘์ ์œผ๋กœ ๋‹ค์Œ ์ธต์— ์ „๋‹ฌํ•˜๋Š” ์Šคํ‚ต ์—ฐ๊ฒฐ(skip connection) ๋˜๋Š” ๋‹จ์ถ• ๊ฒฝ๋กœ(shortcut connection)๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.

  • ์ผ๋ฐ˜์ ์ธ ๊ฒฝ๋กœ๋กœ ์ง„ํ–‰์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์ง€๋งŒ Layer๋ฅผ ๊ฑด๋„ˆ ๋›ฐ์–ด๋„˜๋Š”(์šฐํšŒํ•˜๋Š”)์—ฐ๊ฒฐ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ReLU(Activation Function)์œผ๋กœ ์ด๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด Backpropagation(์—ญ์ „ํŒŒ)์‹œ Graident(๊ธฐ์šธ๊ธฐ)๊ฐ€ ์†Œ์‹ค๋˜์ง€ ์•Š๊ณ  ๊ทธ๋Œ€๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
๊ฐ ์ž”์ฐจ ๋ธ”๋ก์€ ์ž…๋ ฅ์„ ์ถœ๋ ฅ์— ์ง์ ‘ ๋”ํ•˜๋Š” skip connection์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ ํ•™์Šตํ•ด์•ผ ํ•˜๋Š” ๊ฒƒ์ด ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ์ฐจ์ด, ์ฆ‰ "์ž”์ฐจ"์ž„์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๋„คํŠธ์›Œํฌ๊ฐ€ ๋” ๊นŠ์–ด์ ธ๋„ ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋•์Šต๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ skip connection์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๋„คํŠธ์›Œํฌ์˜ ๋” ๊นŠ์€ ์ธต์œผ๋กœ ์ง์ ‘ ์ „๋‹ฌํ•จ์œผ๋กœ์จ, ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ Gradient๊ฐ€ ํšจ๊ณผ์ ์œผ๋กœ ์ „ํŒŒ๋  ์ˆ˜ ์žˆ๋„๋ก ๋•๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ํ•œ Residual Learning(์ž”์ฐจ ํ•™์Šต)์—์„œ ์‚ฌ์šฉ๋˜๋Š” Residual Block(๋ ˆ์ง€๋“€์–ผ ๋ธ”๋ก)์€ ์‹ ๊ฒฝ๋ง์˜ ๊ฐ ๋ ˆ์ด์–ด๊ฐ€ ์ž…๋ ฅ์„ ๊ทธ๋Œ€๋กœ ์ถœ๋ ฅ์œผ๋กœ ์ „๋‹ฌํ•˜๋Š” "์ •์ฒด์„ฑ ๋งตํ•‘(identity mapping)"์„ ํ†ตํ•ด ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ๋งค์šฐ ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ๋„ Gradient Loss ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค.
  • Residual Block์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ์ฐจ์ด๋ฅผ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์ฐจ์ด๋ฅผ ์ž”์—ฌ ํ•จ์ˆ˜(residual function)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

  • ๋˜ํ•œ ๊ทธ๋ฆฌ๊ณ , Residual(์ž”์ฐจ) Layer์—์„œ ์ฃผ๊ธฐ์ ์œผ๋กœ filter ๊ฐœ์ˆ˜๋ฅผ 2๋ฐฐ๋กœ ๋Š˜๋ฆฌ๊ณ  Stride๋ฅผ 2๋กœ ์ง€์ •ํ•˜์—ฌ downsampling์„ ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ๋งˆ์ง€๋ง‰ Classifier(๋ถ„๋ฅ˜๊ธฐ) ์—์„œ ๋ ๋ถ€๋ถ„์— Fully-Connected Layer(FC) ๊ณ„์ธต์„ ์ œ๊ฑฐํ•˜์˜€์Šต๋‹ˆ๋‹ค.
    • ๋‹จ, ํด๋ž˜์Šค ์ถœ๋ ฅ์„ ์œ„ํ•œ Fully-Connected Layer(FC) 1000๊ฐœ๋งŒ ์กด์žฌํ•ฉ๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆฌ๊ณ  Global Average Pooling์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.