A A
[CV] Object Detection์˜ ์ดํ•ด

Intro Object Detection

Object Detection์€ Deep Learning(๋”ฅ๋Ÿฌ๋‹) ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐœ์ „ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • Object detection์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฏธ์ง€๋‚˜ ๋น„๋””์˜ค ๋‚ด์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์ฐพ๊ณ , ํ•ด๋‹น ๊ฐ์ฒด๊ฐ€ ๋ฌด์—‡์ธ์ง€๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.
  • ์ด ๊ธฐ์ˆ ์€ ๋ณด์•ˆ ์‹œ์Šคํ…œ, ์ž์œจ ์ฃผํ–‰ ์ฐจ๋Ÿ‰, ์–ผ๊ตด ์ธ์‹, ์ด๋ฏธ์ง€ ๊ฒ€์ƒ‰ ์—”์ง„ ๋“ฑ ๋‹ค์–‘ํ•œ ์‘์šฉ ๋ถ„์•ผ์—์„œ ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

Pascal VOC ๊ฒฝ์—ฐ๋Œ€ํšŒ์—์„œ ํ‰๊ท ์ ์œผ๋กœ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ง€ํ‘œ ๊ทธ๋ž˜ํ”„

  • PASCAL VOC ๋Œ€ํšŒ์—์„œ convnet์„ ์‚ฌ์šฉํ•˜๊ธฐ ์ „์ด๋ž‘ ํ›„๋ž‘ ์„ฑ๋Šฅ ์ง€ํ‘œ๊ฐ€ ํ™• ์ƒ์Šนํ•œ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Localization, Detection, Segmentation

Localization, Detection, Segmentation ์ด 3๊ฐœ์˜ ๊ณตํ†ต์ ์€ Object์˜ ์œ„์น˜๋ฅผ ์ฐพ์•„์ฃผ๋Š”๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • Localization: ๋‹จ ํ•˜๋‚˜์˜ Object ์œ„์น˜๋ฅผ Bounding box๋กœ ์ง€์ •ํ•˜์—ฌ ์ฐพ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • Detection: ์—ฌ๋Ÿฌ ๊ฐœ์˜ Object๋“ค์— ๋Œ€ํ•œ ์œ„์น˜๋ฅผ Bounding box๋กœ ์ง€์ •ํ•˜์—ฌ ์ฐพ๋Š”๊ฒƒ์ž…๋‹ˆ๋‹ค..
  • Segmentation: Detection๋ณด๋‹ค ๋” ๋ฐœ์ „๋œ ํ˜•ํƒœ๋กœ Pixel ๋ ˆ๋ฒจ์˜ Detection ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    • ๋˜ํ•œ Detection ๋ณด๋‹ค ์ •๊ตํ•˜๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • Localization / Detection์€ ํ•ด๋‹น Object์˜ ์œ„์น˜๋ฅผ Bounding box๋กœ ์ฐพ๊ณ , Bounding Box๋‚ด์˜ Object๋ฅผ ํŒ๋ณ„ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Localization/Detection์€ Bounding box regression(box์˜ ์ขŒํ‘œ๊ฐ’๋“ค์„ ์˜ˆ์ธก)๊ณผ Classification ๋‘๊ฐœ์˜ ๋ฌธ์ œ๊ฐ€ ํ•ฉ์ณ์ ธ
  • ์žˆ์Šต๋‹ˆ๋‹ค.
  • Localization์— ๋น„ํ•ด Detection์€ 2๊ฐœ ์ด์ƒ์˜ Object๋ฅผ ์ด๋ฏธ์ง€์˜ ์ž„์˜ ์œ„์น˜์—์„œ ์ฐพ์•„์•ผ ํ•˜๋ฏ€๋กœ ์ƒ๋Œ€์ ์œผ๋กœ Localization ๋ณด๋‹ค ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์–ด๋ ค์šด ๋ฌธ์ œ์— ๋ด‰์ฐฉํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Object Detection์˜ ์ฃผ์š” ๊ตฌ์„ฑ์š”์†Œ

  • Region Proposal - ์˜์—ญ์ถ”์ •
    • ์˜์—ญ ์ถ”์ •์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ์„ ์‹๋ณ„ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ์ฒด ๊ฒ€์ถœ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ฒซ ๋‹จ๊ณ„๋กœ, ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ชจ๋“  ์œ„์น˜์— ๋Œ€ํ•ด ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์„ ํ‰๊ฐ€ํ•˜๊ณ , ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฒƒ ๊ฐ™์€ ์œ„์น˜์˜ ํ›„๋ณด ์˜์—ญ์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  • Detection์„ ์œ„ํ•œ Deep Learning Network ๊ตฌ์„ฑ
    • ๊ฐ์ฒด ๊ฒ€์ถœ์„ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋Š” ์ฃผ๋กœ Convolutional Neural Networks(CNN)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋„คํŠธ์›Œํฌ๋Š” ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ํŠน์ง•์„ ์ถ”์ถœํ•˜๊ณ , ์ด ํŠน์ง•๋“ค์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ํด๋ž˜์Šค๋ฅผ ๋™์‹œ์— ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • Detection์„ ๊ตฌ์„ฑํ•˜๋Š” ๊ธฐํƒ€์š”์†Œ
    • IOU (Intersection Over Union): IOU๋Š” ๋‘ ์˜์—ญ์˜ ๊ฒน์น˜๋Š” ๋ถ€๋ถ„์„ ๋‘ ์˜์—ญ์˜ ํ•ฉ์ง‘ํ•ฉ์œผ๋กœ ๋‚˜๋ˆˆ ๊ฐ’์ž…๋‹ˆ๋‹ค.
      • ์ด๋Š” ์˜ˆ์ธก๋œ bounding box์™€ ์‹ค์ œ ground truth bounding box์˜ ์ผ์น˜๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. IOU ๊ฐ’์ด ๋†’์„์ˆ˜๋ก, ์˜ˆ์ธก๋œ bounding box๊ฐ€ ์ •ํ™•ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • NMS (Non-Maximum Suppression): NMS๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ bounding box ์ค‘ ๊ฐ€์žฅ ํ™•๋ฅ ์ด ๋†’์€ bounding box๋ฅผ ์„ ํƒํ•˜๊ณ , ๋‚˜๋จธ์ง€์™€ ๋งŽ์ด ๊ฒน์น˜๋Š” bounding box๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.
      • ์ด๋Š” ํ•œ ๊ฐ์ฒด์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ bounding box๊ฐ€ ์ƒ์„ฑ๋˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค.
    • mAP (mean Average Precision): mAP๋Š” ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค.
      • ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ํด๋ž˜์Šค์— ๋Œ€ํ•ด ์–ผ๋งˆ๋‚˜ ์ž˜ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ ํด๋ž˜์Šค๋ณ„ Average Precision(AP)์„ ๊ณ„์‚ฐํ•˜๊ณ , ์ด๋ฅผ ๋ชจ๋‘ ํ‰๊ท ๋‚ธ ๊ฐ’์ž…๋‹ˆ๋‹ค.
    • Anchor Box: Anchor box๋Š” ๋‹ค์–‘ํ•œ ๋น„์œจ๊ณผ ํฌ๊ธฐ๋ฅผ ๊ฐ€์ง„ ๊ธฐ์ค€ ๋ฐ•์Šค๋กœ, ์ด๋ฏธ์ง€ ๋‚ด์˜ ์ž ์žฌ์  ๊ฐ์ฒด ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
      • ์ด๋Š” RPN๊ณผ ๊ฐ™์€ ๋„คํŠธ์›Œํฌ์—์„œ ํ›„๋ณด ์˜์—ญ์„ ์ œ์•ˆํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์™€ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ๋” ์ž˜ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

์ฃผ์š” Backend CNN Classification

๊ฐ์ฒดํƒ์ง€ ๋ชจ๋ธ์˜ ๋ฐฑ์—”๋“œ๋กœ ์‚ฌ์šฉ๋˜๋Š” CNN ๊ตฌ์กฐ๋Š” ํŠน์ง•์ถ”์ถœ(feature extraction) ๋ถ€๋ถ„์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ ์ฃผ๋กœ Resnet, Inception, Mobilenet์ด ์ฃผ์š”ํ•˜๊ฒŒ ์‚ฌ์šฉ๋˜๋Š” Network์ž…๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ ์ฃผ์š” ๋ฒ”์šฉ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š”๊ฑด ResNet (Residual Networks)์ž…๋‹ˆ๋‹ค.
      • ResNet์€ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์„ ํ›ˆ๋ จ์‹œํ‚ค๊ธฐ ์œ„ํ•œ ๊ธฐ์ˆ ์ธ ์ž”์ฐจ ํ•™์Šต(residual learning)์„ ๋„์ž…ํ•˜์—ฌ, ๊นŠ์€ ๋„คํŠธ์›Œํฌ์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ์†Œ์‹ค๋œ ๊ธฐ์šธ๊ธฐ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ฉ๋‹ˆ๋‹ค
    • ์ฃผ๋กœ Tensorflow Object Detection API๋กœ ์‚ฌ์šฉ๋˜๋Š”๊ฑด Inception, Mobilenet์ด ์žˆ์Šต๋‹ˆ๋‹ค.
      • Inception: Inception ๋„คํŠธ์›Œํฌ๋Š” ๋ณ‘๋ ฌ์ ์ธ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ํ•„ํ„ฐ๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ์—์„œ ํŠน์„ฑ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋Š” ๊ฐ์ฒด ํƒ์ง€์™€ ๊ฐ™์ด ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ๊ฐ์ฒด๋ฅผ ์ธ์‹ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
      • MobileNet: MobileNet์€ ๊ฒฝ๋Ÿ‰ํ™”๋œ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ๋กœ, ํŠนํžˆ ๋ชจ๋ฐ”์ผ์ด๋‚˜ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ๊ณผ ๊ฐ™์ด ์—ฐ์‚ฐ ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ๋œ ํ™˜๊ฒฝ์—์„œ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. MobileNet์€ ๊นŠ์ด๋ณ„ ๋ถ„๋ฆฌ ์ปจ๋ณผ๋ฃจ์…˜(depthwise separable convolution)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ํฌ๊ธฐ์™€ ์—ฐ์‚ฐ๋Ÿ‰์„ ๋Œ€ํญ ์ค„์ด๋ฉด์„œ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Object Detection์˜ ๋‚œ์ œ

ํฌ๊ฒŒ 5๊ฐ€์ง€์˜ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. 

  • Classification + Regression์„ ๋™์‹œ์— ํ•˜๋Š” ๋ฌธ์ œ๋Š” ์ด๋ฏธ์ง€์—์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฌผ์ฒด์„ classification ํ•จ๊ณผ ๋™์‹œ์— ์œ„์น˜๋ฅผ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ์œ ํ˜•์˜ Object๊ฐ€ ์„ž์—ฌ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฑด ํฌ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ฌ๊ณ , ์ƒ๊น€์ƒˆ๊ฐ€ ๋‹ค๋ฅธ Object๊ฐ€ ์„ž์—ฌ ์žˆ๋Š” ์ด๋ฏธ์ง€์—์„œ ์ด๋“ค์„ Detectํ•ด์•ผ ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Detect ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ์‹ค์‹œ๊ฐ„ ์˜์ƒ ๊ธฐ๋ฐ˜์—์„œ Detect ํ•ด์•ผ ํ•˜๋Š” ์š”๊ตฌ์‚ฌํ•ญ์ด ์ฆ๋Œ€ ๋œ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • Object image๊ฐ€ ๋ช…ํ™•ํ•˜์ง€ ์•Š์€ ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ ์ „์ฒด ์ด๋ฏธ์ง€์—์„œ Detect ํ•  Object๊ฐ€ ์ฐจ์ง€ํ•˜๋Š” ๋น„์ค‘์ด ๋†’์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • ์ฃผ๋กœ ๋ฐฐ๊ฒฝ์ด ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
  • ๋˜ํ•œ ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ ์„ธํŠธ๊ฐ€ ๋ถ€์กฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  annotation์„ ๋งŒ๋“ค์–ด์•ผ ํ•˜๋ฏ€๋กœ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์ƒ์„ฑํ•˜๊ธฐ๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ์—ฌ๋ ต๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.

Object Localization ๊ฐœ์š”

Object Localization์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ํŠน์ • ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ๊ฐ์ฒด ๊ฒ€์ถœ(Object Detection)์˜ ํ•œ ๋ถ€๋ถ„์œผ๋กœ, ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๊ฐ€ ์–ด๋””์— ์œ„์น˜ํ•ด ์žˆ๋Š”์ง€๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    • Object Detection๊ณผ ๋‹ฌ๋ฆฌ, Object Localization์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๋‹จ ํ•˜๋‚˜์˜ ์ฃผ์š” ๊ฐ์ฒด์˜ ์œ„์น˜๋งŒ์„ ์ฐพ๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ผ๋ฐ˜์ ์ž…๋‹ˆ๋‹ค.

 

  • ์ฃผ์š” ๊ณผ์ •์„ ๊ฐ„๋žตํ•˜๊ฒŒ ์„ค๋ช…๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
  1. ์ด๋ฏธ์ง€ ์ž…๋ ฅ: ๋ถ„์„ํ•  ์ด๋ฏธ์ง€๊ฐ€ ๋ชจ๋ธ์— ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.
  2. Feature Extractor: CNN์˜ ์—ฌ๋Ÿฌ ์ธต(convolutional layers, pooling layers ๋“ฑ)์„ ํ†ต๊ณผํ•˜๋ฉด์„œ ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ํŠน์ง•๋“ค์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ ์ธต์€ ์ด๋ฏธ์ง€์˜ ๊ฐ„๋‹จํ•œ ํŠน์ง•(์˜ˆ: ์—ฃ์ง€, ์ƒ‰์ƒ)์„ ์ธ์‹ํ•˜๊ณ , ๊นŠ์€ ์ธต์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ๋ณต์žกํ•œ ํŠน์ง•(์˜ˆ: ๊ฐ์ฒด์˜ ์ผ๋ถ€๋ถ„)์„ ์ธ์‹ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  3. Feature Map: CNN์˜ Layer๋ฅผ ํ†ต๊ณผํ•œ ๊ฒฐ๊ณผ๋กœ ์ƒ์„ฑ๋œ, ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ํŠน์ง•๋“ค์ด ๋‹ด๊ธด ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ๋“ค์€ ์›๋ณธ ์ด๋ฏธ์ง€๋ณด๋‹ค ์ฐจ์›์ด ์ค„์–ด๋“ค์—ˆ๊ฑฐ๋‚˜, ํŠน์ • ํŠน์ง•์ด ๊ฐ•์กฐ๋œ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ๋ณด์กดํ•˜๋ฉด์„œ๋„ ๋ฐ์ดํ„ฐ์˜ ์–‘์„ ์ค„์—ฌ ์ฒ˜๋ฆฌ ์†๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  4. Fully-Connected Layer (FC-Layer): Feature Map์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ด๋ฏธ์ง€๊ฐ€ ์–ด๋–ค ํด๋ž˜์Šค์— ์†ํ•˜๋Š”์ง€๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฉ์‹์€ Feature Map์„ ์ผ๋ ฌ๋กœ ํŽด์„œ(Flatten) ์™„์ „ ์—ฐ๊ฒฐ ๊ณ„์ธต (Fully Connected Layer)์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณ„์ธต์€ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  5. Softmax Class Score: Fully Connected Layer (FC Layer)์—์„œ ๊ณ„์‚ฐ๋œ ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ๊ฐ ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ด๋–„ Softmax ํ•จ์ˆ˜๋ฅผ Activation Function (ํ™œ์„ฑํ™” ํ•จ์ˆ˜)๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ ์ˆ˜๋ฅผ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ, ๋ชจ๋“  ํด๋ž˜์Šค์˜ ํ™•๋ฅ  ํ•ฉ์ด 1์ด ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • Object Localization์—์„œ Feature Extractor์—์„œ ์ถ”์ถœํ•œ ํŠน์ง•์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ๊ฐ์ฒด๊ฐ€ ์ด๋ฏธ์ง€ ๋‚ด์— ์กด์žฌํ•˜๋Š” ์ •ํ™•ํ•œ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” Bounding Box์˜ ์ขŒํ‘œ๊ฐ’ (x1, y1, x2, y2)๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ (x1, y1)์€ Bounding Box์˜ ์™ผ์ชฝ ์ƒ๋‹จ ๋ชจ์„œ๋ฆฌ์˜ ์ขŒํ‘œ์ด๊ณ , (x2, y2)๋Š” ์˜ค๋ฅธ์ชฝ ํ•˜๋‹จ ๋ชจ์„œ๋ฆฌ์˜ ์ขŒํ‘œ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Feature Extractor์—์„œ ์ถ”์ถœํ•œ ํŠน์„ฑ์„ ๋ฐ”ํƒ•์œผ๋กœ, ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋Š” ์˜์—ญ(Region of Interest, ROI)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
    • Object Localization์—์„œ๋Š” ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋งˆ์ง€๋ง‰์œผ๋กœ Bounding Box Regression์„ ํ†ตํ•ด ๋ณด์ •๋œ Bounding Box์˜ ์ขŒํ‘œ (x1, y1, x2, y2)๊ฐ€ ์ตœ์ข… ๊ฒฐ๊ณผ๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค
  • ์ด ์ขŒํ‘œ๋Š” ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด์˜ ์ •ํ™•ํ•œ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

Object Localization - Bounding Box ํ•™์Šต

  • ์•ž์„œ Object Localization์—์„œ Bounding Box๋ฅผ ํ•™์Šตํ•˜๋ฉด์„œ Weight(๊ฐ€์ค‘์น˜)๊ฐ€ ์—…๋ฐ์ดํŠธ ๋˜๋Š”๊ฑด Nerual Network(์‹ ๊ฒฝ๋ง)์—์„œ ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ๋‚ด๋ถ€ parameter(Weight-๊ฐ€์ค‘์น˜)๋ฅผ ์กฐ์ •ํ•œ๋‹ค๋Š” ์˜๋ฏธ์ž…๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ์—ฌ๋Ÿฌ ๋ฒˆ ๋ฐ˜๋ณตํ•˜๋ฉด์„œ, ์‹ ๊ฒฝ๋ง์€ ์ ์ฐจ ์‹ค์ œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋Š” Bounding Box๋ฅผ ์ถœ๋ ฅํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.
  1. ์ดˆ๊ธฐ ์˜ˆ์ธก: ์‹ ๊ฒฝ๋ง์€ ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด Bounding Box์˜ ์ขŒํ‘œ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  2. ์†์‹ค ๊ณ„์‚ฐ: ์˜ˆ์ธก๋œ Bounding Box์™€ ์‹ค์ œ ๊ฐ์ฒด์˜ Bounding Box(์ง€๋„ ํ•™์Šต ๋ฐ์ดํ„ฐ์—์„œ ์ œ๊ณต) ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ์ฐจ์ด๋Š” ์ฃผ๋กœ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ํ†ตํ•ด ๊ณ„์‚ฐ๋˜๋ฉฐ, ์ด๋•Œ Loss Function(์†์‹ค ํ•จ์ˆ˜)๋Š” Mean Squared Error(MSE) ๋˜๋Š” Intersection over Union(IoU)๋“ฑ ๋‹ค์–‘ํ•œ Loss Function(์†์‹ค ํ•จ์ˆ˜)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ฐ€์ค‘์น˜ ์—…๋ฐ์ดํŠธ: ๊ณ„์‚ฐ๋œ Loss ์„ ๋ฐ”ํƒ•์œผ๋กœ, Nerual Network(์‹ ๊ฒฝ๋ง)์€ Backpropagation(์—ญ์ „ํŒŒ) ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ Weight(๊ฐ€์ค‘์น˜)๋ฅผ ์กฐ์ •(update)ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์—์„œ Weight(๊ฐ€์ค‘์น˜)๋Š” Loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์กฐ์ •๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋ ‡๊ฒŒ Object Localization์„ ์ˆ˜ํ–‰ํ•˜๋ฉด ์˜ˆ์ธก๊ฒฐ๊ณผ๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.

  • ๊ทธ๋Ÿฌ๋ฉด ๋งŒ์•ฝ์— 2๊ฐœ ์ด์ƒ์˜ Object๋ฅผ ๊ฒ€์ถœํ•˜๋ ค๊ณ  ํ•˜๋ฉด ์ด๋ฏธ์ง€์˜ ์–ด๋Š ์œ„์น˜์—์„œ Object๋ฅผ ์ฐพ์•„์•ผ ํ• ๊นŒ์š”? ๊ทธ๊ฑด ๋‹ค์Œ ๊ธ€์—์„œ ์„ค๋ช…ํ•ด ๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.