A A
[CV] Object Detection Network ๊ตฌ์กฐ, R-CNN ๊ฐœ์š”
์ด๋ฒˆ ๊ธ€์—์„œ๋Š” Object Detection Network ๊ตฌ์กฐ ๊ฐœ์š”, FPS, Resolution๊ณผ ์„ฑ๋Šฅ ์ƒ๊ด€ ๊ด€๊ณ„, R-CNN ์— ๋ฐํ•˜์—ฌ ์•Œ์•„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Object Detection Network ๊ฐœ์š”

  • Object Detetction Network ๊ตฌ์กฐ๋Š” ๋‘ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
  • ํŠน์ง• ์ถ”์ถœ ๋„คํŠธ์›Œํฌ(Feature Extractor Network)์™€ ๊ฐ์ฒด ํƒ์ง€ ๋„คํŠธ์›Œํฌ(Object Detection Network)์ž…๋‹ˆ๋‹ค.
  • ๋‘ ๋„คํŠธ์›Œํฌ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ • ์ž‘์—…์— ๋งž๊ฒŒ ๋ฏธ์„ธ ์กฐ์ •๋ฉ๋‹ˆ๋‹ค.

 

Feature Extractor Network (ํŠน์ง• ์ถ”์ถœ ๋„คํŠธ์›Œํฌ)

  • ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ์œ ์šฉํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • VGG, ResNet, Inception ๋“ฑ๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ๋ณดํ†ต ImageNet ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต(Pretrained)๋ฉ๋‹ˆ๋‹ค.
  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ ˆ์ด์–ด(Layer 1, Layer 2, ..., Layer N)๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ถ”์ถœ๋œ ํŠน์ง• ๋งต(feature map)์„ ์ƒ์„ฑํ•˜์—ฌ ๊ฐ์ฒด ํƒ์ง€ ๋„คํŠธ์›Œํฌ์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.

 

Object Detection Network (๊ฐ์ฒด ํƒ์ง€ ๋„คํŠธ์›Œํฌ)

  • ์ถ”์ถœ๋œ ํŠน์ง• ๋งต์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๊ณ , ๊ฐ ๊ฐ์ฒด์˜ ํด๋ž˜์Šค์™€ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ๋ณดํ†ต Pascal VOC๋‚˜ MS-COCO์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ์…‹์„ ๊ธฐ๋ฐ˜์œผ๋กœ Pre-Train ๋ฉ๋‹ˆ๋‹ค.
    • Feature Extractor Network์˜ Feature map์œผ๋กœ ๊ธฐ๋ฐ˜.
  • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ ˆ์ด์–ด(Layer A, Layer B, Layer C)๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ๊ฐ ๋ ˆ์ด์–ด๋Š” ํŠน์ • ์—ญํ• ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์™€ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ํฌํ•จํ•˜๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ํ•œ ๋ณ„๋„์˜ Network๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Image Resolution, FPS, Detection ์„ฑ๋Šฅ ์ƒ๊ด€ ๊ด€๊ณ„

  • ์—ฌ๊ธฐ์„œ ๋ด์•ผํ•˜๋Š”๊ฑด FPS์™€ Image Resoultion์˜ ๊ด€๊ณ„ ์ž…๋‹ˆ๋‹ค. ์„œ๋กœ ์ƒ๊ด€๊ด€๊ณ„ ์ž…๋‹ˆ๋‹ค.
  • Image์˜ Resoultion์ด ๋†’์„์ˆ˜๋ก? 1์ดˆ์— Object Detection ํ•  ์ˆ˜ ์žˆ๋Š” ์ด๋ฏธ์ง€์˜ ๊ฐœ์ˆ˜๊ฐ€ ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค.

R-CNN(Regions with CNN)

์•„๋ž˜๋Š” Object Detection์˜ ๊ฐœ์š”์ž…๋‹ˆ๋‹ค. ์‚ฌ์ง„ ์•„๋ž˜ ๋งํฌ ๋‹ฌ์•„๋†“์„ํ…Œ๋‹ˆ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”!
 

[CV] Object Detection์˜ ์ดํ•ด

Intro Object DetectionObject Detection์€ Deep Learning(๋”ฅ๋Ÿฌ๋‹) ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐœ์ „ํ•˜์˜€์Šต๋‹ˆ๋‹ค.Object detection์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๋กœ, ์ด๋ฏธ์ง€๋‚˜ ๋น„๋””์˜ค ๋‚ด์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์ฐพ๊ณ , ํ•ด

daehyun-bigbread.tistory.com

Object Detection์˜ ์ง„ํ–‰ ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

 

  • ์ด ๋‹ค์ด์–ด๊ทธ๋žจ์€ ๊ฐ์ฒด ํƒ์ง€ ๊ณผ์ •์˜ ์ฃผ์š” ๋‹จ๊ณ„๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ์„ค๋ช…ํ•˜๋ฉฐ, ๊ฐ ๋‹จ๊ณ„๊ฐ€ ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐ๋˜๋Š”์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

 

์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ ๋ฐ ๋‹จ๊ณ„

  • ์›๋ณธ ์ด๋ฏธ์ง€ (Input Image)
    • ๊ฐ์ฒด ํƒ์ง€์˜ ์‹œ์ž‘์ ์ž…๋‹ˆ๋‹ค. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋Š” ๋„คํŠธ์›Œํฌ์— ์ „๋‹ฌ๋˜์–ด ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ๋กœ, ๋นจ๊ฐ„์ƒ‰ ์ฐจ๊ฐ€ ํฌํ•จ๋œ ์ด๋ฏธ์ง€๊ฐ€ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • Feature Extractor (ํŠน์ง• ์ถ”์ถœ๊ธฐ)
    • ์—ฌ๊ธฐ์„œ๋Š” VGG-16 ๋„คํŠธ์›Œํฌ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ์œ ์šฉํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. VGG-16์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ปจ๋ณผ๋ฃจ์…˜ ๋ ˆ์ด์–ด์™€ ํ’€๋ง ๋ ˆ์ด์–ด๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์œผ๋ฉฐ, ์ด๋ฏธ์ง€์˜ ๋‹ค์–‘ํ•œ ํŒจํ„ด๊ณผ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ์ถ”์ถœ๋œ ํŠน์ง• ๋งต(Feature Map)์€ ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  • Feature Map (ํŠน์ง• ๋งต)
    • ํŠน์ง• ์ถ”์ถœ๊ธฐ์—์„œ ์ถ”์ถœ๋œ ๋‹ค์ฐจ์› ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.
    • ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ฑ„๋„๋กœ ์ด๋ฃจ์–ด์ง„ ํ…์„œ๋กœ, ๊ฐ ์ฑ„๋„์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋‹ค๋ฅธ ์ธก๋ฉด์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • Fully Connected Layer - FC (์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด)
    • ํŠน์ง• ๋งต์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๊ฐ ํ”ฝ์…€์ด ์–ด๋–ค ํด๋ž˜์Šค์— ์†ํ•˜๋Š”์ง€ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด๋ฏธ์ง€์˜ ๊ฐ์ฒด๊ฐ€ ์ž๋™์ฐจ์ผ ํ™•๋ฅ ์ด 0.8, ๊ณ ์–‘์ด์ผ ํ™•๋ฅ ์ด 0.1, ๊ฐœ์ผ ํ™•๋ฅ ์ด 0.1๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.
  • Softmax Class Score (ํด๋ž˜์Šค ์ ์ˆ˜)
    • ๊ฐ ํด๋ž˜์Šค์— ์†ํ•  ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
    • ์˜ˆ์‹œ: Car: 0.8, Cat: 0.1, Dog: 0.1
  • Bounding Box Regression (๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)
    • ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๊ฐ์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • Bounding Box ์˜ ์ขŒํ‘œ(x1, y1, x2, y2)๋ฅผ ์˜ˆ์ธกํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ํฌ๊ธฐ๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ž๋™์ฐจ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ๊ฐ€ (x1, y1, x2, y2)๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

๋ณต์Šต - Sliding Window ๋ฐฉ์‹๊ณผ Region Propsal ๋ฐฉ์‹

Sliding Window ๋ฐฉ์‹

  • R-CNN (Regions with CNN)์„ ์„ค๋ช…ํ•˜๋‹ค๊ฐ€ ๊ฐ‘์ž๊ธฐ ์™œ Sliding Window ๋ฐฉ์‹๊ณผ Region Proposal ๋ฐฉ์‹์— ๋ฐํ•œ ์„ค๋ช…์„ ๊ฐ‘์ž๊ธฐ ์™œ ๋“ค๊ณ  ์˜จ ์ด์œ ๋Š” ๋ฌด์—‡์ผ๊นŒ์š”?
  • ์ด์œ ๋Š” R-CNN์ด Region Proposal ๋ฐฉ์‹์— ๊ธฐ๋ฐ˜์„ ํ•˜๊ณ  ์žˆ๊ณ , Sliding Window ๋ฐฉ์‹์ด ๋“ค์–ด๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ๋ณต์Šตํ•˜๋Š” ๊น€์— ์•„๋ž˜์˜ ๊ฐœ๋…๋“ค์„ ํ•œ๋ฒˆ ํ›‘์–ด๋ณธ ๋‹ค์Œ, R-CNN ์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

 

Sliding Window ๋ฐฉ์‹์€ Window๋ฅผ ์™ผ์ชฝ ์ƒ๋‹จ๋ถ€ํ„ฐ ์˜ค๋ฅธ์ชฝ ํ•˜๋‹จ์œผ๋กœ ์ด๋™์‹œํ‚ค๋ฉด์„œ Object๋ฅผ Detection ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
์•ฝ๊ฐ„ ์ด˜์ด˜ํžˆ, ์„ธ๋ฐ€ํ•˜๊ฒŒ window๋ฅผ ์ด๋™์‹œํ‚ค๋ฉด์„œ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.

  • ์ด๋ฏธ์ง€๋ฅผ ์ž‘์€ ์˜์—ญ์œผ๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ๊ฐ์˜ ์ž‘์€ ์˜์—ญ(์œˆ๋„์šฐ)์—์„œ ๊ฐ์ฒด๊ฐ€ ์กด์žฌํ•˜๋Š”์ง€๋ฅผ ํƒ์ง€ํ•˜๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
  • ์žฅ์ ์€ ๋งŽ์€ ์˜์—ญ์„ Scan ํ• ์ˆ˜ ์žˆ๋Š”๊ฒƒ, Window์˜ ํ˜•ํƒœ๋ž‘ Image Scale์„ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ณ€๊ฒฝํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
  • ๋‹จ์ ์€ Object ์—†๋Š” ์˜์—ญ๋„ ๋ฌด์กฐ๊ฑด ์Šฌ๋ผ์ด๋”ฉ ํ•˜์—ฌ์•ผ ํ•˜๋ฉฐ ์—ฌ๋Ÿฌ ํ˜•ํƒœ์˜ Window์™€ ์—ฌ๋Ÿฌ Scale์„ ๊ฐ€์ง„ ์ด๋ฏธ์ง€๋ฅผ ์Šค์บ”ํ•ด์„œ ๊ฒ€์ถœํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ณ  ๊ฒ€์ถœ ์„ฑ๋Šฅ์ด ์ƒ๋Œ€์ ์œผ๋กœ ๋‚ฎ์Šต๋‹ˆ๋‹ค.
  • Region Proposal(์˜์—ญ ์ถ”์ •) ๊ธฐ๋ฒ•์˜ ๋“ฑ์žฅ์œผ๋กœ ํ™œ์šฉ๋„๋Š” ๋–จ์–ด์กŒ์ง€๋งŒ Object Detection ๋ฐœ์ „์„ ์œ„ํ•œ ๊ธฐ์ˆ ์  ํ† ๋Œ€ ์ œ๊ณตํ–ˆ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

์ด๋ ‡๊ฒŒ Window๋ฅผ ์ด˜์ด˜, ์„ธ๋ฐ€ํ•˜๊ฒŒ ์ด๋™์‹œํ‚ค๋ฉด์„œ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ํ•ฉ๋‹ˆ๋‹ค.

Slicing Window ๋ฐฉ์‹ ๊ณผ์ •

์ง„ํ–‰ ๋ฐฉ์‹์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  1. ์œˆ๋„์šฐ ํฌ๊ธฐ ์„ค์ •: ํƒ์ง€ํ•˜๊ณ ์ž ํ•˜๋Š” ๊ฐ์ฒด์˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ์œˆ๋„์šฐ์˜ ํฌ๊ธฐ๋ฅผ ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด ์œˆ๋„์šฐ๋Š” ์ด๋ฏธ์ง€ ์ „์ฒด์— ๊ฑธ์ณ ์ด๋™ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  2. ์œˆ๋„์šฐ ์ด๋™: ์œˆ๋„์šฐ๋ฅผ ์ด๋ฏธ์ง€์˜ ์‹œ์ž‘์ ์—์„œ๋ถ€ํ„ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์ขŒ์ธก ์ƒ๋‹จ์—์„œ ์šฐ์ธก ํ•˜๋‹จ ๋ฐฉํ–ฅ์œผ๋กœ, ์ง€์ •๋œ ์Šคํ… ํฌ๊ธฐ๋งŒํผ ์ด๋™์‹œํ‚ค๋ฉฐ ๊ฐ ์œ„์น˜์—์„œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  3. ๊ฐ์ฒด ํƒ์ง€: ๊ฐ ์œˆ๋„์šฐ ์œ„์น˜์—์„œ, ์ด๋ฏธ ์ •์˜๋œ ๊ฐ์ฒด ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜(์˜ˆ: Haar feature-based cascade classifiers, SVM ๋“ฑ)์„ ์‚ฌ์šฉํ•˜์—ฌ ์œˆ๋„์šฐ ๋‚ด๋ถ€์— ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š”์ง€ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.
  4. ๊ฒฐ๊ณผ ์ฒ˜๋ฆฌ: ๊ฐ ์œˆ๋„์šฐ์—์„œ์˜ ํƒ์ง€ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ, ์ตœ์ข…์ ์œผ๋กœ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ์ด๋•Œ ์—ฌ๋Ÿฌ ์œˆ๋„์šฐ์—์„œ ์ค‘๋ณต์œผ๋กœ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ–ˆ์„ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, ์ค‘๋ณต ์ œ๊ฑฐ ๊ณผ์ •(non-max suppression)์„ ๊ฑฐ์ณ ์ตœ์ข… ํƒ์ง€ ๊ฒฐ๊ณผ๋ฅผ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

Region Proposal (์˜์—ญ ์ถ”์ •) ๋ฐฉ์‹

"Object๊ฐ€ ์žˆ์„ ๋งŒํ•œ ํ›„๋ณด ์˜์—ญ์„ ์ฐพ์ž" ์ด๋Ÿฌํ•œ ๊ฐœ๋…์œผ๋กœ ์˜์—ญ ์ถ”์ •์„ ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Object Detection์„ ํ•ฉ๋‹ˆ๋‹ค.

Region Proposal (์˜์—ญ ์ถ”์ •) ๋ฐฉ์‹์˜ ์ฃผ์š” ๊ณผ์ •

  • ์ด ๋ฐฉ๋ฒ•์€ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ๋“ค์„ ๋จผ์ € ์‹๋ณ„ํ•˜๊ณ , ๊ทธ ํ›„์— ์‹๋ณ„๋œ ์˜์—ญ๋“ค์„ ๋Œ€์ƒ์œผ๋กœ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ๊ณ„์‚ฐ ๋น„์šฉ์„ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ฒ˜๋ฆฌ ์†๋„์™€ ์ •ํ™•๋„๋ฅผ ๋™์‹œ์— ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Region Propsal ๋ฐฉ์‹์— ๊ธฐ๋ฐ˜ํ•œ Object Detection - RCNN

๊ทธ๋Ÿฌ๋ฉด Sliding Window ๋ฐฉ์‹๊ณผ Region Proposal ๋ฐฉ์‹์— ๋ฐํ•œ ๊ฐœ๋…์„ ๋‹ค์‹œ ํ•œ๋ฒˆ ๋ณด์•˜์œผ๋‹ˆ๊นŒ, ๊ณ„์†ํ•ด์„œ R-CNN (Regions with CNN)์„ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • RCNN(Regions with Convolutional Neural Networks) ๊ฐ์ฒด ํƒ์ง€ ๋ฐฉ๋ฒ•์˜ ์ฃผ์š” ๋‹จ๊ณ„๋ฅผ ์„ค๋ช…ํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • RCNN์€ ๊ฐ์ฒด ํƒ์ง€๋ฅผ ์œ„ํ•ด Region Proposal ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด๋ฅผ ์ฐพ๊ณ  ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.

 

RCNN์˜ ๊ฐ์ฒด ํƒ์ง€ ๊ณผ์ •

  • ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ: Selective Search๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ(Region Proposal)์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
  • ํŠน์ง• ์ถ”์ถœ: AlexNet์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์ œ์•ˆ๋œ ์˜์—ญ์—์„œ ๊ณ ์ˆ˜์ค€ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ํŠน์ง• ๋งต ์ฒ˜๋ฆฌ: ์ถ”์ถœ๋œ Feature Map(ํŠน์ง• ๋งต)์„ Flatten(ํ‰ํƒ„ํ™”)ํ•˜์—ฌ ์™„์ „ ์—ฐ๊ฒฐ ๋ ˆ์ด์–ด์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ์ฒด ๋ถ„๋ฅ˜: SVM Classifier(๋ถ„๋ฅ˜๊ธฐ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๊ฐ์ฒด์˜ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ์œ„์น˜ ์˜ˆ์ธก: Bounding Box Regression(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)๋ฅผ ํ†ตํ•ด ๊ฐ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ์ตœ์ข… ๊ฒฐ๊ณผ: ์˜ˆ์ธก๋œ ํด๋ž˜์Šค์™€ Bounding Box(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค)๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

R-CNN ๊ฐœ์š”

R-CNN์€ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ๋กœ, ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•˜๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ณผ์ •์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค

์ฃผ์š” ๋‹จ๊ณ„

  • Input Image (์ž…๋ ฅ ์ด๋ฏธ์ง€)
    • R-CNN ๋ชจ๋ธ์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์˜ˆ์‹œ ์ด๋ฏธ์ง€์—์„œ ์นด์šฐ๋ณด์ด์™€ ๋ง์ด ํฌํ•จ๋œ ์‚ฌ์ง„์ด ์‚ฌ์šฉ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • Extract Region Proposals (~2k) (์˜์—ญ ์ œ์•ˆ ์ถ”์ถœ)
    • Selective Search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ฝ 2000๊ฐœ์˜ ์ž ์žฌ์ ์ธ ๊ฐ์ฒด ์œ„์น˜๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด ์ œ์•ˆ๋œ ์˜์—ญ์€ ๋‹ค์Œ ๋‹จ๊ณ„์—์„œ ๋ถ„์„๋ฉ๋‹ˆ๋‹ค.
  • Compute CNN Features (CNN ํŠน์ง• ๊ณ„์‚ฐ)
    • ๊ฐ ์ œ์•ˆ๋œ ์˜์—ญ์„ CNN์— ์ž…๋ ฅํ•˜์—ฌ ํŠน์ง•์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค. CNN์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ(AlexNet, VGG ๋“ฑ)์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ํŠน์ง• ๋งต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Classify Regions (์˜์—ญ ๋ถ„๋ฅ˜)
    • SVM ๋ถ„๋ฅ˜๊ธฐ๋Š” CNN์—์„œ ์ถ”์ถœ๋œ ํŠน์ง•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ์˜์—ญ์„ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ๋žŒ(person), ๋น„ํ–‰๊ธฐ(aeroplane), ๋ชจ๋‹ˆํ„ฐ(tvmonitor) ๋“ฑ์˜ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜๋ฉ๋‹ˆ๋‹ค.
    • ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€๋Š” ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๋”์šฑ ์ •ํ™•ํ•˜๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

 

Bounding Box Regression

Bounding Box Regression์€ ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค.

  • ํฌ๊ฒŒ ์˜ˆ์ธก๋œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์กฐ์ •(Adjusted Predictions), ๋ชฉํ‘œ(Target), ์†์‹ค ํ•จ์ˆ˜(Loss Function) 3๊ฐœ์˜ ๋ถ€๋ถ„์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์˜ˆ์ธก๋œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์กฐ์ • (Adjusted Predictions): ์˜ˆ์ธก๋œ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค์˜ ์ขŒํ‘œ์™€ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•˜์—ฌ ์‹ค์ œ ๊ฐ์ฒด ์œ„์น˜์— ๋งž์ถฅ๋‹ˆ๋‹ค.
    • ๋ชฉํ‘œ (Target): ์‹ค์ œ ๊ฐ์ฒด์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ์™€ ํฌ๊ธฐ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ชฉํ‘œ๊ฐ’์„ ์ •์˜ํ•ฉ๋‹ˆ๋‹ค.
    • ์†์‹ค ํ•จ์ˆ˜(Loss Function): ์˜ˆ์ธก๋œ ๊ฐ’๊ณผ ์‹ค์ œ ๊ฐ’ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค.

 

R-CNN ์žฅ๋‹จ์ 

R-CNN์˜ ์žฅ๋‹จ์ ์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์žฅ์ : ๋™์‹œ๋Œ€์˜ ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋Œ€๋น„ ๋งค์šฐ ๋†’์€ Detection ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋‹จ์ : ๋„ˆ๋ฌด ๋Š๋ฆฐ Detection ์‹œ๊ฐ„๊ณผ ๋ณต์žกํ•œ ์•„ํ‚คํ…์ฒ˜ ๋ฐ ํ•™์Šต ํ”„๋กœ์„ธ์Šค๊ฐ€ ๋‹จ์ ์ž…๋‹ˆ๋‹ค.
    • ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋งˆ๋‹ค selective search๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ 2000๊ฐœ์˜ region ์˜์—ญ ์ด๋ฏธ์ง€๋“ค ๋„์ถœํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐœ๋ณ„ ์ด๋ฏธ์ง€๋ณ„๋กœ 2000๊ฐœ์”ฉ ์ƒ์„ฑ๋œ region ์ด๋ฏธ์ง€๋ฅผ CNN Feature map ์ƒ์„ฑ ํ•ฉ๋‹ˆ๋‹ค.
    • ๊ฐ๊ธฐ ๋”ฐ๋กœ ๋…ธ๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋“ค. Selective search, CNN Feature Extractor, SVM๊ณผ Bounding box regressor๋กœ ๊ตฌ์„ฑ๋˜์–ด ๋ณต์žกํ•œ ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฑฐ์ณ์„œ ํ•™์Šต ๋ฐ Object Detection์ด ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

1์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ Object Detection ํ•˜๋Š”๋ฐ ์•ฝ 50์ดˆ๊ฐ€ ์†Œ์š”๋ฉ๋‹ˆ๋‹ค.

 

R-CNN ์ดํ›„ Object Detection ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ฑ

  • Deep Learning ๊ธฐ๋ฐ˜ Object Detection ์„ฑ๋Šฅ์„ ์ž…์ฆ
  • Region Proposal ๊ธฐ๋ฐ˜ ์„ฑ๋Šฅ ์ž…์ฆ - DL ๊ธฐ๋ฐ˜
  • Detection ์ˆ˜ํ–‰ ์‹œ๊ฐ„์„ ์ค„์ด๊ณ  ๋ณต์žกํ•˜๊ฒŒ ๋ถ„๋ฆฌ๋œ ๊ฐœ๋ณ„ ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•ฉ ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ ์—ฐ๊ตฌ์— ๋งค์ง„ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.