A A
[CV] Faster R-CNN (Faster Region-based Convolutional Neural Network)

Faster R-CNN

Faster R-CNN์€ R-CNN ๊ณ„์—ด์˜ Object Detection ๋ชจ๋ธ ์ค‘์—์„œ ๊ฐ€์žฅ ์ง„๋ณด๋œ ๋ชจ๋ธ ์ค‘ ํ•˜๋‚˜๋กœ, Object Detection ์—์„œ ๋งค์šฐ ๋†’์€ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋˜ํ•œ Region Proposal Network (RPN)๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ „์ฒด ์‹œ์Šคํ…œ์˜ ์†๋„์™€ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.

  • Faster R-CNN์€ RPN(Region Proposal Network) + Fast R-CNN์ด ํ•ฉ์ณ์ง„ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ๊ธฐ์กด Selective Search๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋˜ Object ์œ„์น˜์— Bounding Box๋ฅผ ๊ทธ๋ ค์„œ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค. (GPU ์‚ฌ์šฉ)
  • ๋˜ํ•œ Region Proposal Network๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. (Selective Search ์—ญํ• )
  • ๊ทธ๋ฆฌ๊ณ  Network๋กœ๋งŒ ๊ตฌ์„ฑ์ด ๋˜์—ˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

Faster R-CNN ๊ตฌ์กฐ

ํ•œ๋ฒˆ Faster R-CNN ๊ตฌ์กฐ์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ์ฃผ์˜คํ•œ ํŠน์ง•์€ Selective Search๋ฅผ Neural Network ๊ตฌ์กฐ๋กœ ๋ณ€๊ฒฝ ํ–ˆ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค
  • ๋˜ํ•œ GPU ์‚ฌ์šฉ์œผ๋กœ ๋น ๋ฅธ ํ•™์Šต  ๋ฐ Inference ์ฒ˜๋ฆฌ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • *End-to-end Network ํ•™์Šต ๋ฐฉ์‹์„ ์ฑ„ํƒํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
*End-to-end Network๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ตœ์ข… ์ถœ๋ ฅ๊นŒ์ง€์˜ ์ „์ฒด ๊ณผ์ •์„ ํ•˜๋‚˜์˜ ๋ชจ๋ธ์ด ์ง์ ‘ ์ฒ˜๋ฆฌํ•˜๋Š” ์‹œ์Šคํ…œ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ค‘๊ฐ„ ๋‹จ๊ณ„์—์„œ ๋ณ„๋„์˜ ์ˆ˜์ž‘์—…, ์ถ”๊ฐ€์ ์ธ ์ฒ˜๋ฆฌ ์—†์ด, ๋ชจ๋ธ์ด ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋ชจ๋“  ์ž‘์—…์„ ์ž๋™์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์„ ๋œปํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ํ•œ๋ฒˆ ํ•œ๋‹จ๊ณ„ ํ•œ๋‹จ๊ณ„ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1. ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ

  • ์ž…๋ ฅ ์ด๋ฏธ์ง€๋Š” CNN์— ์ž…๋ ฅ๋˜์–ด ํŠน์ง• ๋งต์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

2. ๊ณตํ†ต CNN ์ ์šฉ

  • Convolutional Network: ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN์— ํ†ต๊ณผ์‹œ์ผœ ํŠน์ง• ๋งต(feature map)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ์ผ๋ฐ˜์ ์œผ๋กœ VGG-16, ResNet๊ณผ ๊ฐ™์€ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

3. Region Proposal Network (RPN)

  • RPN (Region Proposal Network): ํŠน์ง• ๋งต์„ ์ž…๋ ฅ๋ฐ›์•„ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ์„ ์ œ์•ˆํ•˜๋Š” ์‹ ๊ฒฝ๋ง์ž…๋‹ˆ๋‹ค.
  • RPN์€ ์Šฌ๋ผ์ด๋”ฉ ์œˆ๋„์šฐ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์ง• ๋งต์„ ์Šค์บ”ํ•˜๋ฉด์„œ ๊ฐ ์œ„์น˜์—์„œ ์—ฌ๋Ÿฌ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ ์•ต์ปค ๋ฐ•์Šค(anchor boxes)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ ์•ต์ปค ๋ฐ•์Šค์— ๋Œ€ํ•ด ๊ฐ์ฒด์ผ ๊ฐ€๋Šฅ์„ฑ(objectness score)๊ณผ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  • ๋†’์€ ๊ฐ์ฒด ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง„ ์•ต์ปค ๋ฐ•์Šค๋“ค์€ Region Proposal๋กœ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.

4. RoI Pooling Layer

  • RoI Pooling (Region of Interest Pooling): RPN์—์„œ ์ œ์•ˆ๋œ Region Proposal์€ RoI Pooling Layer๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.
  • RoI Pooling Layer๋Š” ๊ฐ Region Proposal์„ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ํ”ผ์ฒ˜ ๋งต์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Š” ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ Region Proposal์„ ๋™์ผํ•œ ํฌ๊ธฐ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์ดํ›„ ๋‹จ๊ณ„์—์„œ ์ผ๊ด€๋œ ์ฒ˜๋ฆฌ๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

5. Fully-Connected Layer - FC(์™„์ „ ์—ฐ๊ฒฐ ์ธต)

  • RoI Pooling Layer์˜ ์ถœ๋ ฅ์„ ์™„์ „ ์—ฐ๊ฒฐ ์ธต์— ์ž…๋ ฅํ•˜์—ฌ ๊ฐ์ฒด์˜ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๊ณ , ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ๋ฅผ ํšŒ๊ท€ํ•ฉ๋‹ˆ๋‹ค.

6. Softmax Clasifier(ํด๋ž˜์Šค ๋ถ„๋ฅ˜) ๋ฐ Bounding Box Regression(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)

  • Softmax Classifier: ๊ฐ Region Proposal์˜ ํด๋ž˜์Šค ํ™•๋ฅ ์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ: Obj1: 0.8, Obj2: 0.1, Obj3: 0.1
  • Bounding Box Regression: ๊ฐ Region Proposal์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ์ฒด์˜ ์ •ํ™•ํ•œ ์œ„์น˜๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

Region Proposal Network ๊ตฌํ˜„ ์ด์Šˆ

์—ฌ๊ธฐ์„œ Selective Search๋ฅผ ๋Œ€์ฒดํ•˜๊ธฐ ์œ„ํ•œ Region Proposal Network๋ฅผ ๊ตฌํ˜„ ์ด์Šˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ผ๋‹จ Selective Search๋Š” Edge detect, Color์˜ ๊ฐ’์„ ์ถ”์ฒ™ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ๋ฐ์ดํ„ฐ๋กœ ์ฃผ์–ด์งˆ Feature๋Š” pixel ๊ฐ’, Target์€ Ground Truth Bounding Box์ธ๋ฐ ์ด๋ฅผ ์ด์šฉํ•ด ์–ด๋–ป๊ฒŒ Selective Search ์ˆ˜์ค€์˜ Region Proposal์„ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ธ๊ฐ€? ๋ผ๋Š” ๊ตฌํ˜„ ์ด์Šˆ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋•Œ Anchor Box๋ผ๋Š” ๊ฐœ๋…์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • Anchor Box๋Š” Object๊ฐ€ ์žˆ๋Š”์ง€ ์—†๋Š”์ง€์˜ ํ›„๋ณด Box์˜ ๊ธฐ์ค€์„ ์ •ํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

Anchor Box ๊ตฌ์„ฑ

Anchor Box๋Š” ๊ฐ์ฒด ํƒ์ง€ ๋ชจ๋ธ์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ๊ฐœ๋…์œผ๋กœ, ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ(aspect ratio)์˜ ๋ฐ•์Šค๋ฅผ ๋ฏธ๋ฆฌ ์ •์˜ํ•˜์—ฌ ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜๋ฅผ ์ œ์•ˆํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

    • Anchor Box๋Š” Size(ํฌ๊ธฐ), Ratio(๋น„์œจ), Anchor Box์˜ ๊ฐœ์ˆ˜. ์ด๋ ‡๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ด 9 ๊ฐœ์˜ Anchor box, 3๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ, 3๊ฐœ์˜ ์„œ๋กœ ๋‹ค๋ฅธ ratio๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • Size(ํฌ๊ธฐ)๋Š” 128 x 128, 256 x 256, 512 x 512 ์ด๋ ‡๊ฒŒ 3๊ฐ€์ง€ ํฌ๊ธฐ๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • Ratio(๋น„์œจ)์€ 1:1, 1:2, 2:1 3๊ฐ€์ง€ ๋น„์œจ์˜ Anchor Box๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ๊ฐ Anchor Box๋Š” ์ค‘์•™์ ์„ ๊ธฐ์ค€์œผ๋กœ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ๋กœ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋ชจ์–‘์˜ ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋„์™€์ค๋‹ˆ๋‹ค. 
      • 1:1 ๋น„์œจ์—์„œ๋Š” ์ •์‚ฌ๊ฐํ˜• ๋ฐ•์Šค๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค, Aspect Ratio: 1: ์ •์‚ฌ๊ฐํ˜• ๋ฐ•์Šค (์ดˆ๋ก์ƒ‰)
      • 1:2 ๋น„์œจ์—์„œ๋Š” ์„ธ๋กœ๋กœ ๊ธด ๋ฐ•์Šค๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค, Aspect Ratio: 0.5: ์„ธ๋กœ๋กœ ๊ธด ๋ฐ•์Šค (ํŒŒ๋ž€์ƒ‰)
      • 2:1 ๋น„์œจ์—์„œ๋Š” ๊ฐ€๋กœ๋กœ ๊ธด ๋ฐ•์Šค๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค, Aspect Ratio: 2: ๊ฐ€๋กœ๋กœ ๊ธด ๋ฐ•์Šค (๋นจ๊ฐ„์ƒ‰)
Anchor Box -> Object ์œ  or ๋ฌด, Region Proposal ์ฐพ๊ธฐ (๋Œ€๋žต์ ์œผ๋กœ)

 

Anchor Box ํŠน์ง•

  • ์œ„์˜ ์‚ฌ์ง„์˜ ์™ผ์ชฝ ๋ถ€๋ถ„์„ ๋ณด๋ฉด ์ด๋ฏธ์ง€ ๋‚ด์— ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ Anchor Box๊ฐ€ ๋ฐฐ์น˜๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด Anchor Box๋“ค์€ ๊ณ ์ •๋œ ์œ„์น˜์— ๋”ฐ๋ผ์„œ ๋ฏธ๋ฆฌ ์ •์˜๋œ ํฌ๊ธฐ๋กœ ์„ค์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์˜ค๋ฅธ์ชฝ์˜ Anchor Box 1,2๋ฅผ ๋ณด๋ฉด ์„ธ๋กœ๋กœ ๊ธด, ๊ฐ€๋กœ๋กœ ๊ธด ๋น„์œจ์˜ ๋ฐ•์Šค๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฐ€์šด๋ฐ์˜ ์ ์€ Anchor Box์˜ ์ค‘์‹ฌ์ ์ด๋ฉฐ, ์ค‘์‹ฌ์ ์„ ๊ธฐ์ค€์œผ๋กœ ์—ฌ๋Ÿฌ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ ๋ฐ•์Šค๊ฐ€ ๋™์ผํ•œ ์œ„์น˜์— ์ค‘์ฒฉ๋˜์–ด ์žˆ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋Š” ๋™์ผํ•œ ์œ„์น˜์—์„œ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํ˜•ํƒœ์˜ ๊ฐ์ฒด๋ฅผ ๊ฐ์ง€ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

Image์™€ Feature Map์—์„œ Anchor Box Mapping

  • Anchor Box์˜ Mapping ๊ณผ์ •์„ ์„ค๋ช…ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์›๋ณธ ์ด๋ฏธ์ง€์˜ ๊ฐ ์ ์—์„œ ์ƒ์„ฑ๋œ Anchor Box๋Š” Feature Map์˜ ๋Œ€์‘๋˜๋Š” ์ ์œผ๋กœ ๋งคํ•‘๋ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค์–ด, ์›๋ณธ ์ด๋ฏธ์ง€์˜ 600x1000 ํ”ฝ์…€ ํฌ๊ธฐ์˜ ์˜์—ญ์ด 40x60 ํ”ฝ์…€ ํฌ๊ธฐ์˜ Feature Map์œผ๋กœ ์ถ•์†Œ๋˜๋ฉด, ๊ฐ Feature Map ํ”ฝ์…€์€ ์›๋ณธ ์ด๋ฏธ์ง€์˜ 15x16 ํ”ฝ์…€ ์˜์—ญ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  CNN(์˜ˆ: VGG, ResNet) ๋“ฑ์˜ Backbone ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ตํ•ด ์ด๋ฏธ์ง€๋ฅผ ํ†ต๊ณผ์‹œํ‚ค๋ฉด ๋” ์ž‘์€ ํฌ๊ธฐ์˜ Feature Map์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด Feature Map์˜ ๊ฐ ์ ์€ ์›๋ž˜ ์ด๋ฏธ์ง€์˜ ํฐ ์˜์—ญ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค์–ด, ์›๋ณธ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ 600 x 1000์ด๋ผ๋ฉด, CNN์„ ํ†ต๊ณผํ•œ ํ›„์˜ Feature Map ํฌ๊ธฐ๋Š” 40 x 60์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ด ๊ฒฝ์šฐ, Feature Map์˜ ๊ฐ ์ ์€ ์›๋ณธ ์ด๋ฏธ์ง€์˜ 15 x 16 ํ”ฝ์…€ ์˜์—ญ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

  • ์ด ์‚ฌ์ง„์€ Anchor Box๋“ค์ด ์ƒ์„ฑ๋  ์ค‘์‹ฌ์ ๋“ค์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๊ฐ ์ ์€ ์ค‘์‹ฌ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋ฉฐ, ํ•ด๋‹น ์ค‘์‹ฌ์ ์—์„œ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ Anchor Box๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    •  

  • ์œ„์˜ ์‚ฌ์ง„์€ ํ•œ ์ง€์ ์—์„œ ์ƒ์„ฑ๋œ ๋ชจ๋“  Anchor Box๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ์ด๋Š” ์ค‘์‹ฌ์  ์ฃผ๋ณ€์˜ ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ Anchor Box๊ฐ€ ์–ด๋–ป๊ฒŒ ์ƒ์„ฑ๋˜๋Š”์ง€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
  • ์™ผ์ชฝ์€ Anchor, ๊ฐ€์šด๋ฐ๋Š” ํ•˜๋‚˜์˜ ์ค‘์‹ฌ์ ์„ ๊ธฐ์ค€์œผ๋กœ Anchor๊ฐ€ ์–ด๋–ป๊ฒŒ ์ƒ์„ฑ์ด ๋˜๋Š”์ง€, ์˜ค๋ฅธ์ชฝ์€ ๋ชจ๋“  Anchor๊ฐ€ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋Š” ์‚ฌ์ง„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

  • ์œ„ ์‚ฌ์ง„์—์„œ์˜ Anchor Box๋Š” ๋‹ค์–‘ํ•œ ๋น„์œจ๊ณผ ํฌ๊ธฐ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๋นจ๊ฐ„์ƒ‰, ์ดˆ๋ก์ƒ‰, ํŒŒ๋ž€์ƒ‰ ๋ฐ•์Šค๊ฐ€ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋น„์œจ๊ณผ ํฌ๊ธฐ๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
    • ๋น„์œจ: 0.5, 1, 2
    • ํฌ๊ธฐ: 128 x 128, 256 x 256, 512 x 512
    • ๋น„์œจ๊ณผ ํฌ๊ธฐ๋ฅผ ์กฐํ•ฉํ•˜์—ฌ 9๊ฐœ์˜ Anchor Box๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

 

  • ๊ฐ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๋งˆ๋‹ค 9๊ฐœ์˜ Anchor Box๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๊ฐ€ 800 x 600์ด๊ณ  ๊ทธ๋ฆฌ๋“œ์˜ ํฌ๊ธฐ๊ฐ€ 16 x 16์ธ ๊ฒฝ์šฐ, ์ด 1900๊ฐœ์˜ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด Anchor Box์˜ ์ˆ˜๋Š” 1900 x 9 = 17100๊ฐœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.

 

  • ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •๋œ stride length๋กœ ๋‚˜๋ˆ„์–ด ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ฏธ์ง€๋ฅผ 16 x 16์˜ stride length๋กœ ๋‚˜๋ˆˆ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ฐ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜์—์„œ Anchor Box๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ๋ฅผ ๋“ค์–ด, stride length๊ฐ€ 16์ด๋ผ๋ฉด ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ(800 x 600)๋ฅผ 16์œผ๋กœ ๋‚˜๋ˆ„๋ฉด 50 x 37๊ฐœ์˜ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ๊ณ„์‚ฐ์—์„œ๋Š” ๋” ์ •๋ฐ€ํ•˜๊ฒŒ 1900๊ฐœ์˜ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜๊ฐ€ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

 

  • Anchor Box ๋งคํ•‘
    • ๊ฐ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜์—์„œ 9๊ฐœ์˜ Anchor Box๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๋ฎ์Šต๋‹ˆ๋‹ค.
    • ์ด๋Š” ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ชจ๋“  ์œ„์น˜์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋ฏธ์ง€ ์ „์ฒด์—์„œ ์ƒ์„ฑ๋œ Anchor Box๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋นจ๊ฐ„์ƒ‰์œผ๋กœ ํ‘œ์‹œ๋œ ์˜์—ญ์ด ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ๋ฎ๋Š” Anchor Box๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

 

  • ์ด 1900๊ฐœ์˜ ๊ทธ๋ฆฌ๋“œ ์œ„์น˜์™€ ๊ฐ ์œ„์น˜์—์„œ 9๊ฐœ์˜ Anchor Box๊ฐ€ ์ƒ์„ฑ๋˜์–ด, ์ด 17,100๊ฐœ์˜ Anchor Box๊ฐ€ ์ด๋ฏธ์ง€ ๋‚ด์˜ ๊ฐ์ฒด๋ฅผ ํƒ์ง€ํ•  ์ค€๋น„๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ถ€ ๋ฐ•์Šค๋Š” ์ด๋ฏธ์ง€ ๊ฒฝ๊ณ„๋ฅผ ๋ฒ—์–ด๋‚  ์ˆ˜ ์žˆ์ง€๋งŒ, ์ด๋Š” ๋„คํŠธ์›Œํฌ๊ฐ€ ์ ์ ˆํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Region Proposal Network (RPN) ๊ฐœ์š”

์•„๋ž˜ ์‚ฌ์ง„์€ Faster R-CNN ๋ชจ๋ธ์—์„œ RPN์ด ์–ด๋–ป๊ฒŒ ์ž‘๋™ํ•˜๋Š”์ง€๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
RPN(Region Proposal Network)๋Š” ๊ธฐ์กด์˜ Selective Search๊ฐ€ ์ˆ˜ํ–‰ํ•˜๋Š” Object ์œ„์น˜์— Bounding Box๋ฅผ ๊ทธ๋ ค์„œ ์ถ”์ฒœํ•˜๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • ์›๋ณธ ์ด๋ฏธ์ง€
    • ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ๋„คํŠธ์›Œํฌ์— ์ž…๋ ฅํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • Feature Extractor (VGG ๋“ฑ)
    • ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ VGG์™€ ๊ฐ™์€ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ CNN(Convolutional Neural Network)์— ์˜ํ•ด ์ฒ˜๋ฆฌ๋˜์–ด, ๊ณ ์ˆ˜์ค€์˜ Feature Map(ํŠน์ง• ๋งต)์ด ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.
    • VGG๋Š” ์ด๋ฏธ์ง€๋ฅผ ์—ฌ๋Ÿฌ ์ธต์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ˆ˜์ค€์˜ Feature(ํŠน์ง•)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  • Feature Map (ํŠน์ง• ๋งต)
    • VGG๋ฅผ ํ†ต๊ณผํ•œ ๊ฒฐ๊ณผ๋กœ ์ƒ์„ฑ๋œ Feature Map(ํŠน์ง• ๋งต)์ž…๋‹ˆ๋‹ค.
    • ์ด Feature Map(ํŠน์ง• ๋งต)์€ ์ด๋ฏธ์ง€์˜ ์ค‘์š”ํ•œ ์‹œ๊ฐ์  ์ •๋ณด๋ฅผ ๋‹ด๊ณ  ์žˆ์œผ๋ฉฐ, ํ›„์† ๋‹จ๊ณ„์—์„œ RPN(Region Proposal Network)์— ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • RPN (Region Proposal Network)
    • RPN (Region Proposal Network)์€ Feature Map(ํŠน์ง• ๋งต)์„ ๋ฐ›์•„์„œ ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ Region Proposal(ํ›„๋ณด ์˜์—ญ)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • RPN (Region Proposal Network)์€ Sliding Window ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ Feature Map(ํŠน์ง• ๋งต)์˜ ๋ชจ๋“  ์œ„์น˜์— ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ๋น„์œจ์˜ ์•ต์ปค ๋ฐ•์Šค(Anchor Box)๋ฅผ ๋†“์Šต๋‹ˆ๋‹ค.
    • ๊ฐ Anchor Box๋Š” ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์„ ์ ์ˆ˜๋กœ ๋งค๊ธฐ๋ฉฐ, ํ›„๋ณด ์˜์—ญ์˜ ์ •ํ™•ํ•œ ์œ„์น˜๋ฅผ ๋ณด์ •ํ•ฉ๋‹ˆ๋‹ค.
  • ์˜์—ญ ์ถ”์ฒœ (Region Proposal)
    • RPN์—์„œ ๋†’์€ ์ ์ˆ˜๋ฅผ ๋ฐ›์€ Region Proposal(ํ›„๋ณด ์˜์—ญ)๋“ค์ด ์ตœ์ข… Region Proposal๋กœ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.
    • ์ด ๊ณผ์ •์—์„œ Non-Maximum Suppression, NMS(๋น„์ตœ๋Œ€ ์–ต์ œ) ๊ธฐ์ˆ ์ด ์ ์šฉ๋˜์–ด ๊ฒน์น˜๋Š” ๋ฐ•์Šค ์ค‘์—์„œ ๊ฐ€์žฅ ํ™•์‹คํ•œ ๋ฐ•์Šค๋งŒ ๋‚จ๊น๋‹ˆ๋‹ค.
    • ์ตœ์ข…์ ์œผ๋กœ ์„ ํƒ๋œ ์˜์—ญ๋“ค์€ ๋‹ค์Œ ๋‹จ๊ณ„์ธ Object Detection(๊ฐ์ฒด ํƒ์ง€) ๋ฐ Clasification Network(๋ถ„๋ฅ˜ ๋„คํŠธ์›Œํฌ)๋กœ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

 

Region Proposal Network (RPN) ๊ตฌ์กฐ

RPN (Region Proposal Network)์˜ ๊ตฌ์กฐ์— ๋ฐํ•˜์—ฌ ์ž์„ธํžˆ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Feature Map
    • CNN์„ ํ†ต๊ณผํ•œ ํ›„ ์ƒ์„ฑ๋œ 40x50 ํฌ๊ธฐ์˜ Feature Map(ํŠน์ง• ๋งต)์ž…๋‹ˆ๋‹ค.
    • ์ด Feature Map(ํŠน์ง• ๋งต)์€ ์›๋ณธ ์ด๋ฏธ์ง€์˜ ๊ณ ์ˆ˜์ค€ ํŠน์ง•์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • 3x3 Convolution 512 Channel
    • 3x3 ํฌ๊ธฐ์˜ Filter์™€ 512๊ฐœ์˜ Channel์„ ๊ฐ–๋Š” Convolution Layer๊ฐ€ Feature Map(ํŠน์ง• ๋งต)์— ์ ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์ด Layer๋Š” Feature Map(ํŠน์ง• ๋งต)์˜ ๋” ๋ณต์žกํ•œ ํŒจํ„ด์„ ํ•™์Šตํ•˜์—ฌ Region Proposal(ํ›„๋ณด ์˜์—ญ)์„ ๋” ์ •ํ™•ํ•˜๊ฒŒ ์ œ์•ˆํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • 1x1 Fully Convolutional Layer
    • 3x3 Convolution Layer๋ฅผ ๊ฑฐ์นœ ๊ฒฐ๊ณผ๋ฅผ 1x1 ํฌ๊ธฐ์˜ filter๋ฅผ ๊ฐ–๋Š” ๋‘ ๊ฐœ์˜ Fully-Connected Layer๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • Softmax Classification
    • ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฐ Region Proposal(ํ›„๋ณด ์˜์—ญ)์ด ๋ฌผ์ฒด(FG, Foreground)์ธ์ง€ ๋ฐฐ๊ฒฝ(BG, Background)์ธ์ง€ ํŒ๋‹จํ•˜๊ณ  ํ™•๋ฅ ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ Anchor๋Š” ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ์œผ๋กœ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.
  • Bounding Box Regression
    • ๋‘ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” Bounding Box Regression(๊ฒฝ๊ณ„ ์ƒ์ž ํšŒ๊ท€)๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ Region Proposal(ํ›„๋ณด ์˜์—ญ)์˜ ์ •ํ™•ํ•œ ์œ„์น˜(x1, y1, w, h)๋ฅผ ๋ณด์ • ๋ฐ ๊ฐ Anchor์˜ ์ •ํ™•ํ•œ ์œ„์น˜์™€ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ๊ฐ Anchor์˜ ์ค‘์‹ฌ ์ขŒํ‘œ(x, y)์™€ ํฌ๊ธฐ(w, h)๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ Region Proposal(ํ›„๋ณด ์˜์—ญ)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

RPN ๊ตฌ์กฐ (๋ถ€๊ฐ€ ์„ค๋ช…)

ใ„ฒ

  • CNN์„ ํ†ตํ•ด ๋ฝ‘์•„๋‚ธ Feature Map ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Šต๋‹ˆ๋‹ค. ์ด ๋•Œ, Feature Map์˜ ํฌ๊ธฐ๋ฅผ H x W x C๋กœ ์žก์Šต๋‹ˆ๋‹ค. ๊ฐ๊ฐ ๊ฐ€๋กœ, ์„ธ๋กœ, ์ฑ„๋„ ์ˆ˜ ์ž…๋‹ˆ๋‹ค.
  • Feature Map์— 3x3 Convoultion์„ 256 ํ˜น์€ 512 ์ฑ„๋„๋งŒํผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ intermediate layer์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ๋•Œ, padding์„ 1๋กœ ์„ค์ •ํ•ด์ค˜์„œ H x W๊ฐ€ ๋ณด์กด๋  ์ˆ˜ ์žˆ๋„๋ก ํ•ด์ค๋‹ˆ๋‹ค.
    • intermediate layer ์ˆ˜ํ–‰ ๊ฒฐ๊ณผ H x W x 256 or H x W x 512 ํฌ๊ธฐ์˜ ๋‘ ๋ฒˆ์งธ Feature Map์„ ์–ป์Šต๋‹ˆ๋‹ค.
  • ๋‘ ๋ฒˆ์งธ Feature Map์„ ์ž…๋ ฅ ๋ฐ›์•„์„œ classification๊ณผ bounding box regression ์˜ˆ์ธก ๊ฐ’์„ ๊ณ„์‚ฐํ•ด์ค˜์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๋•Œ ์ฃผ์˜ํ•ด์•ผ ํ•  ์ ์€ Fully Connected Layer๊ฐ€ ์•„๋‹ˆ๋ผ 1 x 1 Convoultion์„ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐํ•˜๋Š” Fully Convolution Network์˜ ํŠน์ง•์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.
    • ์ด๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ์— ์ƒ๊ด€์—†์ด ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๊ธฐ ์œ„ํ•จ์ž…๋‹ˆ๋‹ค.
  • ๋จผ์ € Classification์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•ด์„œ 1 x 1 Convoultion์„ (Object ์ธ์ง€ ์•„๋‹Œ์ง€ ๋‚˜ํƒ€๋‚ด๋Š” ์ง€ํ‘œ ์ˆ˜) x 9(Anchor ๊ฐœ์ˆ˜) ์ฒด๋„ ์ˆ˜ ๋งŒํผ ์ˆ˜ํ–‰ํ•ด์ฃผ๋ฉฐ, ๊ทธ ๊ฒฐ๊ณผ๋กœ H x W x 18 ํฌ๊ธฐ์˜ Feature Map์„ ์–ป์Šต๋‹ˆ๋‹ค.
    • H x W ์ƒ์˜ ํ•˜๋‚˜์˜ ์ธ๋ฑ์Šค๋Š” Feature Map ์ƒ์˜ ์ขŒํ‘œ๋ฅผ ์˜๋ฏธํ•˜๊ณ , ๊ทธ ์•„๋ž˜ 18๊ฐœ์˜ ์ฑ„๋„์€ ๊ฐ๊ฐ ํ•ด๋‹น ์ขŒํ‘œ๋ฅผ Anchor ์‚ผ์•„ 9๊ฐœ์˜ Anchor Box ๋“ค์ด object์ธ์ง€ ์•„๋‹Œ์ง€์— ๋Œ€ํ•œ ์˜ˆ์ธก ๊ฐ’์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค
    • ์ฆ‰, ํ•œ๋ฒˆ์˜ 1x1 Convoultion์œผ๋กœ H x W์— ํ•ด๋‹น ์ขŒํ‘œ๋“ค์— ๋Œ€ํ•œ Predict์„ ๋ชจ๋‘ ์ˆ˜ํ–‰ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
    • ์ด์ œ ์ด ๊ฐ’์„ ์ ์ ˆํžˆ reshape ํ•ด์ค€ ๋‹ค์Œ Softmax๋ฅผ ์ ์šฉํ•˜์—ฌ ํ•ด๋‹น Anchor๊ฐ€ Object์ผ ํ™•๋ฅ ์„ ์–ป์Šต๋‹ˆ๋‹ค.
  • ๋‘ ๋ฒˆ์งธ๋กœ Bounding Box Regression ์˜ˆ์ธก ๊ฐ’์„ ์–ป๊ธฐ ์œ„ํ•œ 1 x 1 ์ปจ๋ณผ๋ฃจ์…˜์„ (4 x 9) ์ฒด๋„ ์ˆ˜ ๋งŒํผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
    • Regression(ํšŒ๊ท€)์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ฒฐ๊ณผ๋กœ ์–ป์€ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ด์ œ ์•ž์„œ ์–ป์€ ๊ฐ’๋“ค๋กœ RoI๋ฅผ ๊ณ„์‚ฐํ•ด์ค๋‹ˆ๋‹ค. ๋จผ์ € Classification์„ ํ†ตํ•ด ์–ป์€ ๋ฌผ์ฒด์ผ ํ™•๋ฅ  ๊ฐ’์„ ์ •๋ ฌํ•œ ๋‹ค์Œ, ๋†’์€ ์ˆœ์œผ๋กœ K๊ฐœ์˜ Anchor๋ฅผ ์ถ”๋ ค๋ƒ…๋‹ˆ๋‹ค.
    • ์ด K๊ฐœ์˜ Anchor๋“ค์— ๊ฐ๊ฐ Bounding Box regression์„ ์ ์šฉํ•ด์ค๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ Non-Maximum-Suppression์„ ์ ์šฉํ•˜์—ฌ RoI๋ฅผ ๊ตฌํ•ด์ค๋‹ˆ๋‹ค.
feature map์€ ๋„คํŠธ์›Œํฌ์˜ ๊ฐ ์ธต์—์„œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์ ์šฉ๋œ filter์˜ ์ถœ๋ ฅ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.

Anchor Box์™€ Predicted Anchor Box

  • ์œ„์˜ Predicted ๋œ Anchor Box๋“ค์„ ๋ณด๋ฉด, ์–ด๋– ํ•œ Anchor Box๊ฐ€ Object์™€ ์–ผ๋งˆ๋‚˜ ๊ฒน์ง€๋Š”์ง€๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ด…๋‹ˆ๋‹ค.
  • ์ฆ‰, ์–ด๋–ค Anchor Box๊ฐ€ Object๊ฐ€ ์žˆ์„ ํ™•๋ฅ , ๋ฌผ์ฒด(FG, Foreground)์ธ์ง€ ๋ฐฐ๊ฒฝ(BG, Background)์ธ์ง€ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค.

 

Region Proposal Network (RPN) Output

  • Region Proposal Network (RPN)์˜ Output์€ 3x3 Convolution, 512 Channel์ธ Layer๋ฅผ ๊ฑฐ์ณ์„œ 2๊ฐœ์˜ 1x1 ํฌ๊ธฐ์˜ filter๋ฅผ ๊ฐ–๋Š” Fully-Connected Layer๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • Softmax Classification (1x1 Fully Convolutional Layer, 2x9 Output Channel)
    • ์ฒซ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค ๋ถ„๋ฅ˜๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ ๊ฐ Region Proposal(ํ›„๋ณด ์˜์—ญ)์ด ๋ฌผ์ฒด(FG, Foreground)์ธ์ง€ ๋ฐฐ๊ฒฝ(BG, Background)์ธ์ง€ ํŒ๋‹จํ•˜๊ณ  ํ™•๋ฅ ๋กœ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ๋†’์€ ํ™•๋ฅ ์„ ๊ฐ€์ง„ Anchor๋Š” ๋ฌผ์ฒด๊ฐ€ ์กด์žฌํ•  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์˜์—ญ์œผ๋กœ ์„ ํƒ๋ฉ๋‹ˆ๋‹ค.
  • Bounding Box Regression (1x1 Fully Convolutional Layer, 4x9 Output Channel)
    • ๋‘ ๋ฒˆ์งธ ๋ ˆ์ด์–ด๋Š” Bounding Box Regression(๊ฒฝ๊ณ„ ์ƒ์ž ํšŒ๊ท€)๋ฅผ ์ˆ˜ํ–‰ํ•˜์—ฌ Region Proposal(ํ›„๋ณด ์˜์—ญ)์˜ ์ •ํ™•ํ•œ ์œ„์น˜(x1, y1, w, h)๋ฅผ ๋ณด์ • ๋ฐ ๊ฐ Anchor์˜ ์ •ํ™•ํ•œ ์œ„์น˜์™€ ํฌ๊ธฐ๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋Š” ๊ฐ Anchor์˜ ์ค‘์‹ฌ ์ขŒํ‘œ(x, y)์™€ ํฌ๊ธฐ(w, h)๋ฅผ ์กฐ์ •ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ Region Proposal(ํ›„๋ณด ์˜์—ญ)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

Positive Anchor Box, Negative Anchor Box

  • Ground Truth BB(Bounding Box)๊ฐ€ ๊ฒน์น˜๋Š” IOU ๊ฐ’์— ๋”ฐ๋ผ Anchor Box๋ฅผ Positive Anchor Box, Negative Anchor box ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
    • IOU๊ฐ€ 0.7 ์ด์ƒ์ด๋ฉด Positive
    • IOU๊ฐ€ ๊ฐ€์žฅ ๋†’์€ Anochor๋Š” Positive
    • IOU๊ฐ€ 0.3๋ณด๋‹ค ๋‚ฎ์œผ๋ฉด Negative๋กœ ํŒ๋ณ„ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  IOU๊ฐ€ 0.3 ~ 0.7, ์ฆ‰ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์• ๋งคํ•œ ์• ๋“ค์€ ์•„์— out ์‹œ์ผœ๋ฒ„๋ฆฝ๋‹ˆ๋‹ค.
Iou: Bounding Box๋ฅผ ํ†ตํ•ด์„œ Ground Truth์™€ ์˜ˆ์ธก๊ฐ’์ด ์–ผ๋งˆ๋‚˜ ๋งž์•„ ๋–จ์–ด์ง€๋Š” ์ง€ํ‘œ์ž…๋‹ˆ๋‹ค.

Anchor box ๋ฅผ Reference๋กœ ํ•œ Bounding Box Regression

  • ์˜ˆ์ธก Anchor box๋Š” Positive Anchor box์™€์˜ ์ขŒํ‘œ๊ฐ’ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•  ์ˆ˜ ์žˆ๋Š” Bounding Box Regression์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

RPN Classification๊ณผ Bounding Box Regression

RPN ๋ถ„๋ฅ˜์™€ Bounding Box ํšŒ๊ท€์— ๋ฐํ•˜์—ฌ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

RPN ๋ถ„๋ฅ˜ (Classification)

  • Anchor Box(์•ต์ปค ๋ฐ•์Šค)
    • Positive Anchor: ์‹ค์ œ ๊ฐ์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” Ground Truth ๋ฐ•์Šค์™€ ๋†’์€ IoU(Intersection over Union)๋ฅผ ๊ฐ€์ง€๋Š” Anchor Box(์•ต์ปค ๋ฐ•์Šค)์ž…๋‹ˆ๋‹ค. Positive Anchor๋Š” ์ž ์žฌ์ ์ธ ๊ฐ์ฒด ์˜์—ญ์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค.
    • Negative Anchor: Ground Truth ๋ฐ•์Šค์™€ ๋‚ฎ์€ IoU๋ฅผ ๊ฐ€์ง€๋Š” Anchor Box(์•ต์ปค ๋ฐ•์Šค)๋“ค์ž…๋‹ˆ๋‹ค. Negative ์•ต์ปค๋Š” ๋ฐฐ๊ฒฝ ์˜์—ญ์œผ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค.
  • Classification(๋ถ„๋ฅ˜) ๊ณผ์ •:
    • ๊ฐ Anchor Box(์•ต์ปค ๋ฐ•์Šค)์— ๋Œ€ํ•ด ๋„คํŠธ์›Œํฌ๋Š” ์ด๋ฅผ "์ „๊ฒฝ(๊ฐ์ฒด)" ๋˜๋Š” "๋ฐฐ๊ฒฝ(๋น„๊ฐ์ฒด)"์œผ๋กœ ๋ถ„๋ฅ˜ํ•ฉ๋‹ˆ๋‹ค.
    • Classification Loss Function (๋ถ„๋ฅ˜ ์†์‹ค ํ•จ์ˆ˜)๋Š” ๋ณดํ†ต ์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ๋˜๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ(sigmoid) ์†์‹ค๋กœ, ๋„คํŠธ์›Œํฌ๊ฐ€ ์ „๊ฒฝ๊ณผ ๋ฐฐ๊ฒฝ์„ ๊ตฌ๋ถ„ํ•˜๋„๋ก ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€ (Bounding Box Regression)

  • Ground Truth ๋ฐ•์Šค: ์ด๋ฏธ์ง€์—์„œ Object๋ฅผ ์ •ํ™•ํžˆ ๋‘˜๋Ÿฌ์‹ธ๋Š” ์‹ค์ œ Bounding Box ์ž…๋‹ˆ๋‹ค.
  • ์˜ˆ์ธก๋œ Anchor Box(์•ต์ปค ๋ฐ•์Šค)
    • ๋„คํŠธ์›Œํฌ๋Š” Anchor Box(์•ต์ปค ๋ฐ•์Šค)์Šค๋ฅผ Ground Truth ๋ฐ•์Šค์— ๋” ์ž˜ ๋งž๋„๋ก ์กฐ์ •ํ•˜๊ธฐ ์œ„ํ•œ ์˜ˆ์ธก๊ฐ’์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ์กฐ์ •๊ฐ’์€ ์ผ๋ฐ˜์ ์œผ๋กœ Δx, Δy ,Δw, Δh๋กœ ํ‘œํ˜„๋˜๋ฉฐ, Anchor Box(์•ต์ปค ๋ฐ•์Šค)๋ฅผ Ground Truth ๋ฐ•์Šค์— ๋งž์ถ”๊ธฐ ์œ„ํ•œ ์ด๋™ ๋ฐ ์Šค์ผ€์ผ๋ง ๊ฐ’์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • Regression(ํšŒ๊ท€) ๊ณผ์ •:
    • ๋„คํŠธ์›Œํฌ๋Š” ์ด๋Ÿฌํ•œ ์กฐ์ •๊ฐ’์„ ์‚ฌ์šฉํ•˜์—ฌ ์•ต์ปค ๋ฐ•์Šค๋ฅผ ๋” ์ •ํ™•ํ•˜๊ฒŒ Ground Truth ๋ฐ•์Šค์— ๋งž์ถฅ๋‹ˆ๋‹ค.
    • Bounding Box Regression Loss Function (๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€ ์†์‹ค ํ•จ์ˆ˜)๋Š” ๋ณดํ†ต smooth L1 Loss๋กœ, ๋„คํŠธ์›Œํฌ๊ฐ€ ์ •ํ™•ํ•˜๊ฒŒ ์ด ์กฐ์ •๊ฐ’์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

Anchor Box์— ๋”ฐ๋ฅธ RPN Output

ํ•œ๋ฒˆ Anchor Box์— ๋”ฐ๋ฅธ Region Proposal Network(RPN)์˜ Output์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • Convolution Feature Map์—์„œ Feature Map์„ ์ด๋™ํ•˜๋ฉฐ ๊ฐ ์œ„์น˜๋งˆ๋‹ค Region Proposal(์ œ์•ˆ๋œ ์˜์—ญ)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ํ•œ Sliding WIndow ๋ฐฉ์‹์„ ์ด์šฉํ•ด์„œ ๊ฐ ์œ„์น˜์˜ ๊ธฐ์ค€์ ์—์„œ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Window๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋•Œ ๊ฐ Sliding Window ์œ„์น˜์—์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ ํฌ๊ธฐ & ๋น„์œจ์„ ๊ฐ€์ง„ Anchor Box๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ค‘๊ฐ„ ๊ณ„์ธต์ธ Sliding Window๋กœ ์ถ”์ถœ๋œ ๊ฐ ์œ„์น˜์˜ ์ •๋ณด๋ฅผ 256์ฐจ์›์˜ Vector๋กœ ๋ณ€ํ™˜ํ•˜๋Š” Intermeidate Layer๋ฅผ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
    • ์ด๋•Œ, VGG Network๋ฅผ ์‚ฌ์šฉํ• ์‹œ, Intermeidate Layer๋Š” 512์ฐจ์›์˜ Vector๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  • ๋˜ํ•œ ์ด์ œ, Intermeidate Layer๋ฅผ ๊ฑฐ์น˜๋ฉด Classification(๋ถ„๋ฅ˜), Regression(ํšŒ๊ท€) ๊ณ„์ธต์œผ๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค.
    • Classification(๋ถ„๋ฅ˜) Layer (2 x 9) ์—์„œ๋Š” ๊ฐ Anchor Box๊ฐ€ ๊ฐ์ฒด์ธ์ง€ ์•„๋‹Œ์ง€ ์—ฌ๋ถ€๋ฅผ ํŒ๋‹จํ•˜๋Š” 2k๊ฐœ์˜ ์ ์ˆ˜(score)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๊ธฐ์„œ k๋Š” Anchor Box์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
    • Regression(ํšŒ๊ท€) Layer (4 x 9)์—์„œ๋Š” ๊ฐ Anchor Box๋ฅผ ์‹ค์ œ ๊ฐ์ฒด์— ๋” ์ž˜ ๋งž์ถ”๊ธฐ ์œ„ํ•ด ์กฐ์ •ํ•˜๋Š” 4k๊ฐœ์˜ ์ขŒํ‘œ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
    • ์ด ์ขŒํ‘œ๋“ค์€ ์•ต์ปค ๋ฐ•์Šค๋ฅผ ์ด๋™ ๋ฐ ์Šค์ผ€์ผ๋งํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ’๋“ค์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋ณดํ†ต ์ผ๋ฐ˜์ ์œผ๋กœ Δx, Δy ,Δw, Δh๋กœ ํ‘œํ˜„ ๋ฉ๋‹ˆ๋‹ค.

RPN Loss Function

  • RPN(Region Proposal Network)์˜ ์†์‹ค ํ•จ์ˆ˜๋Š” ๋ฌผ์ฒด ํƒ์ง€์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค)๊ณผ Regression Loss (ํšŒ๊ท€ ์†์‹ค)์„ ๊ฒฐํ•ฉํ•œ ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

 

  • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค): Lcls
    • ๊ฐ Anchor Box๊ฐ€ ๋ฌผ์ฒด๋ฅผ ํฌํ•จํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์ธก ํ™•๋ฅ  pi ์™€ ์‹ค์ œ ๊ฐ’ pi* ์‚ฌ์ด์˜ ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์ธก ๊ฐ’์ด ์‹ค์ œ ๊ฐ’๊ณผ ์–ผ๋งˆ๋‚˜ ์ผ์น˜ํ•˜๋Š”์ง€๋ฅผ ์ธก์ •ํ•ฉ๋‹ˆ๋‹ค.
  • Regression Loss (ํšŒ๊ท€ ์†์‹ค): Lsmooth1
    • ๋ฌผ์ฒด๋ฅผ ํฌํ•จํ•˜๋Š” Anchor Box์˜ ์ขŒํ‘œ๋ฅผ ์กฐ์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์˜ˆ์ธก๋œ ์ขŒํ‘œ ti ์™€ ์‹ค์ œ ์ขŒํ‘œ ti*์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
    • Anchor Box๋ฅผ ์‹ค์ œ Object์— ๋งž๊ฒŒ ์กฐ์ •ํ•˜์—ฌ ๋” ์ •ํ™•ํ•œ Border Box ๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Normalization (์ •๊ทœํ™”): Ncls & Nbox
    • ์†์‹ค ๊ฐ’์„ Mini-Batch Size Ncls ์™€ Box ๊ฐœ์ˆ˜ Nbox ๋กœ ๋‚˜๋ˆ„์–ด Normalization ํ•ฉ๋‹ˆ๋‹ค.
    • ์ด๋กœ ์ธํ•ด ํ•™์Šต์ด ์•ˆ์ •์ ์œผ๋กœ ์ด๋ฃจ์–ด์ง€๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • Balancing Parameter (๋ฐธ๋Ÿฐ์‹ฑ ํŒŒ๋ผ๋ฏธํ„ฐ):
    • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค)๊ณผ Regression Loss (ํšŒ๊ท€ ์†์‹ค)์‚ฌ์ด์˜ ๊ท ํ˜•์„ ๋งž์ถ”๊ธฐ ์œ„ํ•œ parameter์ž…๋‹ˆ๋‹ค.
    • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค)๊ณผ Regression Loss (ํšŒ๊ท€ ์†์‹ค)์˜ ๋น„์ค‘์„ ์กฐ์ •ํ•˜์—ฌ ์ตœ์ ์˜ ํ•™์Šต ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

RPN Training & Region Proposal ์˜์—ญ Filtering

  • RPN Training ๋ฐฉ์‹์€ ์•ž์—์„œ ์ˆ˜ ๋งŽ์ด ์„ค๋ช…์„ ํ–ˆ์œผ๋‹ˆ๊นŒ ์ƒ๋žตํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๋‹ค๋งŒ ์šฐ๋ฆฌ๊ฐ€ Region Proposal ์˜์—ญ Filtering์„ ํ• ๋•Œ For Ground์— ๋”ฐ๋ผ์„œ ๋„˜๊น€๋‹ˆ๋‹ค.
  • ์ด๋•Œ FG๋งŒ ์ฐพ์œผ๋ฉด ๋ฐ”๋กœ ๋„˜๊น๋‹ˆ๋‹ค.

์˜ˆ์ธก Region Proposal Box์˜ Objectness Score

Objectness Score: ์˜ˆ์ธก Box๊ฐ€ Object ์ผ ํ™•๋ฅ (Softmax ๊ฐ’) * Ground Truth bounding box์™€์˜ IOU๊ฐ’ ์ž…๋‹ˆ๋‹ค.

  • ์—ฌ๊ธฐ์„œ ์šฐ๋ฆฌ๊ฐ€ ๋ด์•ผ ํ•˜๋Š”๊ฑด Objectness Score๊ฐ€ ๋†’์€ ์ˆœ์œผ๋กœ Regsion Proposal Box๋ฅผ ์ถ”์ถœ ํ•œ๋‹ค๋Š” ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

Fast R-CNN Training

 

  • Alternating Training: RPN๊ณผ Fast R-CNN ๋ชจ๋ธ์„ ๋ฒˆ๊ฐˆ์•„ ๊ฐ€๋ฉฐ ํ•™์Šตํ•˜๊ณ  ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์ตœ์ ์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ˆœ์„œ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค.
    • RPN ํ•™์Šต: ๋จผ์ € RPN์„ ํ•™์Šตํ•˜์—ฌ ์ดˆ๊ธฐ ์ œ์•ˆ๋œ ์˜์—ญ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • Fast R-CNN ํ•™์Šต: Region Proposal์„ ๊ธฐ๋ฐ˜์œผ๋กœ Classification ๋ฐ Bounding Box Regression์„ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค.
    • Fine Tuning: RPN๊ณผ Fast R-CNN ๋ชจ๋ธ์„ ๋ฒˆ๊ฐˆ์•„ ๊ฐ€๋ฉฐ ๋ฏธ์„ธ ์กฐ์ •ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.

Faster RCNN Detection ์„ฑ๋Šฅ & ์ˆ˜ํ–‰์‹œ๊ฐ„ ๋น„๊ต

  • ์„ฑ๋Šฅ ๋น„๊ต ํ‘œ๋ฅผ ์š”์•ฝํ•ด๋ณด๋ฉด ์•„๋ž˜์˜ ๋‚ด์šฉ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • PASCAL VOC ๋ฐ์ดํ„ฐ์…‹: RPN(Region Proposal Network)์„ ์‚ฌ์šฉํ•œ Faster R-CNN์ด Selective Search๋ฅผ ์‚ฌ์šฉํ•œ Fast R-CNN๋ณด๋‹ค ๋†’์€ mAP๋ฅผ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • COCO ๋ฐ์ดํ„ฐ์…‹: RPN(Region Proposal Network)์„ ์‚ฌ์šฉํ•œ Faster R-CNN์ด ์ „๋ฐ˜์ ์œผ๋กœ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค.
  • ์ „์ฒด์ ์ธ ์„ฑ๋Šฅ: ๋” ์ ์€ ์ œ์•ˆ ๋ฐ•์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด์„œ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ธฐ๋กํ•œ Faster R-CNN์˜ ํšจ์œจ์„ฑ์„ ๊ฐ•์กฐํ•ฉ๋‹ˆ๋‹ค.
  • Selective Search๋ฅผ ์‚ฌ์šฉํ•œ ์‹œ์Šคํ…œ์€ ์ œ์•ˆ ๋ฐ•์Šค ์ƒ์„ฑ์— ์‹œ๊ฐ„์ด ๋งŽ์ด ๊ฑธ๋ ค ์†๋„๊ฐ€ ๋Š๋ฆฝ๋‹ˆ๋‹ค.
  • RPN(Region Proposal Network)์„ ์‚ฌ์šฉํ•˜๋ฉด ์ œ์•ˆ ๋ฐ•์Šค ์ƒ์„ฑ ์‹œ๊ฐ„์ด ํฌ๊ฒŒ ๋‹จ์ถ•๋˜์–ด ์ „์ฒด ์†๋„๊ฐ€ ํฌ๊ฒŒ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค.
  • ZF ๋ชจ๋ธ์€ VGG ๋ชจ๋ธ์— ๋น„ํ•ด ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์ œ๊ณตํ•˜์—ฌ RPN๊ณผ Fast R-CNN์˜ ์กฐํ•ฉ์—์„œ ๊ฐ€์žฅ ๋†’์€ ์ฒ˜๋ฆฌ ์†๋„๋ฅผ ๋ณด์ž…๋‹ˆ๋‹ค.

Summary

R-CNN, Fast R-CNN, Faster R-CNN์— ๋ฐํ•˜์—ฌ ์š”์•ฝ๋ฐ ์ •๋ฆฌ๋ฅผ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • R-CNN (Region-based Convolutional Neural Network)
    • ์ œ์•ˆ ์˜์—ญ ์ถ”์ถœ: ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ Selective Search๋ฅผ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ œ์•ˆ ๋ฐ•์Šค(Regions of Interest, RoIs)๋ฅผ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • CNN: ๊ฐ RoI๋ฅผ CNN์„ ํ†ตํ•ด ๊ฐœ๋ณ„์ ์œผ๋กœ Feature Map์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
    • SVM: ๊ฐ  Feature Map์„ SVM์„ ์‚ฌ์šฉํ•ด Classificationํ•ฉ๋‹ˆ๋‹ค.
    • Bounding Box Regression: ๋ถ„๋ฅ˜๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ Bounding Box๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ๋‹จ์ : ๋งค์šฐ ๋Š๋ฆผ: ๋ชจ๋“  RoI์— ๋Œ€ํ•ด ๊ฐœ๋ณ„์ ์œผ๋กœ CNN(Convolution Neural Network)์„ ์ ์šฉํ•ด์•ผ ํ•˜๋ฏ€๋กœ ์—ฐ์‚ฐ ๋น„์šฉ์ด ๋†’์Šต๋‹ˆ๋‹ค.
  • Fast R-CNN
    • ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ: ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ConvNet์— ๋„ฃ์–ด Feature Map์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • RoI ์ถ”์ถœ: RoI Pooling์„ ์‚ฌ์šฉํ•ด Feature Map์—์„œ Region Proposal์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • Fully Connected Layers (FCs): ์ถ”์ถœ๋œ RoI๋ฅผ FC Layer ์— ํ†ต๊ณผ์‹œํ‚ต๋‹ˆ๋‹ค.
    • Softmax Classifier & Bounding Box Regressor: ๊ฐ RoI์— ๋Œ€ํ•ด ๋ฌผ์ฒด์˜ ์ข…๋ฅ˜๋ฅผ Classification ํ•˜๊ณ , Bounding Box๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์žฅ์ : ์†๋„ ํ–ฅ์ƒ: ํ•œ ๋ฒˆ์˜ CNN(Convolution Neural Network) ์—ฐ์‚ฐ์œผ๋กœ ์ „์ฒด Feature Map์„ ์ƒ์„ฑํ•œ ํ›„ RoI๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.
  • Faster R-CNN
    • ์ž…๋ ฅ ์ด๋ฏธ์ง€ ์ฒ˜๋ฆฌ: ConvNet์„ ํ†ตํ•ด Feature Map์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
    • Region Proposal Network (RPN): Feature Map์—์„œ RoI๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
    • RoI Pooling: RPN์ด ์ œ์•ˆํ•œ RoI๋ฅผ Feature Map ์—์„œ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
    • Fully Connected Layers (FCs): ์ถ”์ถœ๋œ RoI๋ฅผ FC Layer์— ํ†ต๊ณผ์‹œํ‚ต๋‹ˆ๋‹ค.
    • Softmax Classifier & Bounding Box Regressor: ๊ฐ RoI์— ๋Œ€ํ•ด ๋ฌผ์ฒด์˜ ์ข…๋ฅ˜๋ฅผ Classification ํ•˜๊ณ , Bounding Box๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
    • ์žฅ์ : ์†๋„ ๋ฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ: RPN์„ ์‚ฌ์šฉํ•˜์—ฌ RoI ์ œ์•ˆ์„ CNN(Convolution Neural Network)์•ˆ์—์„œ ์ฒ˜๋ฆฌํ•˜๋ฏ€๋กœ ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ค„์ด๊ณ  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
Feature Map: Convolutional Neural Network (CNN)๋ฅผ ํ†ต๊ณผํ•œ ํ›„ ์ƒ์„ฑ๋˜๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด๋กœ, ์›๋ณธ ์ด๋ฏธ์ง€์˜ ํŠน์„ฑ(ํŠน์ง•)๋“ค์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
Region Proposal: ์›๋ณธ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๋งŒํ•œ ํ›„๋ณด ์˜์—ญ (Region of Interest, RoI)์„ ์ฐพ๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.