A A
[CV] Fast R-CNN (Fast Region-based Convolutional Neural Network)

Fast R-CNN ๊ฐœ์š” 

FAST R-CNN์€ R-CNN (Region-based Convolutional Neural Network)๊ณผ SPP-Net (Spatial Pyramid Pooling Network)์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ณ , ๊ฐ์ฒด ํƒ์ง€์˜ ์†๋„์™€ ์ •ํ™•์„ฑ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค

 

  • Fast R-CNN์€ ์ด๋Ÿฌํ•œ ์š”์†Œ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  1. ์ž…๋ ฅ ์ด๋ฏธ์ง€: ์›๋ณธ ์ด๋ฏธ์ง€์™€ Region Proposal์ด ์ž…๋ ฅ๋ฉ๋‹ˆ๋‹ค.
  2. ๊ณตํ†ต CNN: ์ด๋ฏธ์ง€์—์„œ Feature Map(ํŠน์ง• ๋งต)์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.
  3. RoI Pooling Layer: ๊ฐ Region Proposal์„ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ *Feature Map(ํŠน์ง• ๋งต)์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  4. ์™„์ „ ์—ฐ๊ฒฐ ์ธต: ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ *Feature Map(ํŠน์ง• ๋งต)์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ Classification(๋ถ„๋ฅ˜)์™€ Bounding Box Regression(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  5. ์ถœ๋ ฅ: ๊ฐ์ฒด์˜ Class์™€ Bounding Box ์ขŒํ‘œ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
*Feature Map์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ํŠน์ • ํŠน์ง•์„ ์ถ”์ถœํ•˜์—ฌ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
CNN์—์„œ๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๊ฐ ๋ถ€๋ถ„์—์„œ ํŒจํ„ด์„ ์ธ์‹ํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด Feature Map์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

 

  • ๋˜ํ•œ FAST R-CNN์€ ์ด๋Ÿฌํ•œ ์žฅ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ „์ฒด ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•œ ๋ฒˆ์˜ CNN ๊ณ„์‚ฐ๋งŒ์œผ๋กœ Feature Map(ํŠน์ง• ๋งต)์„ ์ƒ์„ฑํ•˜๋ฏ€๋กœ, R-CNN์— ๋น„ํ•ด ์—ฐ์‚ฐ ๋น„์šฉ๊ณผ ์‹œ๊ฐ„์ด ํฌ๊ฒŒ ์ค„์–ด๋“ญ๋‹ˆ๋‹ค.
    • RoI Pooling์„ ํ†ตํ•ด ๊ฐ Region Proposal์— ๋Œ€ํ•ด ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Feature Map(ํŠน์ง• ๋งต)์„ ์ƒ์„ฑํ•˜๋ฏ€๋กœ, ์ถ”๊ฐ€์ ์ธ CNN ๊ณ„์‚ฐ์ด ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • RoI Pooling Layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Feature Map(ํŠน์ง• ๋งต)์„ ์ƒ์„ฑํ•˜๋ฏ€๋กœ, Fully Connected Layer(FC)์— ์ž…๋ ฅ๋˜๋Š” ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋ฅผ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ํ†ตํ•ฉ๋œ Network์—์„œ Classification(๋ถ„๋ฅ˜)์™€ Bounding Box Regression(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)๋ฅผ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•˜๋ฏ€๋กœ, ํšจ์œจ์„ฑ์ด๋ž‘ ๊ฐ์ฒด ํƒ์ง€์˜ ์ •ํ™•์„ฑ์ด ํ–ฅ์ƒ๋ฉ๋‹ˆ๋‹ค.
    • End-to-End ํ•™์Šต์ด ๊ฐ€๋Šฅํ•˜์—ฌ, Network๊ฐ€ ์ „์ฒด์ ์œผ๋กœ ์ตœ์ ํ™”๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Fast R-CNN ์ฃผ์š” ํŠน์ง•(From SPP-Net)

Fast R-CNN์˜ ์ฃผ์š” ํŠน์ง•์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

SPP Layer๋ฅผ ROI Pooling Layer๋กœ ๋ณ€๊ฒฝ

  • SPP (Spatial Pyramid Pooling) Layer
    • ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ *Feature Vector(ํŠน์ง• ๋ฒกํ„ฐ)๋กœ ๋ณ€ํ™˜ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ์—ฌ๋Ÿฌ ๋ ˆ๋ฒจ์˜ ํ”ผ๋ผ๋ฏธ๋“œ ๊ตฌ์กฐ๋กœ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๋‚˜๋ˆ„์–ด ๊ฐ ์˜์—ญ์—์„œ Max Pooling(๋งฅ์Šค ํ’€๋ง)์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
*Feature Vector(ํŠน์ง• ๋ฒกํ„ฐ)๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ค‘์š”ํ•œ ํŠน์ง•์„ ์ˆซ์ž๋กœ ํ‘œํ˜„ํ•œ ๋ฒกํ„ฐ์ž…๋‹ˆ๋‹ค.
*Max Pooling(๋งฅ์Šค ํ’€๋ง)์€ Feature Map(ํŠน์ง• ๋งต)์—์„œ ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์ •๋ณด๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๋ฅผ ์ค„์ด๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค
  • ROI (Region of Interest) Pooling Layer
    • Fast R-CNN์—์„œ SPP Layer ๋Œ€์‹  ์‚ฌ์šฉ๋˜๋Š” Layer ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  Layer๊ฐ€ ํ•˜๋‚˜์ด๋ฉฐ, ๊ทธ Layer์—์„œ Vectorํ™”๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
    • ๊ทธ๋ฆฌ๊ณ  Region Proposal์„ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Feature Map(ํŠน์ง• ๋งต)์œผ๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
    • CNN์ด ์ƒ์„ฑํ•œ Feature Map(ํŠน์ง• ๋งต)์—์„œ ๊ฐ Region Proposal์˜ ์˜์—ญ์„ ์ถ”์ถœํ•˜๊ณ  ๊ณ ์ •๋œ ํฌ๊ธฐ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

 

End-to-End Network Learning

  • SVM์„ Softmax๋กœ ๋ณ€ํ™˜
    • R-CNN๊ณผ ๋‹ฌ๋ฆฌ Fast R-CNN์—์„œ๋Š” SVM ๋ถ„๋ฅ˜๊ธฐ ๋Œ€์‹  Softmax ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
    • Softmax๋Š” ํด๋ž˜์Šค ํ™•๋ฅ ์„ ์ถœ๋ ฅํ•˜์—ฌ, ๋ณด๋‹ค ํšจ๊ณผ์ ์ธ ๋‹ค์ค‘ Class Classification(ํด๋ž˜์Šค ๋ถ„๋ฅ˜)๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
  • Multi-task Loss ํ•จ์ˆ˜
    • Fast R-CNN์€ Multi-task Loss Function๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ Classification(๋ถ„๋ฅ˜)์™€ Bounding Box Regression(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ํšŒ๊ท€)๋ฅผ ๋™์‹œ์— ์ตœ์ ํ™”ํ•ฉ๋‹ˆ๋‹ค.
    • Classification Loss(๋ถ„๋ฅ˜์†์‹ค)๊ณผ Regression Loss(ํšŒ๊ท€ ์†์‹ค)์„ ํ•จ๊ป˜ ํ•™์Šตํ•จ์œผ๋กœ์จ, ๋„คํŠธ์›Œํฌ๊ฐ€ ๊ฐ์ฒด์˜ ํด๋ž˜์Šค์™€ ์œ„์น˜๋ฅผ ๋™์‹œ์— ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Fast R-CNN ๊ตฌ์กฐ

Fast R-CNN ๊ตฌ์กฐ์— ๋ฐํ•˜์—ฌ ์„ค๋ช…์„ ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

  • ์ „๋ฐ˜์ ์œผ๋กœ Feature Map(ํŠน์ง• ๋งต)๊นŒ์ง€๋Š” SPP-Net์˜ ๊ณผ์ •๊ณผ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. ๋‹ค๋งŒ Feature Map(ํŠน์ง• ๋งต)์ดํ›„ ๋ถ€ํ„ฐ๋Š” ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
  • RoI Pooling Layer: Feature Map(ํŠน์ง• ๋งต)์—์„œ ๊ฐ Region Proposal์˜ ์˜์—ญ์„ ์ถ”์ถœํ•˜๊ณ  ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ํ”ผ์ฒ˜ ๋งต์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • Fully-Connected Layer: RoI Pooling Layer์˜ ์ถœ๋ ฅ์„ ๋ฐ›์•„ Classification(๋ถ„๋ฅ˜), Regression(ํšŒ๊ท€)๋ฅผ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • Softmax Classifier, Bounding Box Regression: ์ตœ์ข…์ ์œผ๋กœ ๊ฐ Region Proposal์˜ ํด๋ž˜์Šค์™€ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

  • ์•„๋ž˜์˜ ์‚ฌ์ง„์€ FAST R-CNN์˜ ๊ตฌ์กฐ๋ฅผ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ ๋„ฃ๋Š”๊ฑธ ์˜ˆ์‹œ๋ฅผ ๋“ค์–ด ํ‘œํ˜„ํ•œ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ž์„ธํžˆ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์ž…๋ ฅ ์ด๋ฏธ์ง€: ํฌ๊ธฐ 600x1000์˜ ์ด๋ฏธ์ง€๋ฅผ CNN์— ์ž…๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • Feature Maps: CNN์„ ํ†ต๊ณผํ•˜์—ฌ 40x60 ํฌ๊ธฐ์˜Feature Map(ํŠน์ง• ๋งต)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • Region Proposal: Region Proposal Network (RPN)์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ๊ฐ์ฒด ์˜์—ญ์„ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • RoI Pooling: RoI Pooling Layer์—์„œ ๊ฐ Region Proposal์„ ๊ณ ์ •๋œ ํฌ๊ธฐ์˜Feature Map(ํŠน์ง• ๋งต)์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • Fully-Connected Layer: ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ Feature Map(ํŠน์ง• ๋งต)์„ ์™„์ „ Fully-Connected Layer(์—ฐ๊ฒฐ ์ธต)์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • Classification and Regression: ๊ฐ Region Proposal์— ๋Œ€ํ•ด Softmax๋ฅผ ํ†ตํ•ด ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•˜๊ณ , Regression(ํšŒ๊ท€)๋ฅผ ํ†ตํ•ด Bounding Box(๋ฐ”์šด๋”ฉ ๋ฐ•์Šค)๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.

Multi-task loss Function

Multi-task loss๋Š” ๋ถ„๋ฅ˜(Classification)์™€ ํšŒ๊ท€(Regression)๋ฅผ ๋™์‹œ์— ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” Loss ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
  • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค), Regression Loss (ํšŒ๊ท€ ์†์‹ค) 2๊ฐœ์˜ ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค.

  • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค)
    • ๊ฐ์ฒด์˜ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•˜๋Š” ์†์‹ค ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.
    • Data Category์˜ ๊ด€๊ณ„ ํŒŒ์•… ๋ฐ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ์Šค์Šค๋กœ ํŒ๋ณ„ํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค.
    • Lcls(p,u)๋Š” ์˜ˆ์ธก๋œ ํด๋ž˜์Šค ํ™•๋ฅ  p์™€ ์‹ค์ œ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ” u๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • Regression Loss (ํšŒ๊ท€ ์†์‹ค)
    • ์˜ˆ์ธก๋œ Bounding Box ์ขŒํ‘œ v์™€ ์‹ค์ œ Bounding Box ์ขŒํ‘œ tu๊ฐ„์˜ ์ฐจ์ด๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” Loss Function(์†์‹ค ํ•จ์ˆ˜)์ž…๋‹ˆ๋‹ค.
    • Fast R-CNN์—์„œ๋Š” Smooth L1 Loss๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ, ์ž‘์€ ์˜ค์ฐจ์— ๋Œ€ํ•ด์„œ๋Š” L2 Loss์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•˜๊ณ , ํฐ ์˜ค์ฐจ์— ๋Œ€ํ•ด์„œ๋Š” L1 Loss์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.
  • Multi-task Loss (๋ฉ€ํ‹ฐ ํƒœ์Šคํฌ ์†์‹ค)
    • Classification Loss (๋ถ„๋ฅ˜ ์†์‹ค), Regression Loss (ํšŒ๊ท€ ์†์‹ค)์„ ํ•จ๊ป˜ ์ตœ์ ํ™”ํ•˜์—ฌ, Object Detection์˜ ์ •ํ™•์„ฑ์„ ๋†’์ž…๋‹ˆ๋‹ค.
    • λ๋Š” ๋‘ Loss ๊ฐ„์˜ Weight(๊ฐ€์ค‘์น˜)๋ฅผ ์กฐ์ •ํ•˜๋Š” ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์ž…๋‹ˆ๋‹ค.
    • [u≥1]๋Š” Object๊ฐ€ ์žˆ์„ ๋•Œ๋งŒ Regression Loss(ํšŒ๊ท€ ์†์‹ค)๋ฅผ ๊ณ„์‚ฐํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

Fast R-CNN ์„ฑ๋Šฅ๋น„๊ต

  • PASCAL VOC 2012 ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ R-CNN, SPP-Net, Fast R-CNN์˜ ์„ฑ๋Šฅ ๋ฐ ์‹คํ–‰ ์‹œ๊ฐ„์„ ๋น„๊ตํ•œ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
  • ์ •๋ฆฌํ•ด๋ณด์ž๋ฉด, Fast R-CNN์€ ๋” ๋งŽ์€ ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ๊ฐ€์žฅ ๋†’์€ mAP๋ฅผ ๊ธฐ๋กํ•˜๋ฉฐ, ๊ฐ์ฒด ํƒ์ง€ ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
  • Fast R-CNN์€ ํ›ˆ๋ จ ์‹œ๊ฐ„๊ณผ ํ…Œ์ŠคํŠธ ์‹œ๊ฐ„ ๋ชจ๋‘์—์„œ ๊ฐ€์žฅ ํšจ์œจ์ ์ž…๋‹ˆ๋‹ค.
  • Fast R-CNN์€ ํ…Œ์ŠคํŠธ ์‹œ Region Proposal์„ ํฌํ•จํ•ด๋„ ๊ฐ€์žฅ ๋น ๋ฅด๊ฒŒ ์‹คํ–‰๋œ๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.