A A
[CV] Object Detection & Segmentation์„ ์œ„ํ•œ ์ฃผ์š” Dataset

์ฃผ์š” Dataset

๋งŽ์€ Detection & Segmentation DL ํŒจํ‚ค์ง€๊ฐ€ ์•„๋ž˜์˜ Dataset๋“ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ Pretrained ๋˜์–ด ๋ฐฐํฌ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

  • PASCAL VOC: XML Format, 20๊ฐœ์˜ Object Category
    • Bounding Box ์–‘์‹์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค(XML), ๊ทธ๋ฆฌ๊ณ  ํ•˜๋‚˜์˜ Image์— ๋ฐํ•˜์—ฌ Annotation ํ•ฉ๋‹ˆ๋‹ค. 
  • MS COCO: json Form, 80๊ฐœ์˜ ์˜ค๋ธŒ์ ํŠธ ์นดํ…Œ๊ณ ๋ฆฌ
  • Google Open Images: csv Format, 600๊ฐœ์˜ ์˜ค๋ธŒ์ ํŠธ ์นดํ…Œ๊ณ ๋ฆฌ

 

 

PASCAL VOC 2012

PASCAL VOC(Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes)์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ํ‰๊ฐ€ ์ฑŒ๋ฆฐ์ง€์ž…๋‹ˆ๋‹ค.
    • ์ด ๋ฐ์ดํ„ฐ์…‹์€ ๊ฐ์ฒด ๊ฒ€์ถœ(Object Detection), ์ด๋ฏธ์ง€ ๋ถ„ํ• (Image Segmentation), ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜(Image Classification) ๋“ฑ ์—ฌ๋Ÿฌ ์ปดํ“จํ„ฐ ๋น„์ „ ์ž‘์—…์—์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • PASCAL VOC ๋ฐ์ดํ„ฐ์…‹์€ ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 2007๋…„๊ณผ 2012๋…„์˜ ๋ฐ์ดํ„ฐ์…‹์€ ๊ฐ๊ฐ 20๊ฐœ์˜ ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
    • ์ฃผ์š” ํด๋ž˜์Šค์—๋Š” ์‚ฌ๋žŒ(person), ์ž์ „๊ฑฐ(bicycle), ์ž๋™์ฐจ(car), ๊ณ ์–‘์ด(cat), ๊ฐœ(dog), ์‹ํƒ(dining table) ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
 

The PASCAL Visual Object Classes Challenge 2012 (VOC2012)

2006 10 classes: bicycle, bus, car, cat, cow, dog, horse, motorbike, person, sheep. Train/validation/test: 2618 images containing 4754 annotated objects. Images from flickr and from Microsoft Research Cambridge (MSRC) dataset The MSRC images were easier th

host.robots.ox.ac.uk

 

Annotation

ํ•œ๋ฒˆ Annotation์— ๋ฐํ•˜์—ฌ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ๋ฌด์—‡์ผ๊นŒ์š”?
  • ์ด๋ฏธ์ง€์˜ Detection ์ •๋ณด๋ฅผ ๋ณ„๋„์˜ ์„ค๋ช… ํŒŒ์ผ๋กœ ์ œ๊ณต๋˜๋Š” ๊ฒƒ์„ ์ผ๋ฐ˜์ ์œผ๋กœ Annotation ์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • Annotation์€ Object์˜ Bounding Box ์œ„์น˜๋‚˜ Object ์ด๋ฆ„๋“ฑ์„ ํŠน์ • ํฌ๋งท(์„ค๋ช…ํŒŒ์ผ)์œผ๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

  • ์ด Annotation ํŒŒ์ผ์˜ ๋…ธ๋ž€์ƒ‰ ๋ฐ•์Šค ์•ˆ์— ์žˆ๋Š” ๋ถ€๋ถ„์€ ์›๋ณธ ์ด๋ฏธ์ง€์˜ ๋น„ํ–‰๊ธฐ๋ฅผ Bounding Boxํ™” ํ•œ ๋ถ€๋ถ„์ž…๋‹ˆ๋‹ค.

 

PASCAL VOC Dataset ๊ตฌ์กฐ

ํ•œ๋ฒˆ PASCAL VOC Dataset์˜ ๊ตฌ์กฐ์— ๋ฐํ•˜์—ฌ ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

VOC 2012 ๊ธฐ์ค€์ž…๋‹ˆ๋‹ค.

  • Annotations: Xml ํฌ๋งท์ด๋ฉฐ, ๊ฐœ๋ณ„ xmlํŒŒ์ผ์€ ํ•œ ๊ฐœ image์— ๋Œ€ํ•œ Annotation ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
    • ํ™•์žฅ์ž xml์„ ์ œ์™ธํ•œ ํŒŒ์ผ๋ช…์€ image ํŒŒ์ผ๋ช…(ํ™•์žฅ์ž jpg๋ฅผ ์ œ์™ธํ•œ)๊ณผ ๋™์ผํ•˜๊ฒŒ Mapping ํ•ฉ๋‹ˆ๋‹ค.
  • ImageSet: ์–ด๋–ค ์ด๋ฏธ์ง€๋ฅผ train, test, trainval, val์— ์‚ฌ์šฉํ•  ๊ฒƒ์ธ์ง€์— ๋Œ€ํ•œ ๋งคํ•‘ ์ •๋ณด๋ฅผ ๊ฐœ๋ณ„ ์˜ค๋ธŒ์ ํŠธ๋ณ„๋กœ ํŒŒ์ผ๋กœ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • JPEGImages: Detection๊ณผ Segmentation์— ์‚ฌ์šฉ๋  ์›๋ณธ ์ด๋ฏธ์ง€ ์ž…๋‹ˆ๋‹ค.
  • Segmentation Class: Semantic Segmentation์— ์‚ฌ์šฉ๋  masking ์ด๋ฏธ์ง€ ์ž…๋‹ˆ๋‹ค.
  • Segmentation Object: Instance Segmentation์— ์‚ฌ์šฉ๋  masking ์ด๋ฏธ์ง€ ์ž…๋‹ˆ๋‹ค.

 

Annotation ํŒŒ์ผ ์˜ˆ์‹œ

Annotation ํŒŒ์ผ 2007_000032.xml ํŒŒ์ผ ์ผ๋ถ€ ์ž…๋‹ˆ๋‹ค.
  • 2007_000032.jpg ํŒŒ์ผ์— ๋Œ€ํ•œ Annotation ์ •๋ณด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

 

PASCAL VOC 2012 Dataset ํƒ์ƒ‰ํ•˜๊ธฐ

ํ•œ๋ฒˆ PASCAL VOC 2012 Dataset ํƒ์ƒ‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • PASCAL VOC 2012 ๋ฐ์ดํ„ฐ ๋‹ค์šด๋กœ๋“œ ๋ฐ›๊ธฐ
!wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
!tar -xvf VOCtrainval_11-May-2012.tar -C ~/DLCV/data/voc
!ls ~/DLCV/data/voc/VOCdevkit/VOC2012
!ls ~/DLCV/data/voc/VOCdevkit/VOC2012/JPEGImages | head -n 5

 

  • JPEGImasge Directory์— ์žˆ๋Š” ์ž„์˜์˜ ์ด๋ฏธ์ง€ ๋ณด๊ธฐ
import cv2
import matplotlib.pyplot as plt
%matplotlib inline

img = cv2.imread('../../data/voc/VOCdevkit/VOC2012/JPEGImages/2007_000032.jpg')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # BGR -> RGB ๋ณ€ํ™˜
print('img shape:', img.shape)

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
plt.show()
  • img shape: (281, 500, 3)

 

  • Annotations ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ๋Š” ์ž„์˜์˜ annotation ํŒŒ์ผ ๋ณด๊ธฐ
!cat ~/DLCV/data/voc/VOCdevkit/VOC2012/Annotations/2007_000032.xml
<annotation>
	<folder>VOC2012</folder>
	<filename>2007_000032.jpg</filename>
	<source>
		<database>The VOC2007 Database</database>
		<annotation>PASCAL VOC2007</annotation>
		<image>flickr</image>
	</source>
	<size>
		<width>500</width>
		<height>281</height>
		<depth>3</depth>
	</size>
	<segmented>1</segmented>
	<object>
		<name>aeroplane</name>
		<pose>Frontal</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>104</xmin>
			<ymin>78</ymin>
			<xmax>375</xmax>
			<ymax>183</ymax>
		</bndbox>
	</object>
	<object>
		<name>aeroplane</name>
		<pose>Left</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>133</xmin>
			<ymin>88</ymin>
			<xmax>197</xmax>
			<ymax>123</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Rear</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>195</xmin>
			<ymin>180</ymin>
			<xmax>213</xmax>
			<ymax>229</ymax>
		</bndbox>
	</object>
	<object>
		<name>person</name>
		<pose>Rear</pose>
		<truncated>0</truncated>
		<difficult>0</difficult>
		<bndbox>
			<xmin>26</xmin>
			<ymin>189</ymin>
			<xmax>44</xmax>
			<ymax>238</ymax>
		</bndbox>
	</object>
</annotation>
    • ์ด XML ํŒŒ์ผ์€ PASCAL VOC ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‚ฌ์šฉ๋˜๋Š” ์–ด๋…ธํ…Œ์ด์…˜ ํŒŒ์ผ์˜ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค.
    • ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ์†์„ฑ์„ ์ •์˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
    • ๊ฐ ๊ฐ์ฒด๋Š” ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ฐ”์šด๋”ฉ ๋ฐ•์Šค ์ขŒํ‘œ์™€ ํ•จ๊ป˜ ํด๋ž˜์Šค ๋ผ๋ฒจ, ํฌ์ฆˆ, ๊ทธ๋ฆฌ๊ณ  ๊ธฐํƒ€ ์†์„ฑ์„ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
    • XML ํŒŒ์ผ ๊ตฌ์กฐ๋ฅผ ์š”์•ฝํ•ด๋ณด๋ฉด 
    • ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ: 500x281, 3์ฑ„๋„(RGB)
    • ๊ฐ์ฒด๋Š” 4๊ฐœ๊ฐ€ ์žˆ๊ณ , ์•„๋ž˜์— ์ •๋ณด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
  1. `aeroplane`: ํฌ์ฆˆ๋Š” `Frontal`, Bounding Box ์ขŒํ‘œ๋Š” (104, 78), (375, 183)
  2. `aeroplane`: ํฌ์ฆˆ๋Š” `Left`, Bounding Box ์ขŒํ‘œ๋Š” (133, 88), (197, 123)
  3. `person`: ํฌ์ฆˆ๋Š” `Rear`, Bounding Box ์ขŒํ‘œ๋Š” (195, 180), (213, 229)
  4. `person`: ํฌ์ฆˆ๋Š” `Rear`, Bounding Box ์ขŒํ‘œ๋Š” (26, 189), (44, 238)

 

SegmentationObject ๋””๋ ‰ํ† ๋ฆฌ์— ์žˆ๋Š” ์žˆ๋Š” ์ž„์˜์˜ maksing ์ด๋ฏธ์ง€ ๋ณด๊ธฐ

img = cv2.imread('../../data/voc/VOCdevkit/VOC2012/SegmentationObject/2007_000032.png')
img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
print('img shape:', img.shape)

plt.figure(figsize=(8, 8))
plt.imshow(img_rgb)
plt.show()
  • img shape: (281, 500, 3)

 

Annotation xml ํŒŒ์ผ์— ์žˆ๋Š” ์š”์†Œ๋“ค์„ ํŒŒ์‹ฑํ•˜์—ฌ ์ ‘๊ทผํ•˜๊ธฐ

import os
import random

VOC_ROOT_DIR ="../../data/voc/VOCdevkit/VOC2012/"
ANNO_DIR = os.path.join(VOC_ROOT_DIR, "Annotations") # Annotations - Directory๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.
IMAGE_DIR = os.path.join(VOC_ROOT_DIR, "JPEGImages")

xml_files = os.listdir(ANNO_DIR)                       
print(xml_files[:5]); print(len(xml_files))
['2008_007279.xml', '2010_005972.xml', '2012_003581.xml', '2008_004452.xml', '2009_003508.xml']
17125 - ์ด๊ฐœ์ˆ˜

 

 

# !pip install lxml
import os
import xml.etree.ElementTree as ET

xml_file = os.path.join(ANNO_DIR, '2007_000032.xml')

# XML ํŒŒ์ผ์„ Parsing ํ•˜์—ฌ Element ์ƒ์„ฑ
tree = ET.parse(xml_file)
root = tree.getroot()

# image ๊ด€๋ จ ์ •๋ณด๋Š” root์˜ ์ž์‹์œผ๋กœ ์กด์žฌ
image_name = root.find('filename').text
full_image_name = os.path.join(IMAGE_DIR, image_name)
image_size = root.find('size') # element์˜ ๊ฐœ์ˆ˜๋งŒํผ ๋ณ€ํ™˜๋ฐ›์Œ
image_width = int(image_size.find('width').text)
image_height = int(image_size.find('height').text)

# ํŒŒ์ผ๋‚ด์— ์žˆ๋Š” ๋ชจ๋“  object Element(object Element ๊ฐœ์ˆ˜๋งŒํผ)๋ฅผ ์ฐพ์Œ.
objects_list = []
for obj in root.findall('object'): # Iterator๋กœ ๋ณ€ํ™˜
    # object element์˜ ์ž์‹ element์—์„œ bndbox๋ฅผ ์ฐพ์Œ. 
    xmlbox = obj.find('bndbox')
    # bndbox element์˜ ์ž์‹ element์—์„œ xmin,ymin,xmax,ymax๋ฅผ ์ฐพ๊ณ  ์ด์˜ ๊ฐ’(text)๋ฅผ ์ถ”์ถœ 
    x1 = int(xmlbox.find('xmin').text)
    y1 = int(xmlbox.find('ymin').text)
    x2 = int(xmlbox.find('xmax').text)
    y2 = int(xmlbox.find('ymax').text)
    
    bndbox_pos = (x1, y1, x2, y2)
    class_name=obj.find('name').text
    object_dict={'class_name': class_name, 'bndbox_pos':bndbox_pos}
    objects_list.append(object_dict)

print('full_image_name:', full_image_name,'\n', 'image_size:', (image_width, image_height))

for object in objects_list:
    print(object)
full_image_name: ../../data/voc/VOCdevkit/VOC2012/JPEGImages/2007_000032.jpg 
 image_size: (500, 281)
{'class_name': 'aeroplane', 'bndbox_pos': (104, 78, 375, 183)}
{'class_name': 'aeroplane', 'bndbox_pos': (133, 88, 197, 123)}
{'class_name': 'person', 'bndbox_pos': (195, 180, 213, 229)}
{'class_name': 'person', 'bndbox_pos': (26, 189, 44, 238)}

 

Annotation๋‚ด์˜ Object๋“ค์˜ bounding box ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์—ฌ Bounding box ์‹œ๊ฐํ™”

import cv2
import os
import xml.etree.ElementTree as ET

xml_file = os.path.join(ANNO_DIR, '2007_000032.xml')

tree = ET.parse(xml_file)
root = tree.getroot()

image_name = root.find('filename').text
full_image_name = os.path.join(IMAGE_DIR, image_name)

img = cv2.imread(full_image_name)
# opencv์˜ rectangle()๋Š” ์ธ์ž๋กœ ๋“ค์–ด์˜จ ์ด๋ฏธ์ง€ ๋ฐฐ์—ด์— ๊ทธ๋Œ€๋กœ ์‚ฌ๊ฐํ˜•์„ ๊ทธ๋ ค์ฃผ๋ฏ€๋กœ ๋ณ„๋„์˜ ์ด๋ฏธ์ง€ ๋ฐฐ์—ด์— ๊ทธ๋ฆผ ์ž‘์—… ์ˆ˜ํ–‰. 
draw_img = img.copy()
# OpenCV๋Š” RGB๊ฐ€ ์•„๋‹ˆ๋ผ BGR์ด๋ฏ€๋กœ ๋นจ๊ฐ„์ƒ‰์€ (0, 0, 255)
green_color=(0, 255, 0)
red_color=(0, 0, 255)

# ํŒŒ์ผ๋‚ด์— ์žˆ๋Š” ๋ชจ๋“  object Element๋ฅผ ์ฐพ์Œ.
objects_list = []
for obj in root.findall('object'):
    xmlbox = obj.find('bndbox')
    
    left = int(xmlbox.find('xmin').text)
    top = int(xmlbox.find('ymin').text)
    right = int(xmlbox.find('xmax').text)
    bottom = int(xmlbox.find('ymax').text)
    
    class_name=obj.find('name').text
    
    # draw_img ๋ฐฐ์—ด์˜ ์ขŒ์ƒ๋‹จ ์šฐํ•˜๋‹จ ์ขŒํ‘œ์— ๋…น์ƒ‰์œผ๋กœ box ํ‘œ์‹œ 
    cv2.rectangle(draw_img, (left, top), (right, bottom), color=green_color, thickness=1)
    # draw_img ๋ฐฐ์—ด์˜ ์ขŒ์ƒ๋‹จ ์ขŒํ‘œ์— ๋นจ๊ฐ„์ƒ‰์œผ๋กœ ํด๋ž˜์Šค๋ช… ํ‘œ์‹œ
    cv2.putText(draw_img, class_name, (left, top - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.4, red_color, thickness=1)

img_rgb = cv2.cvtColor(draw_img, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(10, 10))
plt.imshow(img_rgb)
  • <matplotlib.image.AxesImage at 0x7fc2d004a780>


MS-COCO Dataset

MS-COCO (Microsoft Common Objects in Context) ๋ฐ์ดํ„ฐ์…‹์€ ์ปดํ“จํ„ฐ ๋น„์ „ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ ์ž…๋‹ˆ๋‹ค.

  • Object Detection(๊ฐ์ฒด ๊ฒ€์ถœ), Image Segmentation(์ด๋ฏธ์ง€ ๋ถ„ํ• ), Image Captioning(์ด๋ฏธ์ง€ ์บก์…”๋‹) ๋“ฑ ๋‹ค์–‘ํ•œ Vision ์ž‘์—…์— ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • COCO ๋ฐ์ดํ„ฐ์…‹์€ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด์˜ ์œ„์น˜์™€ ํด๋ž˜์Šค๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ฐ์ฒด์˜ Segmentation ์ •๋ณด์™€ ์ด๋ฏธ์ง€ ์„ค๋ช…(์บก์…˜)๋„ ์ œ๊ณตํ•˜์—ฌ ํ’๋ถ€ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • 80๊ฐœ Object Category, 300K์˜ Image๋“ค๊ณผ 1.5 Million ๊ฐœ์˜ object๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • (ํ•˜๋‚˜์˜ image์— ํ‰๊ท  5๊ฐœ์˜ Object๋“ค๋กœ ๊ตฌ์„ฑ)
  • ๊ทธ๋ฆฌ๊ณ  Tensorflow Object Detection API ๋ฐ ๋งŽ์€ ์˜คํ”ˆ ์†Œ์Šค ๊ณ„์—ด์˜ ์ฃผ์š” ํŒจํ‚ค์ง€๋“ค์€ COCO Dataset์œผ๋กœ Pretrained๋œ ๋ชจ๋ธ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

 

MS-COCO Dataset ์˜ค๋ธŒ์ ํŠธ ์นดํ…Œ๊ณ ๋ฆฌ

 

MS-COCO Dataset ๋‹ค์šด๋กœ๋“œ

  • ์—ฌ๊ธฐ์„œ MS-COCO Dataset์„ ๋‹ค์šด๋กœ๋“œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
 

COCO - Common Objects in Context

 

cocodataset.org

 

COCO Explorer

Dataset ์•„๋ž˜ Explorer๋ฅผ ๋ˆ„๋ฅด๋ฉด, ์ด๋ ‡๊ฒŒ Category๋ณ„๋กœ ๊ฐ์ž Object๋ฅผ ๋ˆ„๋ฅด๋ฉด Dataset์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

 

MS-COCO Dataset ๊ตฌ์„ฑ

COCO 2017 ๋ฐ์ดํ„ฐ ์„ธํŠธ ๊ธฐ์ค€์ž…๋‹ˆ๋‹ค.
  • ์ด๋ฏธ์ง€ ํŒŒ์ผ๋“ค์„ Json ํฌ๋งท์ธ ํ•œ๊ฐœ์˜ ํŒŒ์ผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. (ํ•œ๊ฐœ์˜ Line ์œผ๋กœ ๊ตฌ์„ฑ)
  • ์•„๋ž˜์™€ ๊ฐ™์€ ๋Œ€ ๋ถ„๋ฅ˜๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
    • Info: COCO Dataset ์ƒ์„ฑ ์ผ์ž๋“ฑ์„ ๊ฐ€์ง€๋Š” ํ—ค๋” ์ •๋ณด ์ž…๋‹ˆ๋‹ค.
    • license: ์ด๋ฏธ์ง€ ํŒŒ์ผ๋“ค์˜ ๋ผ์ด์„ ์Šค์— ๋Œ€ํ•œ ์ •๋ณด ์ž…๋‹ˆ๋‹ค.
    • images: ๋ชจ๋“  ์ด๋ฏธ์ง€๋“ค์˜ id, ํŒŒ์ผ๋ช…, ์ด๋ฏธ์ง€ ๋„ˆ๋น„, ๋†’์ด ์ •๋ณด ์ž…๋‹ˆ๋‹ค.
    • annotations: ๋Œ€์ƒ image๋ฐ object id Segmentation, bounding box, ํ”ฝ์…€ ์˜์—ญ ๋“ฑ์˜ ์ƒ์„ธ ์ •๋ณด๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
    • categories: 80๊ฐœ ์˜ค๋ธŒ์ ํŠธ ์นดํ…Œ๊ณ ๋ฆฌ์— ๋Œ€ํ•œ id, ์ด๋ฆ„, Group์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.

 

MS-COCO JSON ํŒŒ์ผ ์˜ˆ์‹œ

1๊ฐœ์˜ ์ด๋ฏธ์ง€์— ๋ฐํ•œ ํŠน์„ฑ์„ ์˜ˆ์‹œ๋กœ ๊ฐ€์ง€๊ณ  ์™”์Šต๋‹ˆ๋‹ค.

 

MS-COCO Dataset ํŠน์ง•

COCO Dataset์€ ์ด๋ฏธ์ง€ ํ•œ ๊ฐœ์— ์—ฌ๋Ÿฌ ์˜ค๋ธŒ์ ํŠธ๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฉฐ ํƒ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋น„ํ•ด ๋‚œ์ด๋„๊ฐ€ ๋†’์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

Categories per image: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋‹น ๋ช‡๊ฐœ์˜ Category?
Instances per image: Object ๋ช‡๊ฐœ?
Percentiage of images: ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ํ™•๋ฅ 
Number of Cateories: ๊ฐ์ฒด(Category)์˜ ๊ฐœ์ˆ˜
  • ๋‹ค์–‘ํ•œ ๊ฐ์ฒด ํด๋ž˜์Šค
    • COCO ๋ฐ์ดํ„ฐ์…‹์€ 80๊ฐœ์˜ ๊ฐ์ฒด ํด๋ž˜์Šค๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค.
    • ์ฃผ์š” ํด๋ž˜์Šค์—๋Š” ์‚ฌ๋žŒ(person), ์ž์ „๊ฑฐ(bicycle), ์ž๋™์ฐจ(car), ๊ฐœ(dog), ๊ณ ์–‘์ด(cat), ์ฑ…์ƒ(desk), ์˜์ž(chair) ๋“ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ’๋ถ€ํ•œ Annotation
    • Object Detection: ์ด๋ฏธ์ง€ ๋‚ด ๊ฐ์ฒด์˜ Bounding Box ์ขŒํ‘œ์™€ Class ๋ผ๋ฒจ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    • Image Segmentation: ๊ฐ์ฒด์˜ ํ”ฝ์…€ ๋‹จ์œ„ ๋ถ„ํ•  ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค (Polygon ํ˜•์‹).
    • ํ‚คํฌ์ธํŠธ ๊ฒ€์ถœ: ์‚ฌ๋žŒ์˜ ์ฃผ์š” ์‹ ์ฒด ๋ถ€์œ„(์˜ˆ: ๋ˆˆ, ๊ท€, ์–ด๊นจ ๋“ฑ)์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    • Image Captioning: ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์ž์—ฐ์–ด ์„ค๋ช…์„ ์—ฌ๋Ÿฌ ๊ฐœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
  • ๋Œ€๊ทœ๋ชจ Dataset
    • COCO ๋ฐ์ดํ„ฐ์…‹์€ ์ˆ˜์‹ญ๋งŒ ์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๋ฉฐ, ๊ฐ ์ด๋ฏธ์ง€์—๋Š” ์—ฌ๋Ÿฌ ๊ฐ์ฒด๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ์–ด ํ’๋ถ€ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
    • ํ›ˆ๋ จ(train), ๊ฒ€์ฆ(val), ํ…Œ์ŠคํŠธ(test) ์„ธํŠธ๋กœ ๋‚˜๋‰˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋ณต์žกํ•œ ์žฅ๋ฉด
    • COCO ๋ฐ์ดํ„ฐ์…‹์€ ์‹ค์ œ ์ƒํ™ฉ์—์„œ์˜ ๋ณต์žกํ•œ ์žฅ๋ฉด์„ ํฌํ•จํ•˜์—ฌ, ๊ฐ์ฒด๋“ค์ด ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์™€ ํ˜•ํƒœ๋กœ ๋‚˜ํƒ€๋‚˜๊ณ  ์ƒํ˜ธ์ž‘์šฉํ•˜๋Š” ๋ชจ์Šต์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.