A A
[PyTorch] Model ์ €์žฅ & ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
โš ๏ธ ๋ณธ ๋‚ด์šฉ์€ PyTorch Korea์˜ ๊ณต์‹ ๋ฌธ์„œ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ ์€๊ฒƒ์ด๋‹ˆ ์–‘ํ•ด๋ฐ”๋ž๋‹ˆ๋‹ค!
 

๋ชจ๋ธ ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

ํŒŒ์ดํ† ์น˜(PyTorch) ๊ธฐ๋ณธ ์ตํžˆ๊ธฐ|| ๋น ๋ฅธ ์‹œ์ž‘|| ํ…์„œ(Tensor)|| Dataset๊ณผ Dataloader|| ๋ณ€ํ˜•(Transform)|| ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ ๊ตฌ์„ฑํ•˜๊ธฐ|| Autograd|| ์ตœ์ ํ™”(Optimization)|| ๋ชจ๋ธ ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์ด๋ฒˆ ์žฅ์—์„œ๋Š” ์ €์žฅํ•˜๊ธฐ๋‚˜

tutorials.pytorch.kr


Model ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

์ด๋ฒˆ์—๋Š” ์ €์žฅ or ๋ถˆ๋Ÿฌ์˜ค๊ธฐ๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์˜ ์ƒํƒœ ์œ ์ง€(persist)๋ฐ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ์‹œํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

import torch
import torchvision.models as models

Model Weight(๊ฐ€์ค‘์น˜) ์ €์žฅํ•˜๊ณ  ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

  • PyTorch ๋ชจ๋ธ์€ ํ•™์Šตํ•œ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ state_dict๋ผ๊ณ  ๋ถˆ๋ฆฌ๋Š” ๋‚ด๋ถ€ ์ƒํƒœ ์‚ฌ์ „(internal state dictionary)์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ์ƒํƒœ ๊ฐ’๋“ค์€ torch.save ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅ(persist)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
model = models.vgg16(weights='IMAGENET1K_V1')
torch.save(model.state_dict(), 'model_weights.pth')
Downloading: "https://download.pytorch.org/models/vgg16-397923af.pth" to /root/.cache/torch/hub/checkpoints/vgg16-397923af.pth
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 528M/528M [00:07<00:00, 69.5MB/s]
  • ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ์œ„ํ•ด์„œ๋Š”, ๋จผ์ € ๋™์ผํ•œ ๋ชจ๋ธ์˜ ์ธ์Šคํ„ด์Šค(instance)๋ฅผ ์ƒ์„ฑํ•œ ๋‹ค์Œ์— load_state_dict() ๋ฉ”์†Œ๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜๋“ค์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model = models.vgg16() # ์—ฌ๊ธฐ์„œ๋Š” ``weights`` ๋ฅผ ์ง€์ •ํ•˜์ง€ ์•Š์•˜์œผ๋ฏ€๋กœ, ํ•™์Šต๋˜์ง€ ์•Š์€ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
model.load_state_dict(torch.load('model_weights.pth'))
model.eval()
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)
  • ๋งŒ์•ฝ์—, ์‚ฌ์ „์— ํ•™์Šต๋œ Weight(๊ฐ€์ค‘์น˜)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ๋กœ๋“œ ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด? ์•„๋ž˜์˜ ์ฝ”๋“œ์ฒ˜๋Ÿผ ์ž‘์„ฑํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
# ์‚ฌ์ „ ํ•™์Šต๋œ ๊ฐ€์ค‘์น˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๋กœ๋“œ
model = models.vgg16(weights=models.VGG16_Weights.IMAGENET1K_V1)

# ๋ชจ๋ธ์„ ํ‰๊ฐ€ ๋ชจ๋“œ๋กœ ์„ค์ •
model.eval()
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Note

์ถ”๋ก (inference)์„ ํ•˜๊ธฐ ์ „์— model.eval() ๋ฉ”์†Œ๋“œ๋ฅผ ํ˜ธ์ถœํ•˜์—ฌ ๋“œ๋กญ์•„์›ƒ(dropout)๊ณผ ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch normalization)๋ฅผ ํ‰๊ฐ€ ๋ชจ๋“œ(evaluation mode)๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด ์ผ๊ด€์„ฑ ์—†๋Š” ์ถ”๋ก  ๊ฒฐ๊ณผ๊ฐ€ ์ƒ์„ฑ๋ฉ๋‹ˆ๋‹ค.

๋ชจ๋ธ์˜ ํ˜•ํƒœ๋ฅผ ํฌํ•จํ•˜์—ฌ ์ €์žฅ & ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

๋ชจ๋ธ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ, ์‹ ๊ฒฝ๋ง์˜ ๊ตฌ์กฐ๋ฅผ ์ •์˜ํ•˜๊ธฐ ์œ„ํ•ด ๋ชจ๋ธ ํด๋ž˜์Šค๋ฅผ ๋จผ์ € ์ƒ์„ฑ(instantiate)ํ•ด์•ผ ํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ด ํด๋ž˜์Šค์˜ ๊ตฌ์กฐ๋ฅผ ๋ชจ๋ธ๊ณผ ํ•จ๊ป˜ ์ €์žฅํ•˜๊ณ  ์‹ถ์œผ๋ฉด, (model.state_dict()๊ฐ€ ์•„๋‹Œ) model ์„ ์ €์žฅ ํ•จ์ˆ˜์— ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
torch.save(model, 'model.pth')
  • ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
model = torch.load('model.pth')
model
VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (6): ReLU(inplace=True)
    (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): ReLU(inplace=True)
    (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): ReLU(inplace=True)
    (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (20): ReLU(inplace=True)
    (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (22): ReLU(inplace=True)
    (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (25): ReLU(inplace=True)
    (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (27): ReLU(inplace=True)
    (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (29): ReLU(inplace=True)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace=True)
    (2): Dropout(p=0.5, inplace=False)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace=True)
    (5): Dropout(p=0.5, inplace=False)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

Note

์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ Python pickle ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ง๋ ฌํ™”(serialize)ํ•˜๋ฏ€๋กœ, ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ ์‹ค์ œ ํด๋ž˜์Šค ์ •์˜(definition)๋ฅผ ์ ์šฉ(rely on)ํ•ฉ๋‹ˆ๋‹ค.