๋ ผ๋ฌธ์ ๊ณ์ ์ฝ์ด์ผ์ง ์ฝ์ด์ผ์ง ์๊ฐํ๋ค๊ฐ.. ์ฉ๊ธฐ๋ฅผ ๋ด์ด์ ํ๋ฒ ์ฝ์ด๋ณธ ๋ด์ฉ์ ์ฝ๋๋ก ๊ตฌํํด ๋ณด๊ฒ ์ต๋๋ค.
VGGNet Review
๋ ผ๋ฌธ ๋ฆฌ๋ทฐํ ๋ด์ฉ์ ์๋ ๋งํฌ์ ๋ฌ์๋๊ฒ ์ต๋๋ค!
VGGNet Architecture
๊ทธ๋ฌ๋ฉด ํ๋ฒ VGGNet์ ์ฝ๋๋ก ํ๋ฒ ๊ตฌํ์ ํ๋ณด๊ฒ ์ต๋๋ค. - D์ด์ ๋ชจ๋ธ(VGG16)์ ๊ตฌํํด๋ณด์์ต๋๋ค.
- image input - 224 x 224 RGB
- Convolution Stride - 1 pixel๋ก ๊ณ ์
- 3 x 3 Convolution ์ฐ์ฐ x 2 (Channel 64)
- maxpooling - 2 x 2 pixel ์ ์ฉ, Stride = 2
- 3 x 3 Convolution ์ฐ์ฐ x 2 (Channel 128)
- maxpooling - 2 x 2 pixel ์ ์ฉ, Stride = 2
- 3 x 3 Convolution ์ฐ์ฐ x 3 (Channel 256)
- maxpooling - 2 x 2 pixel ์ ์ฉ, Stride = 2
- 3 x 3 Convolution ์ฐ์ฐ x 3 (Channel 512)
- maxpooling - 2 x 2 pixel ์ ์ฉ, Stride = 2
- 3 x 3 Convolution ์ฐ์ฐ x 3 (Channel 512)
- maxpooling - 2 x 2 pixel ์ ์ฉ, Stride = 2
- FC(Fully-Connected Layer) - 4096, ReLU
- FC(Fully-Connected Layer) - 4096, ReLU
- FC(Fully-Connected Layer) - 1000, SoftMax
์ฌ๊ธฐ์ filter๋ฅผ 3 x 3์ผ๋ก ์ฌ์ฉํ๋ ์ด์ ๋? ๊น์ด๊ฐ ๊น์ด์ง๊ณ , ๋น์ ํ์ฑ์ด ์ฆ๊ฐํด ์ด๋ก์ด ์ ์ด ๋ง๋ค๋์ ... (์์ฝํ๋ฉด ๊ทธ๋ ์ต๋๋ค)
VGG16 ๊ตฌํ ํ์ ๋ด์ฉ
VGG16์ ๊ตฌํํ๊ธฐ ์ํด์ ํ์ํ ๋ถ๋ถ์ ์๋์ ํจ๊ผ ์ ์๋ฅผ ํด๋ณด๊ฒ ์ต๋๋ค.
VGG16์ ๊ตฌํํ๊ธฐ ์ํด์๋ ๋คํธ์ํฌ ์ํคํ ์ฒ๋ฅผ ์ ์ํ๊ณ ๋ชจ๋ธ์ ์ปดํ์ผํ ํ ํ๋ จ ๋ฐ ํ๊ฐ๋ฅผ ์ํ ์ ์ฐจ๋ฅผ ์ค์ ํด์ผ ํฉ๋๋ค.
1. ๋ฐ์ดํฐ ์ค๋น
๋ฐ์ดํฐ๋ฅผ ํ๋ จ, ๊ฒ์ฆ, ํ ์คํธ ์ธํธ๋ก ๋๋๊ณ , ์ ์ฒ๋ฆฌ๋ฅผ ํตํด ๋ชจ๋ธ์ ์ ๋ ฅํ ํํ๋ก ์ค๋นํฉ๋๋ค.
2. ๋ชจ๋ธ ์ํคํ ์ฒ ์ ์
- ์ ๋ ฅ ๋ ์ด์ด: ์ ๋ ฅ ์ด๋ฏธ์ง์ ํฌ๊ธฐ (์: 224x224x3)
- ์ปจ๋ณผ๋ฃจ์ ๋ ์ด์ด: ์ฌ๋ฌ ๊ฐ์ ์ปจ๋ณผ๋ฃจ์ ๋ ์ด์ด (3x3 ํํฐ)
- ํ๋ง ๋ ์ด์ด: ์ฃผ๋ก ์ต๋ ํ๋ง ๋ ์ด์ด (2x2 ํ๋ง)
- ์์ ์ฐ๊ฒฐ ๋ ์ด์ด(FC): ์ผ๋ฐ์ ์ผ๋ก 2~3๊ฐ์ ์์ ์ฐ๊ฒฐ ๋ ์ด์ด
- ์ถ๋ ฅ ๋ ์ด์ด: Softmax ํ์ฑํ ํจ์๊ฐ ์๋ ์ถ๋ ฅ ๋ ์ด์ด (์: ํด๋์ค ์๋งํผ์ ๋ด๋ฐ)
3. ๋ชจ๋ธ ์ปดํ์ผ
- ์์ค ํจ์: ๋ถ๋ฅ ๋ฌธ์ ์์๋ ๋ณดํต categorical_crossentropy๋ฅผ ์ฌ์ฉํฉ๋๋ค.
- ์ตํฐ๋ง์ด์ : ์๋ฅผ ๋ค์ด, Adam, RMSprop, SGD ๋ฑ์ด ์์ต๋๋ค.
- ํ๊ฐ ์งํ: ์ ํ๋(accuracy) ๋ฑ์ ํ๊ฐ ์งํ๋ฅผ ์ค์ ํฉ๋๋ค.
4. ๋ชจ๋ธ ํ๋ จ
ํ๋ จ ๋ฐ์ดํฐ์ ํจ๊ป ๋ชจ๋ธ์ ํ๋ จ์ํต๋๋ค. ์ฌ๊ธฐ์๋ ๋ฐฐ์น ํฌ๊ธฐ, ์ํฌํฌ ์ ๋ฑ์ ํ์ดํผํ๋ผ๋ฏธํฐ ์ค์ ์ด ํฌํจ๋ฉ๋๋ค.
5. ๋ชจ๋ธ ํ๊ฐ ๋ฐ ์์ธก
ํ๋ จ๋ ๋ชจ๋ธ์ ์ฌ์ฉํ์ฌ ์๋ก์ด ๋ฐ์ดํฐ์ ๋ํ ์์ธก์ ์ํํ๊ณ , ๊ฒ์ฆ ๋ฐ ํ ์คํธ ๋ฐ์ดํฐ์์ ๋ชจ๋ธ์ ์ฑ๋ฅ์ ํ๊ฐํฉ๋๋ค.
VGG16 ๊ตฌํ By PyTorch
๊ทธ๋ฌ๋ฉด ํ๋ฒ PyTorch๋ก ๊ตฌํํด ๋ณด๊ฒ ์ต๋๋ค.
๋ผ์ด๋ธ๋ฌ๋ฆฌ ๋ฐ ๋ฐ์ดํฐ ๋ก๋
์ค์ ๋ ผ๋ฌธ์์ ์ฝ๋๋ฅผ ๋ณด๋ฉด ๋ฐ์ดํฐ์ ํด๋์ค 1000๊ฐ๋ฅผ ์ฌ์ฉํฉ๋๋ค. ๋ค๋ง ๋ชจ๋ธ ์ฝ๋๋ฅผ ๋ก์ปฌ์์ ๋๋ฆฌ๋ ๊ด๊ณ๋ก ๋ฐ์ดํฐ์ ์ด 10๊ฐ์ธ
CIFAR-10 ๋ฐ์ดํฐ์ ์ ์ฌ์ฉํ์์ต๋๋ค.
ํ๋ฒ ํ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ ๋ฐ ๋ฐ์ดํฐ์ ์ ๋ก๋ํด์ ์ ์ฒ๋ฆฌ ํ๋ ๊ณผ์ ์ ์ํํด ๋ณด๊ฒ ์ต๋๋ค.
import torch
import torch.nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
# ๋ฐ์ดํฐ์
์ ์ฒ๋ฆฌ
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
# CIFAR-10 Dataset download & load
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Input๋ Image Dataset์ Size๋ฅผ 224 x 224๋ก ์ง์ ํ ์ด์ ๋, ๋ ผ๋ฌธ์์ image input - 224 x 224 RGB ์ด๋ฏ๋ก ์ด๋ฏธ์ง ์ฌ์ด์ฆ ํฌ๊ธฐ๋ฅผ ์ง์ ํด์ฃผ์ด์ ๋ฃ์ด์ค์ผ ํฉ๋๋ค.
# ๋ฐ์ดํฐ ํ์ธ
index = 1 # ํ์ธํ ๋ฐ์ดํฐ ์ธ๋ฑ์ค
image, label = trainset[index] # ์ด๋ฏธ์ง์ ๋ ์ด๋ธ ๋ถ๋ฆฌ
# ์ด๋ฏธ์ง๋ฅผ ์๊ฐํํ๊ธฐ ์ํด numpy ๋ฐฐ์ด๋ก ๋ณํ
image_np = image.numpy().transpose((1, 2, 0)) # (C, H, W) -> (H, W, C)
# ์ด๋ฏธ์ง ์๊ฐํ
plt.imshow(image_np)
VGG16 Model Code
์๋๋ ๋ชจ๋ธ ์ฝ๋์ ๋๋ค.
import torch.nn as nn
class VGG16(nn.Module):
def __init__(self):
super(VGG16, self).__init__()
self.features = nn.Sequential(
# Block 1 (2๊ฐ 3x3 Convolution, 64 filter)
nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1), # Input: (3, 224, 224) -> Output: (64, 224, 224)
nn.ReLU(inplace=True),
nn.Conv2d(64, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling (2x2) with stride 2 -> Output: (64, 112, 112)
# Block 2 (2๊ฐ 3x3 Convolution, 128 filter)
nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1), # Input (64, 112, 112) -> Output (128, 112, 112)
nn.ReLU(inplace=True),
nn.Conv2d(128, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling (2x2) with stride 2 -> Output (128, 56, 56)
# Block 3 (3๊ฐ 3x3 Convolution, 256 filter)
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1), # Input (128, 56, 56) -> Output (256, 56, 56)
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling (2x2) with stride 2 -> Output (256, 28, 28)
# Block 4 (3๊ฐ 3x3 Convolution, 512 filter)
nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1), # Input (256, 28, 28) -> Output (512, 28, 28)
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling (2x2) with stride 2 -> Output (512, 14, 14)
# Block 5 (3๊ฐ 3x3 Convolution, 512 filter)
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1), # Input (512, 14, 14) -> Output (512, 14, 14)
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(512, 512, kernel_size=3, stride=1, padding=1),
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling (2x2) with stride 2 -> Output (512, 7, 7)
)
self.classifier = nn.Sequential(
nn.Flatten(),
nn.Linear(512 * 7 * 7, 4096), # First FC Layer (4096)
nn.ReLU(inplace=True),
nn.Linear(4096, 4096), # Second FC Layer (4096)
nn.ReLU(inplace=True),
nn.Linear(4096, 10), # Third FC Layer (1000) -> ์๋ ๋
ผ๋ฌธ๋๋ก ๋ผ๋ฉด 1000๊ฐ์ ๋ฐ์ดํฐ์
์ด ์์ด์ผ ํ์ง๋ง, ์ฌ์ฉํ ๋ฐ์ดํฐ์
์ด ํด๋์ค๊ฐ 10๊ฐ์ด๋ฏ๋ก 10์ผ๋ก ์ง์
)
def forward(self, x):
x = self.features(x)
x = self.classifier(x)
return x
model = VGG16()
print(model)
VGG16(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): ReLU(inplace=True)
(2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(3): ReLU(inplace=True)
(4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(6): ReLU(inplace=True)
(7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(8): ReLU(inplace=True)
(9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(13): ReLU(inplace=True)
(14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(15): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(16): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(17): ReLU(inplace=True)
(18): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(19): ReLU(inplace=True)
(20): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(21): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(22): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(23): ReLU(inplace=True)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(25): ReLU(inplace=True)
(26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(classifier): Sequential(
(0): Flatten(start_dim=1, end_dim=-1)
(1): Linear(in_features=25088, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Linear(in_features=4096, out_features=4096, bias=True)
(4): ReLU(inplace=True)
(5): Linear(in_features=4096, out_features=10, bias=True)
)
)
๋ง์ง๋ง FC Layer์ SoftMax๊ฐ ์ ์ฉ์ด ๋ฉ๋๋ค. ๊ทผ๋ฐ ์ถ๊ฐ๋ฅผ ์ํ ์ด์ ๋ nn.CrossEntropyLoss์ ๊ฐ์ ์์ค ํจ์์ Softmax๊ฐ ํฌํจ๋์ด ์์ผ๋ฏ๋ก, ๋ชจ๋ธ์ ์ต์ข ๋ ์ด์ด์์๋ ์ง์ ์ ์ฉํ ํ์๊ฐ ์์ต๋๋ค.
๋ชจ๋ธ์ ์ถ๋ ฅ์ logits ํํ๋ก, ์์ค ํจ์๊ฐ ๋ด๋ถ์ ์ผ๋ก SoftMax๋ฅผ ์ ์ฉํ์ฌ ํด๋์ค ํ๋ฅ ์ ๊ณ์ฐํฉ๋๋ค.
Model Compile
Loss Function (์์คํจ์), Optimizer ๋ฑ์ ์ ์ํฉ๋๋ค.
# ๋ชจ๋ธ ์ด๊ธฐํ
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = VGG16().to(device)
# ์์ค ํจ์์ ์ตํฐ๋ง์ด์ ์ ์
criterion = nn.CrossEntropyLoss() # ๊ต์ฐจ ์ํธ๋กํผ ์์ค ํจ์
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9) # SGD ์ตํฐ๋ง์ด์
Model ํ๋ จ, ํ๊ฐ ํจ์ ์ ์
ํ๋ จ ๋ฐ์ดํฐ์ ํจ๊ป ๋ชจ๋ธ์ ํ๋ จ์ํต๋๋ค. ์ฌ๊ธฐ์๋ ๋ฐฐ์น ํฌ๊ธฐ, ์ํฌํฌ ์ ๋ฑ์ ํ์ดํผํ๋ผ๋ฏธํฐ ์ค์ ์ด ํฌํจ๋ฉ๋๋ค.
def train(model, device, train_loader, optimizer, epoch):
model.train() # ๋ชจ๋ธ์ ํ์ต ๋ชจ๋๋ก ์ค์
train_loss = 0
correct = 0
total = 0
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device) # ๋ฐ์ดํฐ๋ฅผ ์ฅ์น๋ก ์ด๋
optimizer.zero_grad() # ์ด์ ๊ธฐ์ธ๊ธฐ ์ด๊ธฐํ
output = model(data) # ๋ชจ๋ธ ์์ธก
loss = criterion(output, target) # ์์ค ๊ณ์ฐ
loss.backward() # ์ญ์ ํ๋ฅผ ํตํด ๊ธฐ์ธ๊ธฐ ๊ณ์ฐ
optimizer.step() # ๊ฐ์ค์น ์
๋ฐ์ดํธ
train_loss += loss.item() # ๋ฐฐ์น ์์ค ํฉ์ฐ
# ํ์ต ์ ํ๋ ๊ณ์ฐ
pred = output.argmax(dim=1, keepdim=True)
correct += pred.eq(target.view_as(pred)).sum().item()
total += target.size(0)
if batch_idx % 100 == 0: # 100๋ฒ์งธ ๋ฐฐ์น๋ง๋ค ๋ก๊ทธ ์ถ๋ ฅ
print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')
train_loss /= len(train_loader) # ํ๊ท ์์ค ๊ณ์ฐ
train_accuracy = 100. * correct / total
return train_loss, train_accuracy
# ๋ชจ๋ธ ํ๊ฐ ํจ์ ์ ์
def test(model, device, test_loader):
model.eval() # ๋ชจ๋ธ์ ํ๊ฐ ๋ชจ๋๋ก ์ค์
test_loss = 0
correct = 0
with torch.no_grad(): # ํ๊ฐ ์์๋ ๊ธฐ์ธ๊ธฐ๋ฅผ ๊ณ์ฐํ์ง ์์
for data, target in test_loader:
data, target = data.to(device), target.to(device)
output = model(data)
test_loss += criterion(output, target).item() # ์์ค ํฉ์ฐ
pred = output.argmax(dim=1, keepdim=True) # ๊ฐ์ฅ ๋์ ํ๋ฅ ์ ๊ฐ์ง ํด๋์ค ์์ธก
correct += pred.eq(target.view_as(pred)).sum().item() # ๋ง์ถ ๊ฐ์ ํฉ์ฐ
test_loss /= len(test_loader.dataset)
test_accuracy = 100. * correct / len(test_loader.dataset)
print(f'\nTest set: Average loss: {test_loss:.4f}, Accuracy: {correct}/{len(test_loader.dataset)} '
f'({test_accuracy:.0f}%)\n')
return test_loss, test_accuracy
Epoch (ํ์ต ํ์)๋ ๋ง๊ฐ์์ ๋ ผ๋ฌธ์ ๋์จ๊ฒ ์ฒ๋ผ 50๋ฒ์ ํ๊ณ ์ถ์์ง๋ง.. ์๊ฐ์ด์ ๋๋ฌธ์ 10๋ฒ๋ง ํ์ต์ ์์ผ๋ณด๊ฒ ์ต๋๋ค.
epochs = 10
train_losses, test_losses, train_accuracies, test_accuracies = [], [], [], []
# ๋ชจ๋ธ ํ์ต
for epoch in range(1, epochs + 1):
train_loss, train_accuracy = train(model, device, trainloader, optimizer, epoch)
test_loss, test_accuracy = test(model, device, testloader)
train_losses.append(train_loss)
test_losses.append(test_loss)
train_accuracies.append(train_accuracy)
test_accuracies.append(test_accuracy)
๋ชจ๋ธ ํ๊ฐ ๋ฐ ์์ธก
๋ชจ๋ธ์ ์ ๋ฐ์ ์ธ Architecture & ์ผ๋ง๋ Over, Underfitting์ด ๋์๋์ง ๊ทธ๋ํ๋ฅผ ๊ทธ๋ ค ํ๋ฒ ํ์ธํด ๋ณด๊ฒ ์ต๋๋ค.
from torchsummary import summary
summary(model, input_size=(3, 224, 224))
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 224, 224] 1,792
ReLU-2 [-1, 64, 224, 224] 0
Conv2d-3 [-1, 64, 224, 224] 36,928
ReLU-4 [-1, 64, 224, 224] 0
MaxPool2d-5 [-1, 64, 112, 112] 0
Conv2d-6 [-1, 128, 112, 112] 73,856
ReLU-7 [-1, 128, 112, 112] 0
Conv2d-8 [-1, 128, 112, 112] 147,584
ReLU-9 [-1, 128, 112, 112] 0
MaxPool2d-10 [-1, 128, 56, 56] 0
Conv2d-11 [-1, 256, 56, 56] 295,168
ReLU-12 [-1, 256, 56, 56] 0
Conv2d-13 [-1, 256, 56, 56] 590,080
ReLU-14 [-1, 256, 56, 56] 0
Conv2d-15 [-1, 256, 56, 56] 590,080
MaxPool2d-16 [-1, 256, 28, 28] 0
Conv2d-17 [-1, 512, 28, 28] 1,180,160
ReLU-18 [-1, 512, 28, 28] 0
Conv2d-19 [-1, 512, 28, 28] 2,359,808
ReLU-20 [-1, 512, 28, 28] 0
Conv2d-21 [-1, 512, 28, 28] 2,359,808
MaxPool2d-22 [-1, 512, 14, 14] 0
Conv2d-23 [-1, 512, 14, 14] 2,359,808
ReLU-24 [-1, 512, 14, 14] 0
Conv2d-25 [-1, 512, 14, 14] 2,359,808
ReLU-26 [-1, 512, 14, 14] 0
Conv2d-27 [-1, 512, 14, 14] 2,359,808
MaxPool2d-28 [-1, 512, 7, 7] 0
Flatten-29 [-1, 25088] 0
Linear-30 [-1, 4096] 102,764,544
ReLU-31 [-1, 4096] 0
Linear-32 [-1, 4096] 16,781,312
ReLU-33 [-1, 4096] 0
Linear-34 [-1, 10] 40,970
================================================================
Total params: 134,301,514
Trainable params: 134,301,514
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 208.76
Params size (MB): 512.32
Estimated Total Size (MB): 721.65
----------------------------------------------------------------
# ์ ํ๋ ๊ทธ๋ํ ๊ทธ๋ฆฌ๊ธฐ
plt.plot(range(1, epochs + 1), train_accuracies, label='Train Accuracy')
plt.plot(range(1, epochs + 1), test_accuracies, label='Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.title('Train and Test Accuracy over Epochs')
plt.legend()
plt.show()
์คํํ์๋ ๊ทธ๋ํ๋ฅผ ๋ณด๋ฉด, Overfitting์ด ๋๋ค๋ ์ ์ ๋ณผ์ ์์ต๋๋ค.
๋ค๋ง ์ด๋ฌํ ์ ์ Batchsize๋ฅผ 64, epoch๋ฅผ 10์ ๋ ์ฃผ์ด, ๋๋ฌด ๋ง์ด ํ์ต์ด ๋์ด์ ๊ทธ๋ ๊ฒ ๋ณผ์ ์๋ค๊ณ ์๊ฐํฉ๋๋ค..