A A
[DL] ๋‹จ์ˆœํ•œ Layer ๊ตฌํ˜„ํ•ด๋ณด๊ธฐ

 

์ด๋ฒˆ๊ธ€์—์„œ๋Š” ๋‹จ์ˆœํ•œ Layer ๋ถ€ํ„ฐ ํ•œ๋ฒˆ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์•ž์˜ ๊ธ€์—์„œ๋ณธ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ๊ณฑ์…ˆ ๋…ธ๋“œ๋ฅผ 'MultiLayer', ๋ง์…ˆ ๋…ธ๋“œ๋ฅผ 'AddLayer'๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

๊ณฑ์…ˆ ๊ณ„์ธต

  • ๋ชจ๋“  ๊ณ„์ธต์€ forward()์™€ backward()๋ผ๋Š” ๊ณตํ†ต์˜ Method(์ธํ„ฐํŽ˜์ด์Šค)๋ฅผ ๊ฐ–๋„๋ก ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.
  • forward()๋Š” Forward Propagation(์ˆœ์ „ํŒŒ), backward()๋Š” Back propagation(์—ญ์ „ํŒŒ)๋ฅผ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# coding: utf-8

class MulLayer:
    def __init__(self):
        self.x = None
        self.y = None

	# x์™€ y๋ฅผ ์ธ์ˆ˜๋ผ ๋ฐ›๊ณ  ๋‘ ๊ฐ’์„ ๊ณฑํ•ด์„œ ๋ฐ˜ํ™˜
    def forward(self, x, y):
        self.x = x
        self.y = y                
        out = x * y

        return out

	# ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„(dout)์— ์ˆœ์ „ํŒŒ ๋•Œ์˜ ๊ฐ’์„ 
    # ์„œ๋กœ ๋ฐ”๊ฟ” ๊ณฑํ•œ ํ›„ ํ•˜๋ฅ˜๋กœ ํ˜๋ ค์คŒ
    def backward(self, dout):
        dx = dout * self.y  # x์™€ y๋ฅผ ๋ฐ”๊พผ๋‹ค.
        dy = dout * self.x

        return dx, dy
  • ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜์ธ x,y๋ฅผ ์ดˆ๊ธฐํ™” ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‘ ๋ณ€์ˆ˜๋Š” Forward PropagatIon(์ˆœ์ „ํŒŒ)์‹œ ์ž…๋ ฅ ๊ฐ’์„ ์œ ์ง€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • forward()์—์„œ๋Š” x, y๋ฅผ ์ธ์ˆ˜๋กœ ๋ฐ›๊ณ  ๋‘ ๊ฐ’์„ ๊ณฑํ•ด์„œ ๋ฐ˜ํ™˜ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋ฐ˜๋ฉด backward()์—์„œ๋Š” ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„(dout)์— Forward PropagatIon(์ˆœ์ „ํŒŒ) ๋•Œ์˜ ๊ฐ’์„ '์„œ๋กœ ๋ด๊ฟ”' ๊ณฑํ•œ ํ›„ ํ•˜๋ฅ˜๋กœ ํ˜๋ฆฝ๋‹ˆ๋‹ค.

  • MulLayer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ˆœ์ „ํŒŒ๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# coding: utf-8
from layer_naive import *

apple = 100
apple_num = 2
tax = 1.1

mul_apple_layer = MulLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)
price = mul_tax_layer.forward(apple_price, tax)

print(price) # 200
  • ๊ฐ ๋ณ€์ˆ˜์— ๋Œ€ํ•œ ๋ฏธ๋ถ„์€ backward() ์—์„œ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
# backward (์—ญ์ „ํŒŒ)

dprice = 1
dapple_price, dtax = mul_tax_layer.backward(dprice)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)

print(dapple, dapple_num, dtax) # 2.2 110 200
  • backward() ์—ญ์ „ํŒŒ์˜ ํ˜ธ์ถœ ์ˆœ์„œ๋Š” forward() ์ˆœ์ „ํŒŒ ๋•Œ์™€๋Š” ๋ฐ˜๋Œ€์ž…๋‹ˆ๋‹ค.
  • ๋˜, backward()๊ฐ€ ๋ฐ›๋Š” ์ธ์ˆ˜๋Š” '์ˆœ์ „ํŒŒ์˜ ์ถœ๋ ฅ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„'์ž„์— ์ฃผ์˜ํ•ฉ๋‹ˆ๋‹ค.

๋ง์…ˆ ๊ณ„์ธต

class AddLayer:
	def __init__(self):
    	pass
        
	# ์ž…๋ ฅ๋ฐ›์€ ๋‘ ์ธ์ˆ˜ x, y๋ฅผ ๋”ํ•ด์„œ ๋ฐ˜ํ™˜
    def forward(self, x, y):
    	out = x + y
        return out
        
	# ์ƒ๋ฅ˜์—์„œ ๋‚ด๋ ค์˜จ ๋ฏธ๋ถ„(dout)์„ ๊ทธ๋Œ€๋กœ ํ•˜๋ฅ˜๋กœ ํ˜๋ ค์คŒ
    def backward(self, dout):
    	dx = dout * 1
        dy = dout * 1
        return dx, dy
  • ๋ง์…ˆ ๊ณ„์ธต์—์„œ๋Š” ์ดˆ๊ธฐํ™”๊ฐ€ ํ•„์š” ์—†๊ธฐ ๋•Œ๋ฌธ์— _init_()์—์„œ๋Š” ์•„๋ฌด ์ผ๋„ ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
    • ์ฆ‰, pass๋Š” ์•„๋ฌด๊ฒƒ๋„ ํ•˜์ง€ ๋ง๋ผ๋Š” ๋ช…๋ น์ž…๋‹ˆ๋‹ค.
  • ๋ง์…ˆ ๊ณ„์ธต์˜ forward()์—์„œ๋Š” ์ž…๋ ฅ๋ฐ›์€ ๋‘ ์ธ์ˆ˜ x, y๋ฅผ ๋”ํ•ด์„œ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • backward()์—์„œ๋Š” ์ƒ๋ฅ˜์—์„œ ๋‚ด๋ ค์˜จ ๋ฏธ๋ถ„(dout)์„ ๊ทธ๋Œ€๋กœ ํ•˜๋ฅ˜๋กœ ํ˜๋ฆด ๋ฟ์ž…๋‹ˆ๋‹ค.
๊ทธ๋Ÿฌ๋ฉด ๋ง์…ˆ, ๊ณฑ์…ˆ ๊ณ„์ธต์„ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ๊ณผ 2๊ฐœ์™€ ๊ทค 3๊ฐœ๋ฅผ ์‚ฌ๋Š” ์ƒํ™ฉ์„ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

# coding: utf-8
from layer_naive import *

apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1

# layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()

# forward
apple_price = mul_apple_layer.forward(apple, apple_num)  # (1)
orange_price = mul_orange_layer.forward(orange, orange_num)  # (2)
all_price = add_apple_orange_layer.forward(apple_price, orange_price)  # (3)
price = mul_tax_layer.forward(all_price, tax)  # (4)

# backward
dprice = 1
dall_price, dtax = mul_tax_layer.backward(dprice)  # (4)
dapple_price, dorange_price = add_apple_orange_layer.backward(dall_price)  # (3)
dorange, dorange_num = mul_orange_layer.backward(dorange_price)  # (2)
dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # (1)

print("price:", int(price))
print("dApple:", dapple)
print("dApple_num:", int(dapple_num))
print("dOrange:", dorange)
print("dOrange_num:", int(dorange_num))
print("dTax:", dtax)
  • ํ•˜๋‚˜ํ•˜๋‚˜์˜ ๋ช…๋ น์€ ๋‹จ์ˆœํ•ฉ๋‹ˆ๋‹ค. ํ•„์š”ํ•œ ๊ณ„์ธต์„ ๋งŒ๋“ค์–ด์„œ Forward Propagation(์ˆœ์ „ํŒŒ) Method์ธ forward()๋ฅผ ์ ์ ˆํ•œ ์ˆœ์„œ๋กœ ํ˜ธ์ถœํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฐ ๋‹ค์Œ Forward Propagation(์ˆœ์ „ํŒŒ)์™€ ๋ฐ˜๋Œ€ ์ˆœ์„œ๋กœ Back Propagation(์—ญ์ „ํŒŒ) Method์ธ Backward()๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ์›ํ•˜๋Š” ๋ฏธ๋ถ„์ด ๋‚˜์˜ต๋‹ˆ๋‹ค.

Model Code (by Python)

# ๊ณฑ์…ˆ ๊ณ„์ธต ์ •์˜
class MulLayer:
    def __init__(self):
        self.x = None  # ์ˆœ์ „ํŒŒ ์ž…๋ ฅ๊ฐ’ ์œ ์ง€๋ฅผ ์œ„ํ•ด์„œ ์ดˆ๊ธฐํ™”
        self.y = None  # ์ˆœ์ „ํŒŒ ์ž…๋ ฅ๊ฐ’ ์œ ์ง€๋ฅผ ์œ„ํ•ด์„œ ์ดˆ๊ธฐํ™”

    def forward(self, x, y):
        self.x = x  # ์ˆœ์ „ํŒŒ ์‹œ ์ž…๋ ฅ๋œ x๊ฐ’ ์ €์žฅ
        self.y = y  # ์ˆœ์ „ํŒŒ ์‹œ ์ž…๋ ฅ๋œ y๊ฐ’ ์ €์žฅ
        out = x * y  # ์ž…๋ ฅ๋œ ๋‘ ๊ฐ’์˜ ๊ณฑ ๋ฐ˜ํ™˜
        return out

# ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„(dout)์—์„œ ์ˆœ์ „ํŒŒ ๋•Œ์˜ ๊ฐ’์„ ์„œ๋กœ ๋ด๊ฟ” ๊ณฑํ•œ ํ›„ ํ•˜๋ฅ˜๋กœ ํ˜๋ฆผ
    def backward(self, dout):
        dx = dout * self.y  # x์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’ ๊ณ„์‚ฐ
        dy = dout * self.x  # y์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’ ๊ณ„์‚ฐ
        return dx, dy  # ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜


# ๋ง์…ˆ ๊ณ„์ธต ์ •์˜
class AddLayer:
    def __init__(self):
        pass  # ๋ง์…ˆ ๊ณ„์ธต์€ ๋ณ„๋„์˜ ์ดˆ๊ธฐํ™” ์ž‘์—…์ด ํ•„์š” ์—†์Œ

    def forward(self, x, y):
        out = x + y  # ์ž…๋ ฅ๋œ ๋‘ ๊ฐ’์˜ ํ•ฉ ๋ฐ˜ํ™˜
        return out

# ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„(dout)์—์„œ ์ˆœ์ „ํŒŒ ๋•Œ์˜ ๊ฐ’์„ ์„œ๋กœ ๋ด๊ฟ” ๊ณฑํ•œ ํ›„ ํ•˜๋ฅ˜๋กœ ํ˜๋ฆผ
    def backward(self, dout):
        dx = dout * 1  # x์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’ ๊ณ„์‚ฐ
        dy = dout * 1  # y์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’ ๊ณ„์‚ฐ
        return dx, dy  # ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜


if __name__ == '__main__':
    # ๋ฌธ์ œ1: ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ ์˜ˆ์‹œ
    apple = 100  # ์‚ฌ๊ณผ ํ•œ ๊ฐœ ๊ฐ€๊ฒฉ
    apple_num = 2  # ์‚ฌ๊ณผ ๊ฐœ์ˆ˜
    tax = 1.1  # ์„ธ๊ธˆ

    # ๊ณ„์ธต ์ƒ์„ฑ
    mul_apple_layer = MulLayer()  # ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๊ณฑ์…ˆ ๊ณ„์ธต
    mul_tax_layer = MulLayer()  # ์„ธ๊ธˆ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๊ณฑ์…ˆ ๊ณ„์ธต

    # ์ˆœ์ „ํŒŒ
    apple_price = mul_apple_layer.forward(apple, apple_num)  # ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ
    price = mul_tax_layer.forward(apple_price, tax)  # ์ตœ์ข… ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ

    print(price)  # ์ตœ์ข… ๊ฐ€๊ฒฉ ์ถœ๋ ฅ

    # ์—ญ์ „ํŒŒ
    dprice = 1  # ๊ฐ€๊ฒฉ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’ ์ดˆ๊ธฐํ™”
    dapple_price, dtax = mul_tax_layer.backward(dprice)  # ์„ธ๊ธˆ ๊ณ„์‚ฐ ์—ญ์ „ํŒŒ
    dapple, dapple_num = mul_apple_layer.backward(dapple_price)  # ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ ์—ญ์ „ํŒŒ

    print(dapple, dapple_num, dtax)  # ๋ฏธ๋ถ„ ๊ฒฐ๊ณผ ์ถœ๋ ฅ

    # ๋ฌธ์ œ2: ์‚ฌ๊ณผ์™€ ์˜ค๋ Œ์ง€ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ ์˜ˆ์‹œ
    orange = 150  # ์˜ค๋ Œ์ง€ ํ•œ ๊ฐœ ๊ฐ€๊ฒฉ
    orange_num = 3  # ์˜ค๋ Œ์ง€ ๊ฐœ์ˆ˜

    # ๊ณ„์ธต ์žฌ์‚ฌ์šฉ ๋ฐ ์ƒˆ๋กœ์šด ๊ณ„์ธต ์ƒ์„ฑ
    mul_apple_layer = MulLayer()  # ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๊ณฑ์…ˆ ๊ณ„์ธต (์žฌ์‚ฌ์šฉ)
    mul_orange_layer = MulLayer()  # ์˜ค๋ Œ์ง€ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๊ณฑ์…ˆ ๊ณ„์ธต
    add_apple_orange_layer = AddLayer()  # ์‚ฌ๊ณผ์™€ ์˜ค๋ Œ์ง€ ๊ฐ€๊ฒฉ ํ•ฉ์‚ฐ์„ ์œ„ํ•œ ๋ง์…ˆ ๊ณ„์ธต
    mul_tax_layer = MulLayer()  # ์„ธ๊ธˆ ๊ณ„์‚ฐ์„ ์œ„ํ•œ ๊ณฑ์…ˆ ๊ณ„์ธต (์žฌ์‚ฌ์šฉ)

    # ์ˆœ์ „ํŒŒ
    apple_price = mul_apple_layer.forward(apple, apple_num)  # ์‚ฌ๊ณผ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ
    orange_price = mul_orange_layer.forward(orange, orange_num)  # ์˜ค๋ Œ์ง€ ๊ฐ€๊ฒฉ ๊ณ„์‚ฐ
    all_price = add_apple_orange_layer.forward(apple_price,

Activation Function Layer (ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ๊ณ„์ธต) ๊ตฌํ˜„ํ•˜๊ธฐ

๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์‹ ๊ฒฝ๋ง์— ์ ์šฉํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์‹ ๊ฒฝ๋ง์„ ๊ตฌ์„ฑํ•˜๋Š” Layer(๊ณ„์ธต)์„ ๊ฐ๊ฐ์˜ ํด๋ž˜์Šค ํ•˜๋‚˜๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

ReLU ๊ณ„์ธต

  • ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉ๋˜๋Š” ReLU์˜ ์ˆ˜์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • x์— ๋Œ€ํ•œ y์˜ ๋ฏธ๋ถ„์€ ์•„๋ž˜์˜ ์‹์ฒ˜๋Ÿผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

  • ์œ„์˜ ์ˆ˜์‹๊ณผ ๊ฐ™์ด, Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ์˜ ์ž…๋ ฅ์ธ x๊ฐ€ 0๋ณด๋‹ค ํฌ๋ฉด Back Propagation(์—ญ์ „ํŒŒ)๋Š” ์ƒ๋ฅ˜์˜ ๊ฐ’์„ ๊ทธ๋Œ€๋กœ ํ•˜๋ฅ˜๋กœ ํ˜๋ฆฝ๋‹ˆ๋‹ค.
  • ๋‹ค๋งŒ, Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ x๊ฐ€ 0 ์ดํ•˜๋ฉด Back Propagation(์—ญ์ „ํŒŒ) ๋•Œ๋Š” ํ•˜๋ฅ˜๋กœ ์‹ ํ˜ธ๋ฅผ ๋ณด๋‚ด์ง€ ์•Š์Šต๋‹ˆ๋‹ค. (0์„ ๋ณด๋ƒ…๋‹ˆ๋‹ค.) ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋กœ๋Š” ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.

  • ์ด์ œ ํ•œ๋ฒˆ ReLU ๊ณ„์ธจ์„ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
class Relu:
    def __init__(self):
        self.mask = None  # ์ž…๋ ฅ๊ฐ’์ด 0 ์ดํ•˜์ธ์ง€ ์—ฌ๋ถ€๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

    def forward(self, x):
        self.mask = (x <= 0)  # x์˜ ๊ฐ’์ด 0 ์ดํ•˜์ธ ์›์†Œ๋Š” True, ๊ทธ ์™ธ๋Š” False๋กœ ํ•˜๋Š” ๋ฐฐ์—ด์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
        out = x.copy()  # ์ž…๋ ฅ๊ฐ’ x์˜ ๋ณต์‚ฌ๋ณธ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.
        out[self.mask] = 0  # mask๊ฐ€ True์ธ ์œ„์น˜, ์ฆ‰ x์˜ ๊ฐ’์ด 0 ์ดํ•˜์ธ ์œ„์น˜์˜ ์›์†Œ๋ฅผ 0์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
        
        return out  # ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        
    def backward(self, dout):
        dout[self.mask] = 0  # ์ˆœ์ „ํŒŒ ๋•Œ 0 ์ดํ•˜์˜€๋˜ ์›์†Œ์— ๋Œ€์‘ํ•˜๋Š” ์—ญ์ „ํŒŒ ๊ฐ’์€ 0์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.
        dx = dout  # ๋‚˜๋จธ์ง€ ์›์†Œ๋Š” ๊ทธ๋Œ€๋กœ dout์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        
        return dx  # ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ReLU ํด๋ž˜์Šค๋Š” mask๋ผ๋Š” ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
  • mask๋Š” True/False๋กœ ๊ตฌ์„ฑ๋œ Numpy Array(๋ฐฐ์—ด)๋กœ, Forward Propagation(์ˆœ์ „ํŒŒ)์˜ ์ž…๋ ฅ์ธ x์˜ ์›์†Œ ๊ฐ’์ด 0์ดํ•˜์ธ index๋Š” True, ๊ทธ ์™ธ(0๋ณด๋‹ค ํฐ ์›์†Œ)๋Š” False๋กœ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ์˜ˆ์ปจ๋Œ€ mask ๋ณ€์ˆ˜๋Š” True/False๋กœ ๊ตฌ์„ฑ๋œ Numpy ๋ฐฐ์—ด์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.

Sigmoid ๊ณ„์ธต

  • Sigmoid ํ•จ์ˆ˜๋Š” ๋‹ค์Œ ์‹์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

  • ์œ„์˜ ์‹์„ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋กœ ๊ทธ๋ฆฌ๋ฉด ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๋ฉ๋‹ˆ๋‹ค.

Sigmoid ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ (Forward Propagation - ์ˆœ์ „ํŒŒ)

  • 'x'์™€ '+' ๋…ธ๋“œ๋ง๊ณ ๋„, 'exp' ,'/' ๋…ธ๋“œ๊ฐ€ ์žˆ๋Š”๋ฐ, 'exp' ๋…ธ๋“œ๋Š” y = exp(x) ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  '/' ๋…ธ๋“œ๋Š” y = 1/x ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ณ„์‚ฐ์€ '๊ตญ์†Œ์  ๊ณ„์‚ฐ'์˜ ์ „ํŒŒ๋กœ ์ด๋ค„์ง‘๋‹ˆ๋‹ค. ์ด์ œ ์œ„์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ Back Propagation(์—ญ์ „ํŒŒ)์˜ ํ๋ฆ„์„ ์˜ค๋ฅธ์ชฝ์—์„œ ์™ผ์ชฝ์œผ๋กœ ํ•œ ๋‹จ๊ณ„์”ฉ ์‹ถ์–ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

1๋‹จ๊ณ„

  • '/' ๋…ธ๋“œ, y = 1/x ์„ ๋ฏธ๋ถ„ํ•˜๋ฉด ๋‹ค์Œ ์‹์ด ๋ฉ๋‹ˆ๋‹ค.

  • Back Propagation(์—ญ์ „ํŒŒ) ๋•Œ๋Š” ์ƒ๋ฅ˜์—์„œ ํ˜๋Ÿฌ์˜จ ๊ฐ’์— -y**2 (์ˆœ์ „ํŒŒ์˜ ์ถœ๋ ฅ์„ ์ œ๊ณฑํ•œ ํ›„ ๋งˆ์ด๋„ˆ์Šค๋ฅผ ๋ถ™์ธ ๊ฐ’)์„ ๊ณฑํ•ด์„œ ํ•˜๋ฅ˜๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค.
  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

2๋‹จ๊ณ„

  • '+'๋…ธ๋“œ๋Š” ์ƒ๋ฅ˜์˜ ๊ฐ’์„ ์—ฌ๊ณผ ์—†์ด ํ•˜๋ฅ˜๋กœ ๋‚ด๋ณด๋‚ด์ง€๋Š”๊ฒŒ ๋‹ค์ž…๋‹ˆ๋‹ค.

3๋‹จ๊ณ„

  • 'exp' ๋…ธ๋“œ๋Š” y = exp(x) ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, ๊ทธ ๋ฏธ๋ถ„์€ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์—์„œ๋Š” ์ƒ๋ฅ˜์˜ ๊ฐ’์— Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ์˜ ์ถœ๋žต(์ด ์˜ˆ์—์„œ๋Š” exp(-x))์„ ๊ณฑํ•ด ํ•˜๋ฅ˜๋กœ ์ „ํŒŒํ•ฉ๋‹ˆ๋‹ค.

4๋‹จ๊ณ„

  • 'x'๋…ธ๋“œ๋Š” Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ์˜ ๊ฐ’์„ '์„œ๋กœ ๋ด๊ฟ”' ๊ณฑํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” -1๋ฅผ ๊ณฑํ•ฉ๋‹ˆ๋‹ค.

  • ์œ„์— ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด, Sigmoid ๊ณ„์ธต์˜ Back Propagation(์—ญ์ „ํŒŒ)๋ฅผ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์™„์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์ด๋Ÿฌํ•œ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ์ค‘๊ฐ„ ๊ณผ์ •์„ ๋ชจ๋‘ ๋ฌถ์–ด ๋‹จ์ˆœํ•œ 'Sigmoid' ๋…ธ๋“œ ํ•˜๋‚˜๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Sigmoid ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„ (๊ฐ„์†Œํ™” ๋ฒ„์ „)

  • ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์™€, ๊ฐ„์†Œํ™” ๊ทธ๋ž˜ํ”„์˜ ๊ฐ„์†Œํ™” ๋ฒ„์ „์˜ ๊ฒฐ๊ณผ๋Š” ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ ‡์ง€๋งŒ, ๊ฐ„์†Œํ™” ๋ฒ„์ „์€ Back Propagation(์—ญ์ „ํŒŒ) ๊ณผ์ •์˜ ์ค‘๊ฐ„ ๊ณผ์ •์„ ์ƒ๋žตํ•˜์—ฌ ๋” ํšจ์œจ์ ์ธ ๊ณ„์‚ฐ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๋˜ํ•œ Node๋ฅผ ๊ทธ๋ฃนํ™” ํ•˜์—ฌ Sigmoid ๊ณ„์ธต์˜ ์„ธ์„ธํ•œ ๋‚ด์šฉ์„ ๋…ธ์ถœํ•˜์ง€ ์•Š๊ณ  ์ž…๋ ฅ, ์ถœ๋ ฅ์—๋งŒ ์ง‘์ค‘ ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ๋„ ์ข‹์€ ํฌ์ธํŠธ ์ž…๋‹ˆ๋‹ค.

๋ฏธ๋ถ„ ์ˆ˜์‹ ์ •๋ฆฌ

  • ์ด์ฒ˜๋Ÿผ Sigmoid ๊ณ„์ธต์˜ Back Propagation(์—ญ์ „ํŒŒ)๋Š” Forward Progagation(์ˆœ์ „ํŒŒ)์˜ ์ถœ๋ ฅ(y)๋งŒ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Sigmoid ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„: ์ˆœ์ „ํŒŒ์˜ ์ถœ๋ ฅ y๋งŒ์œผ๋กœ ์—ญ์ „ํŒŒ๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

  • ๊ทธ๋Ÿฌ๋ฉด ํ•œ๋ฒˆ Sigmoid ๊ณ„์ธต์„ Python์œผ๋กœ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ๋Š” Forward Progagation(์ˆœ์ „ํŒŒ)์˜ ์ถœ๋ ฅ์„ ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜ out์— ๋ณด๊ด€ํ–ˆ๋‹ค๊ฐ€, Back Propagation(์—ญ์ „ํŒŒ)๋•Œ ๊ทธ ๊ฐ’์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
class Sigmoid:
    def __init__(self):
        self.out = None  # ์ˆœ์ „ํŒŒ์˜ ์ถœ๋ ฅ๊ฐ’์„ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

    def forward(self, x):
        out = 1 / (1 + np.exp(-x))  # ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
        self.out = out  # ์ˆœ์ „ํŒŒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฐ’์€ ์—ญ์ „ํŒŒ ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
        return out  # ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        
    def backward(self, dout):
        dx = dout * (1.0 - self.out) * self.out  # ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
        return dx  # ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        
# dout์€ ์ƒ๋ฅ˜(๋‹ค์Œ ๊ณ„์ธต)์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„๊ฐ’์ž…๋‹ˆ๋‹ค.
# ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์€ y(1-y)์ด๋ฉฐ, ์—ฌ๊ธฐ์„œ y๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ๊ฐ’์ž…๋‹ˆ๋‹ค.
# ๋”ฐ๋ผ์„œ, self.out์ด y์— ํ•ด๋‹นํ•˜๊ณ , (1.0 - self.out) * self.out์ด y(1-y)์— ํ•ด๋‹นํ•ฉ๋‹ˆ๋‹ค.
# ์ด๋ฅผ ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„๊ฐ’๊ณผ ๊ณฑํ•˜์—ฌ ์ด ๊ณ„์ธต์„ ํ†ต๊ณผํ•  ๋•Œ์˜ ๋ฏธ๋ถ„๊ฐ’์„ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.

Affine ๊ณ„์ธต

์‹ ๊ฒฝ๋ง์˜ Forward Propagation(์ˆœ์ „ํŒŒ)์—์„œ๋Š” Weight(๊ฐ€์ค‘์น˜) ์‹ ํ˜ธ์˜ ์ดํ•ฉ์„ ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ–‰๋ ฌ์˜ ๊ณฑ(Numpy์—์„œ๋Š” np.dot())์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
  • Neuron(๋‰ด๋Ÿฐ)์˜ Weight(๊ฐ€์ค‘์น˜) ํ•ฉ์€ Y = np.dot(X, W) + B์ฒ˜๋Ÿผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์ด Y๋ฅผ Activation Function(ํ™œ์„ฑํ™” ํ•จ์ˆ˜)๋กœ ๋ณ€ํ™˜ํ•ด ๋‹ค์Œ Layer(์ธต)์œผ๋กœ ์ „ํŒŒํ•˜๋Š” ๊ฒƒ์ด ์‹ ๊ฒฝ๋ง Forward Propagation(์ˆœ์ „ํŒŒ)์˜ ํ๋ฆ„์ด์˜€์Šต๋‹ˆ๋‹ค.
  • ํ–‰๋ ฌ์˜ ๊ณฑ ๊ณ„์‚ฐ์€ ๋Œ€์‘ํ•˜๋Š” Dimension(์ฐจ์›)์˜ ์›์†Œ ์ˆ˜๋ฅผ ์ผ์น˜์‹œํ‚ค๋Š”๊ฒŒ ํ•ต์‹ฌ์ž…๋‹ˆ๋‹ค. ํ–‰๋ ฌ์˜ ํ˜•์ƒ์„ (2, 3)์ฒ˜๋Ÿผ ๊ด„ํ˜ธ๋กœ ํ‘œ๊ธฐํ•˜๋Š” ์ด๋‰ด๋Š” Numpy shapeํ•จ์ˆ˜์˜ Output(์ถœ๋ ฅ) & ํ˜•ํƒœ๋ฅผ ํ†ต์ผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ž…๋‹ˆ๋‹ค.

์‹ ๊ฒฝ๋ง์˜ Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ ์ˆ˜ํ–‰ํ•˜๋Š” ํ–‰๋ ฌ์˜ ๊ณฑ์€ ๊ธฐํ•˜ํ•™์—์„œ Affine Transformation(์–ดํŒŒ์ธ ๋ณ€ํ™˜)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ํ–‰๋ ฌ์˜ ๊ณฑ๊ณผ Bias(ํŽธํ–ฅ)์˜ ํ•ฉ์„ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋กœ ํ•œ๋ฒˆ ๊ทธ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๊ณฑ์„ ๊ณ„์‚ฐํ•˜๋Š” ๋…ธ๋“œ๋ฅผ 'dot'์ด๋ผ ํ•˜๋ฉด np.dot(X, W) + B ๊ณ„์‚ฐ์€ ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„์ฒ˜๋Ÿผ ๊ทธ๋ ค์ง‘๋‹ˆ๋‹ค.
  • ์ฐธ๊ณ ๋กœ ์ง€๊ธˆ๊นŒ์ง€์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋Š” ๋…ธ๋“œ ์‚ฌ์ด์— '์Šค์นผ๋ผ ๊ฐ’'์ด ํ˜๋ €๋Š”๋ฐ ๋ฐ˜ํ•ด, ์ด ์˜ˆ์—์„œ๋Š” 'ํ–‰๋ ฌ'์ด ํ๋ฅด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Affine ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„: ๋ณ€์ˆ˜๊ฐ€ ํ–‰๋ ฌ์ž„์— ์ฃผ์˜, ๊ฐ ๋ณ€์ˆ˜์˜ ํ˜•์ƒ์„ ๋ณ€์ˆ˜๋ช… ์œ„์— ํ‘œ๊ธฐ

  • ๊ทธ๋Ÿฌ๋ฉด ์ด๋ฒˆ์—๋Š” Back propagation(์—ญ์ „ํŒŒ)์— ๋Œ€ํ•ด ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ํ–‰๋ ฌ์„ ์‚ฌ์šฉํ•œ Back propagation(์—ญ์ „ํŒŒ)๋„ ํ–‰๋ ฌ์˜ ์›์†Œ๋งˆ๋‹ค ์ „๊ฐœํ•ด๋ณด๋ฉด ์Šค์นผ๋ผ ๊ฐ’์„ ์‚ฌ์šฉํ•œ ์ง€๊ธˆ๊นŒ์ง€์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์™€ ๊ฐ™์€ ์ˆœ์„œ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • WT์˜ T๋Š” ์ „์น˜ํ–‰๋ ฌ์„ ๋œปํ•˜๋ฉฐ, W์˜ (i, j) ์œ„์น˜์˜ ์›์†Œ๋ฅผ (j,i) ์œ„์น˜๋กœ ๋ด๊พผ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค.

Affine ๊ณ„์ธต์˜ ์—ญ์ „ํŒŒ, ๋ณ€์ˆ˜๊ฐ€ ๋‹ค์ฐจ์› ๋ฐฐ์—ด์ž„์— ์ฃผ์˜.

  • ๊ณ„์‚ฐ๊ทธ๋ž˜ํ”„์—์„œ ๊ฐ ๋ณ€์ˆ˜์˜ ํ˜•์ƒ์— ์ฃผ์˜ํ•ด์„œ ์‚ดํŽด ๋ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ํŠนํžˆ X์™€ ฮดL/ฮดX์€ ๊ฐ™์€ ํ˜•์ƒ์ด๊ณ , W์™€ ฮดL/ฮดW๋„ ๊ฐ™์€ ํ˜•์ƒ์ด๋ผ๋Š” ๊ฒƒ์„ ๊ธฐ์–ตํ•˜์„ธ์š”.

  • ๊ทผ๋ฐ, ์—ฌ๊ธฐ์„œ ์˜๋ฌธ์ด ๋“œ๋Š”๊ฒŒ ์žˆ์Šต๋‹ˆ๋‹ค. ์™œ ํ–‰๋ ฌ์˜ ํ˜•์ƒ์— ์ฃผ์˜๋ฅผ ํ•ด์•ผ ํ• ๊นŒ์š”?
  • ํ–‰๋ ฌ์˜ ๊ณฑ์—์„œ ๋Œ€์‘ํ•˜๋Š” ์ฐจ์›์˜ ์›์†Œ ์ˆ˜๋ฅผ ์ผ์น˜์‹œ์ผœ์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

ํ–‰๋ ฌ ๊ณฑ('dot' ๋…ธ๋“œ)์˜ ์—ญ์ „ํŒŒ๋Š” ํ–‰๋ ฌ์— ๋Œ€์‘ํ•˜๋Š” ์ฐจ์›์˜ ์›์†Œ์ˆ˜๊ฐ€ ์ผ์น˜ํ•˜๋„๋ก ๊ณฑ์„ ์กฐ๋ฆฝํ•˜์—ฌ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.


๋ฐฐ์น˜์šฉ Affine ๊ณ„์ธต

  • Affine ๊ณ„์ธต์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ X ํ•˜๋‚˜๋งŒ์„ ๊ณ ๋ คํ•œ ๊ฒƒ์ด์˜€์Šต๋‹ˆ๋‹ค.
  • ์ด๋ฒˆ ์ ˆ์—์„œ๋Š” ๋ฐ์ดํ„ฐ N๊ฐœ๋ฅผ ๋ฌถ์–ด Forward Propagation(์ˆœ์ „ํŒŒ)ํ•˜๋Š” ๊ฒฝ์šฐ, ์ฆ‰, ๋ฐฐ์น˜์šฉ Affine ๊ณ„์ธต์„ ์ƒ๊ฐํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ๋ฌถ์€ ๋ฐ์ดํ„ฐ๋ฅผ '๋ฐฐ์น˜'๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

๋ฐฐ์น˜์šฉ Affine ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„

  • ๊ธฐ์กด๊ณผ ๋‹ค๋ฅธ ๋ถ€๋ถ„์€ input(์ž…๋ ฅ)์ธ X์˜ ํ˜•์ƒ์ด (N, 2)๊ฐ€ ๋œ ๊ฒƒ ๋ฟ์ž…๋‹ˆ๋‹ค. ๊ทธ ๋’ค๋กœ๋Š” ์ง€๊ธˆ๊นŒ์ง€์™€ ๊ฐ™์ด ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ์ˆœ์„œ๋ฅผ ๋”ฐ๋ผ ์ˆœ์ˆœํžˆ ํ–‰๋ ฌ ๊ณ„์‚ฐ์„ ํ•ฉ๋‹ˆ๋‹ค.
  • ๋˜ํ•œ Back propagation(์—ญ์ „ํŒŒ) ๋•Œ๋Š” ํ–‰๋ ฌ์˜ ํ˜•์ƒ์— ์ฃผ์˜ํ•˜๋ฉด ฮดL/ฮดX๊ณผ ฮดL/ฮดW์€ ์ด์ „๊ณผ ๊ฐ™์ด ๋„์ถœํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํŽธํ•ญ์„ ๋”ํ• ๋•Œ๋„ ์ฃผ์˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. Forward Propagation(์ˆœ์ „ํŒŒ)์˜ Bias(ํŽธํ–ฅ) ๋ง์…ˆ์€ X, W์— ๋Œ€ํ•œ Bias(ํŽธํ–ฅ)์ด ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ์— ๋”ํ•ด์ง‘๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ Back propagation(์—ญ์ „ํŒŒ) ๋•Œ๋Š” ๊ฐ ๋ฐ์ดํ„ฐ์˜ Back propagation(์—ญ์ „ํŒŒ)๊ฐ’์ด Bias(ํŽธํ–ฅ)์˜ ์›์†Œ์— ๋ชจ์—ฌ์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋ ‡๊ฒŒ Affine ๊ณ„์ธต์€ ์ด๋ ‡๊ฒŒ ๊ตฌํ˜„ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
class Affine:
    def __init__(self, W, b):
        self.W = W  # ๊ฐ€์ค‘์น˜(weight)๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
        self.b = b  # ํŽธํ–ฅ(bias)์„ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
        self.x = None  # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
        self.dW = None  # ๊ฐ€์ค‘์น˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.
        self.db = None  # ํŽธํ–ฅ์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜๋ฅผ ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค.

    def forward(self, x):
        self.x = x  # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.
        out = np.dot(x, self.W) + self.b  # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์™€ ๊ฐ€์ค‘์น˜์˜ ๋‚ด์ ์„ ๊ณ„์‚ฐํ•˜๊ณ  ํŽธํ–ฅ์„ ๋”ํ•ฉ๋‹ˆ๋‹ค.
        return out  # ๊ณ„์‚ฐ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

    def backward(self, dout):
        dx = np.dot(dout, self.W.T)  # ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„๊ฐ’์— ๊ฐ€์ค‘์น˜์˜ ์ „์น˜๋ฅผ ๊ณฑํ•˜์—ฌ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. (์˜คํƒ€ ์ˆ˜์ •: np.dout -> np.dot, xelf.W.T -> self.W.T)
        self.dW = np.dot(self.x.T, dout)  # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ์ „์น˜์™€ ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณฑํ•˜์—ฌ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
        self.db = np.sum(dout, axis=0)  # ์ƒ๋ฅ˜์—์„œ ๋„˜์–ด์˜จ ๋ฏธ๋ถ„๊ฐ’์„ ์ถ•(axis) 0์„ ๋”ฐ๋ผ ํ•ฉ์‚ฐํ•˜์—ฌ ํŽธํ–ฅ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
        return dx  # ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

Softmax-with-Loss ๊ณ„์ธต

์ถœ๋ ฅ์ธต์—์„œ ์‚ฌ์šฉํ•˜๋Š” Softmax ํ•จ์ˆ˜์— ๊ด€ํ•ด ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
  • Softmax ํ•จ์ˆ˜๋Š” ์ž…๋ ฅ ๊ฐ’์„ ์ •๊ทœํ™”ํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด์„œ Mnist ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•œ ์†๊ธ€์”จ ์ˆซ์ž ์ธ์‹์—์„œ์˜ Softmax ๊ณ„์ธต์˜ ์ถœ๋ ฅ์€ ์•„๋ž˜์˜ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ ๋ฉ๋‹ˆ๋‹ค.
    • Input Image๊ฐ€ Affine ๊ณ„์ธต, ReLU ๊ณ„์ธต์„ ํ†ต๊ณผํ•˜์—ฌ ๋ณ€ํ™˜๋˜๊ณ , ๋งˆ์ง€๋ง‰ Softmax ๊ณ„์ธต์— ์˜ํ•ด์„œ 10๊ฐœ์˜ ์ž…๋ ฅ์ด ์ •๊ทœํ™” ๋ฉ๋‹ˆ๋‹ค.
    • ์ด ๊ทธ๋ฆผ์—์„œ๋Š” ์ˆซ์ž '0'์˜ ์ ์ˆ˜๋Š” 5.3์ด๋ฉฐ, ์ด๊ฒƒ์ด Softmax ๊ณ„์ธต์— ์˜ํ•ด์„œ 0.008(0.8%)๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.
    • ๋˜ํ•œ '2'์˜ ์ •์ˆ˜๋Š” 10.1์—์„œ 0.991(99.1%)๋กœ ๋ณ€ํ™˜๋ฉ๋‹ˆ๋‹ค.

  • Softmax ๊ณ„์ธต์€ ์ž…๋ ฅ ๊ฐ’์„ ์ •๊ทœํ™”(์ถœ๋ ฅ์˜ ํ•ฉ์ด 1์ด ๋˜๋กœ๋ก ๋ณ€ํ˜•)ํ•˜์—ฌ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ์†๊ธ€์”จ ์ˆซ์ž๋Š” 10๊ฐœ(Class 10๊ฐœ๋กœ ๋ถ„๋ฅ˜)์ด๋ฏ€๋กœ Softmax ๊ณ„์ธต์˜ ์ž…๋ ฅ์€ 10๊ฐœ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
์‹ ๊ฒฝ๋ง์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” ์ž‘์—…์€ ํ•™์Šต, ์ถ”๋ก  2๊ฐ€์ง€ ์ธ๋ฐ, ์ถ”๋ก ํ• ๋•Œ์˜ ์ผ๋ฐ˜์ ์œผ๋กœ Softmax ๊ณ„์ธต์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
์‹ ๊ฒฝ๋ง์€ ์ถ”๋ก ํ• ๋•Œ ๋งˆ์ง€๋ง‰ Affine ๊ณ„์ธต์˜ ์ถœ๋ ฅ์„ ์ธ์‹ ๊ฒฐ๊ณผ๋กœ ์ด์šฉํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋ฆฌ๊ณ  ์‹ ๊ฒฝ๋ง์—์„œ ์ •๊ทœํ™” ํ•˜์ง€ ์•Š๋Š” ์ถœ๋ ฅ ๊ฒฐ๊ณผ๋ฅผ Score(์ ์ˆ˜)๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
์ฆ‰, ์‹ ๊ฒฝ๋ง ์ถ”๋ก ์—์„œ ๋‹ต์„ ํ•˜๋‚˜๋งŒ ๋‚ด๋Š” ๊ฒฝ์šฐ์—๋Š” ๊ฐ€์žฅ ๋†’์€ Score(์ ์ˆ˜)๋งŒ ์•Œ๋ฉด ๋˜๋‹ˆ, Softmax ๊ณ„์ธต์ด ํ•„์š” ์—†์Šต๋‹ˆ๋‹ค.
๋‹ค๋งŒ, ์‹ ๊ฒฝ๋ง์„ ํ•™์Šตํ• ๋•Œ์—๋Š” Softmax ๊ณ„์ธต์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ์ด์ œ Softmax ๊ณ„์ธต์„ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. Loss Function(์†์‹ค ํ•จ์ˆ˜)์ธ Cross-Entropy Error(๊ต์ฒด ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ)๋„ ํฌํ•จํ•˜์—ฌ, 'Softmax-with-Loss ๊ณ„์ธต'์ด๋ผ๋Š” ์ด๋ฆ„์œผ๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

Softmax-with-Loss ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„

  • ๋ณด์‹œ๋‹ค ์‹œํ”ผ, Softmax-with-Loss ๊ณ„์ธต์€ ๋ณต์žกํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ๊ฒฐ๊ณผ๋งŒ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

๊ฐ„์†Œํ™”๋œ Softmax-with-Loss ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„

  • ์œ„์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์—์„œ ์†Œํ”„ํŠธ๋งฅ์ˆ˜ ํ•จ์ˆ˜๋Š” 'Softmax" ๊ณ„์ธต์œผ๋กœ, Cross-Entropy Error'๊ณ„์ธต์œผ๋กœ ํ‘œ๊ธฐํ–ˆ์Šต๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ 3๊ฐœ์˜ ํด๋ž˜์Šค ๋ถ„๋ฅ˜๋ฅผ ๊ฐ€์ •ํ•˜๊ณ  ์ด์ „ Layer(๊ณ„์ธต)์—์„œ 3๊ฐœ์˜ ์ž…๋ ฅ(Score)๋ฅผ ๋ฐ›์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด Softmax ๊ณ„์ธต์€ ์ž…๋ ฅ (a1, a2, a3)๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ (y1, y2, y3)๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • Cross-Entropy Error ๊ณ„์ธต์€ Softmax ๊ณ„์ธต์˜ ์ถœ๋ ฅ (y1, y2, y3)์™€ ์ •๋‹ต ๋ ˆ์ด๋ธ” (t1, t2, t3)๋ฅผ ๋ฐ›๊ณ , ์ด ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ Loss(์†์‹ค) L์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฐ„์†Œํ™”๋œ Softmax-with-Loss ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์—์„œ ์ฃผ๋ชฉํ• ๊ฑด Back propagation(์—ญ์ „ํŒŒ)์˜ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.
  • Softmax ๊ณ„์ธต์˜ Back propagation(์—ญ์ „ํŒŒ)๋Š” (y1 - t1, y2- t2, y3 - t3)๋ผ๋Š” '๋ง๋”ํ•œ' ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋†“๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
  • (y1 ~ y3)๋Š” Softmax ๊ณ„์ธต์˜ ์ถœ๋ ฅ์ด๊ณ , (t1 ~ t3)๋Š” ์ •๋‹ต ๋ ˆ์ด๋ธ” ์ด๋ฏ€๋กœ, (y1 - t1, y2- t2, y3 - t3)๋Š” Softmax ๊ณ„์ธต์˜ ์ถœ๋ ฅ๊ณผ ์ •๋‹ต ๋ ˆ์ด๋ธ”์˜ ์ฐจ๋ถ„์ธ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์‹ ๊ฒฝ๋ง์˜ Back propagation(์—ญ์ „ํŒŒ)๋Š” ์ด ์ฐจ์ด์ธ ์˜ค์ฐจ๊ฐ€ ์•ž ๊ณ„์ธต์— ์ „ํ•ด์ง€๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ์ด๊ฒƒ์ด ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์ค‘์š”ํ•œ ์„ฑ์งˆ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด Softmax-with-Loss ๊ณ„์ธต์„ ๊ตฌํ˜„ํ•œ ์ฝ”๋“œ๋ฅผ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Softmax-with-Loss Example Code (by Python)

class SoftmaxWithLoss:
    def __init__(self):
        self.loss = None # ์†์‹ค
        self.y = None # softmax์˜ ์ถœ๋ ฅ
        self.t = None # ์ •๋‹ต ๋ ˆ์ด๋ธ”(์›-ํ•ซ ๋ฒกํ„ฐ)
        
    def forward(self, x, t):
        self.t = t
        self.y = softmax(x)  # ์ž…๋ ฅ x์— ๋Œ€ํ•ด softmax ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
        self.loss = cross_entropy_error(self.y, self.t
        # softmax์˜ ์ถœ๋ ฅ๊ณผ ์ •๋‹ต ๋ ˆ์ด๋ธ”์„ ์ด์šฉํ•ด ํฌ๋กœ์Šค ์—”ํŠธ๋กœํ”ผ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค.
        return self.loss  # ๊ณ„์‚ฐ๋œ ์†์‹ค์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        
    def backward(self, dout=1):
        batch_size = self.t.shape[0]  # ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
        dx = (self.y - self.t) / batch_size
        # ์†์‹ค ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ dout=1์€ ์†์‹ค ํ•จ์ˆ˜์˜ ๋ฏธ๋ถ„๊ฐ’์ด 1์ด๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
        return dx  # ์ž…๋ ฅ๊ฐ’์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ฃผ์˜ํ•ด์•ผ ํ•˜๋Š”์ ์€ Back propagation(์—ญ์ „ํŒŒ) ๋•Œ๋Š” ์ „ํŒŒํ•˜๋Š” ๊ฐ’์„ Batch_size๋กœ ๋‚˜๋ˆ ์„œ ๋ฐ์ดํ„ฐ 1๊ฐœ๋‹น ์˜ค์ฐจ๋ฅผ ์•ž Layer(๊ณ„์ธต)์œผ๋กœ ์ „ํŒŒํ•ฉ๋‹ˆ๋‹ค.

Backprogagation(์˜ค์ฐจ์—ญ์ „ํŒŒ)๋ฒ• ๊ตฌํ˜„ํ•˜๊ธฐ

์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์ „์ฒด์ ์ธ Flow

๋‹ค์‹œ ํ•œ๋ฒˆ ์‹ ๊ฒฝ๋ง ํ•™์Šต์˜ ์ˆœ์„œ๋ฅผ ํ•œ๋ฒˆ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์ „์ œ

  • ์‹ ๊ฒฝ๋ง์—๋Š” ์ ์‘ ๊ฐ€๋Šฅํ•œ Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์ด ์žˆ๊ณ , ์ด Weight(๊ฐ€์ค‘์น˜)์™€ Bias(ํŽธํ–ฅ)์„ Training Data(ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ)์— ์ ์‘ํ•˜๋„๋ก ์กฐ์ •ํ•˜๋Š” ๊ณผ์ •์„ Training(ํ•™์Šต)์ด๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Neural Network Training(์‹ ๊ฒฝ๋ง ํ•™์Šต)์€ 4๋‹จ๊ณ„๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1๋‹จ๊ณ„ - Mini-Batch

  • ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ ์ค‘ ์ผ๋ถ€๋ฅผ ๋ฌด์ž‘์œ„๋กœ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ์„ ๋ณ„ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ Mini-Batch(๋ฏธ๋‹ˆ๋ฐฐ์น˜) ๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  ๊ทธ Mini-Batch(๋ฏธ๋‹ˆ๋ฐฐ์น˜)์˜ Loss Function Value(์†์‹ค ํ•จ์ˆ˜ ๊ฐ’)์„ ์ค„์ด๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ž…๋‹ˆ๋‹ค.

2๋‹จ๊ณ„ - Gradient(๊ธฐ์šธ๊ธฐ) ์‚ฐ์ถœ

  • Mini-Batch์˜ Loss Function ๊ฐ’์„ ์ค„์ด๊ธฐ ์œ„ํ•ด์„œ ๊ฐ Weight Paraemter(๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜)์˜ Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
  • Gradient(๊ธฐ์šธ๊ธฐ)๋Š” Loss Function Value(์†์‹ค ํ•จ์ˆ˜ ๊ฐ’)์„ ๊ฐ€์žฅ ์ž‘๊ฒŒ ํ•˜๋Š” ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.

3๋‹จ๊ณ„ - Parameter(๋งค๊ฐœ๋ณ€์ˆ˜) ๊ฐฑ์‹ 

  • Weight Paraemter(๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜)๋ฅผ Gradient(๊ธฐ์šธ๊ธฐ) ๋ฐฉํ–ฅ์œผ๋กœ ์•„์ฃผ ์กฐ๊ธˆ ๊ฐฑ์‹ ํ•ฉ๋‹ˆ๋‹ค.

4๋‹จ๊ณ„ - ๋ฐ˜๋ณต

  • 1~3๋‹จ๊ณ„๋ฅผ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

  • ์—ฌ๊ธฐ์„œ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์ด ๋“ฑ์žฅํ•˜๋Š” ๋‹จ๊ณ„๋Š” 2๋‹จ๊ณ„์ธ 'Gradient(๊ธฐ์šธ๊ธฐ)์‚ฐ์ถœ' ์ž…๋‹ˆ๋‹ค.
  • ์•ž์—์„œ๋Š” ์ด Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ˆ˜์น˜ ๋ฏธ๋ถ„์„ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์„ ์ด์šฉํ•˜๋ฉด Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ํšจ์œจ์ , ๋น ๋ฅด๊ฒŒ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์„ ์ด์šฉํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌํ˜„ํ•˜๊ธฐ

์—ฌ๊ธฐ์„œ 2์ธต ์‹ ๊ฒฝ๋ง์€ TwoLayerNet ํด๋ž˜์Šค๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ํด๋ž˜์Šค & ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜ ๋ฐ Method๋ฅผ ์ •์˜ํ•œ ํ‘œ๋“ค์„ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

TwoLayerNet ํด๋ž˜์Šค & ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜
TwoLayerNet ํด๋ž˜์Šค & Method

  • ์•Œ์•„์•ผ ํ• ์ ์€, Layer(๊ณ„์ธต)์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
  • Layer(๊ณ„์ธต)์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์ธ์‹ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š” ์ฒ˜๋ฆฌ(predict())์™€ Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•˜๋Š” ์ฒ˜๋ฆฌ (gradient()) ๊ณ„์ธต์˜ ์ „ํŒŒ๋งŒ์œผ๋กœ ๋™์ž‘์ด ์ด๋ฃจ์–ด ์ง‘๋‹ˆ๋‹ค.
# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # ๋ถ€๋ชจ ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ์„ค์ •
import numpy as np
from common.layers import *
from common.gradient import numerical_gradient
from collections import OrderedDict


class TwoLayerNet:

    def __init__(self, input_size, hidden_size, output_size, weight_init_std = 0.01):
        # ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”
        self.params = {}
        self.params['W1'] = weight_init_std * np.random.randn(input_size, hidden_size)
        self.params['b1'] = np.zeros(hidden_size)
        self.params['W2'] = weight_init_std * np.random.randn(hidden_size, output_size) 
        self.params['b2'] = np.zeros(output_size)

        # ๊ณ„์ธต ์ƒ์„ฑ
        self.layers = OrderedDict()
        self.layers['Affine1'] = Affine(self.params['W1'], self.params['b1'])
        self.layers['Relu1'] = Relu()
        self.layers['Affine2'] = Affine(self.params['W2'], self.params['b2'])

        self.lastLayer = SoftmaxWithLoss()
        
    def predict(self, x):
        for layer in self.layers.values():
            x = layer.forward(x)
        
        return x
        
    # x : ์ž…๋ ฅ ๋ฐ์ดํ„ฐ, t : ์ •๋‹ต ๋ ˆ์ด๋ธ”
    def loss(self, x, t):
        y = self.predict(x)
        return self.lastLayer.forward(y, t)
    
    def accuracy(self, x, t):
        y = self.predict(x)
        y = np.argmax(y, axis=1)
        if t.ndim != 1 : t = np.argmax(t, axis=1)
        
        accuracy = np.sum(y == t) / float(x.shape[0])
        return accuracy
        
    # x : ์ž…๋ ฅ ๋ฐ์ดํ„ฐ, t : ์ •๋‹ต ๋ ˆ์ด๋ธ”
    def numerical_gradient(self, x, t):
        loss_W = lambda W: self.loss(x, t)
        
        grads = {}
        grads['W1'] = numerical_gradient(loss_W, self.params['W1'])
        grads['b1'] = numerical_gradient(loss_W, self.params['b1'])
        grads['W2'] = numerical_gradient(loss_W, self.params['W2'])
        grads['b2'] = numerical_gradient(loss_W, self.params['b2'])
        
        return grads
        
    def gradient(self, x, t):
        # forward
        self.loss(x, t)

        # backward
        dout = 1
        dout = self.lastLayer.backward(dout)
        
        layers = list(self.layers.values())
        layers.reverse()
        for layer in layers:
            dout = layer.backward(dout)

        # ๊ฒฐ๊ณผ ์ €์žฅ
        grads = {}
        grads['W1'], grads['b1'] = self.layers['Affine1'].dW, self.layers['Affine1'].db
        grads['W2'], grads['b2'] = self.layers['Affine2'].dW, self.layers['Affine2'].db

        return grads
  • ์ธ์ˆ˜๋Š” ์ฐจ๋ก€๋Œ€๋กœ ์ž…๋ ฅ์ธต ๋‰ด๋Ÿฐ ์ˆ˜, ์€๋‹‰์ธต ๋‰ด๋Ÿฐ ์ˆ˜, ์ถœ๋ ฅ์ธต ๋‰ด๋Ÿฐ ์ˆ˜, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ์‹œ ์ •๊ทœ๋ถ„ํฌ์˜ ์Šค์ผ€์ผ์ž…๋‹ˆ๋‹ค.
  • OrderedDict๋Š” ์ˆœ์„œ๊ฐ€ ์žˆ๋Š” ๋”•์…”๋„ˆ๋Ÿฌ ์ž…๋‹ˆ๋‹ค. '์ˆœ์„œ๊ฐ€ ์žˆ๋Š”' ๋”•์…”๋„ˆ๋ฆฌ์— ์ถ”๊ฐ€ํ•œ ์ˆœ์„œ๋ฅผ ๊ธฐ์–ตํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ Forward Propagation(์ˆœ์ „ํŒŒ) ๋•Œ๋Š” ์ถ”๊ฐ€ํ•œ ์ˆœ์„œ๋Œ€๋กœ ๊ฐ Layer(๊ณ„์ธต)์˜ forward() Method๋ฅผ ํ˜ธ์ถœํ•˜๋ฉด ์ฒ˜๋ฆฌ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • Back Propagation(์—ญ์ „ํŒŒ)๋•Œ์—๋Š” Layer(๊ณ„์ธต)์„ ๋ฐ˜๋Œ€ ์ˆœ์„œ๋กœ ํ˜ธ์ถœ ํ•˜๊ธฐ๋งŒ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.
  • Affine, ReLU ๊ณ„์ธต์ด ๊ฐ์ž์˜ ๋‚ด๋ถ€์—์„œ Forward Propagation(์ˆœ์ „ํŒŒ), Back Propagation(์—ญ์ „ํŒŒ)๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ณ  ์žˆ์œผ๋‹ˆ๊นŒ, ๊ทธ๋ƒฅ ๊ณ„์ธต์„ ์˜ฌ๋ฐ”๋ฅธ ์ˆœ์„œ๋กœ ์—ฐ๊ฒฐํ•œ ํ›„ ํ˜ธ์ถœํ•ด์ฃผ๋ฉด ๋์ž…๋‹ˆ๋‹ค.

Gradient(๊ธฐ์šธ๊ธฐ) ๊ฒ€์ฆํ•˜๊ธฐ

  • ํฌ๊ฒŒ 2๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์ˆ˜์น˜ ๋ฏธ๋ถ„์„ ์จ์„œ Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•, ํ•ด์„์ ์œผ๋กœ ์ˆ˜์‹์„ ํ’€์–ด์„œ Gradient(๊ธฐ์šธ๊ธฐ)๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ• 2๊ฐ€์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
    • ์ˆ˜์‹์„ ํ’€์–ด์„œ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์„ ์‚ฌ์šฉํ•ด์„œ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋งŽ์•„๋„ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ ์ˆ˜์น˜ ๋ฏธ๋ถ„์€ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์˜ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•˜์—ฌ ์ œ๋Œ€๋กœ ๊ตฌํ˜„ํ–ˆ๋Š”์ง€ ๊ฒ€์ฆํ•˜๋Š” ๊ณผ์ •์„ ๊ฑฐ์นฉ๋‹ˆ๋‹ค.
  • ์ด ๊ฒ€์ฆํ•˜๋Š” ์ž‘์—…์„ '๊ธฐ์šธ๊ธฐ ํ™•์ธ(Gradient Check)'๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
# coding: utf-8
import sys, os
sys.path.append(os.pardir)  # ๋ถ€๋ชจ ๋””๋ ‰ํ„ฐ๋ฆฌ์˜ ํŒŒ์ผ์„ ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋„๋ก ์„ค์ •
import numpy as np
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

x_batch = x_train[:3]
t_batch = t_train[:3]

grad_numerical = network.numerical_gradient(x_batch, t_batch)
grad_backprop = network.gradient(x_batch, t_batch)

# ๊ฐ ๊ฐ€์ค‘์น˜์˜ ์ฐจ์ด์˜ ์ ˆ๋Œ“๊ฐ’์„ ๊ตฌํ•œ ํ›„, ๊ทธ ์ ˆ๋Œ“๊ฐ’๋“ค์˜ ํ‰๊ท ์„ ๋‚ธ๋‹ค.
for key in grad_numerical.keys():
    diff = np.average( np.abs(grad_backprop[key] - grad_numerical[key]) )
    print(key + ":" + str(diff))
# Result

        W2:9.71260696544e-13
        b2:1.20570232964e-10
        W1:2.86152966578e-13
        b1:1.19419626098e-12
        ์ˆ˜์น˜ ๋ฏธ๋ถ„๊ณผ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์œผ๋กœ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ์˜ ์ฐจ์ด๊ฐ€ ๋งค์šฐ ์ž‘๋‹ค.
        ์‹ค์ˆ˜ ์—†์ด ๊ตฌํ˜„๋˜์—ˆ์„ ํ™•๋ฅ ์ด ๋†’๋‹ค.
        ์ •๋ฐ€๋„๊ฐ€ ์œ ํ•œํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜ค์ฐจ๊ฐ€ 0์ด ๋˜์ง€๋Š” ์•Š๋Š”๋‹ค.

์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์„ ์ด์šฉํ•œ ํ•™์Šต ๊ตฌํ˜„ํ•˜๊ธฐ

์ง€๊ธˆ๊นŒ์ง€์™€ ๋‹ค๋ฅธ ๋ถ€๋ถ„์€ Grdient(๊ธฐ์šธ๊ธฐ)๋ฅผ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์œผ๋กœ ๊ตฌํ•œ๋‹ค๋Š” ์  ๋ฟ์ž…๋‹ˆ๋‹ค.
# coding: utf-8
import sys, os
sys.path.append(os.pardir)

import numpy as np
from dataset.mnist import load_mnist
from two_layer_net import TwoLayerNet

# ๋ฐ์ดํ„ฐ ์ฝ๊ธฐ
(x_train, t_train), (x_test, t_test) = load_mnist(normalize=True, one_hot_label=True)

network = TwoLayerNet(input_size=784, hidden_size=50, output_size=10)

# hyperparameter
iters_num = 10000 # ๋ฐ˜๋ณตํšŸ์ˆ˜
train_size = x_train.shape[0]
batch_size = 100 # ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ํฌ๊ธฐ
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

# 1 epoch๋‹น ๋ฐ˜๋ณตํ•˜๋Š” ํšŸ์ˆ˜
iter_per_epoch = max(train_size / batch_size, 1)

for i in range(iters_num):
	# ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ํš๋“
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    # ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์œผ๋กœ ๊ธฐ์šธ๊ธฐ ๊ณ„์‚ฐ (๋ณ€๊ฒฝํ•œ ๋ถ€๋ถ„)
    #grad = network.numerical_gradient(x_batch, t_batch) # ์ˆ˜์น˜ ๋ฏธ๋ถ„ ๋ฐฉ์‹
    grad = network.gradient(x_batch, t_batch) # ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ• ๋ฐฉ์‹(ํ›จ์”ฌ ๋น ๋ฅด๋‹ค)
    
    # ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐฑ์‹ 
    for key in ('W1', 'b1', 'W2', 'b2'):
        network.params[key] -= learning_rate * grad[key]
    
    # ํ•™์Šต ๊ฒฝ๊ณผ ๊ธฐ๋ก
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)
    
    # 1epoch ๋‹น accuray ๊ณ„์‚ฐ
    if i % iter_per_epoch == 0:
        train_acc = network.accuracy(x_train, t_train)
        test_acc = network.accuracy(x_test, t_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print(train_acc, test_acc)
# Result

"""
train acc, test acc | 0.0992833333333, 0.1032
train acc, test acc | 0.898, 0.9026
train acc, test acc | 0.92135, 0.9216
train acc, test acc | 0.936016666667, 0.9337
train acc, test acc | 0.945316666667, 0.9431
train acc, test acc | 0.94675, 0.9427
train acc, test acc | 0.954766666667, 0.9521
train acc, test acc | 0.9602, 0.9551
train acc, test acc | 0.9634, 0.9581
train acc, test acc | 0.9656, 0.9597
train acc, test acc | 0.9683, 0.9615
train acc, test acc | 0.970516666667, 0.9629
train acc, test acc | 0.97305, 0.9649
train acc, test acc | 0.9731, 0.9661
train acc, test acc | 0.975916666667, 0.9659
train acc, test acc | 0.976383333333, 0.9666
train acc, test acc | 0.977916666667, 0.969
[Finished in 45.5s]
"""

Summary

- ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„๋ฅผ ์ด์šฉํ•˜๋ฉด ๊ณ„์‚ฐ ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ๋…ธ๋“œ๋Š” ๊ตญ์†Œ์  ๊ณ„์‚ฐ์œผ๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ๊ตญ์†Œ์  ๊ณ„์‚ฐ์„ ์กฐํ•ฉํ•ด ์ „์ฒด ๊ณ„์‚ฐ์„ ๊ตฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.
- ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ์ˆœ์ „ํŒŒ๋Š” ํ†ต์ƒ์˜ ๊ณ„์‚ฐ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ํ•œํŽธ, ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„์˜ ์—ญ์ „ํŒŒ๋กœ๋Š” ๊ฐ ๋…ธ๋“œ์˜ ๋ฏธ๋ถ„์„ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ์‹ ๊ฒฝ๋ง์˜ ๊ตฌ์„ฑ ์š”์†Œ๋ฅผ ๊ณ„์ธต์œผ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ๊ธฐ์šธ๊ธฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•).
- ์ˆ˜์น˜ ๋ฏธ๋ถ„๊ณผ ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์˜ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•˜๋ฉด ์˜ค์ฐจ์—ญ์ „ํŒŒ๋ฒ•์˜ ๊ตฌํ˜„์— ์ž˜๋ชป์ด ์—†๋Š”์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(๊ธฐ์šธ๊ธฐ ํ™•์ธ).