A A
[ํ˜ผ๊ณต๋จธ์‹ ] Linear Regression

K-์ตœ๊ทผ์ ‘ ์ด์›ƒ์˜ ํ•œ๊ณ„

K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋Š” ๋งŒ์•ฝ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์˜ ๊ฐ’์ด Training_set์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฉด ์—‰๋šฑํ•œ ๊ฐ’์„ ์˜ˆ์ธกํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•˜์—ฌ ์ „์— ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ๋ž‘ ๋ชจ๋ธ์„ ์ค€๋น„ํ•ด์„œ ํ•œ๋ฒˆ ๋Œ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
import numpy as np

perch_length = np.array([8.4, 13.7, 15.0, 16.2, 17.4, 18.0, 18.7, 19.0, 19.6, 20.0, 21.0,
       21.0, 21.0, 21.3, 22.0, 22.0, 22.0, 22.0, 22.0, 22.5, 22.5, 22.7,
       23.0, 23.5, 24.0, 24.0, 24.6, 25.0, 25.6, 26.5, 27.3, 27.5, 27.5,
       27.5, 28.0, 28.7, 30.0, 32.8, 34.5, 35.0, 36.5, 36.0, 37.0, 37.0,
       39.0, 39.0, 39.0, 40.0, 40.0, 40.0, 40.0, 42.0, 43.0, 43.0, 43.5,
       44.0])
perch_weight = np.array([5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0, 110.0,
       115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0, 130.0,
       150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0, 197.0,
       218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0, 514.0,
       556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0, 820.0,
       850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0, 1000.0,
       1000.0])
  • ์ „์ด๋ž‘ ๋˜‘๊ฐ™์ด ๋ฐ์ดํ„ฐ๋ฅผ Training_set, Test_set๋กœ ๋‚˜๋ˆ„๊ณ , ํŠน์„ฑ ๋ฐ์ดํ„ฐ๋Š” 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ณ€ํ™˜ํ•ด์„œ ํ•™์Šต์‹œ์ผœ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# Scikit-learn ํ›ˆ๋ จ์„ธํŠธ๋Š” 2์ฐจ์› ๋ฐฐ์—ด์ด์—ฌ์•ผ ํ•จ์œผ๋กœ, Numpy์˜ reshape method๋ฅผ ์‚ฌ์šฉํ•ด์„œ 2์ฐจ์›์œผ๋กœ ๋ด๊ฟ”์คŒ
from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(perch_length, perch_weight, random_state=42)

train_input = train_input.reshape(-1, 1) # ์—ด์ด ํ•˜๋‚˜, ๋‚จ์€ ์ฐจ์›์˜ ๊ฐœ์ˆ˜๋Š” ํ•˜๋‚˜๊ฐ€ ๋œ๋‹ค.
test_input = test_input.reshape(-1, 1)
print(train_input.shape, test_input.shape)
(42, 1) (14, 1)  #(42,1) - Training_set ๋ฐฐ์—ด, #(14,1) - Test_set ๋ฐฐ์—ด
from sklearn.neighbors import KNeighborsRegressor

knr = KNeighborsRegressor(n_neighbors=3) # class ๊ฐ์ฒด ํ• ๋‹น

# k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.
knr.fit(train_input, train_target) # train, target ๋ฐ์ดํ„ฐ ์ „๋‹ฌ & ํ›ˆ๋ จ

# ํ•™์Šต์‹œํ‚จ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์„œ ๊ธธ์ด๊ฐ€ 50cm์ธ ๋†์–ด์˜ ๋ฌด๊ฒŒ ์˜ˆ์ธก
print(knr.predict([[50]]))
[1033.33333333]
  • ์ด ๋ชจ๋ธ์€ 50cm์˜ ๋†์–ด์˜ ๋ฌด๊ฒŒ๋ฅผ 1033g ์ •๋„๋กœ ์˜ˆ์ธก ํ–ˆ๋‹ค๊ณ  ํ•˜์ง€๋ฉด, ์‹ค์ œ๋กœ๋Š” ๋ฌด๊ฒŒ๊ฐ€ ๋” ๋‚˜๊ฐ„๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. 
  • ๊ทธ๋ž˜์„œ ํ•œ๋ฒˆ ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค์–ด์„œ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
import matplotlib.pyplot as plt

# 50cm ๋†์–ด์˜ ์ด์›ƒ์„ ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
distances, indexes = knr.kneighbors([[50]])

# ํ›ˆ๋ จ ์„ธํŠธ์˜ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input, train_target)

# ํ›ˆ๋ จ ์„ธํŠธ ์ค‘์—์„œ ์ด์›ƒ ์ƒ˜ํ”Œ๋งŒ ๋‹ค์‹œ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input[indexes], train_target[indexes], marker='D')

# 50cm ๋†์–ด ๋ฐ์ดํ„ฐ
plt.scatter(50, 1033, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

  • ์—ฌ๊ธฐ์„œ ๊ธธ์ด๊ฐ€ 50cm์ด๊ณ , ๋ฌด๊ฒŒ๊ฐ€ 1,033์ธ ๋†์–ด๋Š” ์„ธ๋ชจ ๋ชจ์–‘์˜ marker๋กœ ํ‘œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทผ์ฒ˜์— ์žˆ๋Š” ์ƒ˜ํ”Œ๋“ค์€ ๋‹ค์ด์•„๋ชฌ๋“œ ๋ชจ์–‘์˜ marker๋กœ ํ‘œ์‹œ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
  • ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด์„œ ์•Œ ์ˆ˜ ์žˆ๋Š”๊ฒƒ์€ ๋†์–ด์˜ ๊ธธ์ด๊ฐ€ ๋Š˜์–ด๋‚ ์ˆ˜๋ก, ๋ฌด๊ฒŒ๊ฐ€ ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ ‡์ง€๋งŒ ์—ฌ๊ธฐ์„œ ์‚ฌ์šฉํ•œ k-์ตœ๊ทผ์ ‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์˜ˆ์ธกํ•œ ์ƒ˜ํ”Œ ๊ทผ์ฒ˜ ์ƒ˜ํ”Œ๋“ค์˜ ๋ฌด๊ฒŒ๋ฅผ ํ‰๊ท ํ•˜์—ฌ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
print(np.mean(train_target[indexes]))
1033.3333333333333
print(knr.predict([[100]]))
1033.3333333333333
  • ๋ณด์ด์‹œ๋Š”๊ฒƒ๊ณผ ๊ฐ™์ด 100cm์ธ ๋†์–ด๋ฅผ ์ž…๋ ฅํ•ด๋„ ๋ฌด๊ฒŒ๋Š” ๊ฐ™๊ฒŒ ๋‚˜์˜ค๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ์ด ํ›ˆ๋ จ ์„ธํŠธ์˜ ๋ฒ”์œ„๋ฅผ ๋ฒ—์–ด๋‚˜๋ฉด ์—‰๋šฑํ•œ ๊ฐ’์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค๋Š”๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ๋” ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค์„œ ํ™•์ธํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# 100cm ๋†์–ด์˜ ์ด์›ƒ example
distances, indexes = knr.kneighbors([[100]])

# ํ›ˆ๋ จ ์„ธํŠธ์˜ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input, train_target)

# ํ›ˆ๋ จ ์„ธํŠธ ์ค‘์—์„œ ์ด์›ƒ ์ƒ˜ํ”Œ๋งŒ ๋‹ค์‹œ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input[indexes], train_target[indexes], marker='D')

# 100cm ๋†์–ด ๋ฐ์ดํ„ฐ
plt.scatter(100, 1033, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

  • ์ด๋Ÿฐ ๊ทธ๋ž˜ํ”„๋ฉด.. ๊ธธ์ด๊ฐ€ ๋” ํฐ ๋†์–ด๊ฐ€ ์™€๋„ ๋ฌด๊ฒŒ๋Š” ๋” ๋Š˜์–ด๋‚˜์ง€ ์•Š๋„๋ก ์˜ˆ์ธก์„ ํ• ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด, K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ง๊ณ  ๋‹ค๋ฅธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•ด์„œ ๋ชจ๋ธ์„ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

์„ ํ˜• ํšŒ๊ท€(Linear Regression)

์„ ํ˜•ํšŒ๊ท€(Linear Regression) ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋Œ€ํ‘œ์ ์ธ ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค.
๋น„๊ต์  ๊ฐ„๋‹จํ•˜๊ณ , ์„ฑ๋Šฅ๋„ ์ข‹๊ธฐ ๋•Œ๋ฌธ์— ๋ณดํ†ต ์ฒ˜์Œ ์ ‘ํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ค‘ ํ•˜๋‚˜์ž…๋‹ˆ๋‹ค.
  • ํŠน์ง•์€ ํŠน์„ฑ์ด ํ•˜๋‚˜์ธ๊ฒฝ์šฐ ์–ด๋–ค ์ง์„ ์„ ํ•™์Šตํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž…๋‹ˆ๋‹ค. ํ•œ๋ฒˆ Scikit-learn ์„ ์ด์šฉํ•ด์„œ ๊ตฌํ˜„ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • Scikit-learn์€ sklearn.linear_model ํŒจํ‚ค์ง€ ์•„๋ž˜์— Linear Regression ํด๋ž˜์Šค๋กœ ์„ ํ˜•ํšŒ๊ท€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•ด ๋†“์•˜์Šต๋‹ˆ๋‹ค.
  • Scikit-learn์˜ ๋ชจ๋ธ Class๋“ค์€ ํ›ˆ๋ จ, ํ‰๊ฐ€, ์˜ˆ์ธกํ•˜๋Š” Method ์ด๋ฆ„์ด ๋ชจ๋‘ ๋™์ผํ•ฉ๋‹ˆ๋‹ค. 
  • ์ฆ‰, Linear Regression ํด๋ž˜์Šค์—๋„ fit(), score(), predict() Method๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
from sklearn.linear_model import LinearRegression

lr = LinearRegression()
# ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ ํ›ˆ๋ จ
lr.fit(train_input, train_target)

# 50cm ๋†์–ด์— ๋Œ€ํ•œ ์˜ˆ์ธก
print(lr.predict([[50]]))
[1241.83860323]
  • K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€๋ฅผ ์‚ฌ์šฉํ–ˆ์„๋•Œ๋ณด๋‹ค 50cm ๋†์–ด์˜ ๋ฌด๊ฒŒ๋ฅผ ๋” ๋†’์ด ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. ์™œ ์ด๋ ‡๊ฒŒ ๋‚˜์™”๋Š”์ง€ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ํ•˜๋‚˜์˜ ์ง์„ ์„ ๊ทธ๋ฆด๋ ค๋ฉด, ๊ธฐ์šธ๊ธฐ์™€ ์ ˆํŽธ์ด ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. y = a * x + b ์ฒ˜๋Ÿผ ์“ธ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ์—ฌ๊ธฐ์„œ๋Š” x๋Š” ๋†์–ด์˜ ๊ธธ์ด, y๋ฅผ ๋†์–ด์˜ ๋ฌด๊ฒŒ๋กœ ๋ด๊ฟ”์„œ ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค.
y = a * x + b, ๋†์–ด๋ฌด๊ฒŒ(y) = a(๊ธฐ์šธ๊ธฐ) * ๋†์–ด๊ธธ์ด(x) + b(y์˜ ์ ˆํŽธ, y์˜ ์ถ•๊ณผ ๋งŒ๋‚˜๋Š” ๊ฐ’)

  • ๊ทธ๋Ÿฌ๋ฉด, ๊ณผ์—ฐ ๋ฐ์ดํ„ฐ์— ์ž˜๋งž๋Š” a์™€ b๊ฐ€ ๋ญ˜๊นŒ์š”? Linear Regression ํด๋ž˜์Šค๊ฐ€ ์ฐพ์€ a์™€ b๋Š” lr ๊ฐ์ฒด์˜ coef_ ์™€ intercept_ ์†์„ฑ์— ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
#scikit-learn ๋ชจ๋ธ๋“ค์€ ๋ฐ์ดํ„ฐ์—์„œ ํ•™์Šตํ•œ ๊ฐ’๋“ค์„ ์ €์žฅํ• ๋•Œ, ๋‹ค๋ฅธ ์†์„ฑ๊ณผ ๊ตฌ๋ถ„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ '_' ์ถ”๊ฐ€
print(lr.coef_, lr.intercept_) # ๋ชจ๋ธ parameter๋“ค
[39.01714496] -709.0186449535474
  • ํ•œ๋ฒˆ ๋†์–ด์˜ ๊ธธ์ด์ธ 15cm๋ถ€ํ„ฐ 50cm ๊นŒ์ง€ ์ง์„ ์œผ๋กœ ๊ทธ๋ ค๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. 
  • ์ง์„ ์„ ๊ทธ๋ ค๋ณด๋ ค๋ฉด, ์•ž์—์„œ ๊ตฌํ•œ ๊ธฐ์šธ๊ธฐ & ์ ˆํŽธ์„ ์‚ฌ์šฉํ•˜์—ฌ (15, 15 X 39 -709), (50, 50 X 39 -709) ๋‘ ์ ์„ ์ด์œผ๋ฉด ๋ฉ๋‹ˆ๋‹ค.
# ํ›ˆ๋ จ ์„ธํŠธ์˜ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input, train_target)

# 15~50๊นŒ์ง€ 1์ฐจ ๋ฐฉ์ •์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค. [lr.coef_+lr.intercept_: ๊ธฐ์šธ๊ธฐ + ์ ˆํŽธ]
plt.plot([15, 50], [15*lr.coef_+lr.intercept_, 50*lr.coef_+lr.intercept_])

# 50cm ํ™์–ด ๋ฐ์ดํ„ฐ
plt.scatter(50, 1241.8, marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

  • ํ•œ๋ฒˆ Training_set์™€ Test_set์— ๋Œ€ํ•œ R^2 ์ ์ˆ˜๋ฅผ ํ™•์ธํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# training์ด ์ตœ๊ทผ์ ‘ ์ด์›ƒ์œผ๋กœ ํ–ˆ๋˜ ์ ์ˆ˜๋ณด๋‹ค ๋‚ฎ์•„์„œ ๊ณผ์†Œ์ ํ•ฉ?
print(lr.score(train_input, train_target))

# training์˜ ๊ณผ๋Œ€ ์ ํ•ฉ์ผ์ˆ˜๋„
print(lr.score(test_input, test_target))
0.9398463339976041
0.824750312331356
  • Training_set์™€ Test_set์˜ ์ ์ˆ˜๊ฐ€ ์กฐ๊ธˆ ์ฐจ์ด๊ฐ€ ๋‚˜๋Š”๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡๋‹ค๊ณ  ๊ณผ๋Œ€์ ํ•ฉ์ด ๋œ๊ฒƒ๋„ ์•„๋‹ˆ๊ณ , ์ „์ฒด์ ์œผ๋กœ ๋ชจ๋ธ์ด ๊ณผ์†Œ์ ํ•ฉ์ด ๋˜์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์„๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค. ๊ทผ๋ฐ ๊ทธ๊ฒƒ๋งŒ ๋ฌธ์ œ๊ฐ€ ์•„๋‹™๋‹ˆ๋‹ค. ๊ทธ๋ž˜ํ”„์˜ ์™ผ์ชฝ ํ•˜๋‹จ์„ ๋ณด๋ฉด ๋ฌด์–ธ๊ฐ€ ์ด์ƒํ•œ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜ํ”„์˜ ์ง์„ ์„ ๋ณด๋ฉด ์™ผ์ชฝ ์•„๋ž˜๋กœ ๋‚ด๋ ค๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„์—์„œ ๋ณด์ด๋Š” ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„ํฌ๋ž‘ ๋น„์Šทํ•˜์ง€๊ฐ€ ์•Š์ฃ . ์ด๊ฑด ๊ฐ’์ด ์Œ์ˆ˜๋กœ ๋–จ์–ด ์งˆ์ˆ˜๋„ ์žˆ๋‹ค๋Š”๊ฒƒ์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค. 

๋‹คํ•ญ ํšŒ๊ท€

๋‹คํ•ญ ํšŒ๊ท€๋Š” x์˜ ๋‹คํ•ญ์‹์œผ๋กœ ๋งŒ๋“ค์–ด์„œ ํ•˜๋Š” ์„ ํ˜•ํšŒ๊ท€ ๋ฐฉ์‹์„ ๋‹คํ•ญ ํšŒ๊ท€๋ผ๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ์ „์— ์„ ํ˜•ํšŒ๊ท€์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ ์ง์„  ๋ณด๋‹ค๋Š”, ์ด ๋ฐ์ดํ„ฐ์…‹๋“ค์„ ๋ณด๋ฉด ์ตœ์ ์˜ ๊ณก์„ ์„ ๋งŒ๋“ค์–ด์„œ ๋ด์•ผ ํ• ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

  • ์ด๋Ÿฐ ๊ณก์„  ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ๋ ค๋ฉด, ๊ธธ์ด๋ฅผ ์ œ๊ณฑํ•œ ํ•ญ์ด Training_set์— ์ถ”๊ฐ€๊ฐ€ ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿด๋•, Numpy๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฐ„๋‹จํžˆ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ•œ๋ฒˆ ๋งŒ๋“ค์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# 2์ฐจ ๋ฐฉ์ •์‹์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•˜์—ฌ ๊ธธ์ด๋ฅผ ์ œ๊ณฑํ•œ ํ•ญ์„ training_set์— ์ถ”๊ฐ€
# Numpy Broadcasting ์ ์šฉ
train_poly = np.column_stack((train_input ** 2, train_input))
test_poly = np.column_stack((test_input ** 2, test_input))
# ์ƒˆ๋กญ๊ฒŒ ๋งŒ๋“  dataset ํฌ๊ธฐ ํ™•์ธ
print(train_poly.shape, test_poly.shape)
(42, 2) (14, 2)
  • Train_input ** 2 ์‹์—๋„ Numpy Broadcasting์ด ์ ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ฆ‰, train_input์— ์žˆ๋Š” ๋ชจ๋“  ์›์†Œ๋ฅผ ์ œ๊ณฑํ•ฉ๋‹ˆ๋‹ค. 
  • ์›๋ž˜ ํŠน์„ฑ์ธ ๊ธธ์ด๋ฅผ ์ œ๊ณฑํ•˜์—ฌ ์™ผ์ชฝ ์—ด์— ์ถ”๊ฐ€๋ฅผ ํ•ด์„œ Training_set, Test_set ๋ชจ๋‘ ์—ด์ด 2๊ฐœ๋กœ ๋Š˜์–ด๋‚ฌ์Šต๋‹ˆ๋‹ค.
  • ์ด์ œ train_poly๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์„ ํ˜• ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ํ›ˆ๋ จ์„ธํŠธ์—์„œ ํ–ˆ๋˜๊ฒƒ ์ฒ˜๋Ÿผ ํ…Œ์ŠคํŠธ ํ• ๋•Œ๋Š” ์ด ๋ชจ๋ธ์˜ ๋†์–ด ๊ธธ์ด์˜ ์ œ๊ณฑ๊ณผ ์›๋ž˜ ๊ธธ์ด๋ฅผ ๋„ฃ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
# Model ๋‹ค์‹œ training
lr = LinearRegression()
lr.fit(train_poly, train_target) # train_target - ์˜ˆ์ธกํ•˜๊ณ ์ž ํ•˜๋Š” ๋†์–ด์˜ ๋ฌด๊ฒŒ (๋ณ€๋™ ์—†์Œ)

print(lr.predict([[50**2, 50]]))
[1573.98423528]
  • ์•ž์—์„œ ํ›ˆ๋ จํ•œ ๋ชจ๋ธ๋ณด๋‹ค ๋” ๋†’์€ ๊ฐ’์„ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์ด ํ›ˆ๋ จํ•œ ๊ณ„์ˆ˜ & ์ ˆํŽธ์„ ์ถœ๋ ฅํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
print(lr.coef_, lr.intercept_)
# ์ œ๊ณฑํ•ญ์„ 2๊ฐœ ๋„ฃ์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ณ„์ˆ˜๊ฐ€ 2๊ฐœ๊ฐ€ ์ถœ๋ ฅ.[1.01433211 -21.55792498], ์ ˆํŽธ: 116.05021078278338
[ 1.01433211 -21.55792498] 116.05021078278338
๋ฌด๊ฒŒ = a* ๊ธธ์ด^2(x^2) + b * ๊ธธ์ด(x) + c
๋ชจ๋ธ์€ ์ด ๊ทธ๋ž˜ํ”„๋ฅผ ํ•™์Šตํ–ˆ์Šต๋‹ˆ๋‹ค. 
๋ฌด๊ณ„ = 1.01 x ๊ธธ์ด**2 - 21.6 x ๊ธธ์ด + 116.05
  • ์ด๋Ÿฐ ๋ฐฉ์ •์‹์„ ๋‹คํ•ญ์‹์ด๋ผ๊ณ  ๋ถ€๋ฅด๋ฉฐ, ๋‹คํ•ญ์‹์„ ์‚ฌ์šฉํ•œ ์„ ํ˜• ํšŒ๊ท€๋ฅผ ๋‹คํ•ญ ํšŒ๊ท€(Polynomial regression)์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.
  • ํ•œ๋ฒˆ ๋‹คํ•ญํšŒ๊ท€๋ฅผ ์ ์šฉํ•œ ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋Ÿฌ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# ๊ตฌ๊ฐ„๋ณ„ ์ง์„ ์„ ๊ทธ๋ฆฌ๊ธฐ ์œ„ํ•ด 15~50 ๊นŒ์ง€ ์ •์ˆ˜ ๋ฐฐ์—ด์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค.
point = np.arange(15,50)

# ํ›ˆ๋ จ ์„ธํŠธ์˜ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.scatter(train_input, train_target)

# 15์—์„œ 50๊นŒ์ง€ 2์ฐจ ๋ฐฉ์ •์‹ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
plt.plot(point, 1.01*point**2 - 21.6*point + 116.05)

# 50cm ๋†์–ด ๋ฐ์ดํ„ฐ
plt.scatter([50], [1574], marker='^')
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

  • ์•ž์„œ, ๋‹จ์ˆœ ์„ ํ˜• ํšŒ๊ท€๋ชจ๋ธ ๋ณด๋‹ค ํœ ์”ฌ ๋‚˜์€ ๊ทธ๋ž˜ํ”„๊ฐ€ ๊ทธ๋ ค์กŒ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด Training_set์™€ Test_set ์˜ R**2 ์ ์ˆ˜๋ฅผ ํ‰๊ฐ€ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
print(lr.score(train_poly, train_target
print(lr.score(test_poly, test_target))
0.9706807451768623
0.9775935108325122
  • Training_set์™€ Test_set์˜ ์ ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ๋†’์•„์กŒ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ์—ฌ์ „ํžˆ Test_set์ ์ˆ˜๊ฐ€ Training_set ๋ณด๋‹ค ์ ์ˆ˜๊ฐ€ ๋” ๋†’์Šต๋‹ˆ๋‹ค. ๊ณผ์†Œ์ ํ•ฉ์ด ์•„์ง ๋‚จ์•„์žˆ๋Š”๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค.

Keywords

  • ์„ ํ˜• ํšŒ๊ท€ ๋Š” ํŠน์„ฑ๊ณผ ํƒ€๊นƒ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ๊ฐ€์žฅ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” ์„ ํ˜• ๋ฐฉ์ •์‹์„ ์ฐพ์Šต๋‹ˆ๋‹ค. ํŠน์„ฑ์ด ํ•˜๋‚˜๋ฉด ์ง์„  ๋ฐฉ์ •์‹์ด ๋ฉ๋‹ˆ๋‹ค.
  • ์„ ํ˜• ํšŒ๊ท€ ๊ฐ€ ์ฐพ์€ ํŠน์„ฑ๊ณผ ํƒ€๊นƒ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋Š” ์„ ํ˜• ๋ฐฉ์ •์‹์˜ ๊ณ„์ˆ˜ ๋˜๋Š” ๊ฐ€์ค‘์น˜์— ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ์ข…์ข… ๊ธฐ์ค‘์น˜๋Š” ๋ฐฉ์ •์‹์˜ ๊ธฐ์šธ๊ธฐ์™€ ์ ˆํŽธ์„ ๋ชจ๋‘ ์˜๋ฏธํžˆ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.
  • ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ์„ ํ˜• ํšŒ๊ท€๊ฐ€ ์ฐพ์€ ๊ฐ€์ค‘์น˜์ฒ˜๋Ÿผ ๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ํŠน์„ฑ์—์„œ ํ•™์Šตํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋งํ•ฉ๋‹ˆ๋‹ค.
  • ๋‹คํ•ญ ํšŒ๊ท€๋Š” ๋‹คํ•ญ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ํŠน์„ฑ๊ณผ ํƒ€๊นƒ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋ƒ„๋‹ˆ๋‹ค. ์ด ํ•จ์ˆ˜๋Š” ๋น„์„ ํ˜•์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ์—ฌ์ „ํžˆ ์„ ํ˜• ํšŒ๊ท€๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํŒจํ‚ค์ง€์™€ ํ•จ์ˆ˜

Scikit-learn

  • LinearRegression์€ ์‚ฌ์ดํ‚ท๋Ÿฐ์˜ ์„ ํ˜• ํšŒ๊ท€ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค.
  • fit_intercept ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ False๋กœ ์ง€์ •ํ•˜๋ฉด ์ ˆํŽธ์„ ํ•™์Šตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ๊ธฐ๋ณธ ๊ฐ’์€ True์ž…๋‹ˆ๋‹ค.
  • ํ•™์Šต๋œ ๋ชจ๋ธ์˜ coef ์†์„ฑ์€ ํŠน์„ฑ์— ๋Œ€ํ•œ ๊ณ„์ˆ˜๋ฅผ ํฌํ•จํ•œ ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์ฆ‰ ์ด ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋Š” ํŠน์„ฑ์˜ ๊ฐœ์ˆ˜์™€๊ฐ™์Šต๋‹ˆ๋‹ค. intercept_์†์„ฑ์—๋Š”์ ˆํŽธ์ด ์ €์žฅ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.