A A
[ํ˜ผ๊ณต๋จธ์‹ ] K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€

K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€

  • K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€์— ๋ฐํ•˜์—ฌ ์„ค๋ช…์„ ๋“œ๋ฆฌ๊ธฐ ์ „์—, ํšŒ๊ท€์— ๋Œ€ํ•˜์—ฌ ์„ค๋ช…์„ ๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ํšŒ๊ท€(Regression)์€ ์ง€๋„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ข…๋ฅ˜์ค‘ ํ•˜๋‚˜์ด๋ฉฐ, Sample์„ ๋ช‡๊ฐœ์˜ Class์ค‘ ํ•˜๋‚˜๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค.
  • ์ง€๋„ํ•™์Šต ์•Œ๊ณ ๋ฆฌ์ฆ˜์ค‘ ํ•˜๋‚˜์ธ ๋ถ„๋ฅ˜์™€ ๋˜‘๊ฐ™์ด ์˜ˆ์ธกํ•˜๋ ค๋Š” Sample์— ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด Sample K๊ฐœ๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

  • ๊ทธ๋ฆผ์—์„œ ๋ณด์—ฌ๋“œ๋ ธ๋“ฏ์ด, ์˜ˆ๋ฅผ ๋“ค๋ฉด ์ƒ˜ํ”Œ X์˜ Target๊ฐ’์„ ๊ตฌํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ๊ฐ๊ฐ ์ด์›ƒํ•œ ์ƒ˜ํ”Œ์˜ ํƒ€๊ฒŸ๊ฐ’์ด 100, 80, 60 ์ด๋ฉด, ์ด๋ฅผ ํ‰๊ท ํ™”ํ•˜๋ฉด, Sample X์˜ ์˜ˆ์ธก Target๊ฐ’์€ 80์ด ๋ฉ๋‹ˆ๋‹ค.

๋ฐ์ดํ„ฐ ์ค€๋น„

์ด๋ฒˆ์—๋Š” ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ Numpy ๋ฐฐ์—ด๋กœ ๋ฐ”๋กœ ๋งŒ๋“ค์–ด์„œ ๋ณ€ํ™˜ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๋†์–ด์˜ ๊ธธ์ด๋ฅผ ํŠน์„ฑ, ๋ฌด๊ฒŒ๋ฅผ Target์œผ๋กœ ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
import numpy as np
perch_length = np.array([8.4, 13.7, 15.0, 16.2, 17.4, 18.0, 18.7, 19.0, 19.6, 20.0, 21.0,
       21.0, 21.0, 21.3, 22.0, 22.0, 22.0, 22.0, 22.0, 22.5, 22.5, 22.7,
       23.0, 23.5, 24.0, 24.0, 24.6, 25.0, 25.6, 26.5, 27.3, 27.5, 27.5,
       27.5, 28.0, 28.7, 30.0, 32.8, 34.5, 35.0, 36.5, 36.0, 37.0, 37.0,
       39.0, 39.0, 39.0, 40.0, 40.0, 40.0, 40.0, 42.0, 43.0, 43.0, 43.5,
       44.0])
       
perch_weight = np.array([5.9, 32.0, 40.0, 51.5, 70.0, 100.0, 78.0, 80.0, 85.0, 85.0, 110.0,
       115.0, 125.0, 130.0, 120.0, 120.0, 130.0, 135.0, 110.0, 130.0,
       150.0, 145.0, 150.0, 170.0, 225.0, 145.0, 188.0, 180.0, 197.0,
       218.0, 300.0, 260.0, 265.0, 250.0, 250.0, 300.0, 320.0, 514.0,
       556.0, 840.0, 685.0, 700.0, 700.0, 690.0, 900.0, 650.0, 820.0,
       850.0, 900.0, 1015.0, 820.0, 1100.0, 1000.0, 1100.0, 1000.0,
       1000.0])

 

  • ์ผ๋‹จ ์ด ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋–ค ํ˜•ํƒœ๋ฅผ ๋„๊ณ  ์žˆ๋Š”์ง€ ํŒŒ์•…์„ ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค. 
  • ํ•˜๋‚˜์˜ ํŠน์„ฑ์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ํŠน์„ฑ ๋ฐ์ดํ„ฐ๋ฅผ x์ถ•, Target ๋ฐ์ดํ„ฐ๋ฅผ y์ถ•์— ๋†“๊ณ  scatter() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฐ์ ๋„๋ฅผ ๊ทธ๋ฆฝ๋‹ˆ๋‹ค.
import matplotlib.pyplot as plt

plt.scatter(perch_length, perch_weight) # perch_weight(target)
plt.xlabel('length')
plt.ylabel('weight')
plt.show()

  • ๋†์–ด์˜ ๊ธธ์ด๊ฐ€ ์ปค์ง€๋ฉด, ๋ฌด๊ฒŒ๋„ ๋Š˜์–ด๋‚œ๋‹ค๋Š”๊ฒƒ์„ ๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด์„œ ํ™•์ธ์„ ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
  • ML ๋ชจ๋ธ์— ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•˜์—ฌ Training Set ์™€ Test Set ๋กœ ๋‚˜๋ˆ•๋‹ˆ๋‹ค.
# ์ž„์˜์˜ ์ˆ˜์น˜ (๋†์–ด์˜ ๋ฌด๊ฒŒ ์ธก์ •)
# Scikit-learn ํ›ˆ๋ จ์„ธํŠธ๋Š” 2์ฐจ์› ๋ฐฐ์—ด์ด์—ฌ์•ผ ํ•จ์œผ๋กœ, Numpy์˜ reshape method๋ฅผ ์‚ฌ์šฉํ•ด์„œ 2์ฐจ์›์œผ๋กœ ๋ด๊ฟ”์คŒ
from sklearn.model_selection import train_test_split

train_input, test_input, train_target, test_target = train_test_split(
	perch_length, perch_weight, random_state=42)
์—ฌ๊ธฐ์„œ ๊ธฐ์–ตํ•ด์•ผ ํ•  ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. Scikit-learn ์— ์‚ฌ์šฉํ•  ํ›ˆ๋ จ ์„ธํŠธ๋Š” 2์ฐจ์› ๋ฐฐ์—ด์ด์—ฌ์•ผ ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.
  • perch_length๊ฐ€ 1์ฐจ์› ๋ฐฐ์—ด์ด๋ฏ€๋กœ, ์ด๋ฅผ ๋‚˜๋ˆˆ train_input & test_input๋„ 1์ฐจ์› ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. 
  • Scikit-learn์— ์‚ฌ์šฉํ•˜๋ ค๋ฉด 2์ฐจ์› ๋ฐฐ์—ด์ด ๋˜์–ด์•ผ ํ•จ์œผ๋กœ reshape() Method๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ด๊พธ์–ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
  • reshape() Method๋Š” ๋ด๊พธ๋ ค๋Š” ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋ฅผ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ ์›๋ณธ ๋ฐฐ์—ด์˜ ์›์†Œ์˜ ํฌ๊ธฐ์™€ ์ง€์ •ํ•œ ํฌ๊ธฐ๊ฐ€ ๋‹ค๋ฅด๋ฉด ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•ด์„œ ๋ณ€ํ™˜ ์ž์ฒด๊ฐ€ ๋ถˆ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
# test_array์˜ ๋ฐฐ์—ด ํ™•์ธ
test_array = np.array([1,2,3,4])
print(test_array.shape)

# (2,2)
# reshape()๋ฅผ ์‚ฌ์šฉํ•ด์„œ train_input & test_input์„ 2์ฐจ์› ๋ฐฐ์—ด๋กœ ๋ด๊ฟˆ

train_input = train_input.reshape(-1, 1) # ์—ด์ด ํ•˜๋‚˜, ๋‚จ์€ ์ฐจ์›์˜ ๊ฐœ์ˆ˜๋Š” ํ•˜๋‚˜๊ฐ€ ๋œ๋‹ค.
test_input = test_input.reshape(-1, 1)
print(train_input.shape, test_input.shape)

๊ฒฐ์ •๊ณ„์ˆ˜

  • Scikit-learn์—์„œ K-์ตœ๊ทผ์ ‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌํ˜„ํ•œ Class๋Š” KNeighborsRegressor ์ž…๋‹ˆ๋‹ค.
  • ์‚ฌ์šฉ๋ฒ•์€ KNeighborsClassifier์™€ ๋น„์Šทํ•ฉ๋‹ˆ๋‹ค.
from sklearn.neighbors import KNeighborsRegressor

knr = KNeighborsRegressor() # class ๊ฐ์ฒด ํ• ๋‹น

# k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.
knr.fit(train_input, train_target) # train, target ๋ฐ์ดํ„ฐ ์ „๋‹ฌ & ํ›ˆ๋ จ
knr.score(test_input, test_target) # test_set ์ ์ˆ˜ ํ™•์ธ(์ •ํ™•๋„)
0.992809406101064
๊ฒฐ์ •๊ณ„์ˆ˜ ๊ตฌํ•˜๋Š” ๊ณต์‹: R**2 = 1 - (test_target - test_input)**2 / (test_target - test_target_mean)**2
R**2๋Š” 0์— ๊ฐ€๊นŒ์›Œ์ง‘๋‹ˆ๋‹ค. ๋งŒ์•ฝ ์˜ˆ์ธก์ด Target์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด? - R2๋Š” ๋ถ„์ž๊ฐ€ 0์— ๊ฐ€๊นŒ์›Œ์ ธ์„œ 1์— ๊ฐ€๊นŒ์šด ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.
  • ์ •ํ™•๋„๊ฐ€ 0.99์ด๋ฉด ์ข‹์€ํŽธ์ž…๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ์ด๋ฒˆ์—๋Š” Target๊ฐ’๊ณผ ์˜ˆ์ธกํ•œ ๊ฐ’ ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ ๊ตฌํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ์–ด๋А์ •๋„ ์˜ˆ์ธก์ด ๋ฒ—์–ด๋‚ฌ๋Š”์ง€๋ฅผ ํŒ๋‹จํ• ๋•Œ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
  • ์ด๋Ÿด๋•Œ์—๋Š” Scikit-learn์—์„œ๋Š” sklearn.metrics ํŒจํ‚ค์ง€ ์•„๋ž˜ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ธก์ •๋„๊ตฌ์ค‘ mean_absolute_error ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ด ๋„๊ตฌ๋Š” Target๊ณผ ์˜ˆ์ธก์˜ ์ ˆ๋Œ€๊ฐ’ ์˜ค์ฐจ๋ฅผ ํ‰๊ท ๋‚ด์–ด์„œ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
# mean_absolute_error (ํ‰๊ท  ์ ˆ๋Œ€๊ฐ’ ์˜ค์ฐจ-์ด๋Ÿฌํ•œ ์ธก์ • ์ง€ํ‘œ๋“ค์ด sklearn์—์„œ metrics library์— ์กด์žฌ)
from sklearn.metrics import mean_absolute_error

test_prediction = knr.predict(test_input) # Test_input์œผ๋กœ ์˜ˆ์ธกํ•œ ๊ฐ’
mae = mean_absolute_error(test_target, test_prediction) # Target & Test_prediction ์œผ๋กœ ์˜ˆ์ธกํ•œ๊ฐ’
print(mae) # 19g ์ •๋„๋กœ ์ฐจ์ด๊ฐ€ ๋‚œ๋‹ค.
19.157142857142862 
  • ๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ๋ณด๋ฉด ์˜ˆ์ธก์ด ํ‰๊ท ์ ์œผ๋กœ 19g ์ •๋„ Target๊ฐ’๊ณผ ๋‹ค๋ฅด๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๊ณผ๋Œ€์ ํ•ฉ vs ๊ณผ์†Œ์ ํ•ฉ

์•ž์—์„œ ํ›ˆ๋ จํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด Training_set์™€ Test_set์˜ R**2 ์ ์ˆ˜๋ฅผ ํ™•์ธํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.
knr.score(train_input, train_target)
0.9698823289099254
knr.score(test_input, test_target)
0.992809406101064
  • ๊ทผ๋ฐ, ์ด ์ ์ˆ˜์—์„œ ์ด์ƒํ•œ ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋ณดํ†ต ๋ชจ๋ธ์„ ํ›ˆ๋ จ์„ธํŠธ์—์„œ ํ›ˆ๋ จ์„ ํ•˜๋ฉด Training_set์˜ ์ ์ˆ˜๊ฐ€ ์กฐ๊ธˆ ๋” ๋†’๊ฒŒ ๋‚˜์˜ต๋‹ˆ๋‹ค. 
๋งŒ์•ฝ Training_set์—์„œ ์ ์ˆ˜๊ฐ€ ์ข‹์•˜๋Š”๋ฐ, Test_set์—์„œ ์ ์ˆ˜๊ฐ€ ๋‚ฎ์œผ๋ฉด Model์ด Training_set์— ๊ณผ๋Œ€์ ํ•ฉ(Overfitting) ๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜๋Œ€๋กœ, Training_set๋ณด๋‹ค Test_set์˜ ์ ์ˆ˜๊ฐ€ ๋†’๊ฑฐ๋‚˜, ๋‘ ์ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋‚ฎ์€ ๊ฒฝ์šฐ๋Š” Model์ด Training_set์— ๊ณผ์†Œ์ ํ•ฉ(Underfitting)๋˜์—ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋ฉด ์ด์ œ ํ•œ๋ฒˆ ๋‹ค์‹œ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค. ํ˜„์žฌ K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€๋กœ ํ‰๊ฐ€ํ•œ Training_set์™€ Test_set์˜ ์ ์ˆ˜๋Š” ์–ด๋–ค๊ฐ€์š”?
  • Training_set๋ณด๋‹ค Test_set์˜ ์ ์ˆ˜๊ฐ€ ๋” ๋†’์œผ๋‹ˆ ๊ณผ์†Œ์ ํ•ฉ. Underfitting ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์ด ํ˜„์ƒ์„ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ์š”?
  • ๋ชจ๋ธ์„ ์กฐ๊ธˆ ๋” ๋ณต์žกํ•˜๊ฒŒ ๋งŒ๋“œ๋ฉด ๋ฉ๋‹ˆ๋‹ค. K-์ตœ๊ทผ์ ‘ ์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๋ชจ๋ธ์„ ๋” ๋ณต์žกํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์€ ์ด์›ƒ์˜ ๊ฐœ์ˆ˜. ์ฆ‰ k๊ฐœ๋ฅผ ์ค„์ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. 
  • scikit-learn์˜ k-์ตœ๊ทผ์ ‘ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ธฐ๋ณธ k๊ฐ’์€ 5์ž…๋‹ˆ๋‹ค. ํ•œ๋ฒˆ k๊ฐ’์„ 3์œผ๋กœ ์ค„์—ฌ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.
# ์ด์›ƒ์˜ ๊ฐœ์ˆ˜๋ฅผ 3์œผ๋กœ ์„ค์ •
knr.n_neighbors = 3

# ๋ชจ๋ธ์„ ๋‹ค์‹œ ํ›ˆ๋ จ
knr.fit(train_input, train_target)
print(knr.score(train_input, train_target))
0.9804899950518966
print(knr.score(test_input, test_target))
0.9746459963987609
  • K๊ฐ’์„ ์ค„์˜€๋”๋‹ˆ, Training_Set์˜ ์ ์ˆ˜๊ฐ€ ์˜ฌ๋ผ๊ฐ”์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์–ด๋А์ •๋„ ๊ณผ์†Œ์ ํ•ฉ (Underfitting)๋ฌธ์ œ๋Š” ์–ด๋А์ •๋„ ํ•ด๊ฒฐ ๋˜์—ˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ๊ฒ ์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ฆฌ๊ณ  Test_set๊ณผ์˜ ์ฐจ์ด๋„ ํฌ์ง€ ์•Š์œผ๋ฏ€๋กœ, ๊ณผ๋Œ€์ ํ•ฉ๋„ ์–ด๋А์ •๋„ ํ•ด๊ฒฐ๋œ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.
  • ๋งˆ์ง€๋ง‰์œผ๋กœ ์‚ฐ์ ๋„ ๊ทธ๋ž˜ํ”„๋ฅผ ๊ทธ๋ ค๋ณด๋ฉด ๊ฒฐ๊ณผ๋Š” ์ด๋ ‡์Šต๋‹ˆ๋‹ค.
plt.scatter(train_input, train_target)
plt.scatter(test_input, test_target)
plt.xlabel('length')
plt.ylabel('weight')
plt.show()


Keywords

  • ํšŒ๊ท€๋Š” ์ž„์˜์˜ ์ˆ˜์น˜๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํƒ€๊นƒ๊ฐ’๋„ ์ž„์˜์˜ ์ˆ˜์น˜๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
  • k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€๋Š” k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‹œ์šฉํ•ด ํšŒ๊ท€ ๋ฌธ์ œ๋ฅผ ํ’ˆ๋‹ˆ๋‹ค. ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ์ด ์›ƒ ์ƒ˜ํ”Œ์„ ์ฐพ๊ณ  ์ด ์ƒ˜ํ”Œ๋“ค์˜ ํƒ€๊นƒ๊ฐ’์„ ํ‰๊ท ํ•˜์—ฌ ์˜ˆ์ธก์œผ๋กœ ์‚ผ์Šต๋‹ˆ๋‹ค.
  • ๊ฒฐ์ •๊ณ„์ˆ˜(R2)๋Š” ๋Œ€ํ‘œ์ ์ธ ํšŒ๊ท€ ๋ฌธ์ œ์˜ ์„ฑ๋Šฅ ์ธก์ • ๋„๊ตฌ์ž…๋‹ˆ๋‹ค. 1์— ๊ฐ€๋ผ์šธ์ˆ˜๋ก ์ข‹๊ณ , 0์— ๊ฐ€๊น ๋‹ค๋ฉด ์„ฑ๋Šฅ์ด ๋‚˜์œ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
  • ๊ณผ๋Œ€์ ํ•ฉ์€ ๋ชจ๋ธ์˜ ํ›ˆ๋ จ ์„ธํŠธ ์„ฑ๋Šฅ์ด ํƒœ์ŠคํŠธ ์„ธํŠธ ์„ฑ๋Šฅ๋ณด๋‹ค ํ›จ์”ฌ ๋†’์„ ๋•Œ ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์„ธํŠธ์— ๋„ˆ๋ฌด ์ง‘์ฐฉํ•ด์„œ ๋ฐ์ดํ„ฐ์— ๋‚ด์žฌ๋œ ๊ฑฐ์‹œ์ ์ธ ํŒจํ„ด์„ ๊ฐ์ง€ํ•˜์ง€ ๋ชปํ•ฉ๋‹ˆ๋‹ค. ๊ณผ์†Œ์ ํ•ฉ์€ ์ด์™€ ๋ฐ˜๋Œ€์ž…๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์„ธํŠธ์™€ ํƒœ์ŠคํŠธ ์„ธํŠธ ์„ฑ๋Šฅ์ด ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ๋‚ฎ๊ฑฐ๋‚˜ ํ…Œ์ŠคํŠธ ์„ธํŠธ ์„ฑ๋Šฅ์ด ์˜คํžˆ๋ ค ๋” ๋†’์„๋•Œ ์ผ์–ด๋‚ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฐ ๊ฒฝ์šฐ ๋” ๋ณต์žกํ•œ๋ชจ๋Œˆ์„์‚ฌ์šฉํ•ด ํ›ˆ๋ จ ์„ธํŠธ์— ์ž˜๋งž๋Š”๋ชจ๋ธ ์„๋งŒ๋“ค์–ด์•ผํ•ฉ๋‹ˆ๋‹ค.

ํ•ต์‹ฌ ํŒจํ‚ค์ง€์™€ ํ•จ์ˆ˜

Scikit-learn

  • KNeighborsRegressor๋Š” k-์ตœ๊ทผ์ ‘ ์ด์›ƒ ํšŒ๊ท€ ๋ชจ๋Œˆ์„ ๋งŒ๋“œ๋Š” ์‚ฌ์ดํ‚ท๋Ÿฐ ํด๋ž˜์Šค์ž…๋‹ˆ๋‹ค. n_neighbors ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ด์›ƒ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ง€์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ๋ณธ๊ฐ’์€ 5์ž…๋‹ˆ๋‹ค.
  • ๋‹ค๋ฅธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” KNeighborsClassifier ํด๋ž˜์Šค์™€ ๊ฑฐ์˜ ํ†ต์ผํ•ฉ๋‹ˆ๋‹ค.
  • mean_absolute_error()๋Š” ํšŒ๊ท€ ๋ชจ๋Œˆ์˜ ํ‰๊ท  ์ ˆ๋ฉ‹๊ฐ’ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ฒซ ๋ฒˆ์งธ ๋งค๊ฐœ๋ณ€ ์ˆ˜๋Š” ํƒ€๊นƒ, ๋‘ ๋ฒˆ์งธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ์˜ˆ์ธก๊ฐ’์„ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ์ด์™€ ๋น„์Šทํ•œ ํ•จ์ˆ˜๋กœ๋Š” ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” mean_squared_error() ๊ฐ€ ์žˆ์Šต๋‹ˆ ๋‹ค.
  • ์ด ํ•จ์ˆ˜๋Š” ํƒ€๊นƒ๊ณผ ์˜ˆ์ธก์„ ๋บ€ ๊ฐ’์„ ์ œ๊ณฑํ•œ ๋‹ค์Œ ์ „์ฒด ์ƒ˜ํ”Œ์— ๋Œ€ํ•ด ํ‰๊ท ํ•œ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

Numpy

  • reshape()๋Š” ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋ฅผ ๋ฐ”๊พธ๋Š” ๋ฉ”์„œ๋“œ์ž…๋‹ˆ๋‹ค. ๋น„๊พธ๊ณ ์ž ํ•˜๋Š” ๋ฐฐ์—ด์˜ ํฌ๊ธฐ๋ฅผ ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ ์ „๋‹ฌํ•ฉ๋‹ˆ๋‹ค. ๋ฐ”๊พธ๊ธฐ ์ „ํ›„์˜ ๋ฐฐ์—ด ์›์†Œ ๊ฐœ์ˆ˜๋Š” ๋™์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
  • Numpy๋Š” ์ข…์ข… ๋ฐฐ์—ด์˜ ๋ฉ”์„œ๋“œ์™€ ๋™์ผํ•œ ํ•จ์ˆ˜๋ฅผ ๋ณ„๋„๋กœ ์ œ๊ณตํ•ฉ๋‚˜๋‹ค. ์ด๋•Œ ํ•จ์ˆ˜์˜ ์ฒซ ๋ฒˆ ์งธ ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” ๋ฐ”๊พธ๊ณ ์ž ํ•˜๋Š” ๋ฐฐ์—ด์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด test_array.reshape (2, 2)๋Š” np.reshape (test_array, (2, 2))์™€๊ฐ™์ด ๋ฐ”๊ฟ” ์“ธ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.