๐Ÿ“ Data Mining

๐Ÿ“ Data Mining

[Data Mining] Getting Data Part.1

Getting Datafrom collections import Counterimport math, random, csv, json, refrom bs4 import BeautifulSoupimport requests ์˜ˆ๋ฅผ ๋“ค์–ด, beautifulsoup ๊ฐ™์€, ์–ด๋–ค ๋ชจ๋“ˆ์ด ์„ค์น˜๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด? ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ์š”?googling: ์•„๋‚˜์ฝ˜๋‹ค beautifulsoup ์„ค์น˜ ๋ฐฉ๋ฒ•๊ตฌ๊ธ€ ๋‹ต๋ณ€์—์„œ ์•„๋‚˜์ฝ˜๋‹ค ํด๋ผ์šฐ๋“œ๋ฅผ ์ฐพ์œผ์„ธ์š”. ๋ชจ๋“ˆ๋“ค์ด ํ…Œ์ŠคํŠธ๋˜๊ณ  ์•ˆ์ „ํ•œ ๊ณณ์ž…๋‹ˆ๋‹ค.์ด์ œ, ์—ฌ๋Ÿฌ๋ถ„์€ ๋ถ€๋„๋Ÿฌ์šด ์ •๋„๋กœ ๋งŽ์€ ์‹œ๊ฐ„์„ acquiring(ํš๋“), cleaning(์ •๋ฆฌ), and transforming data(๋ฐ์ดํ„ฐ ๋ณ€ํ™˜)์— ํ• ์• ํ•˜๊ฒŒ ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค.stdin and stdoutNumber of lines containing num..

๐Ÿ“ Data Mining

[Data Mining] Gradient Descent (๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)

Gradient Descent (๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•(Gradient Descent)์€ ์ตœ์ ํ™” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ค‘ ํ•˜๋‚˜๋กœ, ์ฃผ์–ด์ง„ ํ•จ์ˆ˜์˜ ์ตœ์†Œ๊ฐ’์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋จธ์‹  ๋Ÿฌ๋‹๊ณผ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ํ•™์Šต ๊ณผ์ •์—์„œ ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์˜ ์•„์ด๋””์–ด (The Idea Behind Gradient Descent)์šฐ๋ฆฌ๋Š” ์ข…์ข… ํ•จ์ˆ˜ ๐‘“๋ฅผ ์ตœ๋Œ€ํ™”(๋˜๋Š” ์ตœ์†Œํ™”)ํ•ด์•ผ ํ•  ํ•„์š”๊ฐ€ ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค.์ฆ‰, ์šฐ๋ฆฌ๋Š” ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ์ž‘์€(๋˜๋Š” ๊ฐ€์žฅ ํฐ) ๊ฐ’์„ ์ƒ์„ฑํ•˜๋Š” ์ž…๋ ฅ v๋ฅผ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.๊ทธ๋ฆฌ๊ณ  ์ด๋•Œ, ํ•จ์ˆ˜ ๐‘“๋ฅผ ์ตœ๋Œ€ํ™”(๋˜๋Š” ์ตœ์†Œํ™”)ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ๊ฐ€๋Šฅํ•œ ๊ฐ€์žฅ ์ž‘์€(๋˜๋Š” ๊ฐ€์žฅ ํฐ) ๊ฐ’์„ ๋งŒ๋“œ๋Š” ์ž…๋ ฅ ๐‘ฃ๋ฅผ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค.์ด๊ฒƒ์€ ๋งŽ์€ ๋ฌธ์ œ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ผ์ž…๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๋น„์šฉ ํ•จ์ˆ˜(cost function)๋ฅผ ์ตœ์†Œ..

๐Ÿ“ Data Mining

[Data Mining] Statistics (ํ†ต๊ณ„ํ•™)

Describing a Single Set of Data (๋‹จ์ผ ๋ฐ์ดํ„ฐ ์„ธํŠธ)๋‹จ์ผ ๋ฐ์ดํ„ฐ ์„ธํŠธ(describing a single set of data)๋Š” ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•œ ํŠน์„ฑ์ด๋‚˜ ํŠน์งˆ์„ ์„ค๋ช…ํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๊ณผ์ •์„ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ์ค‘์‹ฌ ๊ฒฝํ–ฅ, ๋ถ„์‚ฐ, ํ˜•ํƒœ ๋ฐ ๋ถ„ํฌ ๋“ฑ์„ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.์˜ˆ๋ฅผ ๋“ค์–ด๋ณด๋ฉด, ๋ชจ๊ธˆ ํ™œ๋™ ๋‹จ์ฒด์˜ ๋ถ€์‚ฌ์žฅ์ด ํšŒ์›๋“ค์ด ์นœ๊ตฌ๋ฅผ ์–ผ๋งˆ๋‚˜ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์š”์ฒญํ–ˆ์Šต๋‹ˆ๋‹ค.from collections import Counterfrom linear_algebra import sum_of_squares, dotimport mathfrom operator import addnum_friends = [100,49,41,40,25,21,21,19,19,18,18,16..

๐Ÿ“ Data Mining

[Data Mining] Linear Algebra (์„ ํ˜•๋Œ€์ˆ˜)

Linear AlgebraLinear Algebra (์„ ํ˜•๋Œ€์ˆ˜ํ•™)์€ ๋ฒกํ„ฐ ๊ณต๊ฐ„, ํ–‰๋ ฌ, ์„ ํ˜• ๋ณ€ํ™˜ ๋“ฑ์˜ ๊ฐœ๋…์„ ์—ฐ๊ตฌํ•˜๋Š” ์ˆ˜ํ•™์˜ ํ•œ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค์ฃผ๋กœ ๋‹ค์ฐจ์› ๊ณต๊ฐ„์—์„œ์˜ ๋ฒกํ„ฐ์™€ ํ–‰๋ ฌ์˜ ์—ฐ์‚ฐ ๋ฐ ์ด๋“ค ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, ๊ณตํ•™, ๋ฌผ๋ฆฌํ•™, ์ปดํ“จํ„ฐ ๊ณผํ•™ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.๋˜ํ•œ ๋งŽ์€ ๋ฐ์ดํ„ฐ ๊ณผํ•™ ๊ฐœ๋…๊ณผ ๊ธฐ์ˆ ์„ ๋’ท๋ฐ›์นจํ•ฉ๋‹ˆ๋‹ค.import re, math, random # regexes, math functions, random numbersimport matplotlib.pyplot as plt # pyplotfrom collections import defaultdict, Counterfrom functools import partial, reduce VectorsVectors๋Š” ์–ด๋–ค finite..

๐Ÿ“ Data Mining

[Data Mining] Introduction to Numpy part.2

BroadcastingNumpy์˜ Broadcasting์€ ์„œ๋กœ ๋‹ค๋ฅธ ํฌ๊ธฐ์˜ ๋ฐฐ์—ด ๊ฐ„์˜ ์—ฐ์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ๊ธฐ๋Šฅ์ž…๋‹ˆ๋‹ค. Broadcasting์„ ํ†ตํ•ด Numpy๋Š” ๋” ์ž‘์€ ๋ฐฐ์—ด์„ ๋” ํฐ ๋ฐฐ์—ด๊ณผ ๋™์ผํ•œ ๋ชจ์–‘์œผ๋กœ ํ™•์žฅํ•˜์—ฌ ์š”์†Œ๋ณ„(element-wise) ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ˜๋ณต๋ฌธ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ ๋„ ํšจ์œจ์ ์ธ ๋ฒกํ„ฐํ™” ์—ฐ์‚ฐ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ๋Š” ์‚ฐ์ˆ  ์—ฐ์‚ฐ ์ค‘์— numpy๊ฐ€ ๋‹ค์–‘ํ•œ ๋ชจ์–‘์„ ๊ฐ€์ง„ ๋ฐฐ์—ด์„ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋Š”์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.ํŠน์ • ์ œ์•ฝ ์กฐ๊ฑด์— ๋”ฐ๋ผ ๋” ์ž‘์€ ๋ฐฐ์—ด์€ ๋” ํฐ ๋ฐฐ์—ด์— ๊ฑธ์ณ "๋ธŒ๋กœ๋“œ์บ์ŠคํŠธ"๋˜์–ด ํ˜ธํ™˜ ๊ฐ€๋Šฅํ•œ ๋ชจ์–‘์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.ExamplesA (2d array): 5 x 4B (1d array): 1Result (2d array): 5 x 4..

๐Ÿ“ Data Mining

[Data Mining] Introduction to Numpy part.1

Introduction to Numpy Numpy๋Š” numberal Python์˜ ์•ฝ์ž๋กœ, ์ˆ˜์น˜ ๊ณ„์‚ฐ์„ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•œ Python ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ํŒจํ‚ค์ง€ ์ž…๋‹ˆ๋‹ค.Numpy์—์„œ ๊ธฐ๋ณธ์ ์ธ ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋Š” ndarray๋ผ๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด ๊ฐ์ฒด์ž…๋‹ˆ๋‹ค.Numpy๋Š” ndarray์˜ ์š”์†Œ๋“ค์„ ํšจ์œจ์ ์œผ๋กœ ์กฐ์ž‘ํ•  ์ˆ˜ ์žˆ๋Š” ์ผ๋ จ์˜ ํ•จ์ˆ˜๋“ค์„ ์ œ๊ณตํ•œ๋‹ค๋Š” ํŠน์ง•์ด ์žˆ์Šต๋‹ˆ๋‹ค.์„ค๋ช…์„œ๋ฅผ ๋ณด๋ ค๋ฉด ์•„๋ž˜ ๋งํฌ๋ฅผ ์ฐธ๊ณ ํ•˜์„ธ์š”. NumPy documentation — NumPy v2.0 ManualThe reference guide contains a detailed description of the functions, modules, and objects included in NumPy. The reference describes how the met..

๐Ÿ“ Data Mining

[Data Mining] Visualizing Data

๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์šฉ๋„๋ฐ์ดํ„ฐ๋ฅผ ํƒ์ƒ‰ํ•˜๋ ค๋ฉด? ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ์šฉ๋„์— ๋ฐํ•˜์—ฌ ์•Œ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค. ํƒ์ƒ‰์  ๋ฐ์ดํ„ฐ ๋ถ„์„ (Exploratory Data Analysis, EDA)๋ชฉ์ : ๋ฐ์ดํ„ฐ์˜ ํŒจํ„ด, ํŠธ๋ Œ๋“œ, ์ด์ƒ์น˜๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ์ดํ•ดํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ํ†ต๊ณ„์  ๊ด€๊ณ„๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ ๋งค์šฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.๋„๊ตฌ: ํžˆ์Šคํ† ๊ทธ๋žจ, ๋ฐ•์Šค ํ”Œ๋กฏ, ์‚ฐ์ ๋„, ์—ด์ง€๋„์™€ ๊ฐ™์€ ๊ทธ๋ž˜ํ”„๊ฐ€ ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.์„ค๋ช…์  ๋ฐ์ดํ„ฐ ๋ถ„์„ (Explanatory Data Analysis)๋ชฉ์ : ํŠน์ • ๋ฐœ๊ฒฌ์ด๋‚˜ ํ†ต์ฐฐ์„ ์ „๋‹ฌํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์ด์•ผ๊ธฐํ•˜๊ฑฐ๋‚˜ ์˜์‚ฌ ๊ฒฐ์ •์„ ์ง€์›ํ•˜๋Š” ๊ณผ์ •์—์„œ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.๋„๊ตฌ: ๋ง‰๋Œ€ ๊ทธ๋ž˜ํ”„, ํŒŒ์ด ์ฐจํŠธ, ์„  ๊ทธ๋ž˜ํ”„, ๋Œ€์‹œ๋ณด๋“œ ๋“ฑ์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋ฉ๋‹ˆ..

๐Ÿ“ Data Mining

[Data Mining] Crash_Course in Python Part.2

The Not-So-BasicsSortingx = [4,1,2,3]y = sorted(x) # is [1,2,3,4], x is unchangedx.sort() # now x is [1,2,3,4]# sort the list by absolute value from largest to smallestx = sorted([-4,1,-2,3], key=abs, reverse=True) # is [-4,3,-2,1]# sort the words and counts from highest count to lowestwc = sorted(word_counts.items(), key=lambda x: x[1], # x[1] ๋‘๋ฒˆ์งธ ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌ rev..

๐Ÿ“ Data Mining

[Data Mining] Crash_Course in Python Part.1

๊ณต๋ฐฑ ์„œ์‹ ์ง€์ •๋งŽ์€ ์–ธ์–ด๋“ค์ด ์ฝ”๋“œ ๋ธ”๋ก๋“ค์˜ ๊ฒฝ๊ณ„๋ฅผ ์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ด„ํ˜ธ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ด์ฌ์€ ์ด๊ฑธ indentation(' : ') ์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.for i in [1, 2, 3, 4, 5]: print(i) for j in [1, 2, 3, 4, 5]: print(j) print(i + j) print(i)print("done looping")1122334455612132435465723142536475834152637485945162738495105done looping ๊ด„ํ˜ธ ๋ฐ ๊ด„ํ˜ธ ์•ˆ์— ๊ณต๋ฐฑ์„ ๋ฌด์‹œํ•˜๋Š” ๊ฒฝ์šฐ์ž…๋‹ˆ๋‹ค.long_winded_computation = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 + 12 + ..

Bigbread1129
'๐Ÿ“ Data Mining' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๊ธ€ ๋ชฉ๋ก