The Not-So-Basics
Sorting
x = [4,1,2,3]
y = sorted(x) # is [1,2,3,4], x is unchanged
x.sort() # now x is [1,2,3,4]
# sort the list by absolute value from largest to smallest
x = sorted([-4,1,-2,3], key=abs, reverse=True) # is [-4,3,-2,1]
# sort the words and counts from highest count to lowest
wc = sorted(word_counts.items(),
key=lambda x: x[1], # x[1] λλ²μ§Έ κ°μ κΈ°μ€μΌλ‘ μ λ ¬
reverse=True)
wc
# [('I', 2), ('am', 1), ('a', 1), ('boy', 1), ('love', 1), ('you', 1)]
- x 리μ€νΈλ μ λκ°μ λ°λΌ λ΄λ¦Όμ°¨μμΌλ‘ μ λ ¬λ©λλ€.
- sorted() ν¨μλ μ λ ₯ 리μ€νΈλ₯Ό μ λ ¬ν©λλ€.
- key=abs μΈμλ μ λ ¬ ν€λ₯Ό μ λκ°μΌλ‘ μ€μ ν©λλ€.
- reverse=True μΈμλ λ΄λ¦Όμ°¨μμΌλ‘ μ λ ¬λ¨μ λνλ λλ€.
- wcλ λ¨μ΄ λ° ν΄λΉ μΉ΄μ΄νΈλ₯Ό κ°μ§ λμ
λ리λ₯Ό κ°μ₯ λ§μ μΉ΄μ΄νΈλΆν° κ°μ₯ μ μ μΉ΄μ΄νΈλ‘ μ λ ¬λ©λλ€.
- sorted() ν¨μμ key 맀κ°λ³μλ κ° μμμ λν΄ μ μ©λλ ν¨μλ₯Ό μ§μ ν©λλ€.
- μ¬κΈ°μλ λλ€ ν¨μλ₯Ό μ¬μ©νμ¬ μμμ λ λ²μ§Έ κ°(μΉ΄μ΄νΈ), λ¨μ΄μ μΆμ° λΉλλ₯Ό κΈ°μ€μΌλ‘ μ λ ¬ν©λλ€.
- reverse=True μΈμλ λ΄λ¦Όμ°¨μμΌλ‘ μ λ ¬λ¨μ λνλ λλ€.
word_counts.items()λ λμ λ리μ κ° νλͺ©μ ννλ‘ λ°νν©λλ€.
μλ₯Ό λ€μ΄, { 'I': 2, 'am': 1, 'a': 1 }μ΄λΌλ λμ λλ¦¬κ° μλ€λ©΄, items() λ©μλλ ('I', 2), ('am', 1), ('a', 1)κ³Ό κ°μ ννλ€μ 리μ€νΈλ₯Ό λ°νν©λλ€.
sorted() ν¨μμ μν΄ λ λ²μ§Έ μμ(μ¦, ννμ λ λ²μ§Έ κ°)λ₯Ό κΈ°μ€μΌλ‘ λ΄λ¦Όμ°¨μμΌλ‘ μ λ ¬λ©λλ€.
λ°λΌμ word_counts.items()κ° ννλ€μ 리μ€νΈλ₯Ό λ°ννλ―λ‘ λ λ²μ§Έ κ°μ ν΄λΉ λ¨μ΄μ μΆν λΉλλ₯Ό λνλ λλ€.
x[0]: ‘λ¨μ΄’, x[1]: ‘μ«μ’
- λ¨μ΄μ ν΄λΉ μΉ΄μ΄νΈλ₯Ό κ°μ₯ λ§μ μΉ΄μ΄νΈλΆν° κ°μ₯ μ μ μΉ΄μ΄νΈλ‘ μ λ ¬ν©λλ€.
List Comprehensions
- νΉμ μμλ§μ μ ννκ±°λ μμλ₯Ό λ³ννκ±°λ λ λ€λ₯Ό μ ννμ¬ λͺ©λ‘μ λ€λ₯Έ λͺ©λ‘μΌλ‘ λ³ννκ³ μ ν κ²μ λλ€.
- λͺ©λ‘μ ν¬κ΄μ±μ νμ΄ν λ λ°©μμ μν΄ κ²°μ λ©λλ€.
- κ°λ₯νλ©΄ νμ List Comprehensionsλ₯Ό μ¬μ©ν©λλ€.
even_numbers = [x for x in range(5) if x % 2 == 0] # [0, 2, 4], 0 ~ 4 μ¬μ΄μ€
squares = [x * x for x in range(5)] # [0, 1, 4, 9, 16]
even_squares = [x * x for x in even_numbers] # [0, 4, 16]
- λ§μ°¬κ°μ§λ‘ Listμ Dictionary λλ SetμΌλ‘ λ³νν μ μμ΅λλ€.
square_dict = { x : x * x for x in range(5) } # { 0:0, 1:1, 2:4, 3:9, 4:16 }
square_set = { x * x for x in [1, -1] } # { 1 } # setμ μ€λ³΅ μμ ν¬ν¨ X
- λ³μλ‘ λ°μ€μ μ¬μ©νλ κ²μ΄ μΌλ°μ μ
λλ€.
- ‘_’μ κ°μ΄ νμνμ§ μμλ μ¬μ©. even_numbers = [0, 2, 4] → Zeros List: [0, 0, 0]'
zeroes = [0 for _ in even_numbers] # has the same length as even_numbers
- List Comprehension λ€μκ³Ό κ°μ μ¬λ¬ κ°μ§ μ΄μ κ° ν¬ν¨λ μ μμ΅λλ€.
pairs = [(x, y)
for x in range(10)
for y in range(10)] # 100 pairs (0,0) (0,1) ... (9,8), (9,9)
- μ»΄ν¨ν° μ¬μ©μ΄ μ¬μ΄ 거리 νλ ¬, λμ€μλ μ΄μ μ κ²°κ³Όλ₯Ό μ¬μ©ν μ μμ΅λλ€.
increasing_pairs = [(x, y)
for x in range(10)
for y in range(x + 1, 10)]
# [(0,1), (0,2), (0,3) ... (8, 9)]
- 0λΆν° 9κΉμ§μΈ λͺ¨λ μ«μ μμ μμ±ν©λλ€.
- μ΄ μλ€μ 첫 λ²μ§Έ μμκ° λ λ²μ§Έ μμλ³΄λ€ μμ μ¦κ°νλ μμμμ λλ€.
Generators & Iterators
- Listμ λ¬Έμ μ μ Listκ° μ½κ² λ§€μ° μ»€μ§ μ μλ€λ κ²μ λλ€.
- range(1000000)λ μ€μ 100λ§ κ°μ μμ λͺ©λ‘μ λ§λλλ€.
- λ§μ½ λΉμ μ΄ ν λ²μ νλμ©λ§ λ€λ£¨λ©΄ λλ€λ©΄, μ΄κ²μ μμ²λ λΉν¨μ¨μ±μ μμ²(λλ λ©λͺ¨λ¦¬ λΆμ‘±μ μμΈ)μ΄ λ μ μμ΅λλ€.
- λ§μ½ λΉμ μ΄ μ μ¬μ μΌλ‘ μ²μ λͺ κ°μ κ°λ§ νμνλ€λ©΄, κ·Έκ²λ€μ λͺ¨λ κ³μ°νλ κ²μ λλΉμ λλ€.
- Generatorλ μ¬μ©μκ° λ°λ³΅ν μ μμ§λ§(μ°λ¦¬μ κ²½μ°, μΌλ°μ μΌλ‘ μ μ¬μ©ν©λλ€) νμμ λ°λΌ κ°μ΄ μμ±λλ κ²(lazily)μ λλ€.
- Generatorλ₯Ό λ§λλ ν κ°μ§ λ°©λ²μ ν¨μμ yield μ°μ°μλ₯Ό μ¬μ©νλ κ²μ λλ€:
def lazy_range(n):
"""a lazy version of range"""
i = 0
while i < n:
yield i
i += 1
# The following loop will consume the yield ed values one at a time until none are left:
for i in lazy_range(10):
print(i)
0
1
2
3
4
5
6
7
8
9
# The following loop will consume the yield ed values one at a time until none are left:
for i in lazy_range(10000):
if i == 3: break
print(i)
0
1
2
t = lazy_range(3)
next(t)
next(t)
next(t)
#next(t)
# 2
def lazy_inf_range():
i = 0
while True:
yield i
i += 1
t = lazy_inf_range()
next(t)
next(t)
next(t)
# 2
- Generatorλ₯Ό λ§λλ λ λ²μ§Έ λ°©λ²μ κ΄νΈ μμΌλ‘ ν¬μ₯λ comprehensionμ μ¬μ©νλ κ²μ λλ€:
lazy_evens_below_20 = (i for i in lazy_range(20) if i % 2 == 0)
lazy_evens_below_20
# <generator object <genexpr> at 0x7cce6849c4a0>
Randomness
- λμλ₯Ό μμ±νκΈ° μν΄, μ°λ¦¬λ λμ λͺ¨λμ μ¬μ©ν μ μμ΅λλ€.
- random.random()λ 0μμ 1 μ¬μ΄μ μ«μλ₯Ό κ· μΌνκ² μμ±ν©λλ€.
import random
four_uniform_randoms = [random.random() for _ in range(4)]
four_uniform_randoms
[0.15001730378211198,
0.047689363188983425,
0.4438845111618783,
0.8064273339306516]
- reproducible κ°λ₯ν κ²°κ³Όλ₯Ό μ»μΌλ €λ κ²½μ°.
random.seed(10)
print(random.random())
random.seed(10)
print(random.random())
0.5714025946899135
0.5714025946899135
- random.randrangeλ 1 λλ 2κ°μ μΈμλ₯Ό μ¬μ©νκ³ ν΄λΉ λ²μ ()μμ μμλ‘ μ νν μμλ₯Ό λ°νν©λλ€.
random.randrange(10) # choose randomly from range(10) = [0, 1, ..., 9]
random.randrange(3, 6) # choose randomly from range(3, 6) = [3, 4, 5]
- random.shuffleλ λͺ©λ‘μ μμλ₯Ό μμλ‘ μ¬μ λ ¬ν©λλ€:
up_to_ten = list(range(10))
random.shuffle(up_to_ten)
print(up_to_ten)
# 4, 5, 8, 1, 2, 6, 7, 3, 0, 9]
- λͺ©λ‘μμ ν μμλ₯Ό μμλ‘ μ ννλ λ°©λ²μ λλ€.
my_best_friend = random.choice(["Alice", "Bob", "Charlie"])
- λ체νμ§ μκ³ μμ νλ³Έμ μμλ‘ μ ννλ λ°©λ²μ λλ€. (μ¦, μ€λ³΅λμ§ μμ΅λλ€)
lottery_numbers = range(60)
winning_numbers = random.sample(lottery_numbers, 6)
- λ체 μμ νλ³Έμ μ ννλ λ°©λ²μ λλ€.(μ¦, μ€λ³΅μ νμ©ν©λλ€)
four_with_replacement = [random.choice(range(10)) for _ in range(4)]
four_with_replacement
# [2, 9, 5, 6]
- random.choice(range(10))λ 0λΆν° 9κΉμ§μ μ«μ μ€μμ νλλ₯Ό 무μμλ‘ μ ννλ ν¨μμ λλ€.
- μ΄λ₯Ό 리μ€νΈ μ»΄ν리ν¨μ μΌλ‘ 4λ² λ°λ³΅νμ¬ λ¦¬μ€νΈλ₯Ό μμ±. μ€λ³΅μ νμ©ν©λλ€. 무μμλ‘ μ νλ 0λΆν° 9κΉμ§μ μ«μκ° 4κ° ν¬ν¨.
Regular Expressions
- μ κ· ννμμ ν μ€νΈλ₯Ό κ²μνλ λ°©λ²μ μ 곡ν©λλ€.
- κ·Έκ²λ€μ λ―Ώμ μ μμ μ λλ‘ μ μ©νμ§λ§ λν κ½€ 볡μ‘ν΄μ κ·Έκ²λ€μ λν μ 체 μ± μ΄ μμ΅λλ€.
import re
# μ κ· ννμμ μ¬μ©νμ¬ λ€μν λ¬Έμμ΄ μμ
μ μννκ³ , κ° μμ
μ κ²°κ³Όλ₯Ό κ²μ¬ν©λλ€.
# λͺ¨λ μ‘°κ±΄μ΄ μ°ΈμΌ κ²½μ° Trueλ₯Ό μΆλ ₯ν©λλ€.
print(all([
not re.match("a", "cat"), # "cat" λ¬Έμμ΄μ "a"λ‘ μμνμ§ μμ
re.search("a", "cat"), # "cat" λ¬Έμμ΄μ "a"κ° ν¬ν¨λμ΄ μμ
not re.search("c", "dog"), # "dog" λ¬Έμμ΄μ "c"κ° ν¬ν¨λμ΄ μμ§ μμ
3 == len(re.split("[ab]", "carbs")), # "carbs" λ¬Έμμ΄μ "[ab]"λ₯Ό κΈ°μ€μΌλ‘ λΆν νλ©΄ ['c', 'r', 's']κ° λλ©°, κΈΈμ΄κ° 3μ
"R-D-" == re.sub("[0-9]", "-", "R2D2") # "R2D2" λ¬Έμμ΄μμ μ«μλ₯Ό "-"λ‘ λ체νλ©΄ "R-D-"κ° λ¨
])) # μΆλ ₯ κ²°κ³Όλ True
Object-Oriented Programming
# κ΄λ‘μ μΌλ‘, ν΄λμ€λ PascalCase μ΄λ¦μ μ¬μ©ν©λλ€.
class Set:
# μ΄κ²λ€μ λ©€λ² ν¨μμ
λλ€.
# κ° ν¨μλ "self"λΌλ 첫 λ²μ§Έ 맀κ°λ³μλ₯Ό κ°μ ΈμΌ ν©λλ€(λ λ€λ₯Έ κ΄λ‘μ
λλ€).
# μ΄ "self"λ μ¬μ©λλ νΉμ Set κ°μ²΄λ₯Ό κ°λ¦¬ν΅λλ€.
def __init__(self, values=None):
"""μ΄κ²μ μμ±μμ
λλ€.
μλ‘μ΄ Setμ λ§λ€ λ νΈμΆλ©λλ€.
λ€μκ³Ό κ°μ΄ μ¬μ©ν μ μμ΅λλ€.
s1 = Set() # λΉ μ§ν©
s2 = Set([1,2,2,3]) # κ°μΌλ‘ μ΄κΈ°ν"""
self.dict = {} # κ° Set μΈμ€ν΄μ€λ§λ€ κ³ μ ν dict μμ±μ΄ μμ΅λλ€.
# μ΄ μμ±μ λ©€λ²μμ μΆμ νλ λ° μ¬μ©λ©λλ€.
if values is not None:
for value in values:
self.add(value)
def __repr__(self):
"""μ΄κ²μ Set κ°μ²΄μ λ¬Έμμ΄ ννμ
λλ€.
Python ν둬ννΈμμ μ
λ ₯νκ±°λ str()μ μ λ¬νλ©΄ μ¬μ©λ©λλ€."""
return "Set: " + str(self.dict.keys())
# κ° μμμ λ©€λ²μμ self.dictμ ν€λ‘ νμλ©λλ€.
def add(self, value):
self.dict[value] = True
# κ°μ΄ μ§ν©μ μλμ§ μ¬λΆλ μ¬μ μ ν€λ‘ νλ³λ©λλ€.
def contains(self, value):
return value in self.dict
def remove(self, value):
del self.dict[value]
s = Set([1,2,3])
s.add(4)
print(s.contains(4)) # True
s.remove(3)
print(s.contains(3)) # False
Function Tools
- ν¨μλ₯Ό μ λ¬ν λ μ ν¨μλ₯Ό λ§λ€κΈ° μν΄ λΆλΆμ μΌλ‘ (λλ μΉ΄λ ) ν¨μλ₯Ό μ μ©νκ³ μΆμ λκ° μμ΅λλ€
def exp(base, power):
return base ** power
def two_to_the(power):
return exp(2, power)
two_to_the(3)
# 8
- functools.partialμ μ¬μ©νλ κ²μ λ€λ₯Έ μ κ·Ό λ°©μ
from functools import partial
two_to_the = partial(exp, 2) # is now a function of one variable
print(two_to_the(3)) # 8
square_of = partial(exp, power=2)
print(square_of(3)) # 9
- λν map, reduce λ° filterλ₯Ό μ¬μ©νμ¬ μ΄ν΄λ₯Ό λμ΄νλ κΈ°λ₯μ λμμ μ 곡νκΈ°λ ν©λλ€:
- νμ map μ¬μ©νκ³ κ°λ₯νλ©΄ reduceνκ³ filteringν©λλ€.
Map
def double(x):
return 2 * x
xs = [1, 2, 3, 4]
# 리μ€νΈ μ»΄ν리ν¨μ
μ μ¬μ©νμ¬ κ° μμλ₯Ό λ λ°°λ‘ λ§λλλ€.
twice_xs = [double(x) for x in xs]
# map ν¨μλ₯Ό μ¬μ©νμ¬ κ° μμλ₯Ό λ λ°°λ‘ λ§λλλ€.
twice_xs = map(double, xs)
# partial ν¨μλ₯Ό μ¬μ©νμ¬ map ν¨μμ double ν¨μλ₯Ό μ μ©ν©λλ€.
list_doubler = partial(map, double)
# list_doublerλ₯Ό μ¬μ©νμ¬ xsμ κ° μμλ₯Ό λ λ°°λ‘ λ§λλλ€.
twice_xs = list_doubler(xs)
def multiply(x, y): return x * y
products = map(multiply, [1, 2], [4, 5]) # [1 * 4, 2 * 5] = [4, 10]
list(products)
# [4,10]
def multiply(x, y, z): return x * y * z
products = map(multiply, [1, 2], [4, 5], [10, 20]) # [1 * 4 * 10, 2 * 5 * 20]
list(products)
# [40, 200]
Filter
def is_even(x):
"""xκ° μ§μμ΄λ©΄ True, νμμ΄λ©΄ Falseλ₯Ό λ°νν©λλ€."""
return x % 2 == 0
xs = [1, 2, 3, 4]
# 리μ€νΈ μ»΄ν리ν¨μ
μ μ¬μ©νμ¬ μ§μλ§ νν°λ§ν©λλ€.
x_evens = [x for x in xs if is_even(x)]
print(x_evens)
# filter ν¨μλ₯Ό μ¬μ©νμ¬ μ§μλ§ νν°λ§ν©λλ€.
x_evens = filter(is_even, xs)
print(list(x_evens))
# partial ν¨μλ₯Ό μ¬μ©νμ¬ filter ν¨μμ is_even ν¨μλ₯Ό μ μ©ν©λλ€.
list_evener = partial(filter, is_even)
# list_evenerλ₯Ό μ¬μ©νμ¬ xsμμ μ§μλ§ νν°λ§ν©λλ€.
x_evens = list_evener(xs)
print(list(x_evens))
[2, 4]
[2, 4]
Reduce
def multiply(x, y): return x * y
xs = [1,2,3]
x_product = reduce(multiply, xs)
print(x_product)
list_product = partial(reduce, multiply)
x_product = list_product(xs)
print(x_product)
6
6
Enumerate
- λͺ©λ‘μμ λ°λ³΅νκ³ ν΄λΉ μμμ μΈλ±μ€λ₯Ό λͺ¨λ μ¬μ©νλ €λ©΄ λ€μκ³Ό κ°μ΄ νλ©΄ λ©λλ€.
documents = ["I", "am", "a", "boy"]
# not Pythonic
for i in range(len(documents)):
document = documents[i]
print(i, document)
# also not Pythonic
i = 0
for document in documents:
print(i, document)
i += 1
0 I
1 am
2 a
3 boy
0 I
1 am
2 a
3 boy
- Pythonic solutionμ μ΄κ±°νμΌλ‘ νν(μΈλ±μ€, μμ)μ μμ±ν©λλ€:
for i, document in enumerate(documents):
print(i, document)
0 I
1 am
2 a
3 boy
for i in range(len(documents)): print(i) # not Pythonic
for i, _ in enumerate(documents): print(i) # Pythonic
0
1
2
3
0
1
2
3
Zip & Unzip
- λ μ΄μμ λͺ©λ‘μ ν¨κ» μμΆν©λλ€.
- zipμ μ¬λ¬ λͺ©λ‘μ ν΄λΉ μμμ νν λ¨μΌ λͺ©λ‘μΌλ‘ λ³νν©λλ€:
list1 = ['a', 'b', 'c']
list2 = [1, 2, 3]
list(zip(list1, list2)) # is [('a', 1), ('b', 2), ('c', 3)]
- μ΄μν μμμλ₯Ό μ¬μ©νμ¬ Listλ₯Ό "Unzip"ν μλ μμ΅λλ€:
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(*pairs)
print(letters, numbers)
# ('a', 'b', 'c') (1, 2, 3)
pairs = [('a', 1), ('b', 2), ('c', 3)]
letters, numbers = zip(('a', 1), ('b', 2), ('c', 3))
print(letters, numbers)
# ('a', 'b', 'c') (1, 2, 3)
'π Data Engineering > π Data Mining' μΉ΄ν κ³ λ¦¬μ λ€λ₯Έ κΈ
[Data Mining] Linear Algebra (μ νλμ) (0) | 2024.07.09 |
---|---|
[Data Mining] Introduction to Numpy part.2 (0) | 2024.07.05 |
[Data Mining] Introduction to Numpy part.1 (0) | 2024.06.26 |
[Data Mining] Visualizing Data (0) | 2024.06.25 |
[Data Mining] Crash_Course in Python Part.1 (0) | 2024.06.25 |