A A
[ML] Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)
Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)๋„ Unsupervised Learning (๋น„์ง€๋„ ํ•™์Šต)

 

๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ณ„์ธต์ ์ธ ๊ตฐ์ง‘ ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•˜๋Š” ๊ตฐ์ง‘ํ™” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.

์ด ๋ฐฉ๋ฒ•์€ ๋ฐ์ดํ„ฐ๋ฅผ ํŠธ๋ฆฌ ๊ตฌ์กฐ๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ, ๋‹จ๊ณ„๋ณ„๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ ๊ฐ„์˜ ๊ด€๊ณ„์™€ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)์˜ ์œ ํ˜•

๊ทธ๋Ÿฌ๋ฉด, Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)์˜ ์œ ํ˜•์€ ์–ด๋– ํ•œ ๊ฒƒ์ด ์žˆ์„๊นŒ์š”? ํ•œ๋ฒˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

https://www.datacamp.com/tutorial/introduction-hierarchical-clustering-python

 

1. ๋ณ‘ํ•ฉ์  ๊ตฐ์ง‘ํ™” (Agglomerative Clustering)

  • ๋ณ‘ํ•ฉ์  ๊ตฐ์ง‘ํ™”๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์œผ๋กœ ์‹œ์ž‘ํ•˜์—ฌ, ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๊ตฐ์ง‘๋“ค์„ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋ณ‘ํ•ฉํ•ด ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ตœ์ข…์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์ด ํ˜•์„ฑ๋  ๋•Œ๊นŒ์ง€ ๋ณ‘ํ•ฉ ๊ณผ์ •์ด ๊ณ„์†๋ฉ๋‹ˆ๋‹ค.
  • ์ด ๊ณผ์ •์€ ํ•˜ํ–ฅ์‹ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ, ๋งŽ์€ ์ž‘์€ ๊ตฐ์ง‘๋“ค์ด ์ ์ฐจ์ ์œผ๋กœ ํฐ ๊ตฐ์ง‘์œผ๋กœ ํ•ฉ์ณ์ง€๋ฉฐ, ๊ณ„์ธต์ ์ธ ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์ตœ์ข…์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์ด ํ˜•์„ฑ๋  ๋•Œ ๊นŒ์ง€์˜ ๋ณ‘ํ•ฉ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

2. ๋ถ„ํ• ์  ๊ตฐ์ง‘ํ™” (Divisive Clustering)

  • ๋ถ„ํ• ์  ๊ตฐ์ง‘ํ™”๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์œผ๋กœ ์‹œ์ž‘ํ•˜๊ณ , ๊ทธ ๊ตฐ์ง‘์„ ์ ์ฐจ์ ์œผ๋กœ ๋ถ„ํ• ํ•ด ๋‚˜๊ฐ€๋Š” ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์€ ์ตœ์ข…์ ์œผ๋กœ ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์„ ํ˜•์„ฑํ•  ๋•Œ๊นŒ์ง€ ์ง„ํ–‰๋ฉ๋‹ˆ๋‹ค.
  • ์ด ๋ฐฉ์‹์€ ์ƒํ–ฅ์‹ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ, ํฐ ๊ตฐ์ง‘์ด ์ ์ฐจ์ ์œผ๋กœ ๋” ์ž‘์€ ๊ตฐ์ง‘์œผ๋กœ ๋‚˜๋‰˜์–ด ๊ฐ€๋ฉด์„œ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
  • ์ตœ์ข…์ ์œผ๋กœ ๊ฐ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์„ ํ˜•์„ฑํ•  ๋–„ ๊นŒ์ง€์˜ ๋ถ„ํ•  ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

๊ฑฐ๋ฆฌ ์ธก์ • (Distance Measurement)

๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„์—์„œ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ๋“ค ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•ด ๊ฑฐ๋ฆฌ ์ธก์ •์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ์ผ๋ฐ˜์ ์œผ๋กœ ์œ ํด๋ฆฌ๋“œ ๊ฑฐ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์ง€๋งŒ, ๋งจํ•ดํŠผ ๊ฑฐ๋ฆฌ, ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„ ๋“ฑ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฑฐ๋ฆฌ๋Š” ๋‘ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ์ฐจ์ด๋ฅผ ์ˆ˜์น˜์ ์œผ๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ, ๊ตฐ์ง‘ํ™” ๊ณผ์ •์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค.

๊ตฐ์ง‘ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ์ธก์ • (Linkage Criteria)

๊ตฐ์ง‘ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๊ธฐ์ค€์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ์—ฐ๊ฒฐ ๊ธฐ์ค€์€ ๊ตฐ์ง‘ํ™”์˜ ๊ฒฐ๊ณผ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.
  • ๋‹จ์ผ ์—ฐ๊ฒฐ (Single Linkage): ๋‘ ๊ตฐ์ง‘ ๊ฐ„์˜ ์ตœ์†Œ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‘ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๊ฑฐ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์™„์ „ ์—ฐ๊ฒฐ (Complete Linkage): ๋‘ ๊ตฐ์ง‘ ๊ฐ„์˜ ์ตœ๋Œ€ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ€์žฅ ๋ฉ€๋ฆฌ ๋–จ์–ด์ง„ ๋‘ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ๊ฑฐ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ํ‰๊ท  ์—ฐ๊ฒฐ (Average Linkage): ๋‘ ๊ตฐ์ง‘ ๊ฐ„์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ํ‰๊ท  ๊ฑฐ๋ฆฌ์— ๊ธฐ๋ฐ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ์ค‘์‹ฌ ์—ฐ๊ฒฐ (Centroid Linkage): ๊ฐ ๊ตฐ์ง‘์˜ ์ค‘์‹ฌ(centroid) ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ค‘์‹ฌ์€ ๊ตฐ์ง‘ ๋‚ด ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ํ‰๊ท  ์œ„์น˜์ž…๋‹ˆ๋‹ค.
  • ์›Œ๋“œ ์—ฐ๊ฒฐ (Ward’s Linkage): ๊ตฐ์ง‘ ๋‚ด ๋ถ„์‚ฐ์˜ ์ฆ๊ฐ€๋Ÿ‰์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ธฐ์ค€์œผ๋กœ ๊ตฐ์ง‘ํ™”๋ฅผ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๊ตฐ์ง‘ํ™” ๊ณผ์ •์—์„œ ๋ถ„์‚ฐ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ฃนํ™”ํ•ฉ๋‹ˆ๋‹ค.

๋ด๋“œ๋กœ๊ทธ๋žจ (Dendrogram)

๋ด๋“œ๋กœ๊ทธ๋žจ์€ ๊ณ„์ธต์  ๊ตฐ์ง‘ํ™” ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•œ ํŠธ๋ฆฌ ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.

https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8

  • ์ด ํŠธ๋ฆฌ ๊ตฌ์กฐ์—์„œ ๊ฐ ๋…ธ๋“œ๋Š” ๊ตฐ์ง‘์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๋…ธ๋“œ ๊ฐ„์˜ ๋†’์ด๋Š” ๊ตฐ์ง‘ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ ๋˜๋Š” ์œ ์‚ฌ๋„๋ฅผ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.
  • ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ๊ตฐ์ง‘ํ™”์˜ ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ์  ๊ด€๊ณ„๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)์˜ ๊ธฐ๋ณธ ์›๋ฆฌ

๊ทธ๋Ÿฌ๋ฉด Hierarchical Clustering (๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„)์˜ ๊ธฐ๋ณธ ์›๋ฆฌ๋Š” ์–ด๋– ํ•œ ์ ๋“ค์ด ์žˆ์„๊นŒ์š”?

http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/117-hcpc-hierarchical-clustering-on-principal-components-essentials/

1. ์ดˆ๊ธฐํ™” (Initialization)

๊ฐ ๋ฐ์ดํ„ฐ๋ฅผ ํ•˜๋‚˜์˜ ๊ตฐ์ง‘์œผ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ดˆ๊ธฐ์—๋Š”  n ๊ฐœ์˜ ๊ตฐ์ง‘์ด ์กด์žฌํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ  n ์€ ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ์˜ ์ˆ˜๋ฅผ ์˜๋ฏธํ•ฉ๋‹ˆ๋‹ค.

2. ๊ฑฐ๋ฆฌ ๊ณ„์‚ฐ (Distance Calculation)

๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฑฐ๋ฆฌ๋Š” ๊ตฐ์ง‘ํ™” ๊ณผ์ •์—์„œ ๋‘ ๊ตฐ์ง‘ ๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ์ธก์ •ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

3. ๊ตฐ์ง‘ ๋ณ‘ํ•ฉ (Cluster Merging)

๊ฐ€์žฅ ๊ฐ€๊นŒ์šด ๋‘ ๊ตฐ์ง‘์„ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๋ณ‘ํ•ฉ ํ›„ ์ƒˆ๋กœ์šด ๊ตฐ์ง‘ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ์žฌ๊ณ„์‚ฐํ•˜์—ฌ ๋‹ค์Œ ๋ณ‘ํ•ฉ์„ ์ค€๋น„ํ•ฉ๋‹ˆ๋‹ค.

4. ๋ฐ˜๋ณต (Iteration)

๊ตฐ์ง‘์˜ ์ˆ˜๊ฐ€ ํ•˜๋‚˜๊ฐ€ ๋  ๋•Œ๊นŒ์ง€ ๊ตฐ์ง‘ ๋ณ‘ํ•ฉ ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณผ์ •์„ ํ†ตํ•ด ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•˜๊ณ , ์ด๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•˜์—ฌ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.


๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„ (Hierarchical Clustering) ์žฅ, ๋‹จ์ 

๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„์˜ ์žฅ์ 

1. ๊ณ„์ธต์  ๊ตฌ์กฐ ์‹œ๊ฐํ™”: ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ๊ตฐ์ง‘ํ™” ๊ณผ์ •์„ ์‹œ๊ฐ์ ์œผ๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ์  ๊ด€๊ณ„๋ฅผ ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. ๊ตฐ์ง‘ ์ˆ˜ ๊ฒฐ์ • ๋ถˆํ•„์š”: ์‚ฌ์ „์— ๊ตฐ์ง‘ ์ˆ˜๋ฅผ ๊ฒฐ์ •ํ•  ํ•„์š”๊ฐ€ ์—†์œผ๋ฉฐ, ๋ด๋“œ๋กœ๊ทธ๋žจ์„ ํ†ตํ•ด ์ ์ ˆํ•œ ๊ตฐ์ง‘ ์ˆ˜๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐ๋ฅผ ๋ฏธ๋ฆฌ ์•Œ์ง€ ๋ชปํ•  ๋•Œ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.

3. ์œ ์—ฐ์„ฑ: ๋‹ค์–‘ํ•œ ์—ฐ๊ฒฐ ๊ธฐ์ค€์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฐ์ง‘ํ™”๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์–ด, ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์œ ํ˜•์— ์ ์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

 

๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„์˜ ๋‹จ์ 

1. ๊ณ„์‚ฐ ๋น„์šฉ: ๋ชจ๋“  ๋ฐ์ดํ„ฐ ํฌ์ธํŠธ ๊ฐ„์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์ €์žฅํ•ด์•ผ ํ•˜๋ฏ€๋กœ, ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋Š” ๊ณ„์‚ฐ ๋น„์šฉ์ด ๋†’์•„์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณ„์‚ฐ ์ž์›์ด ์ œํ•œ์ ์ธ ํ™˜๊ฒฝ์—์„œ ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. ๋ณ‘ํ•ฉ ํ›„ ์ˆ˜์ • ๋ถˆ๊ฐ€: ๋ณ‘ํ•ฉ๋œ ๊ตฐ์ง‘์€ ๋‹ค์‹œ ๋ถ„ํ• ํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์—, ์ดˆ๊ธฐ ๋ณ‘ํ•ฉ ๋‹จ๊ณ„์—์„œ ๋ฐœ์ƒํ•œ ์˜ค๋ฅ˜๊ฐ€ ์ตœ์ข… ๊ฒฐ๊ณผ์— ์ง€์†์ ์œผ๋กœ ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ์ž˜๋ชป๋œ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ๋ฅผ ์ดˆ๋ž˜ํ•  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

3. ๋…ธ์ด์ฆˆ ๋ฏผ๊ฐ์„ฑ: ๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„์€ ์ด์ƒ์น˜๋‚˜ ๋…ธ์ด์ฆˆ ๋ฐ์ดํ„ฐ์— ๋ฏผ๊ฐํ•˜๊ฒŒ ๋ฐ˜์‘ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ตฐ์ง‘ํ™” ๊ฒฐ๊ณผ๋ฅผ ์™œ๊ณก์‹œํ‚ฌ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ถ”๊ฐ€์ ์ธ ์ „์ฒ˜๋ฆฌ ์ž‘์—…์ด ํ•„์š”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„ (Hierarchical Clustering) Example Code

!kaggle datasets download -d mlg-ulb/creditcardfraud
!unzip creditcardfraud.zip

 

# ๊ณ„์ธต์  ๊ตฐ์ง‘ ๋ถ„์„ ์˜ˆ์ œ

# ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from scipy.cluster.hierarchy import dendrogram, linkage
import matplotlib.pyplot as plt

# ๋ฐ์ดํ„ฐ์…‹ ๋กœ๋“œ
X = np.array([[5,3],
    [10,15],
    [15,12],
    [24,10],
    [30,30],
    [85,70],
    [71,80],
    [60,78],
    [70,55],
    [80,91],])
labels = range(1, 11)
linked = linkage(X, 'single')

plt.figure(figsize=(10, 7))
dendrogram(linked,
            orientation='top',
            labels=labels,
            distance_sort='descending',
            show_leaf_counts=True)
plt.show()