A A
[Syntax] Syntactic analysis in NLP - NLP์—์„œ ๊ตฌ๋ฌธ๋ถ„์„

Syntactic analysis in NLP 


Parsing - PP & NP์˜ ๋ฐ˜๋ณต..

  • Counsituency Parsing์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•œ ๊ฒƒ์ด Dependency Parsing
  • Counsituency Parsing Structure Tree

  • Dependenxy Parsing Structure Tree


Dependent Grammer

  • head๊ฐ€ dependent ์ผ ๋•Œ ๋„ ์žˆ๊ณ  ์„œ๋กœ ๋ฐ˜๋Œ€์ผ ์ˆ˜๋„ ์žˆ๋‹ค.
  • ์ข…์†์„ฑ์— ๊ธฐ๋ฐ˜
  • Dependency Structure๋Š” Word(head)์™€ ๊ทธ๊ฒƒ์˜ Dependent๊ณผ์˜ ๊ด€๊ณ„์— ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค.
    • ์˜๋ฏธ์ ์œผ๋กœ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”๊ฒƒ ๋“ค๋งŒ ์—ฐ๊ฒฐ๋œ๋‹ค. - ์˜๋ฏธ์ ์œผ๋กœ๋งŒ ์—ฐ๊ฒฐ๋˜๋ฉด ๋ฌถ์„์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋น„๊ต์  ์ž์œ ๋กœ์šด๊ฒƒ์ด ํŠน์ง•

  • ์ž์œ  ์–ด์ˆœ(Free word order)์˜ ์–ธ์–ด ๋ถ„์„์— ๋งค์šฐ ์ ํ•ฉ

  • PG
    • S -> NP VP
    • NP -> Det N
    • VP -> V NP …
    • → Word order is important
  • But, Korean
    • Free in word order
    • Omission
    • ungrammatical sentences on the internet

  • ์˜๋ฏธ์ ์œผ๋กœ ๋ฌถ์œผ๋ฉด ๋œ๋‹ค! (์˜๋ฏธ ๊ด€๊ณ„๋ฅผ ํŒŒ์•… ํ•˜๊ธฐ๊ฐ€ ํŽธํ•˜๋‹ค & ๊ทธ๋ฆผ์ด ๋‹จ์ˆœ)
    • ์ตœ๊ทผ์—๋Š” Partial Parsing, Chunking ์œผ๋กœ ์œ ํ–‰์ด ์ง€๋‚˜๊ฐ ใ… 

Partial parsing

  • ์ „์ฒด ๋ง๊ณ , ์ผ๋ถ€๋งŒ Parsing
  • Full parse trees - ๋งค์šฐ ์œ ์šฉํ•˜๋‹ค
    • ๋ฌธ์žฅ์˜ ๊ตฌ์กฐ, ์˜๋ฏธ ํŒŒ์•… ํ•˜๊ธฐ์—” ์‰ฝ๋‹ค
    • ๋ฌธ๋ฒ•์€ ๋ณต์žก.. ์„œ๋กœ ๋ฌธ๋ฒ•๋“ค ๋ผ๋ฆฌ ์ถฉ๋Œํ•˜๊ธฐ๋„ ํ•œ๋‹ค.
      • ๋น„์ •ํ˜•์ด๊ณ , ์“ธ๋ฐ์—†๋Š” ๋ฌธ์žฅ๋„ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค.
    • ๋งŒ๋“œ๋Š”๋ฐ ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ ๋‹ค → ๋ˆ๋„ ๋งŽ์ด ๋“ค๊ณ  ํž˜๋“ค์–ด
    • ์ผ๋ถ€ NLP ์ž‘์—…์€ full hierarchical Parses(์ „์ฒด ๊ณ„์ธต์  ๊ตฌ๋ฌธ ๋ถ„์„)์„ ์š”๊ตฌ ํ•˜์ง€ ์•Š์„์ˆ˜ ๋„ ์žˆ๋‹ค.
  • Simpler parsing ์—๋Š” ๋” ํšจ๊ณผ์  ์ผ์ˆ˜๋„ ์žˆ๋‹ค.
    • ๊ตฌ๋ฌธ์˜ ๋ณต์žก ์„ฑ์ด ๋‚ฎ์•„์ง€๊ณ , ์ฒ˜๋ฆฌํ•˜๋Š” ์‹œ๊ฐ„์ด ์ค„์–ด๋“ ๋‹ค.
  • Full parsing์€ ๋งŽ์€ NLP Application์— ๋Œ€ํ•ด ์ถฉ๋ถ„ํžˆ ๊ฐ•๋ ฅ ํ•˜์ง€๊ฐ€ ์•Š๋‹ค.
  • Full parsing ์‹œ๋„๋Ÿฌ์šด ํ™˜๊ฒฝ์—์„œ๋Š” ์ข‹์€ parse tree๋ฅผ ์‹๋ณ„ํ•˜์ง€ ๋ชปํ•œ๋‹ค.

What Is Partial Parsing?

  • Full Traditional Parsing์˜ ์–ด๋ ค์›€์— ๋Œ€ํ•œ ๋Œ€์•ˆ์œผ๋กœ ๋„์ž…
  • ๋ถ„์„์˜ ์™„์„ฑ๋„ & ๊นŠ์ด๋ฅผ ํฌ์ƒํ•˜์—ฌ ์ œํ•œ๋˜์ง€ ์•Š์€ text๋กœ ๋ถ€ํ„ฐ Syntatic(๊ตฌ๋ฌธ) ์ •๋ณด๋ฅผ ํšจ์œจ & ์•ˆ์ •์ ์œผ๋กœ ๋ณต๊ตฌํ•˜๋Š” ๊ธฐ์ˆ ๋กœ ์„ค๋ช…
  • ์™„์„ฑ๋„๊ฐ€ ๋–จ์–ด์ง€๊ณ , ๊นŠ์ด๋„ ๊นŠ์ง€๊ฐ€ ์•Š๋‹ค. - ์–•์€ ๊ตฌ๋ฌธ ๋ถ„์„
    • ๋ฌธ์žฅ์„ ์ผ๋ จ์˜ Syntactic constituents (ํ†ต์‚ฌ์  ๊ตฌ์„ฑ์š”์†Œ) or Chunks(๋ฉ์–ด๋ฆฌ)๋กœ ๋ถ„ํ• 
    • ์ฆ‰, ์–ธ์–ด์  ํŠน์„ฑ์— ๊ธฐ์ดˆํ•˜์—ฌ ๊ทธ๋ฃนํ™”๋œ word์˜ ์ˆœ์„œ
  • ํ•œ์ธต์—์„œ ๋งŒ๋‚˜๋Š” ๊ตฌ์กฐ
    • ๊ฐ์ฒด๋ช…์„ ์‚ฌ์šฉํ• ๋•Œ ์“ฐ์ธ๋‹ค.
    • Terminology Discovery (์šฉ์–ด ๊ฒ€์ƒ‰)
    • Named Entity Recognition (๋ช…๋ช…๋œ ์—”ํ‹ฐํ‹ฐ ์ธ์‹)
    • Text Mining (ํ…์ŠคํŠธ ๋งˆ์ด๋‹)
    • An intermediate step providing input to further full parsing stages
      • (์ „์ฒด ๊ตฌ๋ฌธ ๋ถ„์„ ๋‹จ๊ณ„์— ๋Œ€ํ•œ ์ž…๋ ฅ์„ ์ œ๊ณตํ•˜๋Š” ์ค‘๊ฐ„ ๋‹จ๊ณ„)

Chunking

  • ๋ฌธ์žฅ์„ ๊ฒน์น˜์ง€ ์•Š๊ฒŒ ํ•˜๋Š”๊ฒƒ & ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋Š”๊ฒƒ ์œผ๋กœ ๋ถ„ํ• ํ•œ๋‹ค.
  • ์ ˆ๋Œ€๋กœ overlapping ๋˜๊ฒŒ ๊ทธ๋ฆฌ์ง€ ์•Š๋Š”๋‹ค.
๐Ÿ’ก [Her new shipment] NP [of] PP [facemasks] NP [arrived] VP
  • Not Hierarchical - ๋น„ ๊ณ„์ธต์ 
  • flat segmented representation - ํ‰๋ฉด ๋ถ„ํ•  ํ‘œํ˜„

๋ฌธ์žฅ์˜ fllat, non-overlapping์ธ segment๋ฅผ ์‹๋ณ„ & ๋ถ„๋ฅ˜ ํ•˜๋Š” ๊ณผ์ •

  • ์ฃผ์š” content-word์˜ ์Œ์„ฑ(part of speech)๋ถ€๋ถ„์— ํ•ด๋‹นํ•˜๋Š” basic non-recursive phrase (๊ธฐ๋ณธ ๋น„์žฌ๊ท€ ๊ตฌ๋ฌธ)์„ ๊ตฌ์„ฑํ•˜๋Š” segment

→ NPs, VPs, APs & PPs


Lacking hierarchical structure - ๊ณ„์ธต ๊ตฌ์กฐ์˜ ๋ถ€์กฑ.

  • ๊ฐ„๋‹จํ•œ ๋Œ€๊ด„ํ˜ธ ํ‘œ๊ธฐ๋ฒ•์€ Chunk์˜ ์œ„์น˜์™€ ์œ ํ˜•์„ ๋‚˜ํƒ€๋‚ด๊ธฐ์— ์ถฉ๋ถ„

Segmenting

non-overlapping (์ค‘๋ณตX), non-recursive (๋ฐ˜๋ณต X)์ธ

Her new shipment of facemasks arrived
[Her new shipment] [of] [facemasks] [arrived]
  • fundermental phrase(๊ธฐ๋ณธ ๊ตฌ๋ฌธ) ์ฐพ๊ธฐ

Labeling

  • ์ฐพ์€ Chunk์— ๋Œ€ํ•ด ์˜ฌ๋ฐ”๋ฅธ Tag ํ• ๋‹น
  • ํ•˜๋Š” ์ผ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์งˆ ์ˆ˜๋„ ์žˆ๋‹ค.

์ผ๋ฐ˜์ ์ธ ๊ฐ€์ด๋“œ๋ผ์ธ

  • Non-Recursive → ๋ถ„ํ• ๋˜์ง€ ์•Š์Œ. (Segment๋ฅผ ๋” ์ž‘์€ Chunk๋กœ ๋ถ„ํ• )
  • ๋‹จ์–ด๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” head๋ฅผ ์œ ์ง€ํ•œ๋‹ค.
    • NP noun
  • head word์•ž์— ๋‚˜ํƒ€๋‚˜๋Š” ๋ชจ๋“  material๋ฅผ ๊ตฌ์„ฑ ์š”์†Œ์— ์†ํ•˜๋„๋ก ์œ ์ง€
๐Ÿ’ก [Her new shipment] NP of facemasks arrived

๋Œ€๋ถ€๋ถ„์˜ ์ ‘๊ทผ ๋ฐฉ์‹์—์„œ ๊ธฐ๋ณธ ๊ตฌ๋ฌธ(Basic Phrase)

  1. Phrase์˜ headword & ๊ตฌ์„ฑํ•˜๋Š” ์š”์†Œ ๋‚ด์˜ ๋ชจ๋“  pre-head Material๋ฅผ ํฌํ•จ
  2. Post-head Material๋ฅผ ์™„์ „ํžˆ ๋ฐฐ์ œํ•œ๋‹ค.
    • ์ด๋Ÿฌํ•œ ๋ฐฐ์ œ๋Š” PP & VV๊ฐ€ ์ข…์ข… head๋กœ ๊ตฌ์„ฑ๋˜๋Š” ์ด์ƒํ•œ ์ ์„ ์ดˆ๋ž˜ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.
๐Ÿ’ก Example
[Her new shipment] NP [of] PP [facemasks] NP [arrived] VP
[a flight] NP [from] PP [Indianapolis] NP [to] PP [Houston] NP [on] PP [NWA] NP
    • But, ๋งŽ์€ attachement์˜ ๋ชจํ˜ธํ•จ์„ ์ œ๊ฑฐํ•œ๋‹ค!!!