Skip to content

Latest commit

 

History

History
65 lines (38 loc) · 3.92 KB

Dependency Parsing.md

File metadata and controls

65 lines (38 loc) · 3.92 KB

Dependency parsing

With dependency parsing, you examine the dependencies between the phrases of a sentence in order to determine its grammatical structure. You parse an input sequence by staring at the root of the input, and then you jump from token to token based on grammatical binary relationships (dependencies).

If you have jumped to every part of the sentence, then you have parsed the sentence. The parse tree is the sentence you get a **directed graph.

Dependency parsing 1

Dependency parsing 2

Dependency parsing 3

The description of images above: You start with the root, which then goes into the main verb of the sentence. Then you go to the subject of that verb. Then you get the argument of the verb. Then you get modifiers of the argument. For instance morning.

So there is no hierarchy, you just describe the sentence based on the relationships which exists between the tokens. It is mostly based on structure and not on meaning, but you can also base on meaning.

What is this for

Dependency parsing explicitly and directly encode information about relationships across tokens which are often quite buried and hard to see in constituent based trees like a context free grammar.

Dependency parsing also deals very well with morphological complex languages with free word order (which is not like English) without needing to have super specific grammar. For instance, an affix could mark the subject. For these types of languages, chunking or constituency based parsing works worse. So it's like a set of words connected together by grammar binary relationships.

The main content words in the sentence are called heads, and the relations are based on the heads.

Types of relations

Clauses

Syntactic roles like subject, object. Like the verb

Modifier

There are modifiers of heads like blue sky where blue is a modifier of the head sky.

Data structure

After you have analysed a sentence, you can store it as a graph. You can store a graph as a set of tuples with two items. ${(a,b), (b,c), (c,d)}$. So in this sentence there would be an arc from $a$ to $b$, $b$ to $c$ and $c$ to $d$.

We call this here $G$ with $G = (V,A)$ where $V$ is a set of vertices (tokens in the vocabulary or stems and affixes) so like the sentence or what you can connect. $A$ is a labelled ordered pais of vertices (the arcs). This includes the type of the binary relationship as well.

$V$ is what you can connect and $A$ is how it is connected.

Dependency tree

A dependency tree is a directed graph where

  • There is one root node with no incoming arc.
  • Every other vertex has exactly one incoming arc.
  • There is only one path from the root to each vertex in V.

These conditions are necessary to have the dependency tree in the way we care about.

How to get these models?

Using special treebanks. Dependency treebanks. Linguists have created corpora with annotated typed binary directed decency relations that are used to train dependency parsers.

If you don't have a dependency treebank, you can also use deterministic approaches with a constituent based treebank, but this does not work very well. A trained linguist can do better.

Evaluation

Label score

This checks if it went to the correct label. So it's ok if you come from the good head as long as where the relationship goes is the correct label.

Unlabelled score

This does not check if it went to the correct label. So it's ok if you come from the correct head as long as where the relationship goes is the correct head.

Label detached score

This combines the labelled and unlabelled scores.

This scores how many estimated dependencies have the same head and dependent as the gold standard, and also the correct type.