Non-Projective Dependency Parsing in Expected Linear Time
Nivre, Joakim

Article Structure

Abstract

We present a novel transition system for dependency parsing, which constructs arcs only between adjacent words but can parse arbitrary non-projective trees by swapping the order of words in the input.

Introduction

Syntactic parsing using dependency structures has become a standard technique in natural language processing with many different parsing models, in particular data-driven models that can be trained on syntactically annotated corpora (Yamada and Matsumoto, 2003; Nivre et al., 2004; McDonald et al., 2005a; Attardi, 2006; Titov and Henderson, 2007).

Background Notions 2.1 Dependency Graphs and Trees

Given a set L of dependency labels, a dependency graph for a sentence cc 2 ml, .

Transitions for Dependency Parsing

Having defined the set of configurations, including initial and terminal configurations, we will now focus on the transition set T required for dependency parsing.

Experiments

Our experiments are based on five data sets from the CoNLL—X shared task: Arabic, Czech, Danish, Slovene, and Turkish (Buchholz and Marsi, 2006).

Related Work

Processing non-projective trees by swapping the order of words has recently been proposed by both Nivre (2008b) and Titov et al.

Conclusion

We have presented a novel transition system for dependency parsing that can handle unrestricted non-projective trees.

Topics

dependency trees

Appears in 16 sentences as: dependency tree (6) dependency trees (10)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. dependency trees , as illustrated in Figure 1.
    Page 1, “Introduction”
  2. In a proj ective dependency tree , the yield of every subtree is a contiguous substring of the sentence.
    Page 1, “Introduction”
  3. But allowing non-projective dependency trees also makes parsing empirically harder, because it requires that we model relations between nonadjacent structures over potentially unbounded distances, which often has a negative impact on parsing accuracy.
    Page 1, “Introduction”
  4. In Section 3, we first define a minimal transition system and explain how it can be used to perform projective dependency parsing in linear time; we then extend the system with a single transition for swapping the order of words in the input and demonstrate that the extended system can be used to parse unrestricted dependency trees with a time complexity that is quadratic in the worst case but still linear in the best case.
    Page 2, “Introduction”
  5. The system Sp 2 (C,Tp,cs,Ct) is sound and complete for the set of projective dependency trees (over some label set L) and has been used, in slightly different variants, by a number of transition-based dependency parsers (Yamada and Matsumoto, 2003; Nivre, 2004; Attardi, 2006;
    Page 3, “Transitions for Dependency Parsing”
  6. Given the simplicity of the extension, it is rather remarkable that the system Su 2 (G, Tu, cs, Gt) is sound and complete for the set of all dependency trees (over some label set L), including all non-projective trees.
    Page 5, “Transitions for Dependency Parsing”
  7. For completeness, we note first that projectiv-ity is not a property of a dependency tree in itself, but of the tree in combination with a word order, and that a tree can always be made projective by reordering the nodes.
    Page 5, “Transitions for Dependency Parsing”
  8. For instance, let c be a sentence with dependency tree G = (Vac, A), and let <0 be the total order on V; defined by an inorder traversal of G that respects the local ordering of a node and its children given by the original word order.
    Page 5, “Transitions for Dependency Parsing”
  9. If the words of a sentence c with dependency tree G are already in projective order, this means that G is projective with respect to c and that we can parse the sentence using only transitions in T ,
    Page 5, “Transitions for Dependency Parsing”
  10. Since any input can be sorted using SHIFT and SWAP, and any projective tree can be built using SHIFT, LEFT-ARC; and RIGHT-ARCl, the system Su is complete for the set of all dependency trees .
    Page 5, “Transitions for Dependency Parsing”
  11. In Figure 4, we define an oracle function 0 for the system Su, which implements this “sort and parse” strategy and predicts the optimal transition 75 out of the current configuration 0, given the target dependency tree G = (V36, A) and the projective order <0.
    Page 5, “Transitions for Dependency Parsing”

See all papers in Proc. ACL 2009 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

dependency parsing

Appears in 13 sentences as: dependency parsers (1) Dependency Parsing (2) dependency parsing (10)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. We present a novel transition system for dependency parsing , which constructs arcs only between adjacent words but can parse arbitrary non-projective trees by swapping the order of words in the input.
    Page 1, “Abstract”
  2. However, one problem that still has not found a satisfactory solution in data-driven dependency parsing is the treatment of discontinuous syntactic constructions, usually modeled by non-projective
    Page 1, “Introduction”
  3. Current approaches to data-driven dependency parsing typically use one of two strategies to deal with non-projective trees (unless they ignore them completely).
    Page 1, “Introduction”
  4. In Section 2, we define the formal representations needed and introduce the framework of transition-based dependency parsing .
    Page 2, “Introduction”
  5. In Section 3, we first define a minimal transition system and explain how it can be used to perform projective dependency parsing in linear time; we then extend the system with a single transition for swapping the order of words in the input and demonstrate that the extended system can be used to parse unrestricted dependency trees with a time complexity that is quadratic in the worst case but still linear in the best case.
    Page 2, “Introduction”
  6. Following Nivre (2008a), we define a transition system for dependency parsing as a quadruple S = (C, T, cs, Ct), where
    Page 2, “Background Notions 2.1 Dependency Graphs and Trees”
  7. Figure 2: Transitions for dependency parsing ; Tp =
    Page 3, “Background Notions 2.1 Dependency Graphs and Trees”
  8. Having defined the set of configurations, including initial and terminal configurations, we will now focus on the transition set T required for dependency parsing .
    Page 3, “Transitions for Dependency Parsing”
  9. 3.1 Projective Dependency Parsing
    Page 3, “Transitions for Dependency Parsing”
  10. The minimal transition set Tp for projective dependency parsing contains three transitions:
    Page 3, “Transitions for Dependency Parsing”
  11. The system Sp 2 (C,Tp,cs,Ct) is sound and complete for the set of projective dependency trees (over some label set L) and has been used, in slightly different variants, by a number of transition-based dependency parsers (Yamada and Matsumoto, 2003; Nivre, 2004; Attardi, 2006;
    Page 3, “Transitions for Dependency Parsing”

See all papers in Proc. ACL 2009 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

time complexity

Appears in 7 sentences as: time complexity (7)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. Adding the swapping operation changes the time complexity for deterministic parsing from linear to quadratic in the worst case, but empirical estimates based on treebank data show that the expected running time is in fact linear for the range of data attested in the corpora.
    Page 1, “Abstract”
  2. There is recent work on algorithms that can cope with important subsets of all non-projective trees in polynomial time (Kuhlmann and Satta, 2009; Gomez-Rodriguez et al., 2009), but the time complexity is at best 0(n6), which can be problematic in practical applications.
    Page 1, “Introduction”
  3. In Section 3, we first define a minimal transition system and explain how it can be used to perform projective dependency parsing in linear time; we then extend the system with a single transition for swapping the order of words in the input and demonstrate that the extended system can be used to parse unrestricted dependency trees with a time complexity that is quadratic in the worst case but still linear in the best case.
    Page 2, “Introduction”
  4. Assuming that the calls 0(0) and 75(0) can both be performed in constant time, the worst-case time complexity of a deterministic parser based on a transition system S is given by an upper bound on the length of transition sequences in S.
    Page 3, “Background Notions 2.1 Dependency Graphs and Trees”
  5. As noted in section 2, the worst-case time complexity of a deterministic transition-based parser is given by an upper bound on the length of transition sequences.
    Page 4, “Transitions for Dependency Parsing”
  6. Let us now consider the time complexity of the extended system Su 2 (0, Tu, cs, Ct) and let us begin by observing that 2n is still a lower bound on the number of transitions required to reach a terminal configuration.
    Page 5, “Transitions for Dependency Parsing”
  7. As a result, the time complexity of deterministic parsing is 0(n2) in the worst case, which is rare, but 0(n) in the best case, which is common, and experimental results on data from five languages support the conclusion that expected running time is linear in the length of the sentence.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention time complexity.

See all papers in Proc. ACL that mention time complexity.

Back to top.

word order

Appears in 5 sentences as: word order (5)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. This has the effect that the order of the nodes i and j in the appended list 2 + B is reversed compared to the original word order in the sentence.
    Page 4, “Transitions for Dependency Parsing”
  2. It is important to note that SWAP is only permissible when the two nodes on top of the stack are in the original word order , which prevents the same two nodes from being swapped more than once, and when the leftmost node i is distinct from the root node 0.
    Page 4, “Transitions for Dependency Parsing”
  3. LEFT-ARC; or RIGHT-ARC; to subtrees whose yields are not adjacent according to the original word order .
    Page 5, “Transitions for Dependency Parsing”
  4. For completeness, we note first that projectiv-ity is not a property of a dependency tree in itself, but of the tree in combination with a word order , and that a tree can always be made projective by reordering the nodes.
    Page 5, “Transitions for Dependency Parsing”
  5. For instance, let c be a sentence with dependency tree G = (Vac, A), and let <0 be the total order on V; defined by an inorder traversal of G that respects the local ordering of a node and its children given by the original word order .
    Page 5, “Transitions for Dependency Parsing”

See all papers in Proc. ACL 2009 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

subtrees

Appears in 4 sentences as: subtrees (4)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. This is not the case for the tree in Figure l, where the subtrees rooted at node 2 (hearing) and node 4 (scheduled) both have discontinuous yields.
    Page 1, “Introduction”
  2. The fact that we can swap the order of nodes, implicitly representing subtrees , means that we can construct non-projective trees by applying
    Page 4, “Transitions for Dependency Parsing”
  3. LEFT-ARC; or RIGHT-ARC; to subtrees whose yields are not adjacent according to the original word order.
    Page 5, “Transitions for Dependency Parsing”
  4. The system reuses standard techniques for building projective trees by combining adjacent nodes (representing subtrees with adjacent yields), but adds a simple mechanism for swapping the order of nodes on the stack, which gives a system that is sound and complete for the set of all dependency trees over a given label set but behaves exactly like the standard system for the subset of projective trees.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

treebank

Appears in 3 sentences as: treebank (2) treebanks (1)
In Non-Projective Dependency Parsing in Expected Linear Time
  1. Adding the swapping operation changes the time complexity for deterministic parsing from linear to quadratic in the worst case, but empirical estimates based on treebank data show that the expected running time is in fact linear for the range of data attested in the corpora.
    Page 1, “Abstract”
  2. When building practical parsing systems, the oracle can be approximated by a classifier trained on treebank data, a technique that has been used successfully in a number of systems (Yamada and Matsumoto, 2003; Nivre et al., 2004; Attardi, 2006).
    Page 3, “Background Notions 2.1 Dependency Graphs and Trees”
  3. These languages have been selected because the data come from genuine dependency treebanks , whereas all the other data sets are based on some kind of conversion from another type of representation, which could potentially distort the distribution of different types of structures in the data.
    Page 6, “Experiments”

See all papers in Proc. ACL 2009 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.