Efficient Third-Order Dependency Parsers
Koo, Terry and Collins, Michael

Article Structure

Abstract

We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time.

Introduction

Dependency grammar has proven to be a very useful syntactic formalism, due in no small part to the development of efficient parsing algorithms (Eisner, 2000; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007), which can be leveraged for a wide variety of learning methods, such as feature-rich discriminative models (Lafferty et al., 2001; Collins, 2002; Taskar et al., 2003).

Dependency parsing

In dependency grammar, syntactic relationships are represented as head-modifier dependencies: directed arcs between a head, which is the more “essential” word in the relationship, and a modifier, which supplements the meaning of the head.

Existing parsing algorithms

Our new third-order dependency parsers build on ideas from existing parsing algorithms.

New third-order parsing algorithms

In this section we describe our new third-order dependency parsing algorithms.

Extensions

We briefly outline a few extensions to our algorithms; we hope to explore these in future work.

Related work

Our method augments each span with the index of the head that governs that span, in a manner superficially similar to parent annotation in CFGs (Johnson, 1998).

Parsing experiments

In order to evaluate the effectiveness of our parsers in practice, we apply them to the Penn WSJ Treebank (Marcus et al., 1993) and the Prague Dependency Treebank (Hajic et al., 2001; Hajic, 1998).6 We use standard training, validation, and test splits7 to facilitate comparisons.

Conclusion

We have presented new parsing algorithms that are capable of efficiently parsing third-order factorizations, including both grandchild and sibling interactions.

Topics

parsing algorithms

Appears in 29 sentences as: parsing algorithm (7) parsing algorithms (22)
In Efficient Third-Order Dependency Parsers
  1. Dependency grammar has proven to be a very useful syntactic formalism, due in no small part to the development of efficient parsing algorithms (Eisner, 2000; McDonald et al., 2005b; McDonald and Pereira, 2006; Carreras, 2007), which can be leveraged for a wide variety of learning methods, such as feature-rich discriminative models (Lafferty et al., 2001; Collins, 2002; Taskar et al., 2003).
    Page 1, “Introduction”
  2. These parsing algorithms share an important characteristic: they factor dependency trees into sets of parts that have limited interactions.
    Page 1, “Introduction”
  3. A crucial limitation of factored parsing algorithms is that the associated parts are typically quite small, losing much of the contextual information within the dependency tree.
    Page 1, “Introduction”
  4. must be balanced against any resulting increase in the computational cost of the parsing algorithm .
    Page 1, “Introduction”
  5. In this paper, we present new third-order parsing algorithms that increase both the size and variety of the parts participating in the factorization, while simultaneously maintaining computational requirements of 0(n4) time and 0(n3) space.
    Page 1, “Introduction”
  6. Efficient new third-order parsing algorithms .
    Page 1, “Introduction”
  7. The remainder of this paper is divided as follows: Sections 2 and 3 give background, Sections 4 and 5 describe our new parsing algorithms , Section 6 discusses related work, Section 7 presents our experimental results, and Section 8 concludes.
    Page 1, “Introduction”
  8. For certain factorizations, efficient parsing algorithms exist for solving Eq.
    Page 2, “Dependency parsing”
  9. We define the order of a part according to the number of dependencies it contains, with analogous terminology for factorizations and parsing algorithms .
    Page 2, “Dependency parsing”
  10. Our new third-order dependency parsers build on ideas from existing parsing algorithms .
    Page 2, “Existing parsing algorithms”
  11. Since each derivation is defined by two fixed indices (the boundaries of the span) and a third free index (the split point), the parsing algorithm requires 0(n3) time and 0(n2) space (Eisner, 1996; McAllester, 1999).
    Page 3, “Existing parsing algorithms”

See all papers in Proc. ACL 2010 that mention parsing algorithms.

See all papers in Proc. ACL that mention parsing algorithms.

Back to top.

dependency parsing

Appears in 10 sentences as: dependency parse (1) dependency parsers (2) dependency parsing (7)
In Efficient Third-Order Dependency Parsers
  1. We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time.
    Page 1, “Abstract”
  2. Consequently, recent work in dependency parsing has been restricted to applications of second-order parsers, the most powerful of which (Carreras, 2007) requires 0(n4) time and 0(n3) space, while being limited to second-order parts.
    Page 1, “Introduction”
  3. For a sentence :13, we define dependency parsing as a search for the highest-scoring analysis of :13:
    Page 2, “Dependency parsing”
  4. Our new third-order dependency parsers build on ideas from existing parsing algorithms.
    Page 2, “Existing parsing algorithms”
  5. In this section we describe our new third-order dependency parsing algorithms.
    Page 3, “New third-order parsing algorithms”
  6. As a final note, the parsing algorithms described in this section fall into the category of projective dependency parsers , which forbid crossing dependencies.
    Page 6, “New third-order parsing algorithms”
  7. Eisner (2000) defines dependency parsing models where each word has a set of possible “senses” and the parser recovers the best joint assignment of syntax and senses.
    Page 6, “Related work”
  8. Following standard practice for higher-order dependency parsing (McDonald and Pereira, 2006; Carreras, 2007), Models 1 and 2 evaluate not only the relevant third-order parts, but also the lower-order parts that are implicit in their third-order factorizations.
    Page 7, “Parsing experiments”
  9. For example, Model 1 defines feature mappings for dependencies, siblings, grandchildren, and grand-siblings, so that the score of a dependency parse is given by:
    Page 7, “Parsing experiments”
  10. A second area for future work lies in applications of dependency parsing .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

dependency tree

Appears in 9 sentences as: dependency tree (7) dependency trees (2)
In Efficient Third-Order Dependency Parsers
  1. These parsing algorithms share an important characteristic: they factor dependency trees into sets of parts that have limited interactions.
    Page 1, “Introduction”
  2. By exploiting the additional constraints arising from the factorization, maximizations or summations over the set of possible dependency trees can be performed efficiently and exactly.
    Page 1, “Introduction”
  3. A crucial limitation of factored parsing algorithms is that the associated parts are typically quite small, losing much of the contextual information within the dependency tree .
    Page 1, “Introduction”
  4. A complete analysis of a sentence is given by a dependency tree : a set of dependencies that forms a rooted, directed tree spanning the words of the sentence.
    Page 1, “Dependency parsing”
  5. Every dependency tree is rooted at a special “*” token, allowing the
    Page 1, “Dependency parsing”
  6. A common strategy, and one which forms the focus of this paper, is to factor each dependency tree into small parts, which can be scored in isolation.
    Page 2, “Dependency parsing”
  7. That is, we treat the dependency tree 3/ as a set of parts p, each of which makes a separate contribution to the score of 3/.
    Page 2, “Dependency parsing”
  8. The first type of parser we describe uses a “first-order” factorization, which decomposes a dependency tree into its individual dependencies.
    Page 2, “Existing parsing algorithms”
  9. The first parser, Model 0, factors each dependency tree into a set of grandchild parts—pairs of dependencies connected head-to-tail.
    Page 3, “New third-order parsing algorithms”

See all papers in Proc. ACL 2010 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

POS tags

Appears in 9 sentences as: POS tag (3) POS tags (10)
In Efficient Third-Order Dependency Parsers
  1. These indices allow the use of arbitrary features predicated on the position of the grandparent (e. g., word identity, POS tag, contextual POS tags ) without affecting the asymptotic complexity of the parsing algorithm.
    Page 6, “Related work”
  2. For example, fdep contains lexicalized “in-between” features that depend on the head and modifier words as well as a word lying in between the two; in contrast, previous work has generally defined in-between features for POS tags only.
    Page 7, “Parsing experiments”
  3. 8A3 in previous work, English evaluation ignores any token whose gold-standard POS tag is one of { ‘ ‘ ' ' :
    Page 7, “Parsing experiments”
  4. First, we define 4-gram features that characterize the four relevant indices using words and POS tags; examples include POS 4-grams and mixed 4-grams with one word and three POS tags .
    Page 8, “Parsing experiments”
  5. Second, we define 4-gram context features consisting of POS 4-grams augmented with adjacent POS tags : for example, fgsib(w, g, h, m, 3) includes POS 7-grams for the tags at positions (9, h, m, 3, 9+1, n+1, m+l).
    Page 8, “Parsing experiments”
  6. For example, fgsib(w, g, h, m, 3) contains features examining the implicit head-modifier relationship (9, m) that are only activated when the POS tag of s is a coordinating conjunction.
    Page 8, “Parsing experiments”
  7. Finally, we make two brief remarks regarding the use of POS tags .
    Page 8, “Parsing experiments”
  8. First, we assume that input sentences have been automatically tagged in a preprocessing step.9 Second, for any feature that depends on POS tags, we include two copies of the feature: one using normal POS tags and another using coarsened versions10 of the POS tags .
    Page 8, “Parsing experiments”
  9. Note that the reliance on POS-tagged input can be relaxed slightly by treating POS tags as word senses; see Section 5.3 and McDonald (2006, Table 6.1).
    Page 8, “Parsing experiments”

See all papers in Proc. ACL 2010 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

UAS

Appears in 4 sentences as: UAS (5)
In Efficient Third-Order Dependency Parsers
  1. measured with unlabeled attachment score ( UAS ): the percentage of words with the correct head.8
    Page 7, “Parsing experiments”
  2. Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Accl/Acc2 = UAS of Models 1/2 on validation data, and Timel/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations.
    Page 8, “Parsing experiments”
  3. Table 2: UAS of Models 1 and 2 on test data, with relevant results from related work.
    Page 9, “Parsing experiments”
  4. Table 3: UAS for modified versions of our parsers on validation data.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

perceptron

Appears in 3 sentences as: perceptron (3)
In Efficient Third-Order Dependency Parsers
  1. 7.2 Averaged perceptron training
    Page 8, “Parsing experiments”
  2. We chose the averaged structured perceptron (Freund and Schapire, 1999; Collins, 2002) as it combines highly competitive performance with fast training times, typically converging in 5—10 iterations.
    Page 8, “Parsing experiments”
  3. Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Accl/Acc2 = UAS of Models 1/2 on validation data, and Timel/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations.
    Page 8, “Parsing experiments”

See all papers in Proc. ACL 2010 that mention perceptron.

See all papers in Proc. ACL that mention perceptron.

Back to top.

Treebank

Appears in 3 sentences as: Treebank (6)
In Efficient Third-Order Dependency Parsers
  1. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank , achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
    Page 1, “Abstract”
  2. We evaluate our parsers on the Penn WSJ Treebank (Marcus et al., 1993) and Prague Dependency Treebank (Hajic et al., 2001), achieving unlabeled attachment scores of 93.04% and 87.38%.
    Page 1, “Introduction”
  3. In order to evaluate the effectiveness of our parsers in practice, we apply them to the Penn WSJ Treebank (Marcus et al., 1993) and the Prague Dependency Treebank (Hajic et al., 2001; Hajic, 1998).6 We use standard training, validation, and test splits7 to facilitate comparisons.
    Page 7, “Parsing experiments”

See all papers in Proc. ACL 2010 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.