Dependency Parsing and Projection Based on Word-Pair Classification
Jiang, Wenbin and Liu, Qun

Article Structure

Abstract

In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge.

Introduction

Supervised dependency parsing achieves the state-of-the-art in recent years (McDonald et al., 2005a; McDonald and Pereira, 2006; Nivre et al., 2006).

Word-Pair Classification Model

2.1 Model Definition

Projected Classification Instance

After the introduction of the word-pair classification model, we now describe the extraction of projected dependency instances.

Boosting an MST Parser

The classifier can be used to boost a existing parser trained on human-annotated trees.

Related Works

5.1 Dependency Parsing

Experiments

In this section, we first validate the word-pair classification model by experimenting on human-annotated treebanks.

Conclusion and Future Works

In this paper, we first describe an intuitionistic method for dependency parsing, which resorts to a classifier to determine whether a word pair forms a dependency edge, and then propose an effective strategy for dependency projection, which produces a set of projected classification instances rather than complete projected trees.

Topics

dependency parsing

Appears in 24 sentences as: dependency parse (1) Dependency Parser (1) dependency parser (5) dependency parsers (3) Dependency Parsing (2) dependency parsing (14)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. In this paper we describe an intuitionistic method for dependency parsing , where a classifier is used to determine whether a pair of words forms a dependency edge.
    Page 1, “Abstract”
  2. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers .
    Page 1, “Abstract”
  3. More importantly, when this classifier is integrated into a maximum spanning tree (MST) dependency parser , obvious improvement is obtained over the MST baseline.
    Page 1, “Abstract”
  4. Supervised dependency parsing achieves the state-of-the-art in recent years (McDonald et al., 2005a; McDonald and Pereira, 2006; Nivre et al., 2006).
    Page 1, “Introduction”
  5. For example, the unsupervised dependency parsing (Klein and Manning, 2004) which is totally based on unannotated data, and the semisupervised dependency parsing (Koo et al., 2008) which is based on both annotated and unannotated data.
    Page 1, “Introduction”
  6. Meanwhile, we propose an intuitionistic model for dependency parsing , which uses a classifier to determine whether a pair of words form a dependency edge.
    Page 1, “Introduction”
  7. The classifier can then be trained on the projected classification instance set, so as to build a projected dependency parser without the need of complete projected trees.
    Page 1, “Introduction”
  8. Experimental results show that, the classifier trained on the projected classification instances significantly outperforms the projected dependency parsers in previous works.
    Page 2, “Introduction”
  9. More importantly, when this classifier is integrated into a 2nd-ordered maXimum spanning tree (MST) dependency parser (McDonald and Pereira, 2006) in a weighted average manner, significant improvement is obtained over the MST baselines.
    Page 2, “Introduction”
  10. In the rest of this paper, we first describe the word-pair classification model for dependency parsing (section 2) and the generation method of projected classification instances (section 3).
    Page 2, “Introduction”
  11. After the comparisons with previous works on dependency parsing and projection, we finally five the experimental results.
    Page 2, “Introduction”

See all papers in Proc. ACL 2010 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

treebanks

Appears in 12 sentences as: Treebank (3) treebank (3) treebanks (8)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. Since it is costly and difficult to build human-annotated treebanks , a lot of works have also been devoted to the utilization of unannotated text.
    Page 1, “Introduction”
  2. For the 2nd-order MST parser trained on Penn Chinese Treebank (CTB) 5.0, the classifier give an precision increment of 0.5 points.
    Page 2, “Introduction”
  3. On the training method, however, our model obviously differs from other graph-based models, that we only need a set of word-pair dependency instances rather than a regular dependency treebank .
    Page 5, “Related Works”
  4. In this section, we first validate the word-pair classification model by experimenting on human-annotated treebanks .
    Page 6, “Experiments”
  5. We experiment on two popular treebanks, the Wall Street Journal (WSJ) portion of the Penn English Treebank (Marcus et al., 1993), and the Penn Chinese Treebank (CTB) 5.0 (Xue et al., 2005).
    Page 6, “Experiments”
  6. The constituent trees in the two treebanks are transformed to dependency trees according to the head-finding rules of Yamada and Matsumoto (2003).
    Page 6, “Experiments”
  7. Each treebank is splitted into three partitions, for training, development and testing, respectively, as shown in Table 2.
    Page 6, “Experiments”
  8. Therefore, if trained on instances extracted from human-annotated treebanks , the word-pair classification model would not demonstrate its advantage over existed state-of-the-art dependency parsing methods.
    Page 7, “Experiments”
  9. The projected classifier significantly outperforms previous works on both test sets, which demonstrates that the word-pair classification model, although falling behind of the state-of-the-art on human-annotated treebanks , performs well in projected dependency parsing.
    Page 7, “Experiments”
  10. It indicates that, the smaller the human-annotated treebank we have, the more significant improvement we can achieve by integrating the projecting classifier.
    Page 8, “Experiments”
  11. In addition, when integrated into a 2nd-ordered MST parser, the projected parser brings significant improvement to the baseline, especially for the baseline trained on smaller treebanks .
    Page 8, “Conclusion and Future Works”

See all papers in Proc. ACL 2010 that mention treebanks.

See all papers in Proc. ACL that mention treebanks.

Back to top.

dependency tree

Appears in 9 sentences as: dependency tree (5) dependency trees (4)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. y denotes the dependency tree for sentence x, and (i, j) E y represents a dependency edge from word :10,- to word as], where :10, is the parent of ccj.
    Page 2, “Word-Pair Classification Model”
  2. Follow the edge based factorization method (Eisner, 1996), we factorize the score of a dependency tree s(x, y) into its dependency edges, and design a dynamic programming algorithm to search for the candidate parse with maximum score.
    Page 2, “Word-Pair Classification Model”
  3. Where 3/ is searched from the set of well-formed dependency trees .
    Page 3, “Word-Pair Classification Model”
  4. As described previously, the score of a dependency tree given by a word-pair classifier can be factored into each candidate dependency edge in this tree.
    Page 5, “Boosting an MST Parser”
  5. Jiang and Liu (2009) refer to alignment matrix and a dynamic programming search algorithm to obtain better projected dependency trees .
    Page 5, “Related Works”
  6. Because of the free translation, the word alignment errors, and the heterogeneity between two languages, it is reluctant and less effective to project the dependency tree completely to the target language sentence.
    Page 5, “Related Works”
  7. The constituent trees in the two treebanks are transformed to dependency trees according to the head-finding rules of Yamada and Matsumoto (2003).
    Page 6, “Experiments”
  8. For a dependency tree with n words, only n —1 positive dependency instances can be extracted.
    Page 6, “Experiments”
  9. The English sentences are then parsed by an implementation of 2nd-ordered MST model of McDonald and Pereira (2006), which is trained on dependency trees extracted from WSJ.
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

word pairs

Appears in 7 sentences as: word pair (4) word pairs (5)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree.
    Page 1, “Abstract”
  2. Given a word-aligned bilingual corpus with source language sentences parsed, the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language.
    Page 1, “Introduction”
  3. A dependency relationship is a boolean value that represents whether this word pair forms a dependency edge.
    Page 1, “Introduction”
  4. The task of the word-pair classification model is to determine whether any candidate word pair , :10, and 553- st. 1 g i,j g |x| andz' 75 j, forms a dependency edge.
    Page 2, “Word-Pair Classification Model”
  5. Ideally, given the classification results for all candidate word pairs , the dependency parse tree can be composed of the candidate edges with higher score (1 for the boolean-valued classifier, and large p for the real-valued classifier).
    Page 2, “Word-Pair Classification Model”
  6. Here we give the calculation of dependency probability C (7', j We use w to denote the parameter vector of the ME model, and f (7', j, 7“) to denote the feature vector for the assumption that the word pair 7' and j has a dependency relationship 7“.
    Page 3, “Word-Pair Classification Model”
  7. In this paper, we first describe an intuitionistic method for dependency parsing, which resorts to a classifier to determine whether a word pair forms a dependency edge, and then propose an effective strategy for dependency projection, which produces a set of projected classification instances rather than complete projected trees.
    Page 8, “Conclusion and Future Works”

See all papers in Proc. ACL 2010 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

word alignment

Appears in 6 sentences as: word alignment (7)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. For dependency projection, the relationship between words in the parsed sentences can be simply projected across the word alignment to words in the unparsed sentences, according to the DCA assumption (Hwa et al., 2005).
    Page 1, “Introduction”
  2. Such a projection procedure suffers much from the word alignment errors and syntactic isomerism between languages, which usually lead to relationship projection conflict and incomplete projected dependency structures.
    Page 1, “Introduction”
  3. Because of the free translation, the syntactic isomerism between languages and word alignment errors, it would be strained to completely project the dependency structure from one language to another.
    Page 1, “Introduction”
  4. In order to alleviate the effect of word alignment errors, we base the projection on the alignment matrix, a compact representation of multiple GIZA++ (Och and Ney, 2000) results, rather than a single word alignment in previous dependency projection works.
    Page 4, “Projected Classification Instance”
  5. Figure 2: The word alignment matrix between a Chinese sentence and its English translation.
    Page 4, “Projected Classification Instance”
  6. Because of the free translation, the word alignment errors, and the heterogeneity between two languages, it is reluctant and less effective to project the dependency tree completely to the target language sentence.
    Page 5, “Related Works”

See all papers in Proc. ACL 2010 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

dependency relationship

Appears in 5 sentences as: dependency relationship (3) dependency relationships (2)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree.
    Page 1, “Abstract”
  2. Given a word-aligned bilingual corpus with source language sentences parsed, the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language.
    Page 1, “Introduction”
  3. A dependency relationship is a boolean value that represents whether this word pair forms a dependency edge.
    Page 1, “Introduction”
  4. Here we give the calculation of dependency probability C (7', j We use w to denote the parameter vector of the ME model, and f (7', j, 7“) to denote the feature vector for the assumption that the word pair 7' and j has a dependency relationship 7“.
    Page 3, “Word-Pair Classification Model”
  5. We define a boolean-valued function 6 (y, i, j, 7“) to investigate the dependency relationship of word 2' and word j in parse tree y:
    Page 4, “Projected Classification Instance”

See all papers in Proc. ACL 2010 that mention dependency relationship.

See all papers in Proc. ACL that mention dependency relationship.

Back to top.

graph-based

Appears in 5 sentences as: graph-based (5)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. Previous graph-based dependency models usually use the index distance of word 7' and word j
    Page 3, “Word-Pair Classification Model”
  2. Both the graph-based (McDonald et al., 2005a; McDonald and Pereira, 2006; Carreras et al., 2006) and the transition-based (Yamada and Matsumoto, 2003; Nivre et al., 2006) parsing algorithms are related to our word-pair classification model.
    Page 5, “Related Works”
  3. Similar to the graph-based method, our model is factored on dependency edges, and its decoding procedure also aims to find a maximum spanning tree in a fully connected directed graph.
    Page 5, “Related Works”
  4. From this point, our model can be classified into the graph-based category.
    Page 5, “Related Works”
  5. On the training method, however, our model obviously differs from other graph-based models, that we only need a set of word-pair dependency instances rather than a regular dependency treebank.
    Page 5, “Related Works”

See all papers in Proc. ACL 2010 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

dynamic programming

Appears in 5 sentences as: dynamic programming (5)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. J iang and Liu (2009) resort to a dynamic programming procedure to search for a completed projected tree.
    Page 1, “Introduction”
  2. Follow the edge based factorization method (Eisner, 1996), we factorize the score of a dependency tree s(x, y) into its dependency edges, and design a dynamic programming algorithm to search for the candidate parse with maximum score.
    Page 2, “Word-Pair Classification Model”
  3. In this work, however, we still adopt the more general, bottom-up dynamic programming algorithm Algorithm 1 in order to facilitate the possible expansions.
    Page 4, “Word-Pair Classification Model”
  4. Jiang and Liu (2009) refer to alignment matrix and a dynamic programming search algorithm to obtain better projected dependency trees.
    Page 5, “Related Works”
  5. First, we implement a chart-based dynamic programming parser for the 2nd-0rdered MST model, and develop a training procedure based on the perceptron algorithm with averaged parameters (Collins, 2002).
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

parse tree

Appears in 4 sentences as: parse tree (3) parsing tree (1)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. Ideally, given the classification results for all candidate word pairs, the dependency parse tree can be composed of the candidate edges with higher score (1 for the boolean-valued classifier, and large p for the real-valued classifier).
    Page 2, “Word-Pair Classification Model”
  2. This strategy alleviate the classification errors to some degree and ensure a valid, complete dependency parsing tree .
    Page 2, “Word-Pair Classification Model”
  3. Suppose a bilingual sentence pair, composed of a source sentence e and its target translation f. ye is the parse tree of the source sentence.
    Page 4, “Projected Classification Instance”
  4. We define a boolean-valued function 6 (y, i, j, 7“) to investigate the dependency relationship of word 2' and word j in parse tree y:
    Page 4, “Projected Classification Instance”

See all papers in Proc. ACL 2010 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

POS tags

Appears in 4 sentences as: POS tagger (2) POS tags (3)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. 1 Each feature is composed of some words and POS tags surrounded word 7' and/or word j, as well as an optional distance representations between this two words.
    Page 3, “Word-Pair Classification Model”
  2. For English, we use the automatically-assigned POS tags produced by an implementation of the POS tagger of Collins (2002).
    Page 6, “Experiments”
  3. While for Chinese, we just use the gold-standard POS tags following the tradition.
    Page 6, “Experiments”
  4. Both English and Chinese sentences are tagged by the implementations of the POS tagger of Collins (2002), which trained on W8] and CTB 5.0 respectively.
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

development set

Appears in 4 sentences as: development set (3) development sets (1)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. The relative weight A is adjusted to maximize the performance on the development set , using an algorithm similar to minimum error-rate training (Och, 2003).
    Page 5, “Boosting an MST Parser”
  2. Figure 3: Performance curves of the word-pair classification model on the development sets of WSJ and CTB 5.0, with respect to a series of ratio 7“.
    Page 6, “Experiments”
  3. Figure 4: The performance curve of the word-pair classification model on the development set of CTB 5.0, with respect to a series of threshold (9.
    Page 7, “Experiments”
  4. Then, on each instance set we train a classifier and test it on the development set of CTB 5.0.
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

maximum spanning

Appears in 3 sentences as: maXimum spanning (1) maximum spanning (2)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. More importantly, when this classifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
    Page 1, “Abstract”
  2. More importantly, when this classifier is integrated into a 2nd-ordered maXimum spanning tree (MST) dependency parser (McDonald and Pereira, 2006) in a weighted average manner, significant improvement is obtained over the MST baselines.
    Page 2, “Introduction”
  3. Similar to the graph-based method, our model is factored on dependency edges, and its decoding procedure also aims to find a maximum spanning tree in a fully connected directed graph.
    Page 5, “Related Works”

See all papers in Proc. ACL 2010 that mention maximum spanning.

See all papers in Proc. ACL that mention maximum spanning.

Back to top.

Parsing Algorithm

Appears in 3 sentences as: Parsing Algorithm (2) parsing algorithms (1)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. 2.3 Parsing Algorithm
    Page 3, “Word-Pair Classification Model”
  2. Algorithm 1 Dependency Parsing Algorithm .
    Page 4, “Word-Pair Classification Model”
  3. Both the graph-based (McDonald et al., 2005a; McDonald and Pereira, 2006; Carreras et al., 2006) and the transition-based (Yamada and Matsumoto, 2003; Nivre et al., 2006) parsing algorithms are related to our word-pair classification model.
    Page 5, “Related Works”

See all papers in Proc. ACL 2010 that mention Parsing Algorithm.

See all papers in Proc. ACL that mention Parsing Algorithm.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pair (1) sentence pairs (2)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. Suppose a bilingual sentence pair , composed of a source sentence e and its target translation f. ye is the parse tree of the source sentence.
    Page 4, “Projected Classification Instance”
  2. It contains 239K sentence pairs with about 6.9M/8.9M words in Chi-neseflEnglish.
    Page 7, “Experiments”
  3. The alignment matrixes for sentence pairs are generated according to (Liu et al., 2009).
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

significant improvement

Appears in 3 sentences as: significant improvement (3)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. More importantly, when this classifier is integrated into a 2nd-ordered maXimum spanning tree (MST) dependency parser (McDonald and Pereira, 2006) in a weighted average manner, significant improvement is obtained over the MST baselines.
    Page 2, “Introduction”
  2. It indicates that, the smaller the human-annotated treebank we have, the more significant improvement we can achieve by integrating the projecting classifier.
    Page 8, “Experiments”
  3. In addition, when integrated into a 2nd-ordered MST parser, the projected parser brings significant improvement to the baseline, especially for the baseline trained on smaller treebanks.
    Page 8, “Conclusion and Future Works”

See all papers in Proc. ACL 2010 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.

significantly outperforms

Appears in 3 sentences as: significantly outperforms (3)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers.
    Page 1, “Abstract”
  2. Experimental results show that, the classifier trained on the projected classification instances significantly outperforms the projected dependency parsers in previous works.
    Page 2, “Introduction”
  3. The projected classifier significantly outperforms previous works on both test sets, which demonstrates that the word-pair classification model, although falling behind of the state-of-the-art on human-annotated treebanks, performs well in projected dependency parsing.
    Page 7, “Experiments”

See all papers in Proc. ACL 2010 that mention significantly outperforms.

See all papers in Proc. ACL that mention significantly outperforms.

Back to top.

spanning tree

Appears in 3 sentences as: spanning tree (3)
In Dependency Parsing and Projection Based on Word-Pair Classification
  1. More importantly, when this classifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
    Page 1, “Abstract”
  2. More importantly, when this classifier is integrated into a 2nd-ordered maXimum spanning tree (MST) dependency parser (McDonald and Pereira, 2006) in a weighted average manner, significant improvement is obtained over the MST baselines.
    Page 2, “Introduction”
  3. Similar to the graph-based method, our model is factored on dependency edges, and its decoding procedure also aims to find a maximum spanning tree in a fully connected directed graph.
    Page 5, “Related Works”

See all papers in Proc. ACL 2010 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.