Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
Zhang, Yuan and Lei, Tao and Barzilay, Regina and Jaakkola, Tommi and Globerson, Amir

Article Structure

Abstract

Much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems associated with rich scoring functions.

Introduction

Dependency parsing is commonly cast as a maximization problem over a parameterized scoring function.

Related Work

Earlier works on dependency parsing focused on inference with tractable scoring functions.

Sampling-Based Dependency Parsing with Global Features

In this section, we introduce our novel sampling-based dependency parser which can incorporate

Features

First- to Third-Order Features The feature templates of first— to third-order features are mainly drawn from previous work on graph-based parsing (McDonald and Pereira, 2006), transition-based parsing (Nivre et al., 2006) and dual decomposition-based parsing (Martins et al., 2011).

Experimental Setup

Datasets We evaluate our model on standard benchmark corpora — CoNLL 2006 and CoNLL 2008 (Buchholz and Marsi, 2006; Surdeanu et al., 2008) — which include dependency treebanks for 14 different languages.

Results

Comparison with State-of-the-art Parsers Table 2 summarizes the performance of our model and of the baselines.

Conclusions

This paper demonstrates the power of combining a simple inference procedure with a highly expressive scoring function.

Topics

scoring function

Appears in 29 sentences as: scoring function (18) scoring functions (12)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems associated with rich scoring functions .
    Page 1, “Abstract”
  2. In contrast, we demonstrate that highly expressive scoring functions can be used with substantially simpler inference procedures.
    Page 1, “Abstract”
  3. Dependency parsing is commonly cast as a maximization problem over a parameterized scoring function .
    Page 1, “Introduction”
  4. In this view, the use of more expressive scoring functions leads to more challenging combinatorial problems of finding the maximizing parse.
    Page 1, “Introduction”
  5. We depart from this view and instead focus on using highly expressive scoring functions with substantially simpler inference procedures.
    Page 1, “Introduction”
  6. Our combination outperforms the state-of-the-art parsers and remains comparable even if we adopt their scoring functions .
    Page 1, “Introduction”
  7. Rich scoring functions have been used for some time.
    Page 1, “Introduction”
  8. They first appeared in the context of reranking (Collins, 2000), where a simple parser is used to generate a candidate list which is then reranked according to the scoring function .
    Page 1, “Introduction”
  9. Because the number of alternatives is small, the scoring function could in principle involve arbitrary (global) features of parse trees.
    Page 1, “Introduction”
  10. We dispense with the notion of a candidate set and seek to exploit the scoring function more directly.
    Page 1, “Introduction”
  11. In this paper, we introduce a sampling-based parser that places few or no constraints on the scoring function .
    Page 1, “Introduction”

See all papers in Proc. ACL 2014 that mention scoring function.

See all papers in Proc. ACL that mention scoring function.

Back to top.

POS tags

Appears in 19 sentences as: POS Tag (1) POS tag (8) POS tagging (2) POS tags (9)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. When proposing a small move, i.e., sampling a head of the word, we can also jointly sample its POS tag from a set of alternatives provided by the tagger.
    Page 2, “Introduction”
  2. For instance, we can sample the POS tag , the dependency relation or morphology information.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  3. POS correction scenario in which only the predicted POS tags are provided in the testing phase, while both gold and predicted tags are available for the training set.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  4. We extend our model such that it jointly learns how to predict a parse tree and also correct the predicted POS tags for a better parsing performance.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  5. For each sampling, let H be the set of candidate heads and ’2' be the set of candidate POS tags .
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  6. 2In our work we choose oz 2 0.003, which gives a 98.9% oracle POS tagging accuracy on the CATiB development set.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  7. 0 Coordination In a coordinate structure, the two adj acent conjuncts usually agree with each other on POS tags and their span lengths.
    Page 6, “Features”
  8. Therefore, we add different features to capture POS tag and span length consistency in a coordinate structure.
    Page 6, “Features”
  9. 0 Span Length This feature captures the distribution of the binned span length of each POS tag .
    Page 6, “Features”
  10. ° Neighbors The POS tags of the neighboring words to the left and right of each span, together with the binned span length and the POS tag at the span root.
    Page 6, “Features”
  11. 0 Valency We consider valency features for each POS tag .
    Page 6, “Features”

See all papers in Proc. ACL 2014 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

reranker

Appears in 16 sentences as: reranked (1) reranker (10) Reranking (2) reranking (7)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. They first appeared in the context of reranking (Collins, 2000), where a simple parser is used to generate a candidate list which is then reranked according to the scoring function.
    Page 1, “Introduction”
  2. Our method provides a more effective mechanism for handling global features than reranking , outperforming it by 1.3%.
    Page 2, “Introduction”
  3. The first successful approach in this arena was reranking (Collins, 2000; Charniak and J ohn-son, 2005) on constituency parsing.
    Page 2, “Related Work”
  4. Reranking can be combined with an arbitrary scoring function, and thus can easily incorporate global features over the entire parse tree.
    Page 2, “Related Work”
  5. Its main disadvantage is that the output parse can only be one of the few parses passed to the reranker .
    Page 2, “Related Work”
  6. Global Features We used feature shown promising in prior reranking work Chamiak and Johnson (2005), Collins (2000) and Huang (2008).
    Page 6, “Features”
  7. We also compare our model against a discriminative reranker .
    Page 7, “Experimental Setup”
  8. The reranker operates over the
    Page 7, “Experimental Setup”
  9. We then train the reranker by running 10 epochs of cost-augmented MIRA.
    Page 7, “Experimental Setup”
  10. The reranker uses the same features as our model, along with the tree scores obtained from the MST parser (which is a standard practice in reranking ).
    Page 7, “Experimental Setup”
  11. 4The MST parser is trained in projective mode for reranking because generating top-k list from second-order non-projective model is intractable.
    Page 7, “Results”

See all papers in Proc. ACL 2014 that mention reranker.

See all papers in Proc. ACL that mention reranker.

Back to top.

UAS

Appears in 10 sentences as: UAS (10)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Evaluation Measures Following standard practice, we use Unlabeled Attachment Score ( UAS ) as the evaluation metric in all our experiments.
    Page 7, “Experimental Setup”
  2. We report UAS excluding punctuation on CoNLL datasets, following Martins et al.
    Page 7, “Experimental Setup”
  3. For the CATiB dataset, we report UAS including punctuation in order to be consistent with the published results in the 2013 SPMRL shared task (Seddah et al., 2013).
    Page 7, “Experimental Setup”
  4. Moreover, our model also outperforms the 88.80% average UAS reported in Martins et al.
    Page 8, “Results”
  5. With these features our model achieves an average UAS 89.28%.
    Page 8, “Results”
  6. UAS POS Acc.
    Page 9, “Results”
  7. UAS Gold — 90.27 — 88.46 Predicted 96.87 88.81 96.82 86.95 POS Correction 97.72 90.08 97.49 88.38 CADIM 96.87 87.4— 96.82 85.78 IMS—Single — — — 86.96 IMS-Ensemble — — — 88.32
    Page 9, “Results”
  8. The upper part shows UAS of our model with gold/predicted information or POS correction.
    Page 9, “Results”
  9. Bottom part shows UAS of the best systems in the SPMRL shared task.
    Page 9, “Results”
  10. Figure 8: UAS on four languages when training with different constraints.
    Page 9, “Results”

See all papers in Proc. ACL 2014 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

Gibbs sampler

Appears in 7 sentences as: Gibbs sampler (3) Gibbs Sampling (1) Gibbs sampling (3)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Our first strategy is akin to Gibbs sampling and samples a new head for each word in the sentence, modifying one arc at a time.
    Page 1, “Introduction”
  2. 3.2.1 Gibbs Sampling
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  3. One shortcoming of the Gibbs sampler is that it only changes one variable (arc) at a time.
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  4. Note that blocked Gibbs sampling would be exponential in K, and is thus very slow already at K = 4.
    Page 4, “Sampling-Based Dependency Parsing with Global Features”
  5. The Gibbs sampler will generate a new sample from the space H x ’2'.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  6. Therefore, the first-order distribution is not well-defined and we only employ Gibbs sampling for simplicity.
    Page 7, “Experimental Setup”
  7. iteration of this sampler makes multiple changes to the tree, in contrast to a single-edge change of Gibbs sampler .
    Page 9, “Results”

See all papers in Proc. ACL 2014 that mention Gibbs sampler.

See all papers in Proc. ACL that mention Gibbs sampler.

Back to top.

CoNLL

Appears in 6 sentences as: CoNLL (7)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. The model outperforms state-of-the-art results when evaluated on 14 languages of non-projective CoNLL datasets.
    Page 1, “Abstract”
  2. Datasets We evaluate our model on standard benchmark corpora — CoNLL 2006 and CoNLL 2008 (Buchholz and Marsi, 2006; Surdeanu et al., 2008) — which include dependency treebanks for 14 different languages.
    Page 6, “Experimental Setup”
  3. We use all sentences in CoNLL datasets during training and testing.
    Page 7, “Experimental Setup”
  4. We report UAS excluding punctuation on CoNLL datasets, following Martins et al.
    Page 7, “Experimental Setup”
  5. Because the CoNLL datasets do not have a standard development set, we randomly select a held out of 200 sentences from the training set.
    Page 7, “Experimental Setup”
  6. However, we do not impose this constraint for the CoNLL datasets.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention CoNLL.

See all papers in Proc. ACL that mention CoNLL.

Back to top.

dependency parsing

Appears in 6 sentences as: dependency parser (1) Dependency parsing (1) dependency parsing (4)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Much of the recent work on dependency parsing has been focused on solving inherent combinatorial problems associated with rich scoring functions.
    Page 1, “Abstract”
  2. Dependency parsing is commonly cast as a maximization problem over a parameterized scoring function.
    Page 1, “Introduction”
  3. Earlier works on dependency parsing focused on inference with tractable scoring functions.
    Page 2, “Related Work”
  4. In this section, we introduce our novel sampling-based dependency parser which can incorporate
    Page 2, “Sampling-Based Dependency Parsing with Global Features”
  5. We apply the Random Walk-based sampling method (see Section 3.2.2) for the standard dependency parsing task.
    Page 7, “Experimental Setup”
  6. Our model achieves the best results on the standard dependency parsing benchmark, outperforming parsing methods with elaborate inference procedures.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

development set

Appears in 6 sentences as: development set (4) development sets (2)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. 2In our work we choose oz 2 0.003, which gives a 98.9% oracle POS tagging accuracy on the CATiB development set .
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  2. For efficiency, we limit the sentence length to 70 tokens in training and development sets .
    Page 7, “Experimental Setup”
  3. This gives a 99% pruning recall on the CATiB development set .
    Page 7, “Experimental Setup”
  4. After pruning, we tune the regularization parameter 0 = {0.l,0.01,0.001} on development sets for different languages.
    Page 7, “Experimental Setup”
  5. Because the CoNLL datasets do not have a standard development set , we randomly select a held out of 200 sentences from the training set.
    Page 7, “Experimental Setup”
  6. We also pick the training epochs from {50, 100, 150} which gives the best performance on the development set for each language.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

parse tree

Appears in 6 sentences as: parse tree (5) parse trees (1)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Because the number of alternatives is small, the scoring function could in principle involve arbitrary (global) features of parse trees .
    Page 1, “Introduction”
  2. Reranking can be combined with an arbitrary scoring function, and thus can easily incorporate global features over the entire parse tree .
    Page 2, “Related Work”
  3. Ideally, we would change multiple heads in the parse tree simultaneously, and sample those choices from the corresponding conditional distribution of p. While in general this is increasingly difficult with more heads, it is indeed tractable if
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  4. 3/ is always a valid parse tree if we allow multiple children of the root and do not impose projective constraint.
    Page 4, “Sampling-Based Dependency Parsing with Global Features”
  5. We extend our model such that it jointly learns how to predict a parse tree and also correct the predicted POS tags for a better parsing performance.
    Page 5, “Sampling-Based Dependency Parsing with Global Features”
  6. We split the sentence based on the ending punctuation, predict the parse tree for each segment and group the roots of resulting trees into a single node.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

feature templates

Appears in 5 sentences as: feature templates (5)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. First- to Third-Order Features The feature templates of first— to third-order features are mainly drawn from previous work on graph-based parsing (McDonald and Pereira, 2006), transition-based parsing (Nivre et al., 2006) and dual decomposition-based parsing (Martins et al., 2011).
    Page 6, “Features”
  2. The feature templates are inspired by previous feature-rich POS tagging work (Toutanova et al., 2003).
    Page 6, “Features”
  3. In our work we use feature templates up to 5-gram.
    Page 6, “Features”
  4. Table 1 summarizes all POS tag feature templates .
    Page 6, “Features”
  5. Table 1: PCS tag feature templates .
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention feature templates.

See all papers in Proc. ACL that mention feature templates.

Back to top.

dependency trees

Appears in 4 sentences as: dependency tree (1) dependency trees (3)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. We denote sentences by ac and the corresponding dependency trees by y E 3?
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  2. is the set of valid (projective or non-projective) dependency trees for sentence cc.
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  3. The decoding problem consists of finding a valid dependency tree y 6 32(53) that maximizes the score s(:c,y) = 6 - f (:c,y) with parameters 6.
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  4. contain non-projective dependency trees .
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2014 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

shared task

Appears in 4 sentences as: shared task (4)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. This is better than the best published results in the 2013 SPMRL shared task (Seddah et al., 2013), including parser ensembles.
    Page 2, “Introduction”
  2. For the CATiB dataset, we report UAS including punctuation in order to be consistent with the published results in the 2013 SPMRL shared task (Seddah et al., 2013).
    Page 7, “Experimental Setup”
  3. To put these numbers into perspective, the bottom part of Table 3 shows the accuracy of the best systems from the 2013 SPMRL shared task on Arabic parsing using predicted information (Seddah et al., 2013).
    Page 8, “Results”
  4. Bottom part shows UAS of the best systems in the SPMRL shared task .
    Page 9, “Results”

See all papers in Proc. ACL 2014 that mention shared task.

See all papers in Proc. ACL that mention shared task.

Back to top.

ground truth

Appears in 3 sentences as: ground truth (3)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. A training set of size N is given as a set of pairs D = {(5507, gawk-1:1 Where ya) is the ground truth parse for sentence ma).
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  2. The Effect of Constraints in Learning Our training method updates parameters to satisfy the pairwise constraints between (1) subsequent samples on the sampling path and (2) selected samples and the ground truth .
    Page 9, “Results”
  3. “Neighbor” corresponds to pairwise constraints between subsequent samples, “Gold” represents constraints between a single sample and the ground truth , “Both” means applying both types of constraints.
    Page 9, “Results”

See all papers in Proc. ACL 2014 that mention ground truth.

See all papers in Proc. ACL that mention ground truth.

Back to top.

learning algorithm

Appears in 3 sentences as: learning algorithm (2) learning algorithms (1)
In Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
  1. Our work builds on one such approach — SampleRank (Wick et al., 2011), a sampling-based learning algorithm .
    Page 2, “Related Work”
  2. We begin with the notation before addressing the decoding and learning algorithms .
    Page 3, “Sampling-Based Dependency Parsing with Global Features”
  3. ure 4 summarizes the learning algorithm .
    Page 5, “Sampling-Based Dependency Parsing with Global Features”

See all papers in Proc. ACL 2014 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.