Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
Li, Zhenghua and Zhang, Min and Chen, Wenliang

Article Structure

Abstract

This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training.

Introduction

Supervised dependency parsing has made great progress during the past decade.

Supervised Dependency Parsing

Given an input sentence x = wowl...wn, the goal of dependency parsing is to build a dependency tree as depicted in Figure 1, denoted by d = {(h,m) :0 S h S 71,0 < m S n},where (h,m) indicates a directed arc from the head word 21);, to the modifier mm, and we is an artificial node linking to the root of the sentence.

Ambiguity-aware Ensemble Training

In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard.

Experiments and Analysis

To verify the effectiveness of our proposed approach, we conduct experiments on Penn Tree-bank (PTB) and Penn Chinese Treebank 5.1 (CT-B5).

Related Work

Our work is originally inspired by the work of Tackstro'm et al.

Conclusions

This paper proposes a generalized training framework of semi-supervised dependency parsing based on ambiguous labelings.

Topics

unlabeled data

Appears in 58 sentences as: Unlabeled Data (2) Unlabeled data (1) unlabeled data (58)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data .
    Page 1, “Abstract”
  2. With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
    Page 1, “Abstract”
  3. In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data , have attracted more and more interest.
    Page 1, “Introduction”
  4. Previously, unlabeled data is explored to derive useful local-context features such as word clusters (Koo et al., 2008), subtree frequencies (Chen et al., 2009; Chen et al., 2013), and word co-occurrence counts (Zhou et al., 2011; Bansal and Klein, 2011).
    Page 1, “Introduction”
  5. A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009).
    Page 1, “Introduction”
  6. Another line of research is to pick up some high-quality auto-parsed training instances from unlabeled data using bootstrapping methods, such as self-training (Yarowsky, 1995), co-training (Blum and Mitchell, 1998), and tri-training (Zhou and Li, 2005).
    Page 1, “Introduction”
  7. The reason may be that dependency parsing models are prone to amplify previous mistakes during training on self-parsed unlabeled data .
    Page 1, “Introduction”
  8. Both work employs two parsers to process the unlabeled data , and only select as extra training data sentences on which the 1-best parse trees of the two parsers are identical.
    Page 1, “Introduction”
  9. In this way, the auto-parsed unlabeled data becomes more reliable.
    Page 1, “Introduction”
  10. However, one obvious drawback of these methods is that they are unable to exploit unlabeled data with divergent outputs from different parsers.
    Page 2, “Introduction”
  11. Our experiments show that unlabeled data with identical outputs from different parsers tends to be short (18.25 words per sentence on average), and only has a small proportion of 40% (see Table 6).
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention unlabeled data.

See all papers in Proc. ACL that mention unlabeled data.

Back to top.

dependency parsing

Appears in 31 sentences as: Dependency Parser (1) dependency parser (8) dependency parsers (3) dependency parsing (20)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training.
    Page 1, “Abstract”
  2. With a conditional random field based probabilistic dependency parser , our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
    Page 1, “Abstract”
  3. Supervised dependency parsing has made great progress during the past decade.
    Page 1, “Introduction”
  4. A few effective learning methods are also proposed for dependency parsing to implicitly utilize distributions on unlabeled data (Smith and Eisner, 2007; Wang et al., 2008; Suzuki et al., 2009).
    Page 1, “Introduction”
  5. However, these methods gain limited success in dependency parsing .
    Page 1, “Introduction”
  6. Although working well on constituent parsing (McClosky et al., 2006; Huang and Harper, 2009), self-training is shown unsuccessful for dependency parsing (Spreyer and Kuhn, 2009).
    Page 1, “Introduction”
  7. The reason may be that dependency parsing models are prone to amplify previous mistakes during training on self-parsed unlabeled data.
    Page 1, “Introduction”
  8. Sagae and Tsujii (2007) apply a variant of co-training to dependency parsing and report positive results on out-of-domain text.
    Page 1, “Introduction”
  9. To solve above issues, this paper proposes a more general and effective framework for semi-supervised dependency parsing , referred to as ambiguity-aware ensemble training.
    Page 2, “Introduction”
  10. To construct parse forest on unlabeled data, we employ three supervised parsers based on different paradigms, including our baseline graph-based dependency parser, a transition-based dependency parser (Zhang and Nivre, 2011), and a generative constituent parser (Petrov and Klein, 2007).
    Page 2, “Introduction”
  11. We propose a generalized ambiguity-aware ensemble training framework for semi-supervised dependency parsing , which can
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

parse trees

Appears in 24 sentences as: parse tree (2) parse trees (23)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Instead of only using 1-best parse trees in previous work, our core idea is to utilize parse forest (ambiguous labelings) to combine multiple l-best parse trees generated from diverse parsers on unlabeled data.
    Page 1, “Abstract”
  2. 1) ambiguity encoded in parse forests compromises noise in l-best parse trees .
    Page 1, “Abstract”
  3. During training, the parser is aware of these ambiguous structures, and has the flexibility to distribute probability mass to its preferred parse trees as long as the likelihood improves.
    Page 1, “Abstract”
  4. Both work employs two parsers to process the unlabeled data, and only select as extra training data sentences on which the 1-best parse trees of the two parsers are identical.
    Page 1, “Introduction”
  5. Different from traditional self/co/tri-training which only use l-best parse trees on unlabeled data, our approach adopts ambiguous labelings, represented by parse forest, as gold-standard for unlabeled sentences.
    Page 2, “Introduction”
  6. The forest is formed by two parse trees , respectively shown at the upper and lower sides of the sentence.
    Page 2, “Introduction”
  7. The differences between the two parse trees are highlighted using dashed arcs.
    Page 2, “Introduction”
  8. First, noise in unlabeled data is largely alleviated, since parse forest encodes only a few highly possible parse trees with high oracle score.
    Page 2, “Introduction”
  9. Please note that the parse forest in Figure 1 contains four parse trees after combination of the two different choices.
    Page 2, “Introduction”
  10. Finally, with sufficient unlabeled data, it is possible that the parser can learn to resolve such uncertainty by biasing to more reasonable parse trees .
    Page 2, “Introduction”
  11. The l-best parse trees of these three parsers are aggregated in different ways.
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

semi-supervised

Appears in 23 sentences as: semi-supervised (23)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. This paper proposes a simple yet effective framework for semi-supervised dependency parsing at entire tree level, referred to as ambiguity-aware ensemble training.
    Page 1, “Abstract”
  2. Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training.
    Page 1, “Abstract”
  3. In contrast, semi-supervised approaches, which can make use of large-scale unlabeled data, have attracted more and more interest.
    Page 1, “Introduction”
  4. To solve above issues, this paper proposes a more general and effective framework for semi-supervised dependency parsing, referred to as ambiguity-aware ensemble training.
    Page 2, “Introduction”
  5. We propose a generalized ambiguity-aware ensemble training framework for semi-supervised dependency parsing, which can
    Page 2, “Introduction”
  6. We first employ a generative constituent parser for semi-supervised dependency parsing.
    Page 3, “Introduction”
  7. Experiments show that the constituent parser is very helpful since it produces more divergent structures for our semi-supervised parser than discriminative dependency parsers.
    Page 3, “Introduction”
  8. In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard.
    Page 4, “Ambiguity-aware Ensemble Training”
  9. We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers.
    Page 5, “Ambiguity-aware Ensemble Training”
  10. For the semi-supervised parsers trained with Algorithm 1, we use N1 = 20K and M1 = 50K for English, and N1 = 15K and M1 = 50K for Chinese, based on a few preliminary experiments.
    Page 6, “Experiments and Analysis”
  11. For semi-supervised cases, one iteration takes about 2 hours on an IBM server having 2.0 GHz Intel Xeon CPUs and 72G memory.
    Page 6, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention semi-supervised.

See all papers in Proc. ACL that mention semi-supervised.

Back to top.

Berkeley Parser

Appears in 15 sentences as: Berkeley parse (1) Berkeley Parser (14)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Default parameter settings are used for training ZPar and Berkeley Parser .
    Page 6, “Experiments and Analysis”
  2. For Berkeley Parser , we use the model after 5 split-merge iterations to avoid over-fitting the training data according to the manual.
    Page 6, “Experiments and Analysis”
  3. The phrase-structure outputs of Berkeley Parser are converted into dependency structures using the same head-finding rules.
    Page 6, “Experiments and Analysis”
  4. Using unlabeled data with the results of Berkeley Parser (“Unlabeled <— B”) significantly improves parsing accuracy by 0.55% (93.40-92.85) on English and 1.06% (83.34-82.28) on Chinese.
    Page 6, “Experiments and Analysis”
  5. We believe the reason is that being a generative model designed for constituent parsing, Berkeley Parser is more different from discriminative dependency parsers, and therefore can provide more divergent syntactic structures.
    Page 6, “Experiments and Analysis”
  6. beled sentences on which Berkeley Parser and ZPar produce identical outputs (“Unlabeled <—B=Z”).
    Page 7, “Experiments and Analysis”
  7. “Unlabeled <—B+(ZflG)” means that the parse forest is initialized with the Berkeley parse and augmented with the intersection of dependencies of the 1-best outputs of ZPar and GParser.
    Page 7, “Experiments and Analysis”
  8. Combining the outputs of Berkeley Parser and
    Page 7, “Experiments and Analysis”
  9. GParser (“Unlabeled <— B+G”), we get higher oracle score (96.37% on English and 89.72% on Chinese) and higher syntactic divergence (1.085 candidate heads per word on English, and 1.188 on Chinese) than “Unlabeled <— Z+G”, which verifies our earlier discussion that Berkeley Parser produces more different structures than ZPar.
    Page 7, “Experiments and Analysis”
  10. However, it leads to slightly worse accuracy than co-training with Berkeley Parser (“Unlabeled <—B”).
    Page 7, “Experiments and Analysis”
  11. Combining the outputs of Berkeley Parser and ZPar (“Unlabeled <— B+Z”), we get the best performance on English, which is also significantly better than both co-training (“Unlabeled <— B”) and tri-training (“Unlabeled <— B=Z”) on both English and Chinese.
    Page 7, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention Berkeley Parser.

See all papers in Proc. ACL that mention Berkeley Parser.

Back to top.

graph-based

Appears in 12 sentences as: Graph-based (1) graph-based (11)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. For example, Koo and Collins (2010) and Zhang and McDonald (2012) show that incorporating higher-order features into a graph-based parser only leads to modest increase in parsing accuracy.
    Page 1, “Introduction”
  2. To construct parse forest on unlabeled data, we employ three supervised parsers based on different paradigms, including our baseline graph-based dependency parser, a transition-based dependency parser (Zhang and Nivre, 2011), and a generative constituent parser (Petrov and Klein, 2007).
    Page 2, “Introduction”
  3. The graph-based method views the problem as finding an optimal tree from a fully-connected directed graph (McDonald et al., 2005; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010), while the transition-based method tries to find a highest-scoring transition sequence that leads to a legal dependency tree (Yamada and Matsumoto, 2003; Nivre, 2003; Zhang and Nivre, 2011).
    Page 3, “Supervised Dependency Parsing”
  4. 2.1 Graph-based Dependency Parser (GParser)
    Page 3, “Supervised Dependency Parsing”
  5. In this work, we adopt the graph-based paradigm because it allows us to naturally derive conditional probability of a dependency tree (1 given a sentence X, which is required to compute likelihood of both labeled and unlabeled data.
    Page 3, “Supervised Dependency Parsing”
  6. Under the graph-based model, the score of a dependency tree is factored into the scores of small subtrees p.
    Page 3, “Supervised Dependency Parsing”
  7. Figure 2: Two types of scoring subtrees in our second-order graph-based parsers.
    Page 3, “Supervised Dependency Parsing”
  8. We adopt the second-order graph-based dependency parsing model of McDonald and Pereira (2006) as our core parser, which incorporates features from the two kinds of subtrees in Fig.
    Page 3, “Supervised Dependency Parsing”
  9. Previous work on graph-based dependency parsing mostly adopts linear models and perceptron based training procedures, which lack probabilistic explanations of dependency trees and do not need to compute likelihood of labeled training
    Page 3, “Supervised Dependency Parsing”
  10. McDonald and Pereira (2006) propose a second-order graph-based parser, but use a smaller feature set than our work.
    Page 8, “Experiments and Analysis”
  11. K00 and Collins (2010) propose a third-order graph-based parser.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

dependency tree

Appears in 9 sentences as: dependency tree (6) dependency trees (3)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Given an input sentence x = wowl...wn, the goal of dependency parsing is to build a dependency tree as depicted in Figure 1, denoted by d = {(h,m) :0 S h S 71,0 < m S n},where (h,m) indicates a directed arc from the head word 21);, to the modifier mm, and we is an artificial node linking to the root of the sentence.
    Page 3, “Supervised Dependency Parsing”
  2. The graph-based method views the problem as finding an optimal tree from a fully-connected directed graph (McDonald et al., 2005; McDonald and Pereira, 2006; Carreras, 2007; K00 and Collins, 2010), while the transition-based method tries to find a highest-scoring transition sequence that leads to a legal dependency tree (Yamada and Matsumoto, 2003; Nivre, 2003; Zhang and Nivre, 2011).
    Page 3, “Supervised Dependency Parsing”
  3. In this work, we adopt the graph-based paradigm because it allows us to naturally derive conditional probability of a dependency tree (1 given a sentence X, which is required to compute likelihood of both labeled and unlabeled data.
    Page 3, “Supervised Dependency Parsing”
  4. Under the graph-based model, the score of a dependency tree is factored into the scores of small subtrees p.
    Page 3, “Supervised Dependency Parsing”
  5. 2.1 Then the score of a dependency tree is:
    Page 3, “Supervised Dependency Parsing”
  6. Previous work on graph-based dependency parsing mostly adopts linear models and perceptron based training procedures, which lack probabilistic explanations of dependency trees and do not need to compute likelihood of labeled training
    Page 3, “Supervised Dependency Parsing”
  7. Assuming the feature weights w are known, the probability of a dependency tree (1 given an input sentence x is defined as:
    Page 4, “Supervised Dependency Parsing”
  8. where Z (x) is the normalization factor and 32(X) is the set of all legal dependency trees for X.
    Page 4, “Supervised Dependency Parsing”
  9. Since 32(xi) contains exponentially many dependency trees , direct calculation of the second term is prohibitive.
    Page 4, “Supervised Dependency Parsing”

See all papers in Proc. ACL 2014 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

labeled data

Appears in 9 sentences as: Labeled data (1) labeled data (8)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. With a conditional random field based probabilistic dependency parser, our training objective is to maximize mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
    Page 1, “Abstract”
  2. Such sentences can provide more discriminative instances for training which may be unavailable in labeled data .
    Page 2, “Introduction”
  3. Evaluation on labeled data shows the oracle accuracy of parse forest is much higher than that of l-best outputs of single parsers (see Table 3).
    Page 2, “Introduction”
  4. Finally, using a conditional random field (CRF) based probabilistic parser, we train a better model by maximizing mixed likelihood of labeled data and auto-parsed unlabeled data with ambiguous labelings.
    Page 2, “Introduction”
  5. not sufficiently covered in manually labeled data .
    Page 4, “Ambiguity-aware Ensemble Training”
  6. Since 13’ contains much more instances than D (1.7M vs. 40K for English, and 4M vs. 16K for Chinese), it is likely that the unlabeled data may overwhelm the labeled data during SGD training.
    Page 5, “Ambiguity-aware Ensemble Training”
  7. 1: Input: Labeled data D = {(Xi,di)}£\;1, and unlabeled data 13’ = {(u,,v,)}§‘:1;Parameters: I, N1, M1, b : Output: w : Initialization: Wm) = O, k: = 0; : forz' = 1 to I do {iterations} Randomly select N1 instances from D and M1 instances from D’ to compose a new dataset Di, and shuffle it.
    Page 5, “Ambiguity-aware Ensemble Training”
  8. These three parsers are trained on labeled data and then used to parse each unlabeled sentence.
    Page 5, “Ambiguity-aware Ensemble Training”
  9. The training objective is to maximize the mixed likelihood of both the labeled data and the auto-parsed unlabeled data with ambiguous labelings.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention labeled data.

See all papers in Proc. ACL that mention labeled data.

Back to top.

constituent parser

Appears in 8 sentences as: constituent parser (5) constituent parsers (1) constituent parsing (2)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Although working well on constituent parsing (McClosky et al., 2006; Huang and Harper, 2009), self-training is shown unsuccessful for dependency parsing (Spreyer and Kuhn, 2009).
    Page 1, “Introduction”
  2. To construct parse forest on unlabeled data, we employ three supervised parsers based on different paradigms, including our baseline graph-based dependency parser, a transition-based dependency parser (Zhang and Nivre, 2011), and a generative constituent parser (Petrov and Klein, 2007).
    Page 2, “Introduction”
  3. We first employ a generative constituent parser for semi-supervised dependency parsing.
    Page 3, “Introduction”
  4. Experiments show that the constituent parser is very helpful since it produces more divergent structures for our semi-supervised parser than discriminative dependency parsers.
    Page 3, “Introduction”
  5. Instead, we build a log-linear CRF-based dependency parser, which is similar to the CRF-based constituent parser of Finkel et al.
    Page 4, “Supervised Dependency Parsing”
  6. To construct parse forests for unlabeled data, we employ three diverse parsers, i.e., our baseline GParser, a transition-based parser (ZPar3) (Zhang and Nivre, 2011), and a generative constituent parser (Berkeley Parser4) (Petrov and Klein, 2007).
    Page 5, “Ambiguity-aware Ensemble Training”
  7. We believe the reason is that being a generative model designed for constituent parsing , Berkeley Parser is more different from discriminative dependency parsers, and therefore can provide more divergent syntactic structures.
    Page 6, “Experiments and Analysis”
  8. For future work, among other possible extensions, we would like to see how our approach performs when employing more diverse parsers to compose the parse forest of higher quality for the unlabeled data, such as the easy-first nondirectional dependency parser (Goldberg and Elhadad, 2010) and other constituent parsers (Collins and Koo, 2005; Charniak and Johnson, 2005; Finkel et al., 2008).
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention constituent parser.

See all papers in Proc. ACL that mention constituent parser.

Back to top.

UAS

Appears in 6 sentences as: UAS (6)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. We measure parsing performance using the standard unlabeled attachment score ( UAS ), excluding punctuation marks.
    Page 6, “Experiments and Analysis”
  2. Table 4: UAS comparison on English test data.
    Page 8, “Experiments and Analysis”
  3. UAS Li et al.
    Page 8, “Experiments and Analysis”
  4. Table 5: UAS comparison on Chinese test data.
    Page 8, “Experiments and Analysis”
  5. Unlabeled data UAS #Sent Len Head/Word Oracle
    Page 8, “Experiments and Analysis”
  6. UAS
    Page 9, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention UAS.

See all papers in Proc. ACL that mention UAS.

Back to top.

feature weights

Appears in 6 sentences as: feature weight (1) feature weights (5)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. 2; wdep/Sib are feature weight vectors; the dot product gives scores contributed by corresponding subtrees.
    Page 3, “Supervised Dependency Parsing”
  2. Assuming the feature weights w are known, the probability of a dependency tree (1 given an input sentence x is defined as:
    Page 4, “Supervised Dependency Parsing”
  3. The partial derivative with respect to the feature weights w is:
    Page 4, “Supervised Dependency Parsing”
  4. We apply L2-norm regularized SGD training to iteratively learn feature weights w for our CRF-based baseline and semi-supervised parsers.
    Page 5, “Ambiguity-aware Ensemble Training”
  5. We follow the implementation in CRFsuite.2 At each step, the algorithm approximates a gradient with a small subset of the training examples, and then updates the feature weights .
    Page 5, “Ambiguity-aware Ensemble Training”
  6. Once the feature weights w are learnt, we can
    Page 5, “Ambiguity-aware Ensemble Training”

See all papers in Proc. ACL 2014 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

subtrees

Appears in 5 sentences as: subtrees (5)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Under the graph-based model, the score of a dependency tree is factored into the scores of small subtrees p.
    Page 3, “Supervised Dependency Parsing”
  2. Figure 2: Two types of scoring subtrees in our second-order graph-based parsers.
    Page 3, “Supervised Dependency Parsing”
  3. We adopt the second-order graph-based dependency parsing model of McDonald and Pereira (2006) as our core parser, which incorporates features from the two kinds of subtrees in Fig.
    Page 3, “Supervised Dependency Parsing”
  4. 2; wdep/Sib are feature weight vectors; the dot product gives scores contributed by corresponding subtrees .
    Page 3, “Supervised Dependency Parsing”
  5. For syntactic features, we adopt those of Bohnet (2010) which include two categories corresponding to the two types of scoring subtrees in Fig.
    Page 3, “Supervised Dependency Parsing”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

POS tagging

Appears in 4 sentences as: POS tag (1) POS tagging (2) POS tags (1)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. ti denotes the POS tag of 10,-. b is an index between h and m. dir(z', j) and dist(i, j) denote the direction and distance of the dependency (i, j).
    Page 3, “Supervised Dependency Parsing”
  2. We build a CRF-based bigram part-of-speech (POS) tagger with the features described in (Li et al., 2012), and produce POS tags for all trairfldevelopment/test/unlabeled sets (10-way jackknifing for training sets).
    Page 6, “Experiments and Analysis”
  3. (2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts.
    Page 8, “Experiments and Analysis”
  4. Our approach can be combined with their work to utilize unlabeled data to improve both POS tagging and parsing simultaneously.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention POS tagging.

See all papers in Proc. ACL that mention POS tagging.

Back to top.

significantly outperforms

Appears in 4 sentences as: significantly outperforming (1) significantly outperforms (3)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Experimental results on benchmark data show that our method significantly outperforms the baseline supervised parser and other entire-tree based semi-supervised methods, such as self-training, co-training and tri-training.
    Page 1, “Abstract”
  2. Using unlabeled data with the results of ZPar (“Unlabeled <— Z”) significantly outperforms the baseline GParser by 0.30% (93.15-82.85) on English.
    Page 6, “Experiments and Analysis”
  3. However, we find that although the parser significantly outperforms the supervised GParser on English, it does not gain significant improvement over co-training with ZPar (“Unlabeled <— Z”) on both English and Chinese.
    Page 7, “Experiments and Analysis”
  4. (2012) and Bohnet and Nivre (2012) use joint models for POS tagging and dependency parsing, significantly outperforming their pipeline counterparts.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention significantly outperforms.

See all papers in Proc. ACL that mention significantly outperforms.

Back to top.

parsing model

Appears in 3 sentences as: parsing model (2) parsing models (1)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. The reason may be that dependency parsing models are prone to amplify previous mistakes during training on self-parsed unlabeled data.
    Page 1, “Introduction”
  2. We adopt the second-order graph-based dependency parsing model of McDonald and Pereira (2006) as our core parser, which incorporates features from the two kinds of subtrees in Fig.
    Page 3, “Supervised Dependency Parsing”
  3. (2013) adopt the higher-order parsing model of Carreras (2007), and Suzuki et al.
    Page 8, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

gold-standard

Appears in 3 sentences as: gold-standard (3)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. Different from traditional self/co/tri-training which only use l-best parse trees on unlabeled data, our approach adopts ambiguous labelings, represented by parse forest, as gold-standard for unlabeled sentences.
    Page 2, “Introduction”
  2. In standard entire-tree based semi-supervised methods such as self/co/tri-training, automatically parsed unlabeled sentences are used as additional training data, and noisy l-best parse trees are considered as gold-standard .
    Page 4, “Ambiguity-aware Ensemble Training”
  3. Here, “ambiguous labelings” mean an unlabeled sentence may have multiple parse trees as gold-standard reference, represented by parse forest (see Figure l).
    Page 4, “Ambiguity-aware Ensemble Training”

See all papers in Proc. ACL 2014 that mention gold-standard.

See all papers in Proc. ACL that mention gold-standard.

Back to top.

significant improvement

Appears in 3 sentences as: significant improvement (2) significantly improves (1)
In Ambiguity-aware Ensemble Training for Semi-supervised Dependency Parsing
  1. All above work leads to significant improvement on parsing accuracy.
    Page 1, “Introduction”
  2. Using unlabeled data with the results of Berkeley Parser (“Unlabeled <— B”) significantly improves parsing accuracy by 0.55% (93.40-92.85) on English and 1.06% (83.34-82.28) on Chinese.
    Page 6, “Experiments and Analysis”
  3. However, we find that although the parser significantly outperforms the supervised GParser on English, it does not gain significant improvement over co-training with ZPar (“Unlabeled <— Z”) on both English and Chinese.
    Page 7, “Experiments and Analysis”

See all papers in Proc. ACL 2014 that mention significant improvement.

See all papers in Proc. ACL that mention significant improvement.

Back to top.