Abstract | We compare two parsing models for temporal dependency structures, and show that a deterministic non-projective dependency parser outperforms a graph-based maximum spanning tree parser, achieving labeled attachment accuracy of 0.647 and labeled tree edit distance of 0.596. |
Corpus Annotation | train a temporal dependency parsing model . |
Evaluations | To evaluate the parsing models (SRP and MST) we proposed two baselines. |
Evaluations | In terms of labeled attachment score, both dependency parsing models outperformed the baseline models — the maximum spanning tree parser achieved 0.614 LAS, and the shift-reduce parser achieved 0.647 LAS. |
Evaluations | These results indicate that dependency parsing models are a good fit to our whole-story timeline extraction task. |
Feature Design | The full set of features proposed for both parsing models , derived from the state-of-the-art systems for temporal relation labeling, is presented in Table 2. |
Parsing Models | Formally, a parsing model is a function (W —> H) where W = 701702 . |
Parsing Models | 4.1 Shift-Reduce Parsing Model |
Parsing Models | 4.2 Graph-Based Parsing Model |
Character-Level Dependency Tree | (2012) also use Zhang and Clark (2010)’s features, the arc-standard and arc-eager character-level dependency parsing models have the same features for joint word segmentation and PCS-tagging. |
Character-Level Dependency Tree | The first consists of a joint segmentation and PCS-tagging model (Zhang and Clark, 2010) and a word-based dependency parsing model using the arc-standard algorithm (Huang et al., 2009). |
Character-Level Dependency Tree | The second consists of the same joint segmentation and PCS-tagging model and a word-based dependency parsing model using the arc-eager algorithm |
Introduction | For direct comparison with word-based parsers, we incorporate the traditional word segmentation, POS-tagging and dependency parsing stages in our joint parsing models . |
Introduction | Experimental results show that the character-level dependency parsing models outperform the word-based methods on all the data sets. |
Abstract | Third, to enhance the power of parsing models , we enlarge the feature set with nonlocal features and semi-supervised word cluster features. |
Introduction | This creates a chicken and egg problem that needs to be addressed when designing a parsing model . |
Introduction | Second, due to the existence of unary rules in constituent trees, competing candidate parses often have different number of actions, and this increases the disambiguation difficulty for the parsing model . |
Introduction | With this strategy, parser states and their unary extensions are put into the same beam, therefore the parsing model could decide whether or not to use unary actions within local decision beams. |
Joint POS Tagging and Parsing with Nonlocal Features | To address the drawbacks of the standard transition-based constituent parsing model (described in Section 1), we propose a model to jointly solve POS tagging and constituent parsing with nonlocal features. |
Joint POS Tagging and Parsing with Nonlocal Features | This makes the lengths of complete action sequences very different, and the parsing model has to disambiguate among terminal states with varying action sizes. |
Joint POS Tagging and Parsing with Nonlocal Features | We find that our new method aligns states with their ru—x extensions in the same beam, therefore the parsing model could make decisions on whether using ru—x actions or not within local decision |
Related Work | Finally, we enhanced our parsing model by enlarging the feature set with nonlocal features and semi-supervised word cluster features. |
Transition-based Constituent Parsing | This section describes the transition-based constituent parsing model , which is the basis of Section 3 and the baseline model in Section 4. |
Transition-based Constituent Parsing | 2.1 Transition-based Constituent Parsing Model |
Transition-based Constituent Parsing | A transition-based constituent parsing model is a quadruple C = (S, T, 30, St), where S is a set of parser states (sometimes called configurations), T is a finite set of actions, so is an initialization function to map each input sentence into a unique initial state, and St E S is a set of terminal states. |
Abstract | We train probabilistic parsing models for resource-poor languages by transferring cross-lingual knowledge from resource-rich language with entropy regularization. |
Introduction | We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data. |
Our Approach | Central to our approach is a maximizing likelihood learning framework, in which we use an English parser and parallel text to estimate the “transferring distribution” of the target language parsing model (See Section 2.2 for more details). |
Our Approach | 2.1 Edge-Factored Parsing Model |
Our Approach | A common strategy to make this parsing model efficiently computable is to factor dependency trees into sets of edges: |
Character-based Chinese Parsing | To produce character-level trees for Chinese NLP tasks, we develop a character-based parsing model , which can jointly perform word segmentation, POS tagging and phrase-structure parsing. |
Character-based Chinese Parsing | Our character-based Chinese parsing model is based on the work of Zhang and Clark (2009), which is a transition-based model for lexicalized constituent parsing. |
Experiments | The character-level parsing model has the advantage that deep character information can be extracted as features for parsing. |
Experiments | Zhang and Clark (2010), and the phrase-structure parsing model of Zhang and Clark (2009). |
Experiments | The phrase-structure parsing model is trained with a 64-beam. |
Introduction | We build a character-based Chinese parsing model to parse the character-level syntax trees. |
Related Work | Our character-level parsing model is inspired by the work of Zhang and Clark (2009), which is a transition-based model with a beam-search decoder for word-based constituent parsing. |
Introduction | To address this limitation, as the first contribution, we propose a novel document-level discourse parser based on probabilistic discriminative parsing models , represented as Conditional Random Fields (CRFs) (Sutton et al., 2007), to infer the probability of all possible DT constituents. |
Introduction | Two separate parsing models could exploit the fact that rhetorical relations are distributed differently intra-sententially vs. multi-sententially. |
Our Discourse Parsing Framework | Both of our parsers have the same two components: a parsing model assigns a probability to every possible DT, and a parsing algorithm identifies the most probable DT among the candidate DTs in that scenario. |
Our Discourse Parsing Framework | Before describing our parsing models and the parsing algorithm, we introduce some terminology that we will use throughout the paper. |
Parsing Models and Parsing Algorithm | The job of our intra-sentential and multi-sentential parsing models is to assign a probability to each of the constituents of all possible DTs at the sentence level and at the document level, respectively. |
Parsing Models and Parsing Algorithm | Formally, given the model parameters 9, for each possible constituent R[z’, m, j] in a candidate DT at the sentence or document level, the parsing model estimates P(R[z’, m, j] |@), which specifies a joint distribution over the label R and the structure [i, m, j] of the constituent. |
Parsing Models and Parsing Algorithm | 4.1 Intra-Sentential Parsing Model |
Abstract | Based on such TPs, we design quasi-synchronous grammar features to augment the baseline parsing models . |
Dependency Parsing | In the current research, we adopt the graph—based parsing models for their state—of—the—art performance in a variety of languages.3 Graph—based models view the problem as finding the highest scoring tree from a directed graph. |
Dependency Parsing | We implement three parsing models of varying strengths in capturing features to better understand the effect of the proposed QG features. |
Dependency Parsing | parsing models (Yamada and Matsumoto, 2003; Nivre, 2003) with minor modifications. |
Dependency Parsing with QG Features | Figure 4 presents the three kinds of TPs used in our model, which correspond to the three scoring parts of our parsing models . |
Dependency Parsing with QG Features | Based on these TPs, we propose the QG features for enhancing the baseline parsing models , which are shown in Table 2. |
Dependency Parsing with QG Features | The type of the TP is conjoined with the related words and POS tags, such that the QG—enhanced parsing models can make more elaborate decisions based on the context. |
Introduction | Therefore, studies have recently resorted to other resources for the enhancement of parsing models , such as large—scale unlabeled data (Koo et al., 2008; Chen et al., 2009; Bansal and Klein, 2011; Zhou et al., 2011), and bilingual texts or cross—lingual treebanks (Burkett and Klein, 2008; Huang et al., 2009; Burkett et al., 2010; Chen et al., 2010). |
Related Work | enhanced parsing models to softly learn the systematic inconsistencies based on QG features, making our approach simpler and more robust. |
Related Work | Our approach is also intuitively related to stacked learning (SL), a machine learning framework that has recently been applied to dependency parsing to integrate two mainstream parsing models , i.e., graph—based and transition—based models (Nivre and McDonald, 2008; Martins et al., 2008). |
Abstract | Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. |
Abstract | In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search. |
Abstract | Based on the dependency language model, we represent a set of features for the parsing model . |
Dependency language model | In this paper, we use a linear model to calculate the scores for the parsing models (defined in Section 3.1). |
Introduction | Among them, graph-based dependency parsing models have achieved state-of-the-art performance for a wide range of Ian-guages as shown in recent CoNLL shared tasks |
Introduction | The parsing model searches for the final dependency trees by considering the original scores and the scores of DLM. |
Introduction | The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model . |
Abstract | Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model , which requires gold standard trees on the both sides. |
Bilingual subtree constraints | Finally, we design the bilingual subtree features based on the mapping rules for the parsing model . |
Bilingual subtree constraints | However, as described in Section 4.3.1, the generated subtrees are verified by looking up list ST): before they are used in the parsing models . |
Conclusion | based parsing models (Nivre, 2003; Yamada and Matsumoto, 2003). |
Dependency parsing | For dependency parsing, there are two main types of parsing models (Nivre and McDonald, 2008; Nivre and Kubler, 2006): transition-based (Nivre, 2003; Yamada and Matsumoto, 2003) and graph-based (McDonald et al., 2005; Carreras, 2007). |
Dependency parsing | Our approach can be applied to both parsing models . |
Dependency parsing | In this paper, we employ the graph-based MST parsing model proposed by McDonald and Pereira |
Introduction | Based on the mapping rules, we design a set of features for parsing models . |
Experiment | Our parsing models are evaluated on both English and Chinese treebanks, i.e., the WSJ section of Penn Treebank 3.0 (LDC99T42) and the Chinese Treebank 5.1 (LDC2005T01U01). |
Experiment | The parameters 6 of each parsing model are estimated from a training set using an averaged perceptron algorithm, following Collins (2002) and Huang (2008). |
Experiment | The performance of our first— and higher-order parsing models on all sentences of the two test sets is presented in Table 3, where A indicates a tuned balance factor. |
Higher-order Constituent Parsing | The first feature ¢0(Q(r), s) is calculated with a PCFG-based generative parsing model (Petrov and Klein, 2007), as defined in (4) below, where 7“ is the grammar rule instance A —> B C that covers the span from the b-th |
Higher-order Constituent Parsing | With only lexical features in a part, this parsing model backs off to a first-order one similar to those in the previous works. |
Higher-order Constituent Parsing | Adding structural features, each involving a least a neighboring rule instance, makes it a higher-order parsing model . |
Introduction | Previous discriminative parsing models usually factor a parse tree into a set of parts. |
Introduction | Then, the previous discriminative constituent parsing models (Johnson, 2001; Henderson, 2004; Taskar et al., 2004; Petrov and Klein, 2008a; |
Abstract | The dependency backbone of an HP SG analysis is used to provide general linguistic insights which, when combined with state-of-the-art statistical dependency parsing models , achieves performance improvements on out-domain testsflL |
Dependency Parsing with HPSG | One is to extract dependency backbone from the HP SG analyses of the sentences and directly convert them into the target representation; the other way is to encode the HP SG outputs as additional features into the existing statistical dependency parsing models . |
Experiment Results & Error Analyses | To evaluate the performance of our different dependency parsing models , we tested our approaches on several dependency treebanks for English in a similar spirit to the CoNLL 2006-2008 Shared Tasks. |
Experiment Results & Error Analyses | The larger part is converted from the Penn Treebank Wall Street Journal Sections #2—#21, and is used for training statistical dependency parsing models ; the smaller part, which covers sentences from Section #23, is used for testing. |
Introduction | In combination with machine learning methods, several statistical dependency parsing models have reached comparable high parsing accuracy (McDonald et al., 2005b; Nivre et al., 2007b). |
Parser Domain Adaptation | Granted for the differences between their approaches, both systems heavily rely on machine learning methods to estimate the parsing model from an annotated corpus as training set. |
Parser Domain Adaptation | Due to the heavy cost of developing high quality large scale syntactically annotated corpora, even for a resource-rich language like English, only very few of them meets the criteria for training a general purpose statistical parsing model . |
Parser Domain Adaptation | Figure 1: Different dependency parsing models and their combinations. |
Bilingually-Guided Dependency Grammar Induction | Use the parsing model to build new treebank on target language for next iteration. |
Bilingually-Guided Dependency Grammar Induction | With this approach, we can optimize the mixed parsing model by maximizing the objective in Formula (9). |
Introduction | We evaluate the final automatically-induced dependency parsing model on 5 languages. |
Unsupervised Dependency Grammar Induction | And the framework of our unsupervised model builds a random treebank on the monolingual corpus firstly for initialization and trains a discriminative parsing model on it. |
Unsupervised Dependency Grammar Induction | Algorithm 1 outlines the unsupervised training in its entirety, where the treebank DE and unsupervised parsing model with A are updated iteratively. |
Unsupervised Dependency Grammar Induction | In line 1 we build a random treebank DE on the monolingual corpus, and then train the parsing model with it (line 2) through a training procedure train(-, which needs DE and 15F as classification instances. |
Conclusions and future work | Second, simply treating POS tags within a small window of the verb as pseudo-GRs produces state-of-the-art results without the need for a parsing model . |
Introduction | However, the treebanks necessary for training a high-accuracy parsing model are expensive to build for new domains. |
Previous work | These typically rely on language-specific knowledge, either directly through heuristics, or indirectly through parsing models trained on treebanks. |
Previous work | Note that both methods require extensive manual work: the Preiss system involves the a priori definition of the SCF inventory, careful construction of matching rules, and an unlexicalized parsing model . |
Previous work | The BioLexicon system induces its SCF inventory automatically, but requires a lexicalized parsing model , rendering it more sensitive to domain variation. |
Results | Since POS tagging is more reliable and robust across domains than parsing, retraining on new domains will not suffer the effects of a mismatched parsing model (Lippincott et al., 2010). |
CCG and Supertagging | However, the technique is inherently approximate: it will return a lower probability parse under the parsing model if a higher probability parse can only be constructed from a supertag sequence returned by a subsequent iteration. |
Integrated Supertagging and Parsing | An obvious way to exploit this without being bound by its decisions is to incorporate these features directly into the parsing model . |
Integrated Supertagging and Parsing | We apply both techniques to our integrated supertagging and parsing model . |
Integrated Supertagging and Parsing | Our parsing model is also a distribution over variables Ti, along with an additional quadratic number of span(i, j ) variables. |
Oracle Parsing | Digging deeper, we compared parser model score against Viterbi F—score and oracle F-score at a va- |
Introduction | 3 A Maximum Entropy Based Shift-Reduce Parsing Model |
Introduction | 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model; 10. maximum entropy parsing model . |
Introduction | Table 3: Contribution of maximum entropy shift-reduce parsing model . |
Background | As our baseline parsers, we use two state-of-the-art lexicalised parsing models , namely the Bikel parser (Bikel, 2004) and Charniak parser (Charniak, 2000). |
Background | While a detailed description of the respective parsing models is beyond the scope of this paper, it is worth noting that both parsers induce a context free grammar as well as a generative parsing model from a training set of parse trees, and use a development set to tune internal parameters. |
Background | (2005) experimented with first-sense and hypernym features from HowNet and CiLin (both WordNets for Chinese) in a generative parse model applied to the Chinese Penn Treebank. |
Discussion | Tighter integration of semantics into the parsing models , possibly in the form of discriminative reranking models (Collins and Koo, 2005; Chamiak and J ohn-son, 2005; McClosky et al., 2006), is a promising way forward in this regard. |
Introduction | Given our simple procedure for incorporating lexical semantics into the parsing process, our hope is that this research will open the door to further gains using more sophisticated parsing models and richer semantic options. |
Abstract | Based on an extension of the incremental joint model for POS tagging and dependency parsing (Hatori et al., 2011), we propose an efficient character-based decoding method that can combine features from state-of-the-art segmentation, POS tagging, and dependency parsing models . |
Model | Based on the joint POS tagging and dependency parsing model by Hatori et al. |
Model | 0 TagDep: the joint POS tagging and dependency parsing model (Hatori et al., 2011), where the lookahead features are omitted.5 |
Related Works | Therefore, we place no restriction on the segmentation possibilities to consider, and we assess the full potential of the joint segmentation and dependency parsing model . |
Related Works | The incremental framework of our model is based on the joint POS tagging and dependency parsing model for Chinese (Hatori et al., 2011), which is an extension of the shift-reduce dependency parser with dynamic programming (Huang and Sagae, 2010). |
Dependency Parsing | The parsing model can be defined as a conditional distribution p(y|x; w) over each projective parse tree 3/ for a particular sentence X, parameterized by a vector w. The probability of a parse tree is |
Introduction | By leveraging some assistant data, the dependency parsing model can directly utilize the additional information to capture the word-to-word level relationships. |
Web-Derived Selectional Preference Features | If both PMI features exist and PMIW-thwit, bat) > PMIW-thwall, bat), indicating to our dependency parsing model that the preposition word with depends on the verb hit is a good choice. |
Web-Derived Selectional Preference Features | Web-derived selectional preference features based on PMI values are trickier to incorporate into the dependency parsing model because they are continuous rather than discrete. |
Web-Derived Selectional Preference Features | Log-linear dependency parsing model is sensitive to inappropriately scaled feature. |
Experiments | prove comparability, we reimplemented this approach using our parsing model , which has richer features than were used in their paper. |
Parser Design | This section describes the Combinatory Categorial Grammar (CCG) parsing model used by ASP. |
Parser Design | The parser uses category and relation predicates from a broad coverage knowledge base both to construct logical forms and to parametrize the parsing model . |
Prior Work | The parsing model in this paper is loosely based on C&C (Clark and Curran, 2007b; Clark and Curran, 2007a), a discriminative log-linear model for statistical parsing. |
Approach | After filtering to identify well-behaved sentences and high confidence projected dependencies, we learn a probabilistic parsing model using the posterior regularization framework (Graca et al., 2008). |
Experiments | We conducted experiments on two languages: Bulgarian and Spanish, using each of the parsing models . |
Parsing Models | We explored two parsing models : a generative model used by several authors for unsupervised induction and a discriminative model used for fully supervised training. |
Parsing Models | The parsing model defines a conditional distribution p9(z | x) over each projective parse tree 2 for a particular sentence X, parameterized by a vector 6. |
Capturing Syntagmatic Relations via Constituency Parsing | We can see that the Bagging model taking both sequential tagging and chart parsing models as basic systems outperform the baseline systems and the Bagging model taking either model in isolation as basic systems. |
Capturing Syntagmatic Relations via Constituency Parsing | interesting phenomenon is that the Bagging method can also improve the parsing model , but there is a decrease while only combining taggers. |
Introduction | and a (syntax-based) chart parsing model . |
Base Models | When segment length is not restricted, the inference procedure is the same as that used in parsing (Finkel and Manning, 2009c).3 In this work we do not enforce a length restriction, and directly utilize the fact that the model can be transformed into a parsing model . |
Base Models | Our parsing model is the discriminatively trained, conditional random field-based context-free grammar parser (CRF-CFG) of (Finkel et al., 2008). |
Base Models | In the parsing model , the grammar consists of only the rules observed in the training data. |
Evaluation | We used the hybrid parsing model described in Clark and Curran (2007), and the Viterbi decoder to find the highest-scoring derivation. |
Evaluation | For the biomedical parser evaluation we have used the parsing model and grammatical relation conversion script from Rimell and Clark (2009). |
Results | In the first two experiments, we explore performance on the newswire domain, which is the source of training data for the parsing model and the baseline supertagging model. |
Experiments of Parsing | Finally we evaluated two parsing models , the generative parser and the reranking parser, on the test set, with results shown in Table 5. |
Experiments of Parsing | A possible reason is that most of non-perfect parses can provide useful syntactic structure information for building parsing models . |
Our Two-Step Solution | After grammar formalism conversion, the problem now we face has been limited to how to build parsing models on multiple homogeneous treebank. |
Experimental Setup | For our parser, we train both a first-order parsing model (as described in Section 3 and 4) as well as a third-order model. |
Problem Formulation | We expect a dependency parsing model to benefit from several aspects of the low-rank tensor scoring. |
Problem Formulation | Combined Scoring Our parsing model aims to combine the strengths of both traditional features from the MST/Turbo parser as well as the new low-rank tensor features. |
Experiments and Analysis | (2013) adopt the higher-order parsing model of Carreras (2007), and Suzuki et al. |
Introduction | The reason may be that dependency parsing models are prone to amplify previous mistakes during training on self-parsed unlabeled data. |
Supervised Dependency Parsing | We adopt the second-order graph-based dependency parsing model of McDonald and Pereira (2006) as our core parser, which incorporates features from the two kinds of subtrees in Fig. |
Ensuring Meaning Composition | Both subtasks require a training set of NLs paired with their MRs. Each NL sentence also requires a syntactic parse generated using Bikel’s (2004) implementation of Collins parsing model 2. |
Experimental Evaluation | First Bikel’s implementation of Collins parsing model 2 was trained to generate syntactic parses. |
Introduction | 1Ge and Mooney (2005) use training examples with semantically annotated parse trees, and Zettlemoyer and Collins (2005) learn a probabilistic semantic parsing model |
Conclusion | Directions for future research include a more detailed analysis of the effect of feature-based integration, as well as the exploration of other strategies for integrating different parsing models . |
Introduction | Many of these parsers are based on data-driven parsing models , which learn to produce dependency graphs for sentences solely from an annotated corpus and can be easily ported to any |
Related Work | The combined parsing model is essentially an instance of the graph-based model, where arc scores are derived from the output of the different component parsers. |