Abstract | We present a novel approach, called selectional branching, which uses confidence estimates to decide when to employ a beam, providing the accuracy of beam search at speeds close to a greedy transition-based dependency parsing approach. |
Abstract | We also present a new transition-based dependency parsing algorithm that gives a complexity of 0(n) for projective parsing and an expected linear time speed for non-projective parsing. |
Experiments | For English, we mostly adapt features from Zhang and Nivre (2011) who have shown state-of-the-art parsing accuracy for transition-based dependency parsing . |
Experiments | Bohnet and Nivre (2012)’s transition-based system jointly performs POS tagging and dependency parsing , which shows higher accuracy than ours. |
Introduction | Transition-based dependency parsing has gained considerable interest because it runs fast and performs accurately. |
Introduction | Greedy transition-based dependency parsing has been widely deployed because of its speed (Cer et a1., 2010); however, state-of—the-art accuracies have been achieved by globally optimized parsers using beam search (Zhang and Clark, 2008; Huang and Sagae, 2010; Zhang and Nivre, 2011; Bohnet and Nivre, 2012). |
Introduction | Coupled with dynamic programming, transition-based dependency parsing with beam search can be done very efficiently and gives significant improvement to parsing accuracy. |
Related work | There are other transition-based dependency parsing algorithms that take a similar approach; Nivre (2009) integrated a SWAP transition into Nivre’s arc-standard algorithm (Nivre, 2004) and Fernandez-Gonzalez and Gomez-Rodriguez (2012) integrated a buffer transition into Nivre’s arc-eager algorithm to handle non-projectivity. |
Related work | Our selectional branching method is most relevant to Zhang and Clark (2008) who introduced a transition-based dependency parsing model that uses beam search. |
Transition-based dependency parsing | We introduce a transition-based dependency parsing algorithm that is a hybrid between Nivre’s arc-eager and list-based algorithms (Nivre, 2003; Nivre, 2008). |
Transition-based dependency parsing | 2The parsing complexity of a transition-based dependency parsing algorithm is determined by the number of transitions performed with respect to the number of tokens in a sentence, say n (Kubler et a1., 2009). |
Discussion | The performance of our methods depends not only on the quality of the induced tag sets but also on the performance of the dependency parser learned in Step 3 of Section 4.1. |
Discussion | Thus we split the 10,000 data into the first 9,000 data for training and the remaining 1,000 for testing, and then a dependency parser was learned in the same way as in Step 3. |
Experiment | In the training process, the following steps are performed sequentially: preprocessing, inducing a POS tagset for a source language, training a POS tagger and a dependency parser , and training a forest-to-string MT model. |
Experiment | The Japanese sentences are parsed using CaboCha (Kudo and Matsumoto, 2002), which generates dependency structures using a phrasal unit called a bunsetsug, rather than a word unit as in English or Chinese dependency parsing . |
Experiment | Training a POS Tagger and a Dependency Parser |
Introduction | In recent years, syntax-based SMT has made promising progress by employing either dependency parsing (Lin, 2004; Ding and Palmer, 2005; Quirk et al., 2005; Shen et al., 2008; Mi and Liu, 2010) or constituency parsing (Huang et al., 2006; Liu et al., 2006; Galley et al., 2006; Mi and Huang, 2008; Zhang et al., 2008; Cohn and Blunsom, 2009; Liu et al., 2009; Mi and Liu, 2010; Zhang et al., 2011) on the source side, the target side, or both. |
Introduction | However, dependency parsing , which is a popular choice for Japanese, can incorporate only shallow syntactic information, i.e., POS tags, compared with the richer syntactic phrasal categories in constituency parsing. |
Abstract | In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. |
Easy-first dependency parsing | The easy-first dependency parsing algorithm (Goldberg and Elhadad, 2010) builds a dependency tree by performing two types of actions LEFT(i) and RIGHT(i) to a list of subtree structures p1,. |
Introduction | The easy-first dependency parsing algorithm (Goldberg and Elhadad, 2010) is attractive due to its good accuracy, fast speed and simplicity. |
Introduction | However, to the best of our knowledge, no work in the literature has ever applied the two techniques to easy-first dependency parsing . |
Introduction | While applying beam-search is relatively straightforward, the main difficulty comes from combining easy-first dependency parsing with perceptron-based global learning. |
Training | 4 As shown in (Goldberg and Nivre 2012), most transition-based dependency parsers (Nivre et al., 2003; Huang and Sagae 2010;Zhang and Clark 2008) ignores spurious ambiguity by using a static oracle which maps a dependency tree to a single action sequence. |
Training | Table 1: Feature templates for English dependency parsing . |
Abstract | Shift-reduce dependency parsers give comparable accuracies to their chart-based counterparts, yet the best shift-reduce constituent parsers still lag behind the state-of-the-art. |
Improved hypotheses comparison | Unlike dependency parsing , constituent parse trees for the same sentence can have different numbers of nodes, mainly due to the existence of unary nodes. |
Introduction | Various methods have been proposed to address the disadvantages of greedy local parsing, among which a framework of beam-search and global discriminative training have been shown effective for dependency parsing (Zhang and Clark, 2008; Huang and Sagae, 2010). |
Introduction | With the use of rich nonlocal features, transition-based dependency parsers achieve state-of-the-art accuracies that are comparable to the best-graph-based parsers (Zhang and Nivre, 2011; Bohnet and Nivre, 2012). |
Introduction | In addition, processing tens of sentences per second (Zhang and Nivre, 2011), these transition-based parsers can be a favorable choice for dependency parsing . |
Semi-supervised Parsing with Large Data | Word clusters are regarded as lexical intermediaries for dependency parsing (Koo et al., 2008) and POS tagging (Sun and Uszkoreit, 2012). |
Semi-supervised Parsing with Large Data | The idea of exploiting lexical dependency information from auto-parsed data has been explored before for dependency parsing (Chen et al., 2009) and constituent parsing (Zhu et al., 2012). |
Semi-supervised Parsing with Large Data | (2008) and is used as additional information for graph-based dependency parsing in Chen et al. |
Abstract | We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. |
Concluding Remarks | In the context of transition-based dependency parsers , right spines have also been exploited by Kitagawa and Tanaka—Ishii (2010) to decide where to attach the next word from the buffer. |
Dependency Parser | Transition-based dependency parsers use a stack data structure, where each stack element is associated with a tree spanning some (contiguous) substring of the input 212. |
Dependency Parser | We assume the reader is familiar with the formal framework of transition-based dependency parsing originally introduced by Nivre (2003); see Nivre (2008) for an introduction. |
Introduction | This development is probably due to many factors, such as the increased availability of dependency treebanks and the perceived usefulness of dependency structures as an interface to downstream applications, but a very important reason is also the high efficiency offered by dependency parsers , enabling web-scale parsing with high throughput. |
Introduction | However, while these parsers are capable of processing tens of thousands of tokens per second with the right choice of classifiers, they are also known to perform slightly below the state-of-the-art because of search errors and subsequent error propagation (McDonald and Nivre, 2007), and recent research on transition-based dependency parsing has therefore explored different ways of improving their accuracy. |
Model and Training | Standard transition-based dependency parsers are trained by associating each gold tree with a canonical complete computation. |
Model and Training | In the context of dependency parsing , the strategy of delaying arc construction when the current configuration is not informative is called the easy-first strategy, and has been first explored by Goldberg and Elhadad (2010). |
Static vs. Dynamic Parsing | In the context of dependency parsing , a parsing strategy is called purely bottom-up if every dependency h —> d is constructed only after all dependencies of the form d —> i have been constructed. |
Static vs. Dynamic Parsing | If we consider transition-based dependency parsing (Nivre, 2008), the purely bottom-up strategy is implemented by the arc-standard model of Nivre (2004). |
Catenae as semantic units | 1 shows a dependency parse that generates 21 catenae in total: (using 2' for Xi) 1, 2, 3, 4, 5, 6, 12, 23, 34, 45, 56, 123, 234, 345, 456, 1234, 2345, 3456, 12345, 23456, 123456. |
Catenae as semantic units | This highlights the fact that a single dependency parse may only partially represent the ambiguous semantics of a query. |
Conclusion | We presented a flexible implementation of dependency paths for long queries in ad hoc IR that does not require dependency parsing a collection. |
Introduction | These approaches are motivated by the idea that sentence meaning can be flexibly captured by the syntactic and semantic relations between words, and encoded in dependency parse tree fragments. |
Selection method for catenae | We use a pseudo-projective joint dependency parse and semantic role labelling system (J ohansson and |
Selection method for catenae | Nugues, 2008) to generate the dependency parse . |
Selection method for catenae | However, any dependency parser may be applied instead. |
Abstract | Even though the quality of unsupervised dependency parsers grows, they often fail in recognition of very basic dependencies. |
Conclusions and Future Work | We proved that such prior knowledge about stop-probabilities incorporated into the standard DMV model significantly improves the unsupervised dependency parsing and, since we are not aware of any other fully unsupervised dependency parser with higher average attachment score over CoNLL data, we state that we reached a new state-of-the-art result.5 |
Conclusions and Future Work | We suppose that many of the current works on unsupervised dependency parsers use gold POS tags only as a simplification of this task, and that the ultimate purpose of this effort is to develop a fully unsupervised induction of linguistic structure from raw texts that would be useful across many languages, domains, and applications. |
Introduction | The task of unsupervised dependency parsing (which strongly relates to the grammar induction task) has become popular in the last decade, and its quality has been greatly increasing during this period. |
Related Work | We have directly utilized the aforementioned criteria for dependency relations in unsupervised dependency parsing in our previous paper (Marecek and Zabokrtsky, 2012). |
Related Work | Dependency Model with Valence (DMV) has been the most popular approach to unsupervised dependency parsing in the recent years. |
Related Work | Other approaches to unsupervised dependency parsing were described e.g. |
Conclusions | Additionally, this work introduces a parser-centric view of normalization, in which the performance of the normalizer is directly tied to the performance of a downstream dependency parser . |
Conclusions | Using this metric, this work established that, when dependency parsing is the goal, typical word-to-word normalization approaches are insufficient. |
Discussion | The results in Section 5.2 establish a point that has often been assumed but, to the best of our knowledge, has never been explicitly shown: performing normalization is indeed beneficial to dependency parsing on informal text. |
Evaluation | The goal is to evaluate the framework in two aspects: (1) usefulness for downstream applications (specifically dependency parsing ), and (2) domain adaptability. |
Evaluation | We then run an off-the-shelf dependency parser on the gold standard normalized data to produce our gold standard parses. |
Evaluation | These results validate the hypothesis that simple word-to-word normalization is insufficient if the goal of normalization is to improve dependency parsing ; even if a system could produce perfect word-to-word normalization, it would produce lower quality parses than those produced by our approach. |
Introduction | To address this problem, this work introduces an evaluation metric that ties normalization performance directly to the performance of a downstream dependency parser . |
Background and Motivation | (2011) successfully apply this idea to the transfer of dependency parsers , using part-of-speech tags as the shared representation of words. |
Evaluation | In the low-resource setting, we cannot always rely on the availability of an accurate dependency parser for the target language. |
Model Transfer | If a target language is poor in resources, one can obtain a dependency parser for the target language by means of cross-lingual model transfer (Zeman and Resnik, 2008). |
Related Work | Cross-lingual annotation projection (Yarowsky et al., 2001) approaches have been applied extensively to a variety of tasks, including POS tagging (Xi and Hwa, 2005; Das and Petrov, 2011), morphology segmentation (Snyder and Barzilay, 2008), verb classification (Merlo et al., 2002), mention detection (Zitouni and Florian, 2008), LFG parsing (Wroblewska and Frank, 2009), information extraction (Kim et al., 2010), SRL (Pado and Lapata, 2009; van der Plas et al., 2011; Annesi and Basili, 2010; Tonelli and Pi-anta, 2008), dependency parsing (Naseem et al., 2012; Ganchev et al., 2009; Smith and Eisner, 2009; Hwa et al., 2005) or temporal relation pre- |
Results | Secondly, in the model transfer setup it is more important how closely the syntactic-semantic interface on the target side resembles that on the source side than how well it matches the “true” structure of the target language, and in this respect a transferred dependency parser may have an advantage over one trained on target-language data. |
Setup | With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2). |
Abstract | As an illustrative case, we study a generative model for dependency parsing . |
Experiments | All our experiments use the DMV for unsupervised dependency parsing of part-of-speech (POS) tag sequences. |
Introduction | We focus on the well-studied but unsolved task of unsupervised dependency parsing (i.e., depen- |
Related Work | Gimpel and Smith (2012) proposed a concave model for unsupervised dependency parsing using IBM Model 1. |
Related Work | Several integer linear programming (ILP) formulations of dependency parsing (Riedel and Clarke, 2006; Martins et al., 2009; Riedel et al., 2012) inspired our definition of grammar induction as a MP. |
Related Work | For semi-supervised dependency parsing , Wang et al. |
Conclusion | A comparison of the parsing accuracy with previous works on Japanese dependency parsing and English CCG parsing indicates that our parser can analyze real-world Japanese texts fairly well and that there is room for improvement in disambiguation models. |
Evaluation | The integrated corpus is divided into training, development, and final test sets following the standard data split in previous works on Japanese dependency parsing (Kudo and Matsumoto, 2002). |
Evaluation | Following conventions in research on Japanese dependency parsing , gold morphological analysis results were input to a parser. |
Evaluation | Comparing the parser’s performance with previous works on Japanese dependency parsing is difficult as our figures are not directly comparable to theirs. |
Introduction | Syntactic parsing for Japanese has been dominated by a dependency-based pipeline in which chunk-based dependency parsing is applied and then semantic role labeling is performed on the dependencies (Sasano and Kurohashi, 2011; Kawahara and Kurohashi, 2011; Kudo and Matsumoto, 2002; Iida and Poesio, 2011; Hayashibe et al., 2011). |
Experiments | (2011), which additionally uses the Chinese Gigaword Corpus; Li ’11 denotes a generative model that can perform word segmentation, POS tagging and phrase-structure parsing jointly (Li, 2011); Li+ ’12 denotes a unified dependency parsing model that can perform joint word segmentation, POS tagging and dependency parsing (Li and Zhou, 2012); Li ’11 and Li+ ’12 exploited annotated morphological-level word structures for Chinese; Hatori+ ’12 denotes an incremental joint model for word segmentation, POS tagging and dependency parsing (Hatori et al., 2012); they use external dictionary resources including HowNet Word List and page names from the Chinese Wikipedia; Qian+ ’12 denotes a joint segmentation, POS tagging and parsing system using a unified framework for decoding, incorporating a word segmentation model, a POS tagging model and a phrase-structure parsing model together (Qian and Liu, 2012); their word segmentation model is a combination of character-based model and word-based model. |
Related Work | Zhao (2009) studied character-level dependencies for Chinese word segmentation by formalizing segmentsion task in a dependency parsing framework. |
Related Work | Li and Zhou (2012) also exploited the morphological-level word structures for Chinese dependency parsing . |
Related Work | (2012) proposed the first joint work for the word segmentation, POS tagging and dependency parsing . |
Word Structures and Syntax Trees | They studied the influence of such morphology to Chinese dependency parsing (Li and Zhou, 2012). |
Introduction | i-1 i j j+1 (c) Knowledge for dependency parsing |
Introduction | Figure 1: Natural annotations for word segmentation and dependency parsing . |
Introduction | a Chinese phrase (meaning NLP), and it probably corresponds to a connected subgraph for dependency parsing . |
Knowledge in Natural Annotations | For dependency parsing , the subsequence P tends to form a connected dependency graph if it contains more than one word. |
Related Work | When enriching the related work during writing, we found a work on dependency parsing (Spitkovsky et al., 2010) who utilized parsing constraints derived from hypertext annotations to improve the unsupervised dependency grammar induction. |
Coordination Structures in Treebanks | Some of the treebanks were downloaded individually from the web, but most of them came from previously published collections for dependency parsing campaigns: six languages from CoNLL-2006 (Buchholz and Marsi, 2006), seven languages from CoNLL-2007 (Nivre et al., 2007), two languages from CoNLL-2009 (Hajic and others, 2009), three languages from ICON-2010 (Husain et al., 2010). |
Introduction | In the last decade, dependency parsing has gradually been receiving visible attention. |
Related work | MTT possesses a complex set of linguistic criteria for identifying the governor of a relation (see Mazziotta (2011) for an overview), which lead to MS. MS is preferred in a rule-based dependency parsing system of Lombardo and Lesmo (1998). |
Related work | The primitive format used for CoNLL shared tasks is widely used in dependency parsing , but its weaknesses have already been pointed out (cf. |
Variations in representing coordination structures | Most state-of-the-art dependency parsers can produce labeled edges. |
Experiments | In this section, we evaluate the performance of the MST dependency parser (McDonald et al., 2005b) which is trained by our bilingually-guided model on 5 languages. |
Introduction | In past decades supervised methods achieved the state-of-the-art in constituency parsing (Collins, 2003; Charniak and Johnson, 2005; Petrov et al., 2006) and dependency parsing (McDonald et al., 2005a; McDonald et al., 2006; Nivre et al., 2006; Nivre et al., 2007; K00 and Collins, 2010). |
Introduction | We evaluate the final automatically-induced dependency parsing model on 5 languages. |
Introduction | In the rest of the paper, we first describe the unsupervised dependency grammar induction framework in section 2 (where the unsupervised optimization objective is given), and introduce the bilingual projection method for dependency parsing in section 3 (where the projected optimization objective is given); Then in section 4 we present the bilingually-guided induction strategy for dependency grammar (where the two objectives above are jointly optimized, as shown in Figure 1). |
Introduction | They suffice to operate on well-formed structures and produce projective dependency parse trees. |
Introduction | This is often referred to as conflict in the shift-reduce dependency parsing literature (Huang et al., 2009). |
Introduction | Unfortunately, such oracle turns out to be non-unique even for monolingual shift-reduce dependency parsing (Huang et al., 2009). |
Experiments | from dependency parse tree) along with computing similarity in semantic spaces (using WordNet) clearly produces an improvement in the summarization quality (+1.4 improvement in ROUGE-l F-score). |
Using the Framework | Instead, we model sentences using a structured representation, i.e., its syntax structure using dependency parse trees. |
Using the Framework | We first use a dependency parser (de Mameffe et al., 2006) to parse each sentence and extract the set of dependency relations associated with the sentence. |
Using the Framework | This allows us to perform approximate matching of syntactic treelets obtained from the dependency parses using semantic (WordNet) similarity. |
Headline generation | PREPROCESSDATA: We start by preprocessing all the news in the news collections with a standard NLP pipeline: tokenization and sentence boundary detection (Gillick, 2009), part-of-speech tagging, dependency parsing (Nivre, 2006), co-reference resolution (Haghighi and Klein, 2009) and entity linking based on Wikipedia and Freebase. |
Headline generation | Figure 1: Pattern extraction process from an annotated dependency parse . |
Headline generation | GETMENTIONNODES: Using the dependency parse T for a sentence 3, we first identify the set of nodes M,- that mention the entities in E. If T does not contain exactly one mention of each target entity in E1, then the sentence is ignored. |
Abstract | We model the problem as a joint dependency parsing and semantic role labeling task. |
Introduction | We model our problem as a joint dependency parsing and role labeling task, assuming a Bayesian generative process. |
Problem Formulation | We formalize the learning problem as a dependency parsing and role labeling problem. |
Introduction | Research in dependency parsing — computational methods to predict such representations — has increased dramatically, due in large part to the availability of dependency treebanks in a number of languages. |
Introduction | In particular, the CoNLL shared tasks on dependency parsing have provided over twenty data sets in a standardized format (Buch-holz and Marsi, 2006; Nivre et al., 2007). |
Towards A Universal Treebank | We use the so-called basic dependencies (with punctuation included), where every dependency structure is a tree spanning all the input tokens, because this is the kind of representation that most available dependency parsers require. |
Architecture of BRAINSUP | The sentence generation process is based on morpho-syntactic patterns which we automatically discover from a corpus of dependency parsed sentences ’P. |
Conclusion | BRAINSUP makes heavy use of dependency parsed data and statistics collected from dependency treebanks to ensure the grammaticality of the generated sentences, and to trim the search space while seeking the sentences that maximize the user satisfaction. |
Evaluation | Dependency operators were learned by dependency parsing the British National Corpus7. |