Abstract | Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with diflerent and incompatible annotation guidelines or standards. |
Abstract | Experiments show that adaptation from the much larger People’s Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which in turn helps improve Chinese parsing accuracy. |
Conclusion and Future Works | Especially, we will pay efforts to the annotation standard adaptation between different treebanks, for example, from HPSG LinGo Redwoods Treebank to PTB, or even from a dependency treebank to PTB, in order to obtain more powerful PTB annotation-style parsers. |
Experiments | Our adaptation experiments are conducted from People’s Daily (PD) to Penn Chinese Treebank 5.0 (CTB). |
Introduction | Much of statistical NLP research relies on some sort of manually annotated corpora to train their models, but these resources are extremely expensive to build, especially at a large scale, for example in treebanking (Marcus et al., 1993). |
Introduction | For example just for English treebanking there have been the Chomskian-style |
Introduction | Penn Treebank (Marcus et al., 1993) the HPSG LinGo Redwoods Treebank (Oepen et al., 2002), and a smaller dependency treebank (Buchholz and Marsi, 2006). |
Related Works | In addition, many efforts have been devoted to manual treebank adaptation, where they adapt PTB to other grammar formalisms, such as such as CCG and LFG (Hockenmaier and Steedman, 2008; Cahill and Mccarthy, 2007). |
Abstract | This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language. |
Abstract | A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation. |
Abstract | The proposed method is evaluated in English and Chinese treebanks . |
Introduction | But, this is not the case as we observe all treebanks in different languages as a whole. |
Introduction | For example, of ten treebanks for CoNLL-2007 shared task, none includes more than 500K |
Introduction | 1It is a tradition to call an annotated syntactic corpus as treebank in parsing community. |
Dependency Parsing with HPSG | Note that all grammar rules in ERG are either unary or binary, giving us relatively deep trees when compared with annotations such as Penn Treebank . |
Dependency Parsing with HPSG | For these rules, we refer to the conversion of the Penn Treebank into dependency structures used in the CoNLL 2008 Shared Task, and mark the heads of these rules in a way that will arrive at a compatible dependency backbone. |
Dependency Parsing with HPSG | 2More recent study shows that with carefully designed retokenization and preprocessing rules, over 80% sentential coverage can be achieved on the WSJ sections of the Penn Treebank data using the same version of ERG. |
Experiment Results & Error Analyses | To evaluate the performance of our different dependency parsing models, we tested our approaches on several dependency treebanks for English in a similar spirit to the CoNLL 2006-2008 Shared Tasks. |
Experiment Results & Error Analyses | Most of them are converted automatically from existing treebanks in various forms. |
Experiment Results & Error Analyses | The larger part is converted from the Penn Treebank Wall Street Journal Sections #2—#21, and is used for training statistical dependency parsing models; the smaller part, which covers sentences from Section #23, is used for testing. |
Introduction | In the meantime, successful continuation of CoNLL Shared Tasks since 2006 (Buchholz and Marsi, 2006; Nivre et al., 2007a; Surdeanu et al., 2008) have witnessed how easy it has become to train a statistical syntactic dependency parser provided that there is annotated treebank . |
Introduction | the Wall Street Journal (WSJ) sections of the Penn Treebank (Marcus et al., 1993) as training set, tests on BROWN Sections typically result in a 6-8% drop in labeled attachment scores, although the average sentence length is much shorter in BROWN than that in WSJ. |
Abstract | We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks . |
Abstract | Then we provide two strategies to refine conversion results, and adopt a corpus weighting technique for parsing on homogeneous treebanks . |
Abstract | Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result. |
Introduction | The last few decades have seen the emergence of multiple treebanks annotated with different grammar formalisms, motivated by the diversity of languages and linguistic theories, which is crucial to the success of statistical parsing (Abeille et al., 2000; Brants et al., 1999; Bohmova et al., 2003; Han et al., 2002; Kurohashi and Nagao, 1998; Marcus et al., 1993; Moreno et al., 2003; Xue et al., 2005). |
Introduction | Availability of multiple treebanks creates a scenario where we have a treebank annotated with one grammar formalism, and another treebank annotated with another grammar formalism that we are interested in. |
Introduction | a source treebank, and the second a target treebank . |
Argument Mapping Model | By examining the arguments that the verbal category combines with in the treebank , we can identify the corresponding semantic role for each argument that is marked on the verbal category. |
Enabling Cross-System Comparison | | P | R I F G&H (treebank) 67.5% 60.0% 63.5% Brutus ( treebank ) 88.18% 85.00% 86.56% |
Error Analysis | Many of the errors made by the Brutus system can be traced directly to erroneous parses, either in the automatic or treebank parse. |
Error Analysis | However, because in 1956 is erroneously modifying the verb using rather than the verb stopped in the treebank parse, the system trusts the syntactic analysis and places Argl of stopped on using asbestos in 1956. |
Identification and Labeling Models | The same features are extracted for both treebank and automatic parses. |
Results | l P l R | F P. et al (treebank) 86.22% 87.40% 86.81% Brutus ( treebank ) 88.29% 86.39% 87.33% |
Results | Headword ( treebank ) 88.94% 86.98% 87.95% |
Results | Boundary ( treebank ) 88.29% 86.39% 87.33% |
The Contribution of the New Features | Removing them has a strong effect on accuracy when labeling treebank parses, as shown in our feature ablation results in table 4. |
This is easily read off of the CCG PARG relationships. | For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse. |
This is easily read off of the CCG PARG relationships. | The Penn Treebank features are as follows: |
Dependency parsing experiments | Our training data includes newswire from the English translation treebank (LDC2007T02) and the English-Arabic Treebank (LDC2006T10), which are respectively translations of sections of the Chinese treebank (CTB) and Arabic treebank (ATB). |
Dependency parsing experiments | We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank . |
Dependency parsing experiments | Our other test set is the standard Section 23 of the Penn treebank . |
Machine translation experiments | To extract dependencies from treebanks , we used the LTH Penn Converter (ht tp : / / nlp . |
Machine translation experiments | We constrain the converter not to use functional tags found in the treebanks , in order to make it possible to use automatically parsed texts (i.e., perform self-training) in future work. |
Machine translation experiments | Chinese words were automatically segmented with a conditional random field (CRF) classifier (Chang et al., 2008) that conforms to the Chinese Treebank (CTB) standard. |
Abstract | Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports. |
Abstract | All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank — discourse connectives and their senses. |
Conclusion | This paper has, for the first time, provided genre information about the articles in the Penn TreeBank . |
Conclusion | It has characterised each genre in terms of features manually annotated in the Penn Discourse TreeBank , and used this to show that genre should be made a factor in automated sense labelling of discourse relations that are not explicitly marked. |
Genre in the Penn TreeBank | Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7). |
Genre in the Penn TreeBank | In lieu of any informative meta-data in the PTB filesl, I looked at line-level patterns in the 2159 files that make up the Penn Discourse TreeBank subset of the PTB, and then manually confirmed the text types I found.2 The resulting set includes all the |
Genre in the Penn TreeBank | the Penn TreeBank that aren’t included in the PDTB. |
Introduction | This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008). |
Introduction | After a brief introduction to the Penn Discourse TreeBank (hereafter, PDTB) in Section 4, Sections 5 and 6 show that these four genres display differences in connective frequency and in terms of the senses associated with intra-sentential connectives (eg, subordinating conjunctions), inter-sentential connectives (eg, inter-sentential coordinating conjunctions) and those inter-sentential relations that are not lexically marked. |
The Penn Discourse TreeBank | Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008). |
Experiments | For these experiments, we use the Wall Street Journal portion of the Penn Treebank (Marcus et al., 1993). |
Experiments | Following the CoNLL shared task from 2000, we use sections 15-18 of the Penn Treebank for our labeled training data for the supervised sequence labeler in all experiments (Tjong et al., 2000). |
Experiments | For the tagging experiments, we train and test using the gold standard POS tags contained in the Penn Treebank . |
A Latent Variable Parser | Latent variable parsing assumes that an observed treebank represents a coarse approximation of an underlying, optimally refined grammar which makes more fine-grained distinctions in the syntactic categories. |
A Latent Variable Parser | For example, the noun phrase category NP in a treebank could be viewed as a coarse approximation of two noun phrase categories corresponding to subjects and object, NPS, and NP AVP. |
A Latent Variable Parser | It starts with a simple bi-narized X-bar grammar style backbone, and goes through iterations of splitting and merging nonterminals, in order to maximize the likelihood of the training set treebank . |
Experiments | Incorporating edge label information does not appear to improve performance, possibly because it oversplits the initial treebank and interferes with the parser’s ability to determine optimal splits for refining the grammar. |
Introduction | Hocken-maier (2006) has translated the German TIGER corpus (Brants et al., 2002) into a CCG—based treebank to model word order variations in German. |
Introduction | The corpus-based, stochastic topological field parser of Becker and Frank (2002) is based on a standard treebank PCFG model, in which rule probabilities are estimated by frequency counts. |
Introduction | Ule (2003) proposes a process termed Directed Treebank Refinement (DTR). |
Introduction | We use the standard test set for this task, a 24,115-word subset of the Penn Treebank , for which a gold tag sequence is available. |
Introduction | They show considerable improvements in tagging accuracy when using a coarser-grained version (with l7-tags) of the tag set from the Penn Treebank . |
Introduction | In contrast, we keep all the original dictionary entries derived from the Penn Treebank data for our experiments. |
Restarts and More Data | Their models are trained on the entire Penn Treebank data (instead of using only the 24,115-token test data), and so are the tagging models used by Goldberg et al. |
Restarts and More Data | ing data from the 24,115-t0ken set to the entire Penn Treebank (973k tokens). |
Smaller Tagset and Incomplete Dictionaries | Their systems were shown to obtain considerable improvements in accuracy when using a l7-tagset (a coarser-grained version of the tag labels from the Penn Treebank ) instead of the 45-tagset. |
Smaller Tagset and Incomplete Dictionaries | The accuracy numbers reported for Init-HMM and LDA+AC are for models that are trained on all the available unlabeled data from the Penn Treebank . |
Smaller Tagset and Incomplete Dictionaries | The IP+EM models used in the l7-tagset experiments reported here were not trained on the entire Penn Treebank , but instead used a smaller section containing 77,963 tokens for estimating model parameters. |
Abstract | We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser. |
Introduction | The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics. |
Introduction | The second approach is to apply statistical methods to parsers based on linguistic formalisms, such as HPSG, LFG, TAG, and CCG, with the grammar being defined manually or extracted from a formalism-specific treebank . |
Introduction | Evaluation is typically performed by comparing against predicate-argument structures extracted from the treebank , or against a test set of manually annotated grammatical relations (GRs). |
The CCG to PTB Conversion | However, there are a number of differences between the two treebanks which make the conversion back far from trivial. |
The CCG to PTB Conversion | First, the corresponding derivations in the treebanks are not isomorphic: a CCG derivation is not simply a relabelling of the nodes in the PTB tree; there are many constructions, such as coordination and control structures, where the trees are a different shape, as well as having different labels. |
Data and Evaluation | The 9 documents include 3 texts from the RST literature2 , 3 online product reviews from Epinions.com, and 3 Wall Street Journal articles taken from the Penn Treebank . |
Principles For Discourse Segmentation | Many of our differences with Carlson and Marcu (2001), who defined EDUs for the RST Discourse Treebank (Carlson et al., 2002), are due to the fact that we adhere closer to the original RST proposals (Mann and Thompson, 1988), which defined as ‘spans’ adjunct clauses, rather than complement (subject and object) clauses. |
Related Work | SPADE is trained on the RST Discourse Treebank (Carlson et al., 2002). |
Related Work | (2004) construct a rule-based segmenter, employing manually annotated parses from the Penn Treebank . |
Results | High F—score in the Treebank data can be attributed to the parsers having been trained on Treebank . |
QG for Paraphrase Modeling | (2005), trained on sections 2—21 of the WSJ Penn Treebank , transformed to dependency trees following Yamada and Matsumoto (2003). |
QG for Paraphrase Modeling | (The same treebank data were also to estimate many of the parameters of our model, as discussed in the text.) |
QG for Paraphrase Modeling | 4 is estimated in our model using the transformed treebank (see footnote 4). |
Discussion | This corpus contains text of a similar domain to the TIGER treebank . |
Generation Ranking Experiments | We train the log-linear ranking model on 7759 F-structures from the TIGER treebank . |
Generation Ranking Experiments | We generate strings from each F-structure and take the original treebank string to be the labelled example. |
Generation Ranking Experiments | We evaluate the string chosen by the log-linear model against the original treebank string in terms of exact match and BLEU score (Papineni et al., |
Experimental setup | Data The Penn Korean Treebank (Han et al., 2002) consists of 5,083 Korean sentences translated into English for the purposes of language training in a military setting. |
Experimental setup | The English-Urdu parallel corpus3 consists of 4,325 sentences from the first three sections of the Penn Treebank and their Urdu translations annotated at the part-of-speech level. |
Experimental setup | We use the remaining sections of the Penn Treebank for English testing. |
Features for sense prediction of implicit discourse relations | Our final verb features were the part of speech tags (gold-standard from the Penn Treebank ) of the main verb. |
Introduction | For our experiments, we use the Penn Discourse Treebank , the largest existing corpus of discourse annotations for both implicit and explicit relations. |
Penn Discourse Treebank | For our experiments, we use the Penn Discourse Treebank (PDTB; Prasad et al., 2008), the largest available annotated corpora of discourse relations. |
Penn Discourse Treebank | The PDTB contains discourse annotations over the same 2,312 Wall Street Journal (WSJ) articles as the Penn Treebank . |
Abstract | Trained on 8,975 dependency structures of a Chinese Dependency Treebank , the realizer achieves a BLEU score of 0.8874. |
Experiments | And for training the headword model, we use both the HIT-CDT and the HIT Chinese Skeletal Dependency Treebank (HIT-CSDT). |
Introduction | The grammar rules are either developed by hand, such as those used in LinGo (Carroll et al., 1999), OpenCCG (White, 2004) and XLE (Crouch et al., 2007), or extracted automatically from annotated corpora, like the HPSG (Nakanishi et al., 2005), LFG (Cahill and van Genabith, 2006; Hogan et al., 2007) and CCG (White et al., 2007) resources derived from the Penn-II Treebank, |
Sentence Realization from Dependency Structure | The input to our sentence realizer is a dependency structure as represented in the HIT Chinese Dependency Treebank (HIT-CDT)1. |
Conclusion and Future work | By exploiting an existing syntactic parser trained on a large treebank , our approach produces improved results on standard corpora, particularly when training data is limited or sentences are long. |
Experimental Evaluation | Experiments on CLANG and GEOQUERY showed that the performance can be greatly improved by adding a small number of treebanked examples from the corresponding training set together with the WSJ corpus. |
Experimental Evaluation | Listed together with their PARSEVAL F-measures these are: gold-standard parses from the treebank (GoldSyn, 100%), a parser trained on WSJ plus a small number of in-domain training sentences required to achieve good performance, 20 for CLANG (Syn20, 88.21%) and 40 for GEOQUERY (Syn40, 91.46%), and a parser trained on no in-domain data (Syn0, 82.15% for CLANG and 76.44% for GEOQUERY). |
Experimental Evaluation | This demonstrates the advantage of utilizing existing syntactic parsers that are learned from large open domain treebanks instead of relying just on the training data. |
Abstract | Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages. |
Approach | In our experiments we evaluate the learned models on dependency treebanks (Nivre et al., 2007). |
Experiments | (2005) with projective decoding, trained on sections 2-21 of the Penn treebank with dependencies extracted using the head rules of Yamada and Matsumoto (2003b). |
Introduction | We evaluate our approach by transferring from an English parser trained on the Penn treebank to Bulgarian and Spanish. |
Experimental Comparison with Unsupervised Learning | We use the WSJ 10 corpus (as processed by Smith (2006)), which is comprised of English sentences of ten words or fewer (after stripping punctuation) from the WSJ portion of the Penn Treebank . |
Experimental Comparison with Unsupervised Learning | It is our hope that this method will permit more effective leveraging of linguistic insight and resources and enable the construction of parsers in languages and domains where treebanks are not available. |
Introduction | While such supervised approaches have yielded accurate parsers (Chamiak, 2001), the syntactic annotation of corpora such as the Penn Treebank is extremely costly, and consequently there are few treebanks of comparable size. |
Related Work | The above methods can be applied to small seed corpora, but McDonald1 has criticized such methods as working from an unrealistic premise, as a significant amount of the effort required to build a treebank comes in the first 100 sentences (both because of the time it takes to create an appropriate rubric and to train annotators). |
Abstract | Adding the swapping operation changes the time complexity for deterministic parsing from linear to quadratic in the worst case, but empirical estimates based on treebank data show that the expected running time is in fact linear for the range of data attested in the corpora. |
Background Notions 2.1 Dependency Graphs and Trees | When building practical parsing systems, the oracle can be approximated by a classifier trained on treebank data, a technique that has been used successfully in a number of systems (Yamada and Matsumoto, 2003; Nivre et al., 2004; Attardi, 2006). |
Experiments | These languages have been selected because the data come from genuine dependency treebanks , whereas all the other data sets are based on some kind of conversion from another type of representation, which could potentially distort the distribution of different types of structures in the data. |
Materials and Method | Proposition Bank (Palmer et al., 2005) adds Levin’s style predicate-argument annotation and indication of verbs’ alternations to the syntactic structures of the Penn Treebank (Marcus et al., |
Materials and Method | Verbal predicates in the Penn Treebank (PTB) receive a label REL and their arguments are annotated with abstract semantic role labels A0-A5 or AA for those complements of the predicative verb that are considered arguments, while those complements of the verb labelled with a semantic functional label in the original PTB receive the composite semantic role label AM-X, where X stands for labels such as LOC, TMP or ADV, for locative, temporal and adverbial modifiers respectively. |
Materials and Method | SemLink1 provides mappings from PropB ank to VerbNet for the WSJ portion of the Penn Treebank . |
Corpus and features | 2.1 Penn Discourse Treebank |
Corpus and features | In our work we use the Penn Discourse Treebank (PDTB) (Prasad et al., 2008), the largest public resource containing discourse annotations. |
Corpus and features | The syntactic features we used were extracted from the gold standard Penn Treebank (Marcus et al., 1994) parses of the PDTB articles: |
Abstract | We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank , and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature. |
Experiments | Previous studies on joint Chinese word segmentation and POS tagging have used Penn Chinese Treebank (CTB) (Xia et al., 2000) in experiments. |
Introduction | We conducted our experiments on Penn Chinese Treebank (Xia et al., 2000) and compared our approach with the best previous approaches reported in the literature. |
Building a Discourse Parser | In this set, the 75 relations originally used in the RST Discourse Treebank (RST-DT) corpus (Carlson et al., 2001) are partitioned into 18 classes according to rhetorical similarity (e.g. |
Building a Discourse Parser | directly from the Penn Treebank corpus (which covers a superset of the RST-DT corpus), then “lexicalized” (i.e. |
Introduction | Figure 1: Example of a simple RST tree (Source: RST Discourse Treebank (Carlson et al., 2001), |