Index of papers in Proc. ACL 2009 that mention

treebank

Seen in text as:

treebank (109)
Treebank (87)
treebanks (60)
TreeBank (11)
treebanking (3)

Seen in 229 sentences in 25 papers.

1. Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging -- A Case Study

Jiang, Wenbin and Huang, Liang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with diflerent and incompatible annotation guidelines or standards.
Abstract	Experiments show that adaptation from the much larger People’s Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which in turn helps improve Chinese parsing accuracy.
Conclusion and Future Works	Especially, we will pay efforts to the annotation standard adaptation between different treebanks, for example, from HPSG LinGo Redwoods Treebank to PTB, or even from a dependency treebank to PTB, in order to obtain more powerful PTB annotation-style parsers.
Experiments	Our adaptation experiments are conducted from People’s Daily (PD) to Penn Chinese Treebank 5.0 (CTB).
Introduction	Much of statistical NLP research relies on some sort of manually annotated corpora to train their models, but these resources are extremely expensive to build, especially at a large scale, for example in treebanking (Marcus et al., 1993).
Introduction	For example just for English treebanking there have been the Chomskian-style
Introduction	Penn Treebank (Marcus et al., 1993) the HPSG LinGo Redwoods Treebank (Oepen et al., 2002), and a smaller dependency treebank (Buchholz and Marsi, 2006).
Related Works	In addition, many efforts have been devoted to manual treebank adaptation, where they adapt PTB to other grammar formalisms, such as such as CCG and LFG (Hockenmaier and Steedman, 2008; Cahill and Mccarthy, 2007).

treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes an approach to enhance dependency parsing in a language by using a translated treebank from another language.
Abstract	A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
Abstract	The proposed method is evaluated in English and Chinese treebanks .
Introduction	But, this is not the case as we observe all treebanks in different languages as a whole.
Introduction	For example, of ten treebanks for CoNLL-2007 shared task, none includes more than 500K
Introduction	1It is a tradition to call an annotated syntactic corpus as treebank in parsing community.

treebank is mentioned in 39 sentences in this paper.

Topics mentioned in this paper:

3. Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar

Zhang, Yi and Wang, Rui

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing with HPSG	Note that all grammar rules in ERG are either unary or binary, giving us relatively deep trees when compared with annotations such as Penn Treebank .
Dependency Parsing with HPSG	For these rules, we refer to the conversion of the Penn Treebank into dependency structures used in the CoNLL 2008 Shared Task, and mark the heads of these rules in a way that will arrive at a compatible dependency backbone.
Dependency Parsing with HPSG	2More recent study shows that with carefully designed retokenization and preprocessing rules, over 80% sentential coverage can be achieved on the WSJ sections of the Penn Treebank data using the same version of ERG.
Experiment Results & Error Analyses	To evaluate the performance of our different dependency parsing models, we tested our approaches on several dependency treebanks for English in a similar spirit to the CoNLL 2006-2008 Shared Tasks.
Experiment Results & Error Analyses	Most of them are converted automatically from existing treebanks in various forms.
Experiment Results & Error Analyses	The larger part is converted from the Penn Treebank Wall Street Journal Sections #2—#21, and is used for training statistical dependency parsing models; the smaller part, which covers sentences from Section #23, is used for testing.
Introduction	In the meantime, successful continuation of CoNLL Shared Tasks since 2006 (Buchholz and Marsi, 2006; Nivre et al., 2007a; Surdeanu et al., 2008) have witnessed how easy it has become to train a statistical syntactic dependency parser provided that there is annotated treebank .
Introduction	the Wall Street Journal (WSJ) sections of the Penn Treebank (Marcus et al., 1993) as training set, tests on BROWN Sections typically result in a 6-8% drop in labeled attachment scores, although the average sentence length is much shorter in BROWN than that in WSJ.

treebank is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

4. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We address the issue of using heterogeneous treebanks for parsing by breaking it down into two sub-problems, converting grammar formalisms of the treebanks to the same one, and parsing on these homogeneous treebanks .
Abstract	Then we provide two strategies to refine conversion results, and adopt a corpus weighting technique for parsing on homogeneous treebanks .
Abstract	Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result.
Introduction	The last few decades have seen the emergence of multiple treebanks annotated with different grammar formalisms, motivated by the diversity of languages and linguistic theories, which is crucial to the success of statistical parsing (Abeille et al., 2000; Brants et al., 1999; Bohmova et al., 2003; Han et al., 2002; Kurohashi and Nagao, 1998; Marcus et al., 1993; Moreno et al., 2003; Xue et al., 2005).
Introduction	Availability of multiple treebanks creates a scenario where we have a treebank annotated with one grammar formalism, and another treebank annotated with another grammar formalism that we are interested in.
Introduction	a source treebank, and the second a target treebank .

treebank is mentioned in 46 sentences in this paper.

Topics mentioned in this paper:

5. Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features

Boxwell, Stephen and Mehay, Dennis and Brew, Chris

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Argument Mapping Model	By examining the arguments that the verbal category combines with in the treebank , we can identify the corresponding semantic role for each argument that is marked on the verbal category.
Enabling Cross-System Comparison	\| P \| R I F G&H (treebank) 67.5% 60.0% 63.5% Brutus ( treebank ) 88.18% 85.00% 86.56%
Error Analysis	Many of the errors made by the Brutus system can be traced directly to erroneous parses, either in the automatic or treebank parse.
Error Analysis	However, because in 1956 is erroneously modifying the verb using rather than the verb stopped in the treebank parse, the system trusts the syntactic analysis and places Argl of stopped on using asbestos in 1956.
Identification and Labeling Models	The same features are extracted for both treebank and automatic parses.
Results	l P l R \| F P. et al (treebank) 86.22% 87.40% 86.81% Brutus ( treebank ) 88.29% 86.39% 87.33%
Results	Headword ( treebank ) 88.94% 86.98% 87.95%
Results	Boundary ( treebank ) 88.29% 86.39% 87.33%
The Contribution of the New Features	Removing them has a strong effect on accuracy when labeling treebank parses, as shown in our feature ablation results in table 4.
This is easily read off of the CCG PARG relationships.	For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse.
This is easily read off of the CCG PARG relationships.	The Penn Treebank features are as follows:

treebank is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

CCG (39)
semantic role (25)
treebank (13)

6. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency parsing experiments	Our training data includes newswire from the English translation treebank (LDC2007T02) and the English-Arabic Treebank (LDC2006T10), which are respectively translations of sections of the Chinese treebank (CTB) and Arabic treebank (ATB).
Dependency parsing experiments	We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank .
Dependency parsing experiments	Our other test set is the standard Section 23 of the Penn treebank .
Machine translation experiments	To extract dependencies from treebanks , we used the LTH Penn Converter (ht tp : / / nlp .
Machine translation experiments	We constrain the converter not to use functional tags found in the treebanks , in order to make it possible to use automatically parsed texts (i.e., perform self-training) in future work.
Machine translation experiments	Chinese words were automatically segmented with a conditional random field (CRF) classifier (Chang et al., 2008) that conforms to the Chinese Treebank (CTB) standard.

treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

7. Genre distinctions for discourse in the Penn TreeBank

Webber, Bonnie

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.
Abstract	All but the latter three were then characterised in terms of features manually annotated in the Penn Discourse TreeBank — discourse connectives and their senses.
Conclusion	This paper has, for the first time, provided genre information about the articles in the Penn TreeBank .
Conclusion	It has characterised each genre in terms of features manually annotated in the Penn Discourse TreeBank , and used this to show that genre should be made a factor in automated sense labelling of discourse relations that are not explicitly marked.
Genre in the Penn TreeBank	Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).
Genre in the Penn TreeBank	In lieu of any informative meta-data in the PTB filesl, I looked at line-level patterns in the 2159 files that make up the Penn Discourse TreeBank subset of the PTB, and then manually confirmed the text types I found.2 The resulting set includes all the
Genre in the Penn TreeBank	the Penn TreeBank that aren’t included in the PDTB.
Introduction	This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008).
Introduction	After a brief introduction to the Penn Discourse TreeBank (hereafter, PDTB) in Section 4, Sections 5 and 6 show that these four genres display differences in connective frequency and in terms of the senses associated with intra-sentential connectives (eg, subordinating conjunctions), inter-sentential connectives (eg, inter-sentential coordinating conjunctions) and those inter-sentential relations that are not lexically marked.
The Penn Discourse TreeBank	Genre differences at the level of discourse in the PTB can be seen in the manual annotations of the Penn Discourse TreeBank (Prasad et al., 2008).

treebank is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

8. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For these experiments, we use the Wall Street Journal portion of the Penn Treebank (Marcus et al., 1993).
Experiments	Following the CoNLL shared task from 2000, we use sections 15-18 of the Penn Treebank for our labeled training data for the supervised sequence labeler in all experiments (Tjong et al., 2000).
Experiments	For the tagging experiments, we train and test using the gold standard POS tags contained in the Penn Treebank .

treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

9. Topological Field Parsing of German

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Latent Variable Parser	Latent variable parsing assumes that an observed treebank represents a coarse approximation of an underlying, optimally refined grammar which makes more fine-grained distinctions in the syntactic categories.
A Latent Variable Parser	For example, the noun phrase category NP in a treebank could be viewed as a coarse approximation of two noun phrase categories corresponding to subjects and object, NPS, and NP AVP.
A Latent Variable Parser	It starts with a simple bi-narized X-bar grammar style backbone, and goes through iterations of splitting and merging nonterminals, in order to maximize the likelihood of the training set treebank .
Experiments	Incorporating edge label information does not appear to improve performance, possibly because it oversplits the initial treebank and interferes with the parser’s ability to determine optimal splits for refining the grammar.
Introduction	Hocken-maier (2006) has translated the German TIGER corpus (Brants et al., 2002) into a CCG—based treebank to model word order variations in German.
Introduction	The corpus-based, stochastic topological field parser of Becker and Frank (2002) is based on a standard treebank PCFG model, in which rule probabilities are estimated by frequency counts.
Introduction	Ule (2003) proposes a process termed Directed Treebank Refinement (DTR).

treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

10. Minimized Models for Unsupervised Part-of-Speech Tagging

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We use the standard test set for this task, a 24,115-word subset of the Penn Treebank , for which a gold tag sequence is available.
Introduction	They show considerable improvements in tagging accuracy when using a coarser-grained version (with l7-tags) of the tag set from the Penn Treebank .
Introduction	In contrast, we keep all the original dictionary entries derived from the Penn Treebank data for our experiments.
Restarts and More Data	Their models are trained on the entire Penn Treebank data (instead of using only the 24,115-token test data), and so are the tagging models used by Goldberg et al.
Restarts and More Data	ing data from the 24,115-t0ken set to the entire Penn Treebank (973k tokens).
Smaller Tagset and Incomplete Dictionaries	Their systems were shown to obtain considerable improvements in accuracy when using a l7-tagset (a coarser-grained version of the tag labels from the Penn Treebank ) instead of the 45-tagset.
Smaller Tagset and Incomplete Dictionaries	The accuracy numbers reported for Init-HMM and LDA+AC are for models that are trained on all the available unlabeled data from the Penn Treebank .
Smaller Tagset and Incomplete Dictionaries	The IP+EM models used in the l7-tagset experiments reported here were not trained on the entire Penn Treebank , but instead used a smaller section containing 77,963 tokens for estimating model parameters.

treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

11. Comparing the Accuracy of CCG and Penn Treebank Parsers

Clark, Stephen and Curran, James R.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
Introduction	The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics.
Introduction	The second approach is to apply statistical methods to parsers based on linguistic formalisms, such as HPSG, LFG, TAG, and CCG, with the grammar being defined manually or extracted from a formalism-specific treebank .
Introduction	Evaluation is typically performed by comparing against predicate-argument structures extracted from the treebank , or against a test set of manually annotated grammatical relations (GRs).
The CCG to PTB Conversion	However, there are a number of differences between the two treebanks which make the conversion back far from trivial.
The CCG to PTB Conversion	First, the corresponding derivations in the treebanks are not isomorphic: a CCG derivation is not simply a relabelling of the nodes in the PTB tree; there are many constructions, such as coordination and control structures, where the trees are a different shape, as well as having different labels.

treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. A Syntactic and Lexical-Based Discourse Segmenter

Tofiloski, Milan and Brooke, Julian and Taboada, Maite

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Evaluation	The 9 documents include 3 texts from the RST literature2 , 3 online product reviews from Epinions.com, and 3 Wall Street Journal articles taken from the Penn Treebank .
Principles For Discourse Segmentation	Many of our differences with Carlson and Marcu (2001), who defined EDUs for the RST Discourse Treebank (Carlson et al., 2002), are due to the fact that we adhere closer to the original RST proposals (Mann and Thompson, 1988), which defined as ‘spans’ adjunct clauses, rather than complement (subject and object) clauses.
Related Work	SPADE is trained on the RST Discourse Treebank (Carlson et al., 2002).
Related Work	(2004) construct a rule-based segmenter, employing manually annotated parses from the Penn Treebank .
Results	High F—score in the Treebank data can be attributed to the parsers having been trained on Treebank .

treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition

Das, Dipanjan and Smith, Noah A.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

QG for Paraphrase Modeling	(2005), trained on sections 2—21 of the WSJ Penn Treebank , transformed to dependency trees following Yamada and Matsumoto (2003).
QG for Paraphrase Modeling	(The same treebank data were also to estimate many of the parameters of our model, as discussed in the text.)
QG for Paraphrase Modeling	4 is estimated in our model using the transformed treebank (see footnote 4).

treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

14. Incorporating Information Status into Generation Ranking

Cahill, Aoife and Riester, Arndt

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	This corpus contains text of a similar domain to the TIGER treebank .
Generation Ranking Experiments	We train the log-linear ranking model on 7759 F-structures from the TIGER treebank .
Generation Ranking Experiments	We generate strings from each F-structure and take the original treebank string to be the labelled example.
Generation Ranking Experiments	We evaluate the string chosen by the log-linear model against the original treebank string in terms of exact match and BLEU score (Papineni et al.,

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Unsupervised Multilingual Grammar Induction

Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	Data The Penn Korean Treebank (Han et al., 2002) consists of 5,083 Korean sentences translated into English for the purposes of language training in a military setting.
Experimental setup	The English-Urdu parallel corpus3 consists of 4,325 sentences from the first three sections of the Penn Treebank and their Urdu translations annotated at the part-of-speech level.
Experimental setup	We use the remaining sections of the Penn Treebank for English testing.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Automatic sense prediction for implicit discourse relations in text

Pitler, Emily and Louis, Annie and Nenkova, Ani

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features for sense prediction of implicit discourse relations	Our final verb features were the part of speech tags (gold-standard from the Penn Treebank ) of the main verb.
Introduction	For our experiments, we use the Penn Discourse Treebank , the largest existing corpus of discourse annotations for both implicit and explicit relations.
Penn Discourse Treebank	For our experiments, we use the Penn Discourse Treebank (PDTB; Prasad et al., 2008), the largest available annotated corpora of discourse relations.
Penn Discourse Treebank	The PDTB contains discourse annotations over the same 2,312 Wall Street Journal (WSJ) articles as the Penn Treebank .

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

17. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Trained on 8,975 dependency structures of a Chinese Dependency Treebank , the realizer achieves a BLEU score of 0.8874.
Experiments	And for training the headword model, we use both the HIT-CDT and the HIT Chinese Skeletal Dependency Treebank (HIT-CSDT).
Introduction	The grammar rules are either developed by hand, such as those used in LinGo (Carroll et al., 1999), OpenCCG (White, 2004) and XLE (Crouch et al., 2007), or extracted automatically from annotated corpora, like the HPSG (Nakanishi et al., 2005), LFG (Cahill and van Genabith, 2006; Hogan et al., 2007) and CCG (White et al., 2007) resources derived from the Penn-II Treebank,
Sentence Realization from Dependency Structure	The input to our sentence realizer is a dependency structure as represented in the HIT Chinese Dependency Treebank (HIT-CDT)1.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. Learning a Compositional Semantic Parser using an Existing Syntactic Parser

Ge, Ruifang and Mooney, Raymond

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future work	By exploiting an existing syntactic parser trained on a large treebank , our approach produces improved results on standard corpora, particularly when training data is limited or sentences are long.
Experimental Evaluation	Experiments on CLANG and GEOQUERY showed that the performance can be greatly improved by adding a small number of treebanked examples from the corresponding training set together with the WSJ corpus.
Experimental Evaluation	Listed together with their PARSEVAL F-measures these are: gold-standard parses from the treebank (GoldSyn, 100%), a parser trained on WSJ plus a small number of in-domain training sentences required to achieve good performance, 20 for CLANG (Syn20, 88.21%) and 40 for GEOQUERY (Syn40, 91.46%), and a parser trained on no in-domain data (Syn0, 82.15% for CLANG and 76.44% for GEOQUERY).
Experimental Evaluation	This demonstrates the advantage of utilizing existing syntactic parsers that are learned from large open domain treebanks instead of relying just on the training data.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Dependency Grammar Induction via Bitext Projection Constraints

Ganchev, Kuzman and Gillenwater, Jennifer and Taskar, Ben

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Broad-coverage annotated treebanks necessary to train parsers do not exist for many resource-poor languages.
Approach	In our experiments we evaluate the learned models on dependency treebanks (Nivre et al., 2007).
Experiments	(2005) with projective decoding, trained on sections 2-21 of the Penn treebank with dependencies extracted using the head rules of Yamada and Matsumoto (2003b).
Introduction	We evaluate our approach by transferring from an English parser trained on the Penn treebank to Bulgarian and Spanish.

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

20. Semi-supervised Learning of Dependency Parsers using Generalized Expectation Criteria

Druck, Gregory and Mann, Gideon and McCallum, Andrew

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Comparison with Unsupervised Learning	We use the WSJ 10 corpus (as processed by Smith (2006)), which is comprised of English sentences of ten words or fewer (after stripping punctuation) from the WSJ portion of the Penn Treebank .
Experimental Comparison with Unsupervised Learning	It is our hope that this method will permit more effective leveraging of linguistic insight and resources and enable the construction of parsers in languages and domains where treebanks are not available.
Introduction	While such supervised approaches have yielded accurate parsers (Chamiak, 2001), the syntactic annotation of corpora such as the Penn Treebank is extremely costly, and consequently there are few treebanks of comparable size.
Related Work	The above methods can be applied to small seed corpora, but McDonald1 has criticized such methods as working from an unrealistic premise, as a significant amount of the effort required to build a treebank comes in the first 100 sentences (both because of the time it takes to create an appropriate rubric and to train annotators).

treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

21. Non-Projective Dependency Parsing in Expected Linear Time

Nivre, Joakim

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Adding the swapping operation changes the time complexity for deterministic parsing from linear to quadratic in the worst case, but empirical estimates based on treebank data show that the expected running time is in fact linear for the range of data attested in the corpora.
Background Notions 2.1 Dependency Graphs and Trees	When building practical parsing systems, the oracle can be approximated by a classifier trained on treebank data, a technique that has been used successfully in a number of systems (Yamada and Matsumoto, 2003; Nivre et al., 2004; Attardi, 2006).
Experiments	These languages have been selected because the data come from genuine dependency treebanks , whereas all the other data sets are based on some kind of conversion from another type of representation, which could potentially distort the distribution of different types of structures in the data.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both?

Merlo, Paola and van der Plas, Lonneke

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Materials and Method	Proposition Bank (Palmer et al., 2005) adds Levin’s style predicate-argument annotation and indication of verbs’ alternations to the syntactic structures of the Penn Treebank (Marcus et al.,
Materials and Method	Verbal predicates in the Penn Treebank (PTB) receive a label REL and their arguments are annotated with abstract semantic role labels A0-A5 or AA for those complements of the predicative verb that are considered arguments, while those complements of the verb labelled with a semantic functional label in the original PTB receive the composite semantic role label AM-X, where X stands for labels such as LOC, TMP or ADV, for locative, temporal and adverbial modifiers respectively.
Materials and Method	SemLink1 provides mappings from PropB ank to VerbNet for the WSJ portion of the Penn Treebank .

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Using Syntax to Disambiguate Explicit Discourse Connectives in Text

Pitler, Emily and Nenkova, Ani

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpus and features	2.1 Penn Discourse Treebank
Corpus and features	In our work we use the Penn Discourse Treebank (PDTB) (Prasad et al., 2008), the largest public resource containing discourse annotations.
Corpus and features	The syntactic features we used were extracted from the gold standard Penn Treebank (Marcus et al., 1994) parses of the PDTB articles:

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

f-score (7)
Treebank (3)

24. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We describe an efficient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank , and show that it achieves superior performance compared to the state-of-the-art approaches reported in the literature.
Experiments	Previous studies on joint Chinese word segmentation and POS tagging have used Penn Chinese Treebank (CTB) (Xia et al., 2000) in experiments.
Introduction	We conducted our experiments on Penn Chinese Treebank (Xia et al., 2000) and compared our approach with the best previous approaches reported in the literature.

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

25. A Novel Discourse Parser Based on Support Vector Machine Classification

duVerle, David and Prendinger, Helmut

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Building a Discourse Parser	In this set, the 75 relations originally used in the RST Discourse Treebank (RST-DT) corpus (Carlson et al., 2001) are partitioned into 18 classes according to rhetorical similarity (e.g.
Building a Discourse Parser	directly from the Penn Treebank corpus (which covers a superset of the RST-DT corpus), then “lexicalized” (i.e.
Introduction	Figure 1: Example of a simple RST tree (Source: RST Discourse Treebank (Carlson et al., 2001),

treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

discourse parsing (10)
SVM (10)
edus (9)