Index of papers in Proc. ACL 2009 that mention
  • Penn Treebank
Huang, Fei and Yates, Alexander
Experiments
For these experiments, we use the Wall Street Journal portion of the Penn Treebank (Marcus et al., 1993).
Experiments
Following the CoNLL shared task from 2000, we use sections 15-18 of the Penn Treebank for our labeled training data for the supervised sequence labeler in all experiments (Tjong et al., 2000).
Experiments
For the tagging experiments, we train and test using the gold standard POS tags contained in the Penn Treebank .
Penn Treebank is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Introduction
We use the standard test set for this task, a 24,115-word subset of the Penn Treebank , for which a gold tag sequence is available.
Introduction
They show considerable improvements in tagging accuracy when using a coarser-grained version (with l7-tags) of the tag set from the Penn Treebank .
Introduction
In contrast, we keep all the original dictionary entries derived from the Penn Treebank data for our experiments.
Restarts and More Data
Their models are trained on the entire Penn Treebank data (instead of using only the 24,115-token test data), and so are the tagging models used by Goldberg et al.
Restarts and More Data
ing data from the 24,115-t0ken set to the entire Penn Treebank (973k tokens).
Smaller Tagset and Incomplete Dictionaries
Their systems were shown to obtain considerable improvements in accuracy when using a l7-tagset (a coarser-grained version of the tag labels from the Penn Treebank ) instead of the 45-tagset.
Smaller Tagset and Incomplete Dictionaries
The accuracy numbers reported for Init-HMM and LDA+AC are for models that are trained on all the available unlabeled data from the Penn Treebank .
Smaller Tagset and Incomplete Dictionaries
The IP+EM models used in the l7-tagset experiments reported here were not trained on the entire Penn Treebank , but instead used a smaller section containing 77,963 tokens for estimating model parameters.
Penn Treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yi and Wang, Rui
Dependency Parsing with HPSG
Note that all grammar rules in ERG are either unary or binary, giving us relatively deep trees when compared with annotations such as Penn Treebank .
Dependency Parsing with HPSG
For these rules, we refer to the conversion of the Penn Treebank into dependency structures used in the CoNLL 2008 Shared Task, and mark the heads of these rules in a way that will arrive at a compatible dependency backbone.
Dependency Parsing with HPSG
2More recent study shows that with carefully designed retokenization and preprocessing rules, over 80% sentential coverage can be achieved on the WSJ sections of the Penn Treebank data using the same version of ERG.
Experiment Results & Error Analyses
The larger part is converted from the Penn Treebank Wall Street Journal Sections #2—#21, and is used for training statistical dependency parsing models; the smaller part, which covers sentences from Section #23, is used for testing.
Experiment Results & Error Analyses
Brown This dataset contains a subset of converted sentences from BROWN sections of the Penn Treebank .
Experiment Results & Error Analyses
Although the original annotation scheme is similar to the Penn Treebank , the dependency extraction setting is slightly different to the CoNLLWSJ dependencies (e.g.
Introduction
the Wall Street Journal (WSJ) sections of the Penn Treebank (Marcus et al., 1993) as training set, tests on BROWN Sections typically result in a 6-8% drop in labeled attachment scores, although the average sentence length is much shorter in BROWN than that in WSJ.
Penn Treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua
Abstract
Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result.
Conclusion
Future work includes further investigation of our conversion method for other pairs of grammar formalisms, e.g., from the grammar formalism of the Penn Treebank to more deep linguistic formalism like CCG, HPSG, or LFG.
Experiments of Grammar Formalism Conversion
(2008) used WSJ section 19 from the Penn Treebank to extract DS to PS conversion rules and then produced dependency trees from WSJ section 22 for evaluation of their DS to PS conversion algorithm.
Experiments of Grammar Formalism Conversion
5 We used the tool “Penn2Malt” to produce dependency structures from the Penn Treebank , which was also used for PS to DS conversion in our conversion algorithm.
Introduction
We have evaluated our conversion algorithm on a dependency structure treebank (produced from the Penn Treebank ) for comparison with previous work (Xia et al., 2008).
Introduction
Section 3 provides experimental results of grammar formalism conversion on a dependency treebank produced from the Penn Treebank .
Penn Treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Webber, Bonnie
Abstract
Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.
Conclusion
This paper has, for the first time, provided genre information about the articles in the Penn TreeBank .
Genre in the Penn TreeBank
Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).
Genre in the Penn TreeBank
the Penn TreeBank that aren’t included in the PDTB.
Introduction
This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008).
Penn Treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Boxwell, Stephen and Mehay, Dennis and Brew, Chris
Error Analysis
This particular problem is caused by an annotation error in the original Penn Treebank that was carried through in the conversion to CCGbank.
This is easily read off of the CCG PARG relationships.
For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse.
This is easily read off of the CCG PARG relationships.
The Penn Treebank features are as follows:
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Clark, Stephen and Curran, James R.
Abstract
We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
Introduction
The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics.
Introduction
The formalism-based parser we use is the CCG parser of Clark and Curran (2007), which is based on CCGbank (Hockenmaier and Steedman, 2007), a CCG version of the Penn Treebank .
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Galley, Michel and Manning, Christopher D.
Dependency parsing experiments
We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank .
Dependency parsing experiments
Our other test set is the standard Section 23 of the Penn treebank .
Dependency parsing experiments
For Parsing, sentences are cased and tokenization abides to the PTB segmentation as used in the Penn treebank version 3.
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Merlo, Paola and van der Plas, Lonneke
Materials and Method
Proposition Bank (Palmer et al., 2005) adds Levin’s style predicate-argument annotation and indication of verbs’ alternations to the syntactic structures of the Penn Treebank (Marcus et al.,
Materials and Method
Verbal predicates in the Penn Treebank (PTB) receive a label REL and their arguments are annotated with abstract semantic role labels A0-A5 or AA for those complements of the predicative verb that are considered arguments, while those complements of the verb labelled with a semantic functional label in the original PTB receive the composite semantic role label AM-X, where X stands for labels such as LOC, TMP or ADV, for locative, temporal and adverbial modifiers respectively.
Materials and Method
SemLink1 provides mappings from PropB ank to VerbNet for the WSJ portion of the Penn Treebank .
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: