Index of papers in Proc. ACL that mention

Penn Treebank

Seen in text as:

Penn Treebank (200)
Penn treebank (17)
Penn TreeBank (11)

Seen in 227 sentences in 42 papers.

1. Transfer Learning for Constituency-Based Grammars

Zhang, Yuan and Barzilay, Regina and Globerson, Amir

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank .
Abstract	While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features.
Introduction	The standard solution to this bottleneck has relied on manually crafted transformation rules that map readily available syntactic annotations (e.g, the Penn Treebank ) to the desired formalism.
Introduction	A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank , while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar (CCG) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994).
Introduction	All of these formalisms share a similar basic syntactic structure with Penn Treebank CFG.
Related Work	For instance, mappings may specify how to convert traces and functional tags in Penn Treebank to the f-structure in LFG (Cahill, 2004).
Related Work	For instance, Hockenmaier and Steedman (2002) made thousands of POS and constituent modifications to the Penn Treebank to facilitate transfer to CCG.

Penn Treebank is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

CCG (45)
Treebank (27)
Penn Treebank (20)

2. Accurate Context-Free Parsing with Combinatory Categorial Grammar

Fowler, Timothy A. D. and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Latent Variable CCG Parser	Unlike the context-free grammars extracted from the Penn treebank , these allow for the categorial semantics that accompanies any categorial parse and for a more elegant analysis of linguistic structures such as extraction and coordination.
A Latent Variable CCG Parser	in Petrov’s experiments on the Penn treebank , the syntactic category NP was refined to the more fine-grained N P1 and N P2 roughly corresponding to N Ps in subject and object positions.
A Latent Variable CCG Parser	In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
Introduction	The Petrov parser (Petrov and Klein, 2007) uses latent variables to refine the grammar extracted from a corpus to improve accuracy, originally used to improve parsing results on the Penn treebank (PTB).
Introduction	These results should not be interpreted as proof that grammars extracted from the Penn treebank and from CCGbank are equivalent.
The Language Classes of Combinatory Categorial Grammars	CCGbank (Hockenmaier and Steedman, 2007) is a corpus of CCG derivations that was semiautomatically converted from the Wall Street J our-nal section of the Penn treebank .

Penn Treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

CCG (46)
Penn treebank (11)
treebank (11)

3. Creating a manually error-tagged and shallow-parsed learner corpus

Nagata, Ryo and Whittaker, Edward and Sheinman, Vera

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Difficulties in Learner Corpus Creation	For POS/parsing annotation, there are also a number of annotation schemes including the Brown tag set, the Claws tag set, and the Penn Treebank tag set.
Difficulties in Learner Corpus Creation	For instance, there are at least three possibilities for POS-tagging the word sing in the sentence everyone sing together using the Penn Treebank tag set: singN B, sing/VBP, or sing/VBZ.
Introduction	For similar reasons, to the best of our knowledge, there exists no such learner corpus that is manually shallow-parsed and which is also publicly available, unlike, say, native-speaker corpora such as the Penn Treebank .
Method	We selected the Penn Treebank tag set, which is one of the most widely used tag sets, for our
Method	Similar to the error annotation scheme, we conducted a pilot study to determine what modifications we needed to make to the Penn Treebank scheme.
Method	As a result of the pilot study, we found that the Penn Treebank tag set sufficed in most cases except for errors which learners made.
UK and XP stand for unknown and X phrase, respectively.	Both use the Penn Treebank POS tag set.
UK and XP stand for unknown and X phrase, respectively.	An obvious cause of mistakes in both taggers is that they inevitably make errors in the POSs that are not defined in the Penn Treebank tag set, that is, UK and CE.

Penn Treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

4. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For these experiments, we use the Wall Street Journal portion of the Penn Treebank (Marcus et al., 1993).
Experiments	Following the CoNLL shared task from 2000, we use sections 15-18 of the Penn Treebank for our labeled training data for the supervised sequence labeler in all experiments (Tjong et al., 2000).
Experiments	For the tagging experiments, we train and test using the gold standard POS tags contained in the Penn Treebank .

Penn Treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Improving Parsing and PP Attachment Performance with Sense Information

Agirre, Eneko and Baldwin, Timothy and Martinez, David

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We devise a gold-standard sense- and parse tree-annotated dataset based on the intersection of the Penn Treebank and SemCor, and experiment with different approaches to both semantic representation and disambiguation.
Background	Traditionally, the two parsers have been trained and evaluated over the WSJ portion of the Penn Treebank (PTB: Marcus et al.
Background	We diverge from this norm in focusing exclusively on a sense-annotated subset of the Brown Corpus portion of the Penn Treebank , in order to investigate the upper bound performance of the models given gold-standard sense information.
Background	most closely related research is that of Bikel (2000), who merged the Brown portion of the Penn Treebank with SemCor (similarly to our approach in Section 4.1), and used this as the basis for evaluation of a generative bilexical model for joint WSD and parsing.
Conclusions	As far as we know, these are the first results over both WordNet and the Penn Treebank to show that semantic processing helps parsing.
Experimental setting	The only publicly-available resource with these two characteristics at the time of this work was the subset of the Brown Corpus that is included in both SemCor (Landes et al., 1998) and the Penn Treebank (PTB).2 This provided the basis of our dataset.
Introduction	We provide the first definitive results that word sense information can enhance Penn Treebank parser performance, building on earlier results of Bikel (2000) and Xiong et al.

Penn Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

6. Parsing Noun Phrase Structure with CCG

Vadas, David and Curran, James R.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank .
Background	Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank , providing a large gold-standard corpus for NP bracketing.
Conversion Process	We apply one preprocessing step on the Penn Treebank data, where if multiple tokens are enclosed by brackets, then a NML node is placed around those
Conversion Process	Since we are applying these to CCGbank NP structures rather than the Penn Treebank , the POS tag based heuristics are sufficient to determine heads accurately.
Experiments	Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
Introduction	This is because their training data, the Penn Treebank (Marcus et al., 1993), does not fully annotate NP structure.
Introduction	The flat structure described by the Penn Treebank can be seen in this example:
Introduction	CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar (CCG) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank .

Penn Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

F-score (20)
NER (19)
CCG (18)

7. Error Mining on Dependency Trees

Gardent, Claire and Narayan, Shashi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Using the Penn Treebank sentences associated with each SR Task dependency tree, we will create the two tree sets necessary to support error mining by dividing the set of trees output by the surface realiser into a set of trees (FAIL) associated with overgeneration (the generated sentences do not match the original sentences) and a set of trees (SUCCESS) associated with success (the generated sentence matches the original sentences).
Experiment and Results	The shallow input data provided by the SR Task was obtained from the Penn Treebank using the LTH Constituent—to—Dependency Conversion Tool for Penn—style Treebanks (Pennconverter, (J ohans—son and Nugues, 2007)).
Experiment and Results	The chunking was performed by retrieving from the Penn Treebank (PTB), for each phrase type, the yields of the constituents of that type and by using the alignment between words and dependency tree nodes provided by the organisers of the SR Task.
Experiment and Results	5 In the Penn Treebank , the POS tag is the category assigned to possessive ’s.
Related Work	(Callaway, 2003) avoids this shortcoming by converting the Penn Treebank to the format expected by his realiser.

Penn Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. Minimized Models for Unsupervised Part-of-Speech Tagging

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	We use the standard test set for this task, a 24,115-word subset of the Penn Treebank , for which a gold tag sequence is available.
Introduction	They show considerable improvements in tagging accuracy when using a coarser-grained version (with l7-tags) of the tag set from the Penn Treebank .
Introduction	In contrast, we keep all the original dictionary entries derived from the Penn Treebank data for our experiments.
Restarts and More Data	Their models are trained on the entire Penn Treebank data (instead of using only the 24,115-token test data), and so are the tagging models used by Goldberg et al.
Restarts and More Data	ing data from the 24,115-t0ken set to the entire Penn Treebank (973k tokens).
Smaller Tagset and Incomplete Dictionaries	Their systems were shown to obtain considerable improvements in accuracy when using a l7-tagset (a coarser-grained version of the tag labels from the Penn Treebank ) instead of the 45-tagset.
Smaller Tagset and Incomplete Dictionaries	The accuracy numbers reported for Init-HMM and LDA+AC are for models that are trained on all the available unlabeled data from the Penn Treebank .
Smaller Tagset and Incomplete Dictionaries	The IP+EM models used in the l7-tagset experiments reported here were not trained on the entire Penn Treebank , but instead used a smaller section containing 77,963 tokens for estimating model parameters.

Penn Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

9. Constructing a Turkish-English Parallel TreeBank

Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank .
Conclusion	We translated and transformed a subset of parse trees of Penn Treebank to Turkish.
Conclusion	As a future work, we plan to expand the dataset to include all Penn Treebank sentences.
Corpus construction strategy	In order to constrain the syntactic complexity of the sentences in the corpus, we selected from the Penn Treebank II 9560 trees which contain a maximum of 15 tokens.
Corpus construction strategy	These include 8660 trees from the training set of the Penn Treebank , 360 trees from its development set and 540 trees from its test set.
Literature Review	MaltParser is trained on the Penn Treebank for English, on the Swedish treebank Talbanken05 (Nivre et al., 2006b), and on the METU-Sabanc1 Turkish Treebank (Atalay et al., 2003), respectively.
Transformation heuristics	In the Penn Treebank II annotation, the movement leaves a trace and is associated with wh- constituent with a numeric marker.

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Rebanking CCGbank for Improved NP Interpretation

Honnibal, Matthew and Curran, James R. and Bos, Johan

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and motivation	Statistical parsers induce their grammars from corpora, and the corpora for linguistically motivated formalisms currently do not contain high quality predicate-argument annotation, because they were derived from the Penn Treebank (PTB Marcus et al., 1993).
Combining CCGbank corrections	The structure of such compound noun phrases is left underspecified in the Penn Treebank (PTB), because the annotation procedure involved stitching together partial parses produced by the Fid-ditch parser (Hindle, 1983), which produced flat brackets for these constructions.
Combining CCGbank corrections	The syntactic analysis of punctuation is notoriously difficult, and punctuation is not always treated consistently in the Penn Treebank (Bies et al., 1995).
Conclusion	The most cited computational linguistics work to date is the Penn Treebank (Marcus et al., l993)1.
Introduction	We chose to work on CCGbank (Hockenmaier and Steedman, 2007), a Combinatory Categorial Grammar (Steedman, 2000) treebank acquired from the Penn Treebank (Marcus et al., 1993).
Noun predicate-argument structure	Our analysis requires semantic role labels for each argument of the nominal predicates in the Penn Treebank — precisely what NomBank (Meyers et al., 2004) provides.
Noun predicate-argument structure	First, we align CCGbank and the Penn Treebank , and produce a version of NomBank that refers to CCGbank nodes.

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

Treebank (14)
CCG (12)
Penn Treebank (7)

11. Cross-Domain Dependency Parsing Using a Deep Linguistic Grammar

Zhang, Yi and Wang, Rui

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency Parsing with HPSG	Note that all grammar rules in ERG are either unary or binary, giving us relatively deep trees when compared with annotations such as Penn Treebank .
Dependency Parsing with HPSG	For these rules, we refer to the conversion of the Penn Treebank into dependency structures used in the CoNLL 2008 Shared Task, and mark the heads of these rules in a way that will arrive at a compatible dependency backbone.
Dependency Parsing with HPSG	2More recent study shows that with carefully designed retokenization and preprocessing rules, over 80% sentential coverage can be achieved on the WSJ sections of the Penn Treebank data using the same version of ERG.
Experiment Results & Error Analyses	The larger part is converted from the Penn Treebank Wall Street Journal Sections #2—#21, and is used for training statistical dependency parsing models; the smaller part, which covers sentences from Section #23, is used for testing.
Experiment Results & Error Analyses	Brown This dataset contains a subset of converted sentences from BROWN sections of the Penn Treebank .
Experiment Results & Error Analyses	Although the original annotation scheme is similar to the Penn Treebank , the dependency extraction setting is slightly different to the CoNLLWSJ dependencies (e.g.
Introduction	the Wall Street Journal (WSJ) sections of the Penn Treebank (Marcus et al., 1993) as training set, tests on BROWN Sections typically result in a 6-8% drop in labeled attachment scores, although the average sentence length is much shorter in BROWN than that in WSJ.

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. Automatic Interpretation of the English Possessive

Tratz, Stephen and Hovy, Eduard

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dataset Creation	21,938 total examples, 15,330 come from sections 2—21 of the Penn Treebank (Marcus et al., 1993).
Dataset Creation	For the Penn Treebank , we extracted the examples using the provided gold standard parse trees, whereas, for the latter cases, we used the output of an open source parser (Tratz and Hovy, 2011).
Experiments	The accuracy figures for the test instances from the Penn Treebank , The Jungle Book, and The History of the Decline and Fall of the Roman Empire were 88.8%, 84.7%, and 80.6%, respectively.
Related Work	The NomBank project (Meyers et al., 2004) provides coarse annotations for some of the possessive constructions in the Penn Treebank , but only those that meet their criteria.
Semantic Relation Inventory	Penn Treebank , respectively.
Semantic Relation Inventory	portion of the Penn Treebank .
Semantic Relation Inventory	The Penn Treebank and The History of the Decline and Fall of the R0-man Empire were substantially more similar, although there are notable differences.

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. Task-oriented Evaluation of Syntactic Parsers and Their Representations

Miyao, Yusuke and Saetre, Rune and Sagae, Kenji and Matsuzaki, Takuya and Tsujii, Jun'ichi

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation Methodology	It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the conversion program relies on function tags and empty categories of the original Penn Treebank .
Evaluation Methodology	10Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank .
Introduction	This assumes the existence of a gold-standard test corpus, such as the Penn Treebank (Marcus et al., 1994).
Introduction	Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank , and high accuracy has been reported for WSJ text; however, these parsers rely on lexical information to attain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;
Syntactic Parsers and Their Representations	In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation.
Syntactic Parsers and Their Representations	Owing largely to the Penn Treebank , the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing.
Syntactic Parsers and Their Representations	ENJU The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank , and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.7

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Exploiting Heterogeneous Treebanks for Parsing

Niu, Zheng-Yu and Wang, Haifeng and Wu, Hua

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Results on the Penn Treebank show that our conversion method achieves 42% error reduction over the previous best result.
Conclusion	Future work includes further investigation of our conversion method for other pairs of grammar formalisms, e.g., from the grammar formalism of the Penn Treebank to more deep linguistic formalism like CCG, HPSG, or LFG.
Experiments of Grammar Formalism Conversion	(2008) used WSJ section 19 from the Penn Treebank to extract DS to PS conversion rules and then produced dependency trees from WSJ section 22 for evaluation of their DS to PS conversion algorithm.
Experiments of Grammar Formalism Conversion	5 We used the tool “Penn2Malt” to produce dependency structures from the Penn Treebank , which was also used for PS to DS conversion in our conversion algorithm.
Introduction	We have evaluated our conversion algorithm on a dependency structure treebank (produced from the Penn Treebank ) for comparison with previous work (Xia et al., 2008).
Introduction	Section 3 provides experimental results of grammar formalism conversion on a dependency treebank produced from the Penn Treebank .

Penn Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In Table l, we show the first four samples of length between 15 and 20 generated from our model and a 5- gram model trained on the Penn Treebank .
Experiments	For training data, we constructed a large treebank by concatenating the WSJ and Brown portions of the Penn Treebank , the 50K BLLIP training sentences from Post (2011), and the AFP and APW portions of English Gigaword version 3 (Graff, 2003), totaling about 1.3 billion tokens.
Experiments	We used the human-annotated parses for the sentences in the Penn Treebank , but parsed the Gigaword and BLLIP sentences with the Berkeley Parser.
Tree Transformations	Figure 2: A sample parse from the Penn Treebank after the tree transformations described in Section 3.
Tree Transformations	Although the Penn Treebank annotates temporal N Ps, most off-the-shelf parsers do not retain these tags, and we do not assume their presence.
Treelet Language Modeling	There is one additional hurdle in the estimation of our model: while there exist corpora with human-annotated constituency parses like the Penn Treebank (Marcus et al., 1993), these corpora are quite small — on the order of millions of tokens — and we cannot gather nearly as many counts as we can for 77.-grams, for which billions or even trillions (Brants et al., 2007) of tokens are available on the Web.

Penn Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Less Grammar, More Features

Hall, David and Durrett, Greg and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Annotations	Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Annotations	Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank .
Annotations	Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank .
Features	Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Parsing Model	Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task.
Surface Feature Framework	Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages.

Penn Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. The Manually Annotated Sub-Corpus: A Community Resource for and by the People

Ide, Nancy and Baker, Collin and Fellbaum, Christiane and Passonneau, Rebecca

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The most well-known multiply-annotated and validated corpus of English is the one million word Wall Street Journal corpus known as the Penn Treebank (Marcus et al., 1993), which over the years has been fully or partially annotated for several phenomena over and above the original part-of-speech tagging and phrase structure annotation.
Introduction	More recently, the OntoNotes project (Pradhan et al., 2007) released a one million word English corpus of newswire, broadcast news, and broadcast conversation that is annotated for Penn Treebank syntax, PropBank predicate argument structures, coreference, and named entities.
MASC Annotations	words Token Validated 1 18 222472 Sentence Validated 1 18 222472 POS/lemma Validated 1 18 222472 Noun chunks Validated 1 18 222472 Verb chunks Validated 1 18 222472 Named entities Validated 1 18 222472 FrameNet frames Manual 21 17829 HSPG Validated 40* 30106 Discourse Manual 40* 30106 Penn Treebank Validated 97 873 83 PropB ank Validated 92 50165 Opinion Manual 97 47583 TimeB ank Validated 34 5434 Committed belief Manual 13 4614 Event Manual 13 4614 Coreference Manual 2 1 877
MASC Annotations	Annotations produced by other projects and the FrameNet and Penn Treebank annotations produced specifically for MASC are semiautomatically and/or manually produced by those projects and subjected to their internal quality controls.
MASC: The Corpus	All of the first 80K increment is annotated for Penn Treebank syntax.
MASC: The Corpus	The second 120K increment includes 5.5K words of Wall Street Journal texts that have been annotated by several projects, including Penn Treebank , PropBank, Penn Discourse Treebank, TimeML, and the Pittsburgh Opinion project.

Penn Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. A Context Free TAG Variant

Swanson, Ben and Yamangil, Elif and Charniak, Eugene and Shieber, Stuart

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We perform parsing experiments the Penn Treebank and draw comparisons to Tree-Substitution Grammars and between different variations in probabilistic model design.
Experiments	As a proof of concept, we investigate OSTAG in the context of the classic Penn Treebank statistical parsing setup; training on section 2-21 and testing on section 23.
Experiments	Furthermore, the various parameteri-zations of adjunction with OSTAG indicate that, at least in the case of the Penn Treebank , the finer grained modeling of a full table of adjunction probabilities for each Goodman index OSTAG3 overcomes the danger of sparse data estimates.
Introduction	We evaluate OSTAG on the familiar task of parsing the Penn Treebank .
TAG and Variants	We propose a simple but empirically effective heuristic for grammar induction for our experiments on Penn Treebank data.

Penn Treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

19. Open Information Extraction Using Wikipedia

Wu, Fei and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used three corpora for experiments: WSJ from Penn Treebank , Wikipedia, and the general Web.
Experiments	In contrast, TextRunner was trained with 91,687 positive examples and 96,795 negative examples generated from the WSJ dataset in Penn Treebank .
Experiments	We used three parsing options on the WSJ dataset: Stanford parsing, C] 50 parsing (Charniak and Johnson, 2005), and the gold parses from the Penn Treebank .
Introduction	For example, TextRunner uses a small set of handwritten rules to heuristically label training examples from sentences in the Penn Treebank .
Wikipedia-based Open IE	In both cases, however, we generate training data from Wikipedia by matching sentences with infoboxes, while TextRunner used a small set of handwritten rules to label training examples from the Penn Treebank .

Penn Treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

20. Genre distinctions for discourse in the Penn TreeBank

Webber, Bonnie

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Articles in the Penn TreeBank were identified as being reviews, summaries, letters to the editor, news reportage, corrections, wit and short verse, or quarterly profit reports.
Conclusion	This paper has, for the first time, provided genre information about the articles in the Penn TreeBank .
Genre in the Penn TreeBank	Although the files in the Penn TreeBank (PTB) lack any classificatory meta-data, leading the PTB to be treated as a single homogeneous collection of “news articles”, researchers who have manually examined it in detail have noted that it includes a variety of “financial reports, general interest stories, business-related news, cultural reviews, editorials and letters to the editor” (Carlson et al., 2002, p. 7).
Genre in the Penn TreeBank	the Penn TreeBank that aren’t included in the PDTB.
Introduction	This paper considers differences in texts in the well-known Penn TreeBank (hereafter, PTB) and in particular, how these differences show up in the Penn Discourse TreeBank (Prasad et al., 2008).

Penn Treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

21. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that the cluster-based features yield substantial gains in performance across a wide range of conditions.
Experiments	The English experiments were performed on the Penn Treebank (Marcus et al., 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6 We split the Treebank into a training set (Sections 2—21), a development set (Section 22), and several test sets (Sections 0,7 1, 23, and 24).
Experiments	9We ensured that the sentences of the Penn Treebank were excluded from the text used for the clustering.
Introduction	We show that our semi-supervised approach yields improvements for fixed datasets by performing parsing experiments on the Penn Treebank (Marcus et al., 1993) and Prague Dependency Treebank (Hajic, 1998; Hajic et al., 2001) (see Sections 4.1 and 4.3).

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

22. Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

Gerber, Matthew and Chai, Joyce

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data annotation and analysis	Implicit arguments have not been annotated within the Penn TreeBank , which is the textual and syntactic basis for NomBank.
Implicit argument identification	Consider the following abridged sentences, which are adjacent in their Penn TreeBank document:
Implicit argument identification	Starting with a wide range of features, we performed floating forward feature selection (Pudil et al., 1994) over held-out development data comprising implicit argument annotations from section 24 of the Penn TreeBank .
Introduction	However, as shown by the following example from the Penn TreeBank (Marcus et al., 1993), this restriction excludes extra-sentential arguments:

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

23. Binarized Forest to String Translation

Zhang, Hao and Fang, Licheng and Xu, Peng and Wu, Xiaoyun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	It achieves 87.8% labelled attachment score and 88.8% unlabeled attachment score on the standard Penn Treebank test set.
Experiments	On the standard Penn Treebank test set, it achieves an F-score of 89.5%.
Experiments	The parser preprocesses the Penn Treebank training data through binarization.
Source Tree Binarization	For example, Penn Treebank annotations are often flat at the phrase level.

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

24. Sparser, Better, Faster GPU Parsing

Hall, David and Berg-Kirkpatrick, Taylor and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Anatomy of a Dense GPU Parser	Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank .
Introduction	As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993).
Minimum Bayes risk parsing	Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank .
Minimum Bayes risk parsing	We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank .

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. On WordNet Semantic Classes and Dependency Parsing

Bengoetxea, Kepa and Agirre, Eneko and Nivre, Joakim and Zhang, Yue and Gojenola, Koldo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank .
Experimental Framework	supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech.
Introduction	Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank .
Related work	The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing.

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank .
Abstract	To explore determinacy in the distribution of POS tags in Penn Treebank , we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.
Abstract	Table 1: Morph features of frequent words and rare words as computed from the WSJ Corpus of Penn Treebank .

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

27. Arguments and Modifiers from the Learner's Perspective

Bergen, Leon and Gibson, Edward and O'Donnell, Timothy J.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	We trained our model on sections 2—21 of the WSJ part of the Penn Treebank (Marcus et al., 1999).
Results	Unfortunately, marking for argument/modifiers in the Penn Treebank is incomplete, and is limited to certain adverbials, e.g.
Results	This corpus adds annotations indicating, for each node in the Penn Treebank , whether that node is a modifier.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Penn Treebank (3)
Treebank (3)

28. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04.
Abstract	For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing.
Abstract	For both methods we chose the best parameters for sentences of length 6 g 10 on the English Penn Treebank (training) and used this set for all other experiments.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

29. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
Introduction	Table 1 shows the relative position’s distribution of head words around measure words in the Chinese Penn Treebank , where a negative position indicates that the head word is to the left of the measure word and a positive position indicates that the head word is to the right of the measure word.
Our Method	According to our survey, about 70.4% of measure words in the Chinese Penn Treebank need

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

30. Fast and Accurate Shift-Reduce Constituent Parsing

Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Labeled English data employed in this paper were derived from the Wall Street Journal (WSJ) corpus of the Penn Treebank (Marcus et al., 1993).
Experiments	In addition, we removed from the unlabeled English data the sentences that appear in the WSJ corpus of the Penn Treebank .
Introduction	On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank, our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. Brutus: A Semantic Role Labeling System Incorporating CCG, CFG, and Dependency Features

Boxwell, Stephen and Mehay, Dennis and Brew, Chris

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Analysis	This particular problem is caused by an annotation error in the original Penn Treebank that was carried through in the conversion to CCGbank.
This is easily read off of the CCG PARG relationships.	For gold-standard parses, we remove functional tag and trace information from the Penn Treebank parses before we extract features over them, so as to simulate the conditions of an automatic parse.
This is easily read off of the CCG PARG relationships.	The Penn Treebank features are as follows:

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (39)
semantic role (25)
treebank (13)

32. Comparing the Accuracy of CCG and Penn Treebank Parsers

Clark, Stephen and Curran, James R.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare the CCG parser of Clark and Curran (2007) with a state-of-the-art Penn Treebank (PTB) parser.
Introduction	The first approach, which began in the mid-90$ and now has an extensive literature, is based on the Penn Treebank (PTB) parsing task: inferring skeletal phrase-structure trees for unseen sentences of the W8], and evaluating accuracy according to the Parseval metrics.
Introduction	The formalism-based parser we use is the CCG parser of Clark and Curran (2007), which is based on CCGbank (Hockenmaier and Steedman, 2007), a CCG version of the Penn Treebank .

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency parsing experiments	We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank .
Dependency parsing experiments	Our other test set is the standard Section 23 of the Penn treebank .
Dependency parsing experiments	For Parsing, sentences are cased and tokenization abides to the PTB segmentation as used in the Penn treebank version 3.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation

Manshadi, Mehdi and Gildea, Daniel and Allen, James

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	For example, Higgins and Sadock (2003) find fewer than 1000 sentences with two or more explicit quantifiers in the Wall Street journal section of Penn Treebank .
Introduction	Plurals form 18% of the NPs in our corpus and 20% of the nouns in Penn Treebank .
Introduction	Explicit universals, on the other hand, form less than 1% of the determiners in Penn Treebank .

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. Addressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors

Cirik, Volkan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The experiments are conducted on Penn Treebank Wall Street Journal corpus.
Experiments	Because we are trying to improve (Yatbaz et al., 2012), we select the experiment on Penn Treebank Wall Street Journal corpus in that work as our baseline and replicate it.
Introduction	For instance,the gold tag perplexity of word “offers” in the Penn Treebank Wall Street Journal corpus we worked on equals to 1.966.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

36. Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Ravi, Sujith and Baldridge, Jason and Knight, Kevin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	CCGbank was created by semiautomatically converting the Penn Treebank to CCG derivations (Hockenmaier and Steedman, 2007).
Introduction	Most work has focused on POS-tagging for English using the Penn Treebank (Marcus et al., 1993), such as (Banko and Moore, 2004; Goldwater and Griffiths, 2007; Toutanova and J ohn-son, 2008; Goldberg et al., 2008; Ravi and Knight, 2009).
Introduction	This generally involves working with the standard set of 45 POS-tags employed in the Penn Treebank .

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

37. Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
Experiment	We ran experiments on the Wall Street Journal (WSJ) portion of the English Penn Treebank data set (Marcus et al., 1993), using a standard data split (sections 2—21 for training, 22 for development and 23 for testing).
Introduction	Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

38. Abstraction and Generalisation in Semantic Role Labels: PropBank, VerbNet or both?

Merlo, Paola and van der Plas, Lonneke

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Materials and Method	Proposition Bank (Palmer et al., 2005) adds Levin’s style predicate-argument annotation and indication of verbs’ alternations to the syntactic structures of the Penn Treebank (Marcus et al.,
Materials and Method	Verbal predicates in the Penn Treebank (PTB) receive a label REL and their arguments are annotated with abstract semantic role labels A0-A5 or AA for those complements of the predicative verb that are considered arguments, while those complements of the verb labelled with a semantic functional label in the original PTB receive the composite semantic role label AM-X, where X stands for labels such as LOC, TMP or ADV, for locative, temporal and adverbial modifiers respectively.
Materials and Method	SemLink1 provides mappings from PropB ank to VerbNet for the WSJ portion of the Penn Treebank .

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

39. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The experiments were performed on the Penn Treebank (PTB) (Marcus et al., 1993), using a standard set of head-selection rules (Yamada
Introduction	With the availability of large-scale annotated corpora such as Penn Treebank (Marcus et al., 1993), it is easy to train a high-performance dependency parser using supervised learning methods.
Introduction	We conduct the experiments on the English Penn Treebank (PTB) (Marcus et al., 1993).

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

40. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Current state-of-the art syntactic parsers have achieved accuracies in the range of 90% F1 on the Penn Treebank , but a range of errors remain.
Introduction	Figure l: A PP attachment error in the parse output of the Berkeley parser (on Penn Treebank ).
Parsing Experiments	We use the standard splits of Penn Treebank into training (sections 2-21), development (section 22) and test (section 23).

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

41. Faster Parsing by Supertagger Adaptation

Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	This method has been used effectively to improve parsing performance on newspaper text (McClosky et al., 2006a), as well as adapting a Penn Treebank parser to a new domain (McClosky et al., 2006b).
Data	We have used Sections 02-21 of CCGbank (Hock-enmaier and Steedman, 2007), the CCG version of the Penn Treebank (Marcus et al., 1993), as training data for the newspaper domain.
Introduction	Since the CCG lexical category set used by the supertagger is much larger than the Penn Treebank POS tag set, the accuracy of supertagging is much lower than POS tagging; hence the CCG supertagger assigns multiple supertags1 to a word, when the local context does not provide enough information to decide on the correct supertag.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

42. Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Sun, Jun and Zhang, Min and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Substructure Spaces for BTKs	Compared with the widely used Penn TreeBank annotation, the new criterion utilizes some different grammar tags and is able to effectively describe some rare language phenomena in Chinese.
Substructure Spaces for BTKs	The annotator still uses Penn TreeBank annotation on the English side.
Substructure Spaces for BTKs	In addition, HIT corpus is not applicable for MT experiment due to the problems of domain divergence, annotation discrepancy (Chinese parse tree employs a different grammar from Penn Treebank annotations) and degree of tolerance for parsing errors.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: