Index of papers in Proc. ACL 2010 that mention

Penn Treebank

Seen in text as:

Penn Treebank (26)
Penn treebank (10)
Penn TreeBank (6)

Seen in 42 sentences in 8 papers.

1. Accurate Context-Free Parsing with Combinatory Categorial Grammar

Fowler, Timothy A. D. and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Latent Variable CCG Parser	Unlike the context-free grammars extracted from the Penn treebank , these allow for the categorial semantics that accompanies any categorial parse and for a more elegant analysis of linguistic structures such as extraction and coordination.
A Latent Variable CCG Parser	in Petrov’s experiments on the Penn treebank , the syntactic category NP was refined to the more fine-grained N P1 and N P2 roughly corresponding to N Ps in subject and object positions.
A Latent Variable CCG Parser	In the supertagging literature, POS tagging and supertagging are distinguished — POS tags are the traditional Penn treebank tags (e. g. NN, VBZ and DT) and supertags are CCG categories.
Introduction	The Petrov parser (Petrov and Klein, 2007) uses latent variables to refine the grammar extracted from a corpus to improve accuracy, originally used to improve parsing results on the Penn treebank (PTB).
Introduction	These results should not be interpreted as proof that grammars extracted from the Penn treebank and from CCGbank are equivalent.
The Language Classes of Combinatory Categorial Grammars	CCGbank (Hockenmaier and Steedman, 2007) is a corpus of CCG derivations that was semiautomatically converted from the Wall Street J our-nal section of the Penn treebank .

Penn Treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

CCG (46)
Penn treebank (11)
treebank (11)

2. Rebanking CCGbank for Improved NP Interpretation

Honnibal, Matthew and Curran, James R. and Bos, Johan

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and motivation	Statistical parsers induce their grammars from corpora, and the corpora for linguistically motivated formalisms currently do not contain high quality predicate-argument annotation, because they were derived from the Penn Treebank (PTB Marcus et al., 1993).
Combining CCGbank corrections	The structure of such compound noun phrases is left underspecified in the Penn Treebank (PTB), because the annotation procedure involved stitching together partial parses produced by the Fid-ditch parser (Hindle, 1983), which produced flat brackets for these constructions.
Combining CCGbank corrections	The syntactic analysis of punctuation is notoriously difficult, and punctuation is not always treated consistently in the Penn Treebank (Bies et al., 1995).
Conclusion	The most cited computational linguistics work to date is the Penn Treebank (Marcus et al., l993)1.
Introduction	We chose to work on CCGbank (Hockenmaier and Steedman, 2007), a Combinatory Categorial Grammar (Steedman, 2000) treebank acquired from the Penn Treebank (Marcus et al., 1993).
Noun predicate-argument structure	Our analysis requires semantic role labels for each argument of the nominal predicates in the Penn Treebank — precisely what NomBank (Meyers et al., 2004) provides.
Noun predicate-argument structure	First, we align CCGbank and the Penn Treebank , and produce a version of NomBank that refers to CCGbank nodes.

Penn Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

Treebank (14)
CCG (12)
Penn Treebank (7)

3. The Manually Annotated Sub-Corpus: A Community Resource for and by the People

Ide, Nancy and Baker, Collin and Fellbaum, Christiane and Passonneau, Rebecca

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The most well-known multiply-annotated and validated corpus of English is the one million word Wall Street Journal corpus known as the Penn Treebank (Marcus et al., 1993), which over the years has been fully or partially annotated for several phenomena over and above the original part-of-speech tagging and phrase structure annotation.
Introduction	More recently, the OntoNotes project (Pradhan et al., 2007) released a one million word English corpus of newswire, broadcast news, and broadcast conversation that is annotated for Penn Treebank syntax, PropBank predicate argument structures, coreference, and named entities.
MASC Annotations	words Token Validated 1 18 222472 Sentence Validated 1 18 222472 POS/lemma Validated 1 18 222472 Noun chunks Validated 1 18 222472 Verb chunks Validated 1 18 222472 Named entities Validated 1 18 222472 FrameNet frames Manual 21 17829 HSPG Validated 40* 30106 Discourse Manual 40* 30106 Penn Treebank Validated 97 873 83 PropB ank Validated 92 50165 Opinion Manual 97 47583 TimeB ank Validated 34 5434 Committed belief Manual 13 4614 Event Manual 13 4614 Coreference Manual 2 1 877
MASC Annotations	Annotations produced by other projects and the FrameNet and Penn Treebank annotations produced specifically for MASC are semiautomatically and/or manually produced by those projects and subjected to their internal quality controls.
MASC: The Corpus	All of the first 80K increment is annotated for Penn Treebank syntax.
MASC: The Corpus	The second 120K increment includes 5.5K words of Wall Street Journal texts that have been annotated by several projects, including Penn Treebank , PropBank, Penn Discourse Treebank, TimeML, and the Pittsburgh Opinion project.

Penn Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

4. Open Information Extraction Using Wikipedia

Wu, Fei and Weld, Daniel S.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used three corpora for experiments: WSJ from Penn Treebank , Wikipedia, and the general Web.
Experiments	In contrast, TextRunner was trained with 91,687 positive examples and 96,795 negative examples generated from the WSJ dataset in Penn Treebank .
Experiments	We used three parsing options on the WSJ dataset: Stanford parsing, C] 50 parsing (Charniak and Johnson, 2005), and the gold parses from the Penn Treebank .
Introduction	For example, TextRunner uses a small set of handwritten rules to heuristically label training examples from sentences in the Penn Treebank .
Wikipedia-based Open IE	In both cases, however, we generate training data from Wikipedia by matching sentences with infoboxes, while TextRunner used a small set of handwritten rules to label training examples from the Penn Treebank .

Penn Treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

5. Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

Gerber, Matthew and Chai, Joyce

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data annotation and analysis	Implicit arguments have not been annotated within the Penn TreeBank , which is the textual and syntactic basis for NomBank.
Implicit argument identification	Consider the following abridged sentences, which are adjacent in their Penn TreeBank document:
Implicit argument identification	Starting with a wide range of features, we performed floating forward feature selection (Pudil et al., 1994) over held-out development data comprising implicit argument annotations from section 24 of the Penn TreeBank .
Introduction	However, as shown by the following example from the Penn TreeBank (Marcus et al., 1993), this restriction excludes extra-sentential arguments:

Penn Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

6. Faster Parsing by Supertagger Adaptation

Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	This method has been used effectively to improve parsing performance on newspaper text (McClosky et al., 2006a), as well as adapting a Penn Treebank parser to a new domain (McClosky et al., 2006b).
Data	We have used Sections 02-21 of CCGbank (Hock-enmaier and Steedman, 2007), the CCG version of the Penn Treebank (Marcus et al., 1993), as training data for the newspaper domain.
Introduction	Since the CCG lexical category set used by the supertagger is much larger than the Penn Treebank POS tag set, the accuracy of supertagging is much lower than POS tagging; hence the CCG supertagger assigns multiple supertags1 to a word, when the local context does not provide enough information to decide on the correct supertag.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Ravi, Sujith and Baldridge, Jason and Knight, Kevin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	CCGbank was created by semiautomatically converting the Penn Treebank to CCG derivations (Hockenmaier and Steedman, 2007).
Introduction	Most work has focused on POS-tagging for English using the Penn Treebank (Marcus et al., 1993), such as (Banko and Moore, 2004; Goldwater and Griffiths, 2007; Toutanova and J ohn-son, 2008; Goldberg et al., 2008; Ravi and Knight, 2009).
Introduction	This generally involves working with the standard set of 45 POS-tags employed in the Penn Treebank .

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

8. Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Sun, Jun and Zhang, Min and Tan, Chew Lim

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Substructure Spaces for BTKs	Compared with the widely used Penn TreeBank annotation, the new criterion utilizes some different grammar tags and is able to effectively describe some rare language phenomena in Chinese.
Substructure Spaces for BTKs	The annotator still uses Penn TreeBank annotation on the English side.
Substructure Spaces for BTKs	In addition, HIT corpus is not applicable for MT experiment due to the problems of domain divergence, annotation discrepancy (Chinese parse tree employs a different grammar from Penn Treebank annotations) and degree of tolerance for parsing errors.

Penn Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: