Index of papers in Proc. ACL 2012 that mention
  • Penn Treebank
Gardent, Claire and Narayan, Shashi
Conclusion
Using the Penn Treebank sentences associated with each SR Task dependency tree, we will create the two tree sets necessary to support error mining by dividing the set of trees output by the surface realiser into a set of trees (FAIL) associated with overgeneration (the generated sentences do not match the original sentences) and a set of trees (SUCCESS) associated with success (the generated sentence matches the original sentences).
Experiment and Results
The shallow input data provided by the SR Task was obtained from the Penn Treebank using the LTH Constituent—to—Dependency Conversion Tool for Penn—style Treebanks (Pennconverter, (J ohans—son and Nugues, 2007)).
Experiment and Results
The chunking was performed by retrieving from the Penn Treebank (PTB), for each phrase type, the yields of the constituents of that type and by using the alignment between words and dependency tree nodes provided by the organisers of the SR Task.
Experiment and Results
5 In the Penn Treebank , the POS tag is the category assigned to possessive ’s.
Related Work
(Callaway, 2003) avoids this shortcoming by converting the Penn Treebank to the format expected by his realiser.
Penn Treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Experiments
In Table l, we show the first four samples of length between 15 and 20 generated from our model and a 5- gram model trained on the Penn Treebank .
Experiments
For training data, we constructed a large treebank by concatenating the WSJ and Brown portions of the Penn Treebank , the 50K BLLIP training sentences from Post (2011), and the AFP and APW portions of English Gigaword version 3 (Graff, 2003), totaling about 1.3 billion tokens.
Experiments
We used the human-annotated parses for the sentences in the Penn Treebank , but parsed the Gigaword and BLLIP sentences with the Berkeley Parser.
Tree Transformations
Figure 2: A sample parse from the Penn Treebank after the tree transformations described in Section 3.
Tree Transformations
Although the Penn Treebank annotates temporal N Ps, most off-the-shelf parsers do not retain these tags, and we do not assume their presence.
Treelet Language Modeling
There is one additional hurdle in the estimation of our model: while there exist corpora with human-annotated constituency parses like the Penn Treebank (Marcus et al., 1993), these corpora are quite small — on the order of millions of tokens — and we cannot gather nearly as many counts as we can for 77.-grams, for which billions or even trillions (Brants et al., 2007) of tokens are available on the Web.
Penn Treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
Following this POS representation, there are as many as 10 possible POS tags that may occur in between the—0f, as estimated from the WSJ corpus of Penn Treebank .
Abstract
To explore determinacy in the distribution of POS tags in Penn Treebank , we need to consider that a POS tag marks the basic syntactic category of a word as well as its morphological inflection.
Abstract
Table 1: Morph features of frequent words and rare words as computed from the WSJ Corpus of Penn Treebank .
Penn Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki
Abstract
Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
Experiment
We ran experiments on the Wall Street Journal (WSJ) portion of the English Penn Treebank data set (Marcus et al., 1993), using a standard data split (sections 2—21 for training, 22 for development and 23 for testing).
Introduction
Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: