Index of papers in Proc. ACL 2008 that mention
  • Penn Treebank
Agirre, Eneko and Baldwin, Timothy and Martinez, David
Abstract
We devise a gold-standard sense- and parse tree-annotated dataset based on the intersection of the Penn Treebank and SemCor, and experiment with different approaches to both semantic representation and disambiguation.
Background
Traditionally, the two parsers have been trained and evaluated over the WSJ portion of the Penn Treebank (PTB: Marcus et al.
Background
We diverge from this norm in focusing exclusively on a sense-annotated subset of the Brown Corpus portion of the Penn Treebank , in order to investigate the upper bound performance of the models given gold-standard sense information.
Background
most closely related research is that of Bikel (2000), who merged the Brown portion of the Penn Treebank with SemCor (similarly to our approach in Section 4.1), and used this as the basis for evaluation of a generative bilexical model for joint WSD and parsing.
Conclusions
As far as we know, these are the first results over both WordNet and the Penn Treebank to show that semantic processing helps parsing.
Experimental setting
The only publicly-available resource with these two characteristics at the time of this work was the subset of the Brown Corpus that is included in both SemCor (Landes et al., 1998) and the Penn Treebank (PTB).2 This provided the basis of our dataset.
Introduction
We provide the first definitive results that word sense information can enhance Penn Treebank parser performance, building on earlier results of Bikel (2000) and Xiong et al.
Penn Treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Vadas, David and Curran, James R.
Abstract
This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank .
Background
Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank , providing a large gold-standard corpus for NP bracketing.
Conversion Process
We apply one preprocessing step on the Penn Treebank data, where if multiple tokens are enclosed by brackets, then a NML node is placed around those
Conversion Process
Since we are applying these to CCGbank NP structures rather than the Penn Treebank , the POS tag based heuristics are sufficient to determine heads accurately.
Experiments
Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
Introduction
This is because their training data, the Penn Treebank (Marcus et al., 1993), does not fully annotate NP structure.
Introduction
The flat structure described by the Penn Treebank can be seen in this example:
Introduction
CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar (CCG) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank .
Penn Treebank is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Miyao, Yusuke and Saetre, Rune and Sagae, Kenji and Matsuzaki, Takuya and Tsujii, Jun'ichi
Evaluation Methodology
It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the conversion program relies on function tags and empty categories of the original Penn Treebank .
Evaluation Methodology
10Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank .
Introduction
This assumes the existence of a gold-standard test corpus, such as the Penn Treebank (Marcus et al., 1994).
Introduction
Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank , and high accuracy has been reported for WSJ text; however, these parsers rely on lexical information to attain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;
Syntactic Parsers and Their Representations
In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation.
Syntactic Parsers and Their Representations
Owing largely to the Penn Treebank , the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing.
Syntactic Parsers and Their Representations
ENJU The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank , and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.7
Penn Treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Abstract
We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank, and we show that the cluster-based features yield substantial gains in performance across a wide range of conditions.
Experiments
The English experiments were performed on the Penn Treebank (Marcus et al., 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6 We split the Treebank into a training set (Sections 2—21), a development set (Section 22), and several test sets (Sections 0,7 1, 23, and 24).
Experiments
9We ensured that the sentences of the Penn Treebank were excluded from the text used for the clustering.
Introduction
We show that our semi-supervised approach yields improvements for fixed datasets by performing parsing experiments on the Penn Treebank (Marcus et al., 1993) and Prague Dependency Treebank (Hajic, 1998; Hajic et al., 2001) (see Sections 4.1 and 4.3).
Penn Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Introduction
According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
Introduction
Table 1 shows the relative position’s distribution of head words around measure words in the Chinese Penn Treebank , where a negative position indicates that the head word is to the left of the measure word and a positive position indicates that the head word is to the right of the measure word.
Our Method
According to our survey, about 70.4% of measure words in the Chinese Penn Treebank need
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: