Index of papers in Proc. ACL 2008 that mention

Treebank

Seen in text as:

Treebank (71)
treebank (48)
treebanks (14)
treebanking (3)

Seen in 127 sentences in 15 papers.

1. Improving Parsing and PP Attachment Performance with Sense Information

Agirre, Eneko and Baldwin, Timothy and Martinez, David

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We devise a gold-standard sense- and parse tree-annotated dataset based on the intersection of the Penn Treebank and SemCor, and experiment with different approaches to both semantic representation and disambiguation.
Background	Traditionally, the two parsers have been trained and evaluated over the WSJ portion of the Penn Treebank (PTB: Marcus et al.
Background	We diverge from this norm in focusing exclusively on a sense-annotated subset of the Brown Corpus portion of the Penn Treebank , in order to investigate the upper bound performance of the models given gold-standard sense information.
Background	most closely related research is that of Bikel (2000), who merged the Brown portion of the Penn Treebank with SemCor (similarly to our approach in Section 4.1), and used this as the basis for evaluation of a generative bilexical model for joint WSD and parsing.
Experimental setting	The only publicly-available resource with these two characteristics at the time of this work was the subset of the Brown Corpus that is included in both SemCor (Landes et al., 1998) and the Penn Treebank (PTB).2 This provided the basis of our dataset.
Experimental setting	2OntoNotes (Hovy et al., 2006) includes large-scale treebank and (selective) sense data, which we plan to use for future experiments when it becomes fully available.
Introduction	Our approach to exploring the impact of lexical semantics on parsing performance is to take two state-of-the-art statistical treebank parsers and pre-process the inputs variously.
Introduction	This simple method allows us to incorporate semantic information into the parser without having to reimplement a full statistical parser, and also allows for maximum comparability with existing results in the treebank parsing community.
Introduction	We provide the first definitive results that word sense information can enhance Penn Treebank parser performance, building on earlier results of Bikel (2000) and Xiong et al.

Treebank is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Ad Hoc Treebank Structures

Dickinson, Markus

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We outline the problem of ad hoc rules in treebanks , rules used for specific constructions in one data set and unlikely to be used again.
Abstract	Based on a simple notion of rule equivalence and on the idea of finding rules unlike any others, we develop two methods for detecting ad hoc rules in flat treebanks and show they are successful in detecting such rules.
Background	Treebank (Marcus et al., 1993), six of which are errors.
Background	For example, in (2), the daughters list RB TO JJ NNS is a daughters list with no correlates in the treebank ; it is erroneous because close to wholesale needs another layer of structure, namely adjective phrase (ADJP) (Bies et al., 1995, p. 179).
Introduction and Motivation	When extracting rules from constituency-based treebanks employing flat structures, grammars often limit the set of rules (e.g., Charniak, 1996), due to the large number of rules (Krotov et al., 1998) and “leaky” rules that can lead to mis-analysis (Foth and Menzel, 2006).
Introduction and Motivation	Thus, we need to carefully consider the applicability of rules in a treebank to new text.
Introduction and Motivation	For example, when ungrammatical or nonstandard text is used, treebanks employ rules to cover it, but do not usually indicate ungrammaticality in the annotation.

Treebank is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

bigram (20)
treebank (15)

3. A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing

Goldberg, Yoav and Tsarfaty, Reut

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using a treebank grammar, a data-driven lexicon, and a linguistically motivated unknown-tokens handling technique our model outperforms previous pipelined, integrated or factorized systems for Hebrew morphological and syntactic processing, yielding an error reduction of 12% over the best published results so far.
Discussion and Conclusion	The overall performance of our joint framework demonstrates that a probability distribution obtained over mere syntactic contexts using a Treebank grammar and a data-driven lexicon outperforms upper bounds proposed by previous joint disambiguation systems and achieves segmentation and parsing results on a par with state-of-the-art standalone applications results.
Experimental Setup	Data We use the Hebrew Treebank , (Sima’an et a1., 2001), provided by the knowledge center for processing Hebrew, in which sentences from the daily newspaper “Ha’aretz” are morphologically segmented and syntactically annotated.
Experimental Setup	The treebank has two versions, v1.0 and v2.0, containing 5001 and 6501 sentences respectively.
Experimental Setup	6Unfortunatley running our setup on the v2.0 data set is currently not possible due to missing tokens-morphemes alignment in the v2.0 treebank .
Introduction	Morphological segmentation decisions in our model are delegated to a lexeme-based PCFG and we show that using a simple treebank grammar, a data-driven lexicon, and a linguistically motivated unknown-tokens handling our model outperforms (Tsarfaty, 2006) and (Cohen and Smith, 2007) on the joint task and achieves state-of-the-art results on a par with current respective standalone models.2
Previous Work on Hebrew Processing	The development of the very first Hebrew Treebank (Sima’an et al., 2001) called for the exploration of general statistical parsing methods, but the application was at first limited.
Previous Work on Hebrew Processing	Tsarfaty (2006) was the first to demonstrate that fully automatic Hebrew parsing is feasible using the newly available 5000 sentences treebank .

Treebank is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

4. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Since exact inference is intractable with nonlocal features, we present an approximate algorithm inspired by forest rescoring that makes discriminative training practical over the whole Treebank .
Abstract	Our final result, an F—score of 91.7, outperforms both 50-best and 100-best reranking baselines, and is better than any previously reported systems trained on the Treebank .
Conclusion	With efficient approximate decoding, perceptron training on the whole Treebank becomes practical, which can be done in about a day even with a Python implementation.
Conclusion	Our final result outperforms both 50-best and 100-best reranking baselines, and is better than any previously reported systems trained on the Treebank .
Experiments	We compare the performance of our forest reranker against n-best reranking on the Penn English Treebank (Marcus et al., 1993).
Experiments	We use the standard split of the Treebank : sections 02-21 as the training data (39832 sentences), section 22 as the development set (1700 sentences), and section 23 as the test set (2416 sentences).
Experiments	Our final result (91.7) is better than any previously reported system trained on the Treebank , although
Introduction	Although previous work on discriminative parsing has mainly focused on short sentences (3 15 words) (Taskar et al., 2004; Turian and Melamed, 2007), our work scales to the whole Treebank , where
Introduction	This result is also better than any previously reported systems trained on the Treebank .
Packed Forests as Hypergraphs	However, in this work, we use forests from a Treebank parser (Charniak, 2000) whose grammar is often flat in many productions.

Treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

5. PDT 2.0 Requirements on a Query Language

Mírovský, Jiří

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Linguistically annotated treebanks play an essential part in the modern computational linguistics.
Abstract	The more complex the treebanks become, the more sophisticated tools are required for using them, namely for searching in the data.
Abstract	We study linguistic phenomena annotated in the Prague Dependency Treebank 2.0 and create a list of requirements these phenomena set on a search tool, especially on its query language.
Introduction	Searching in a linguistically annotated treebank is a principal task in the modern computational linguistics.
Introduction	A search tool helps extract useful information from the treebank , in order to study the language, the annotation system or even to search for errors in the annotation.
Introduction	The more complex the treebank is, the more sophisticated the search tool and its query language needs to be.

Treebank is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

6. Task-oriented Evaluation of Syntactic Parsers and Their Representations

Miyao, Yusuke and Saetre, Rune and Sagae, Kenji and Matsuzaki, Takuya and Tsujii, Jun'ichi

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	Although we restricted ourselves to parsers trainable with Penn Treebank-style treebanks , our methodology can be applied to any English parsers.
Evaluation Methodology	It should be noted, however, that this conversion cannot work perfectly with automatic parsing, because the conversion program relies on function tags and empty categories of the original Penn Treebank .
Evaluation Methodology	Next, we run parsers retrained with GENIA11 (8127 sentences), which is a Penn Treebank-style treebank of biomedical paper abstracts.
Evaluation Methodology	10Some of the parser packages include parsing models trained with extended data, but we used the models trained with WSJ section 2-21 of the Penn Treebank .
Introduction	This assumes the existence of a gold-standard test corpus, such as the Penn Treebank (Marcus et al., 1994).
Introduction	Most state-of-the-art parsers for English were trained with the Wall Street Journal (WSJ) portion of the Penn Treebank , and high accuracy has been reported for WSJ text; however, these parsers rely on lexical information to attain high accuracy, and it has been criticized that these parsers may overfit to WSJ text (Gildea, 2001;
Introduction	When training data in the target domain is available, as is the case with the GENIA Treebank (Kim et al., 2003) for biomedical papers, a parser can be retrained to adapt to the target domain, and larger accuracy improvements are expected, if the training method is sufficiently general.
Syntactic Parsers and Their Representations	In general, our evaluation methodology can be applied to English parsers based on any framework; however, in this paper, we chose parsers that were originally developed and trained with the Penn Treebank or its variants, since such parsers can be retrained with GENIA, thus allowing for us to investigate the effect of domain adaptation.
Syntactic Parsers and Their Representations	Owing largely to the Penn Treebank , the mainstream of data-driven parsing research has been dedicated to the phrase structure parsing.
Syntactic Parsers and Their Representations	ENJU The HPSG parser that consists of an HPSG grammar extracted from the Penn Treebank, and a maximum entropy model trained with an HPSG treebank derived from the Penn Treebank.7

Treebank is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

7. Enhancing Performance of Lexicalised Grammars

Dridan, Rebecca and Kordoni, Valia and Nicholson, Jeremy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	In the case study we describe here, the tools, grammars and treebanks we use are taken from work carried out in the DELPH-IN1 collaboration.
Background	We also use the PET parser, and the [incr tsdb()] system profiler and treebanking tool (Oepen, 2001) for evaluation.
Parser Restriction	The data set used for these experiments is the jh5 section of the treebank released with the ERG.
Parser Restriction	Since a gold standard treebank for our data set was available, it was possible to evaluate the accuracy of the parser.
Parser Restriction	Consequently, we developed a Maximum Entropy model for supertagging using the OpenNLP implementation.2 Similarly to Zhang and Kordoni (2006), we took training data from the gold—standard lexical types in the treebank associated with ERG (in our case, the July-07 version).
Unknown Word Handling	Four sets are English text: jh5 described in Section 3; tree consisting of questions from TREC and included in the treebanks released with the ERG; a00 which is taken from the BNC and consists of factsheets and newsletters; and depbank, the 700 sentences of the Briscoe and Carroll version of DepBank (Briscoe and Carroll, 2006) taken from the Wall Street Journal.
Unknown Word Handling	The last two data sets are German text: clef700 consisting of German questions taken from the CLEF competition and eiche564 a sample of sentences taken from a treebank parsed with the German HPSG grammar, GG and consisting of transcribed German speech data concerning appointment scheduling from the Verbmobil project.
Unknown Word Handling	Since the primary effect of adding POS tags is shown with those data sets for which we do not have gold standard treebanks , evaluating accuracy in this case is more difficult.

Treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We demonstrate the effectiveness of the approach in a series of dependency parsing experiments on the Penn Treebank and Prague Dependency Treebank , and we show that the cluster-based features yield substantial gains in performance across a wide range of conditions.
Conclusions	A natural avenue for further research would be the development of clustering algorithms that reflect the syntactic behavior of words; e.g., an algorithm that attempts to maximize the likelihood of a treebank , according to a probabilistic dependency model.
Experiments	The English experiments were performed on the Penn Treebank (Marcus et al., 1993), using a standard set of head-selection rules (Yamada and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank to a dependency tree representation.6 We split the Treebank into a training set (Sections 2—21), a development set (Section 22), and several test sets (Sections 0,7 1, 23, and 24).
Experiments	The Czech experiments were performed on the Prague Dependency Treebank 1.0 (Hajic, 1998; Hajic et al., 2001), which is directly annotated with dependency structures.
Experiments	9We ensured that the sentences of the Penn Treebank were excluded from the text used for the clustering.
Introduction	We show that our semi-supervised approach yields improvements for fixed datasets by performing parsing experiments on the Penn Treebank (Marcus et al., 1993) and Prague Dependency Treebank (Hajic, 1998; Hajic et al., 2001) (see Sections 4.1 and 4.3).

Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

9. Parsing Noun Phrase Structure with CCG

Vadas, David and Curran, James R.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This is a significant problem for CCGbank, where binary branching NP derivations are often incorrect, a result of the automatic conversion from the Penn Treebank .
Background	Recently, Vadas and Curran (2007a) annotated internal NP structure for the entire Penn Treebank , providing a large gold-standard corpus for NP bracketing.
Conversion Process	We apply one preprocessing step on the Penn Treebank data, where if multiple tokens are enclosed by brackets, then a NML node is placed around those
Conversion Process	Since we are applying these to CCGbank NP structures rather than the Penn Treebank , the POS tag based heuristics are sufficient to determine heads accurately.
Experiments	Vadas and Curran (2007a) experienced a similar drop in performance on Penn Treebank data, and noted that the F-score for NML and JJP brackets was about 20% lower than the overall figure.
Introduction	This is because their training data, the Penn Treebank (Marcus et al., 1993), does not fully annotate NP structure.
Introduction	The flat structure described by the Penn Treebank can be seen in this example:
Introduction	CCGbank (Hockenmaier and Steedman, 2007) is the primary English corpus for Combinatory Categorial Grammar (CCG) (Steedman, 2000) and was created by a semiautomatic conversion from the Penn Treebank .

Treebank is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

F-score (20)
NER (19)
CCG (18)

10. Regular Tree Grammars as a Formalism for Scope Underspecification

Koller, Alexander and Regneri, Michaela and Thater, Stefan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing best configurations	In practice, we can extract the best reading of the most ambiguous sentence in the Rondane treebank (4.5 x 1012 readings, 75 000 grammar rules) with random soft edges in about a second.
Expressive completeness and redundancy elimination	For instance, the following sentence from the Rondane treebank is analyzed as having six quantifiers and 480 readings by the ERG grammar; these readings fall into just two semantic equivalence classes, characterized by the relative scope of “the lee of” and “a small hillside”.
Expressive completeness and redundancy elimination	To measure the extent to which the new algorithm improves upon KT06, we compare both algorithms on the USRs in the Rondane treebank (version of January 2006).
Expressive completeness and redundancy elimination	The Rondane treebank is a “Redwoods style” treebank (Oepen et al., 2002) containing MRS-based underspecified representations for sentences from the tourism domain, and is distributed together with the English Resource Grammar (ERG) (Copestake and Flickinger, 2000).
Regular tree grammars	2 compares the average number of configurations and the average number of RTG production rules for USRs of increasing sizes in the Rondane treebank (see Sect.
Regular tree grammars	Computing the charts for all 999 MRS-nets in the treebank takes about 45 seconds.

Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

treebank (7)
semantic representations (4)

11. Evaluating a Crosslinguistic Grammar Resource: A Case Study of Wambaya

Bender, Emily M.

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	It is compatible with the broader range of DELPH-IN tools, e. g., for machine translation (Lonning and Oepen, 2006), treebanking (Oepen et al., 2004) and parse selection (Toutanova et al., 2005).
Introduction	treebanks .
Wambaya grammar	With no prior knowledge of this language beyond its most general typological properties, we were able to develop in under 5.5 person-weeks of development time (210 hours) a grammar able to assign appropriate analyses to 91% of the examples in the development set.4 The 210 hours include 25 hours of an RA’s time entering lexical entries, 7 hours spent preparing the development test suite, and 15 hours treebanking (using the LinGO Redwoods software (Oepen et al., 2004) to annotate the intended parse for each item).
Wambaya grammar	The resulting treebank was used to select appropriate parameters by 10-fold cross-validation, applying the experimentation environment and feature templates of (Velldal, 2007).

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Semi-Supervised Convex Training for Dependency Parsing

Wang, Qin Iris and Schuurmans, Dale and Lin, Dekang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	One obvious direction is to use the whole Penn Treebank as labeled data and use some other unannotated data source as unlabeled data for semi-supervised training.
Experimental Results	For experiment on English, we used the English Penn Treebank (PTB) (Marcus et al., 1993) and the constituency structures were converted to dependency trees using the same rules as (Yamada and Matsumoto, 2003).
Experimental Results	For Chinese, we experimented on the Penn Chinese Treebank 4.0 (CTB4) (Palmer et al., 2004) and we used the rules in (Bikel, 2004) for conversion.
Experimental Results	We evaluate parsing accuracy by comparing the directed dependency links in the parser output against the directed links in the treebank .

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The training corpus for Mo-ME model consists of the Chinese Peen Treebank and the Chinese part of the LDC parallel corpus with about 2 million sentences.
Introduction	According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
Introduction	Table 1 shows the relative position’s distribution of head words around measure words in the Chinese Penn Treebank , where a negative position indicates that the head word is to the left of the measure word and a positive position indicates that the head word is to the right of the measure word.
Our Method	According to our survey, about 70.4% of measure words in the Chinese Penn Treebank need

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Hypertagging: Supertagging for Surface Realization with CCG

Espinosa, Dominic and White, Michael and Mehay, Dennis

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	(2007) describe an ongoing effort to engineer a grammar from the CCGbank (Hockenmaier and Steedman, 2007) — a corpus of CCG derivations derived from the Penn Treebank — suitable for realization with OpenCCG.
Related Work	Our approach follows Langkilde-Geary (2002) and Callaway (2003) in aiming to leverage the Penn Treebank to develop a broad-coverage surface realizer for English.
Related Work	However, while these earlier, generation-only approaches made use of converters for transforming the outputs of Treebank parsers to inputs for realization, our approach instead employs a shared bidirectional grammar, so that the input to realization is guaranteed to be the same logical form constructed by the parser.

Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

logical form (13)
POS tags (13)
CCG (11)

15. Applying a Grammar-Based Language Model to a Simplified Broadcast-News Transcription Task

Kaufmann, Tobias and Pfister, Beat

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	These models have in common that they explicitly or implicitly use a context-free grammar induced from a treebank , with the exception of Chelba and J elinek (2000).
Language Model 2.1 The General Approach	In order to compute the probability of a parse tree, it is transformed to a flat dependency tree similar to the syntax graph representation used in the TIGER treebank Brants et al (2002).
Language Model 2.1 The General Approach	The resulting probability distributions were trained on the German TIGER treebank which consists of about 50000 sentences of newspaper text.

Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: