SciSurf: Index of "Treebank" in Proc. ACL 2011

Index of papers in Proc. ACL 2011 that mention

Treebank

Seen in text as:

Treebank (37)
treebank (24)
treebanks (7)

Seen in 65 sentences in 11 papers.

1. Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Das, Dipanjan and Petrov, Slav

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	For monolingual treebank data we relied on the CoNLL-X and CoNLL-2007 shared tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
Experiments and Results	9We extracted only the words and their POS tags from the treebanks .
Experiments and Results	(2011) provide a mapping A from the fine-grained language specific POS tags in the foreign treebank to the universal POS tags.
Graph Construction	7We used a tagger based on a trigram Markov model (Brants, 2000) trained on the Wall Street Journal portion of the Penn Treebank (Marcus et a1., 1993), for its fast speed and reasonable accuracy (96.7% on sections 22-24 of the treebank , but presumably much lower on the (out-of-domain) parallel cor-
Introduction	Because there might be some controversy about the exact definitions of such universals, this set of coarse-grained POS categories is defined operationally, by collapsing language (or treebank ) specific distinctions to a set of categories that exists across all languages.

Treebank is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

2. Creating a manually error-tagged and shallow-parsed learner corpus

Nagata, Ryo and Whittaker, Edward and Sheinman, Vera

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Difficulties in Learner Corpus Creation	For POS/parsing annotation, there are also a number of annotation schemes including the Brown tag set, the Claws tag set, and the Penn Treebank tag set.
Difficulties in Learner Corpus Creation	For instance, there are at least three possibilities for POS-tagging the word sing in the sentence everyone sing together using the Penn Treebank tag set: singN B, sing/VBP, or sing/VBZ.
Introduction	For similar reasons, to the best of our knowledge, there exists no such learner corpus that is manually shallow-parsed and which is also publicly available, unlike, say, native-speaker corpora such as the Penn Treebank .
Method	We selected the Penn Treebank tag set, which is one of the most widely used tag sets, for our
Method	Similar to the error annotation scheme, we conducted a pilot study to determine what modifications we needed to make to the Penn Treebank scheme.
Method	As a result of the pilot study, we found that the Penn Treebank tag set sufficed in most cases except for errors which learners made.
UK and XP stand for unknown and X phrase, respectively.	Both use the Penn Treebank POS tag set.
UK and XP stand for unknown and X phrase, respectively.	An obvious cause of mistakes in both taggers is that they inevitably make errors in the POSs that are not defined in the Penn Treebank tag set, that is, UK and CE.

Treebank is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

Lee, John and Naradowsky, Jason and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	The Ancient Greek treebank comprises both archaic texts, before the development of a definite article, and later classic Greek, which has a definite article; Hungarian has both a definite and an indefinite article.
Experimental Results	More importantly, the Latin Dependency Treebank has grown from about 30K at the time of the previous work to 53K at present, resulting in significantly different training and testing material.
Experimental Setup	Our evaluation focused on the Latin Dependency Treebank (Bamman and Crane, 2006), created at the Perseus Digital Library by tailoring the Prague Dependency Treebank guidelines for the Latin language.
Experimental Setup	We randomly divided the 53K—word treebank into 10 folds of roughly equal sizes, with an average of 5314 words (347 sentences) per fold.
Experimental Setup	Their respective datasets consist of 8000 sentences from the Ancient Greek Dependency Treebank (Bamman et al., 2009), 5800 from the Hungarian Szeged Dependency Treebank (Vincze et al., 2010), and a subset of 3100 from the Prague Dependency Treebank (Bohmova et al., 2003).
Previous Work	Similarly, the English POS tags in the Penn Treebank combine word class information with morphologi-

Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

4. Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

Ponvert, Elias and Baldridge, Jason and Erk, Katrin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

CD	As a result, many structures that in other treebanks would be prepositional phrases with embedded noun phrases — and thus nonlocal constituents — are flat prepositional phrases here.
CD	3For the Penn Treebank tagset, see Marcus et a1.
Data	1999); for German, the Negra corpus V2 (Krenn et al., 1998); for Chinese, the Penn Chinese Treebank V5.0 (CTB, Palmer et al., 2006).
Data	Sentence segmentation and tok-enization from the treebank is used.
Related work	Their output is not evaluated directly using treebanks , but rather applied to several information retrieval problems.
Tasks and Benchmark	Examples of constituent chunks extracted from treebank constituent trees are in Fig.
Tasks and Benchmark	One study by Cramer (2007) found that none of the three performs particularly well under treebank evaluation.

Treebank is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

Blunsom, Phil and Cohn, Trevor

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Though PoS induction was not their aim, this restriction is largely validated by empirical analysis of treebanked data, and moreover conveys the significant advantage that all the tags for a given word type can be updated at the same time, allowing very efficient inference using the exchange algorithm.
Background	Recent work on unsupervised PoS induction has focussed on encouraging sparsity in the emission distributions in order to match empirical distributions derived from treebank data (Goldwater and Griffiths, 2007; Johnson, 2007; Gao and Johnson, 2008).
Experiments	Treebank (Marcus et al., 1993), while for other languages we use the corpora made available for the CoNLL-X Shared Task (Buchholz and Marsi, 2006).
Experiments	Treebank , along with a number of state-of—the-art results previously reported (Table 1).
Experiments	The former shows that both our models and mkcl s induce a more uniform distribution over tags than specified by the treebank .

Treebank is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

6. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The experiments were performed on the Penn Treebank (PTB) (Marcus et al., 1993), using a standard set of head-selection rules (Yamada
Experiments	and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank into a dependency tree representation, dependency labels were obtained via the ”Malt” hard-coded setting.8 We split the Treebank into a training set (Sections 2-2l), a development set (Section 22), and several test sets (Sections 0,9 l, 23, and 24).
Experiments	The results show that our second order model incorporating the N-gram features (92.64) performs better than most previously reported discriminative systems trained on the Treebank .
Introduction	With the availability of large-scale annotated corpora such as Penn Treebank (Marcus et al., 1993), it is easy to train a high-performance dependency parser using supervised learning methods.
Introduction	We conduct the experiments on the English Penn Treebank (PTB) (Marcus et al., 1993).

Treebank is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Beam-Width Prediction for Efficient Context-Free Parsing

Bodenstab, Nathan and Dunlop, Aaron and Hall, Keith and Roark, Brian

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	We run all experiments on the WSJ treebank (Marcus et al., 1999) using the standard splits: section 2-21 for training, section 22 for development, and section 23 for testing.
Experimental Setup	We preprocess the treebank by removing empty nodes, temporal labels, and spurious unary productions (X—>X), as is standard in published works on syntactic parsing.
Experimental Setup	To achieve state-of-the-art accuracy levels, we parse with the Berkeley SM6 latent-variable grammar (Petrov and Klein, 2007b) where the original treebank non-terminals are automatically split into subclasses to optimize parsing accuracy.
Introduction	We simply parse sections 2-21 of the WSJ treebank and train our search models from the output of these trees, with no prior knowledge of the nonterminal set or other grammar characteristics to guide the process.

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. An exponential translation model for target language morphology

Subotin, Michael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpora and baselines	We investigate the models using the 2009 edition of the parallel treebank from UFAL (Bojar and Zabokrtsky, 2009), containing 8,029,801 sentence pairs from various genres.
Corpora and baselines	The English-side annotation follows the standards of the Penn Treebank and includes dependency parses and structural role labels such as subject and object.
Corpora and baselines	The Czech tags follow the standards of the Prague Dependency Treebank 2.0.
Features	The inflection for number is particularly easy to model in translating from English, since it is generally marked on the source side, and POS taggers based on the Penn treebank tag set attempt to infer it in cases where it is not.

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

9. Binarized Forest to String Translation

Zhang, Hao and Fang, Licheng and Xu, Peng and Wu, Xiaoyun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	It achieves 87.8% labelled attachment score and 88.8% unlabeled attachment score on the standard Penn Treebank test set.
Experiments	On the standard Penn Treebank test set, it achieves an F-score of 89.5%.
Experiments	The parser preprocesses the Penn Treebank training data through binarization.
Source Tree Binarization	For example, Penn Treebank annotations are often flat at the phrase level.

Treebank is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

10. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Current state-of-the art syntactic parsers have achieved accuracies in the range of 90% F1 on the Penn Treebank , but a range of errors remain.
Introduction	Figure l: A PP attachment error in the parse output of the Berkeley parser (on Penn Treebank ).
Parsing Experiments	We use the standard splits of Penn Treebank into training (sections 2-21), development (section 22) and test (section 23).

Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

11. A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

Zollmann, Andreas and Vogel, Stephan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	, 36 (the number Penn treebank POS tags, used for the ‘POS’ models, is 36).6 For ‘Clust’, we see a comfortably wide plateau of nearly-identical scores from N = 7,. .
Introduction	Label-based approaches have resulted in improvements in translation quality over the single X label approach (Zollmann et al., 2008; Mi and Huang, 2008); however, all the works cited here rely on stochastic parsers that have been trained on manually created syntactic treebanks .
Introduction	These treebanks are difficult and expensive to produce and exist for a limited set of languages only.

Treebank is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: