Index of papers in Proc. ACL 2011 that mention
  • Treebank
Das, Dipanjan and Petrov, Slav
Experiments and Results
For monolingual treebank data we relied on the CoNLL-X and CoNLL-2007 shared tasks on dependency parsing (Buchholz and Marsi, 2006; Nivre et al., 2007).
Experiments and Results
9We extracted only the words and their POS tags from the treebanks .
Experiments and Results
(2011) provide a mapping A from the fine-grained language specific POS tags in the foreign treebank to the universal POS tags.
Graph Construction
7We used a tagger based on a trigram Markov model (Brants, 2000) trained on the Wall Street Journal portion of the Penn Treebank (Marcus et a1., 1993), for its fast speed and reasonable accuracy (96.7% on sections 22-24 of the treebank , but presumably much lower on the (out-of-domain) parallel cor-
Introduction
Because there might be some controversy about the exact definitions of such universals, this set of coarse-grained POS categories is defined operationally, by collapsing language (or treebank ) specific distinctions to a set of categories that exists across all languages.
Treebank is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Nagata, Ryo and Whittaker, Edward and Sheinman, Vera
Difficulties in Learner Corpus Creation
For POS/parsing annotation, there are also a number of annotation schemes including the Brown tag set, the Claws tag set, and the Penn Treebank tag set.
Difficulties in Learner Corpus Creation
For instance, there are at least three possibilities for POS-tagging the word sing in the sentence everyone sing together using the Penn Treebank tag set: singN B, sing/VBP, or sing/VBZ.
Introduction
For similar reasons, to the best of our knowledge, there exists no such learner corpus that is manually shallow-parsed and which is also publicly available, unlike, say, native-speaker corpora such as the Penn Treebank .
Method
We selected the Penn Treebank tag set, which is one of the most widely used tag sets, for our
Method
Similar to the error annotation scheme, we conducted a pilot study to determine what modifications we needed to make to the Penn Treebank scheme.
Method
As a result of the pilot study, we found that the Penn Treebank tag set sufficed in most cases except for errors which learners made.
UK and XP stand for unknown and X phrase, respectively.
Both use the Penn Treebank POS tag set.
UK and XP stand for unknown and X phrase, respectively.
An obvious cause of mistakes in both taggers is that they inevitably make errors in the POSs that are not defined in the Penn Treebank tag set, that is, UK and CE.
Treebank is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Naradowsky, Jason and Smith, David A.
Experimental Results
The Ancient Greek treebank comprises both archaic texts, before the development of a definite article, and later classic Greek, which has a definite article; Hungarian has both a definite and an indefinite article.
Experimental Results
More importantly, the Latin Dependency Treebank has grown from about 30K at the time of the previous work to 53K at present, resulting in significantly different training and testing material.
Experimental Setup
Our evaluation focused on the Latin Dependency Treebank (Bamman and Crane, 2006), created at the Perseus Digital Library by tailoring the Prague Dependency Treebank guidelines for the Latin language.
Experimental Setup
We randomly divided the 53K—word treebank into 10 folds of roughly equal sizes, with an average of 5314 words (347 sentences) per fold.
Experimental Setup
Their respective datasets consist of 8000 sentences from the Ancient Greek Dependency Treebank (Bamman et al., 2009), 5800 from the Hungarian Szeged Dependency Treebank (Vincze et al., 2010), and a subset of 3100 from the Prague Dependency Treebank (Bohmova et al., 2003).
Previous Work
Similarly, the English POS tags in the Penn Treebank combine word class information with morphologi-
Treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ponvert, Elias and Baldridge, Jason and Erk, Katrin
CD
As a result, many structures that in other treebanks would be prepositional phrases with embedded noun phrases — and thus nonlocal constituents — are flat prepositional phrases here.
CD
3For the Penn Treebank tagset, see Marcus et a1.
Data
1999); for German, the Negra corpus V2 (Krenn et al., 1998); for Chinese, the Penn Chinese Treebank V5.0 (CTB, Palmer et al., 2006).
Data
Sentence segmentation and tok-enization from the treebank is used.
Related work
Their output is not evaluated directly using treebanks , but rather applied to several information retrieval problems.
Tasks and Benchmark
Examples of constituent chunks extracted from treebank constituent trees are in Fig.
Tasks and Benchmark
One study by Cramer (2007) found that none of the three performs particularly well under treebank evaluation.
Treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor
Background
Though PoS induction was not their aim, this restriction is largely validated by empirical analysis of treebanked data, and moreover conveys the significant advantage that all the tags for a given word type can be updated at the same time, allowing very efficient inference using the exchange algorithm.
Background
Recent work on unsupervised PoS induction has focussed on encouraging sparsity in the emission distributions in order to match empirical distributions derived from treebank data (Goldwater and Griffiths, 2007; Johnson, 2007; Gao and Johnson, 2008).
Experiments
Treebank (Marcus et al., 1993), while for other languages we use the corpora made available for the CoNLL-X Shared Task (Buchholz and Marsi, 2006).
Experiments
Treebank , along with a number of state-of—the-art results previously reported (Table 1).
Experiments
The former shows that both our models and mkcl s induce a more uniform distribution over tags than specified by the treebank .
Treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Experiments
The experiments were performed on the Penn Treebank (PTB) (Marcus et al., 1993), using a standard set of head-selection rules (Yamada
Experiments
and Matsumoto, 2003) to convert the phrase structure syntax of the Treebank into a dependency tree representation, dependency labels were obtained via the ”Malt” hard-coded setting.8 We split the Treebank into a training set (Sections 2-2l), a development set (Section 22), and several test sets (Sections 0,9 l, 23, and 24).
Experiments
The results show that our second order model incorporating the N-gram features (92.64) performs better than most previously reported discriminative systems trained on the Treebank .
Introduction
With the availability of large-scale annotated corpora such as Penn Treebank (Marcus et al., 1993), it is easy to train a high-performance dependency parser using supervised learning methods.
Introduction
We conduct the experiments on the English Penn Treebank (PTB) (Marcus et al., 1993).
Treebank is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Bodenstab, Nathan and Dunlop, Aaron and Hall, Keith and Roark, Brian
Experimental Setup
We run all experiments on the WSJ treebank (Marcus et al., 1999) using the standard splits: section 2-21 for training, section 22 for development, and section 23 for testing.
Experimental Setup
We preprocess the treebank by removing empty nodes, temporal labels, and spurious unary productions (X—>X), as is standard in published works on syntactic parsing.
Experimental Setup
To achieve state-of-the-art accuracy levels, we parse with the Berkeley SM6 latent-variable grammar (Petrov and Klein, 2007b) where the original treebank non-terminals are automatically split into subclasses to optimize parsing accuracy.
Introduction
We simply parse sections 2-21 of the WSJ treebank and train our search models from the output of these trees, with no prior knowledge of the nonterminal set or other grammar characteristics to guide the process.
Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Subotin, Michael
Corpora and baselines
We investigate the models using the 2009 edition of the parallel treebank from UFAL (Bojar and Zabokrtsky, 2009), containing 8,029,801 sentence pairs from various genres.
Corpora and baselines
The English-side annotation follows the standards of the Penn Treebank and includes dependency parses and structural role labels such as subject and object.
Corpora and baselines
The Czech tags follow the standards of the Prague Dependency Treebank 2.0.
Features
The inflection for number is particularly easy to model in translating from English, since it is generally marked on the source side, and POS taggers based on the Penn treebank tag set attempt to infer it in cases where it is not.
Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Fang, Licheng and Xu, Peng and Wu, Xiaoyun
Experiments
It achieves 87.8% labelled attachment score and 88.8% unlabeled attachment score on the standard Penn Treebank test set.
Experiments
On the standard Penn Treebank test set, it achieves an F-score of 89.5%.
Experiments
The parser preprocesses the Penn Treebank training data through binarization.
Source Tree Binarization
For example, Penn Treebank annotations are often flat at the phrase level.
Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Introduction
Current state-of-the art syntactic parsers have achieved accuracies in the range of 90% F1 on the Penn Treebank , but a range of errors remain.
Introduction
Figure l: A PP attachment error in the parse output of the Berkeley parser (on Penn Treebank ).
Parsing Experiments
We use the standard splits of Penn Treebank into training (sections 2-21), development (section 22) and test (section 23).
Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Experiments
, 36 (the number Penn treebank POS tags, used for the ‘POS’ models, is 36).6 For ‘Clust’, we see a comfortably wide plateau of nearly-identical scores from N = 7,. .
Introduction
Label-based approaches have resulted in improvements in translation quality over the single X label approach (Zollmann et al., 2008; Mi and Huang, 2008); however, all the works cited here rely on stochastic parsers that have been trained on manually created syntactic treebanks .
Introduction
These treebanks are difficult and expensive to produce and exist for a limited set of languages only.
Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: