Index of papers in Proc. ACL 2014 that mention
  • Penn Treebank
Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh
Abstract
In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank .
Conclusion
We translated and transformed a subset of parse trees of Penn Treebank to Turkish.
Conclusion
As a future work, we plan to expand the dataset to include all Penn Treebank sentences.
Corpus construction strategy
In order to constrain the syntactic complexity of the sentences in the corpus, we selected from the Penn Treebank II 9560 trees which contain a maximum of 15 tokens.
Corpus construction strategy
These include 8660 trees from the training set of the Penn Treebank , 360 trees from its development set and 540 trees from its test set.
Literature Review
MaltParser is trained on the Penn Treebank for English, on the Swedish treebank Talbanken05 (Nivre et al., 2006b), and on the METU-Sabanc1 Turkish Treebank (Atalay et al., 2003), respectively.
Transformation heuristics
In the Penn Treebank II annotation, the movement leaves a trace and is associated with wh- constituent with a numeric marker.
Penn Treebank is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Durrett, Greg and Klein, Dan
Annotations
Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar.
Annotations
Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank .
Annotations
Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank .
Features
Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set.
Parsing Model
Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task.
Surface Feature Framework
Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages.
Penn Treebank is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bengoetxea, Kepa and Agirre, Eneko and Nivre, Joakim and Zhang, Yue and Gojenola, Koldo
Abstract
We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank .
Experimental Framework
supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech.
Introduction
Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank .
Related work
The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing.
Penn Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Berg-Kirkpatrick, Taylor and Klein, Dan
Anatomy of a Dense GPU Parser
Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank .
Introduction
As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993).
Minimum Bayes risk parsing
Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank .
Minimum Bayes risk parsing
We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank .
Penn Treebank is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.
Abstract
3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04.
Abstract
For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing.
Abstract
For both methods we chose the best parameters for sentences of length 6 g 10 on the English Penn Treebank (training) and used this set for all other experiments.
Penn Treebank is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: