Abstract | In the corpus, we manually generated parallel trees for about 5,000 sentences from Penn Treebank . |
Conclusion | We translated and transformed a subset of parse trees of Penn Treebank to Turkish. |
Conclusion | As a future work, we plan to expand the dataset to include all Penn Treebank sentences. |
Corpus construction strategy | In order to constrain the syntactic complexity of the sentences in the corpus, we selected from the Penn Treebank II 9560 trees which contain a maximum of 15 tokens. |
Corpus construction strategy | These include 8660 trees from the training set of the Penn Treebank , 360 trees from its development set and 540 trees from its test set. |
Literature Review | MaltParser is trained on the Penn Treebank for English, on the Swedish treebank Talbanken05 (Nivre et al., 2006b), and on the METU-Sabanc1 Turkish Treebank (Atalay et al., 2003), respectively. |
Transformation heuristics | In the Penn Treebank II annotation, the movement leaves a trace and is associated with wh- constituent with a numeric marker. |
Annotations | Table 2: Results for the Penn Treebank development set, sentences of length g 40, for different annotation schemes implemented on top of the X-bar grammar. |
Annotations | Table 3: Final Parseval results for the v = l, h = 0 parser on Section 23 of the Penn Treebank . |
Annotations | Finally, Table 3 shows our final evaluation on Section 23 of the Penn Treebank . |
Features | Table 1 shows the results of incrementally building up our feature set on the Penn Treebank development set. |
Parsing Model | Because the X-bar grammar is so minimal, this grammar does not parse very accurately, scoring just 73 F1 on the standard English Penn Treebank task. |
Surface Feature Framework | Throughout this and the following section, we will draw on motivating examples from the English Penn Treebank , though similar examples could be equally argued for other languages. |
Abstract | We study the effect of semantic classes in three dependency parsers, using two types of constituency-to-dependency conversions of the English Penn Treebank . |
Experimental Framework | supervised approach that makes use of cluster features induced from unlabeled data, providing significant performance improvements for supervised dependency parsers on the Penn Treebank for English and the Prague Dependency Treebank for Czech. |
Introduction | Most experiments for English were evaluated on the Penn2Malt conversion of the constituency-based Penn Treebank . |
Related work | The results showed a signi-cant improvement, giving the first results over both WordNet and the Penn Treebank (PTB) to show that semantics helps parsing. |
Anatomy of a Dense GPU Parser | Table 1: Performance numbers for computing Viterbi inside charts on 20,000 sentences of length $40 from the Penn Treebank . |
Introduction | As with other grammars with a parse/derivation distinction, the grammars of Petrov and Klein (2007) only achieve their full accuracy using minimum-Bayes-risk parsing, with improvements of over 1.5 F1 over best-derivation Viterbi parsing on the Penn Treebank (Marcus et al., 1993). |
Minimum Bayes risk parsing | Table 2: Performance numbers for computing max constituent (Goodman, 1996) trees on 20,000 sentences of length 40 or less from the Penn Treebank . |
Minimum Bayes risk parsing | We measured parsing accuracy on sentences of length g 40 from section 22 of the Penn Treebank . |
Abstract | 3This data sparsity problem is quite severe — for example, the Penn treebank (Marcus et a1., 1993) has a total number of 43,498 sentences, with 42,246 unique POS tag sequences, averaging to be 1.04. |
Abstract | For English we use the Penn treebank (Marcus et al., 1993), with sections 2—21 for training and section 23 for final testing. |
Abstract | For both methods we chose the best parameters for sentences of length 6 g 10 on the English Penn Treebank (training) and used this set for all other experiments. |