Difficulties in Learner Corpus Creation | For POS/parsing annotation, there are also a number of annotation schemes including the Brown tag set, the Claws tag set, and the Penn Treebank tag set. |
Difficulties in Learner Corpus Creation | For instance, there are at least three possibilities for POS-tagging the word sing in the sentence everyone sing together using the Penn Treebank tag set: singN B, sing/VBP, or sing/VBZ. |
Introduction | For similar reasons, to the best of our knowledge, there exists no such learner corpus that is manually shallow-parsed and which is also publicly available, unlike, say, native-speaker corpora such as the Penn Treebank . |
Method | We selected the Penn Treebank tag set, which is one of the most widely used tag sets, for our |
Method | Similar to the error annotation scheme, we conducted a pilot study to determine what modifications we needed to make to the Penn Treebank scheme. |
Method | As a result of the pilot study, we found that the Penn Treebank tag set sufficed in most cases except for errors which learners made. |
UK and XP stand for unknown and X phrase, respectively. | Both use the Penn Treebank POS tag set. |
UK and XP stand for unknown and X phrase, respectively. | An obvious cause of mistakes in both taggers is that they inevitably make errors in the POSs that are not defined in the Penn Treebank tag set, that is, UK and CE. |
Experiments | It achieves 87.8% labelled attachment score and 88.8% unlabeled attachment score on the standard Penn Treebank test set. |
Experiments | On the standard Penn Treebank test set, it achieves an F-score of 89.5%. |
Experiments | The parser preprocesses the Penn Treebank training data through binarization. |
Source Tree Binarization | For example, Penn Treebank annotations are often flat at the phrase level. |
Introduction | Current state-of-the art syntactic parsers have achieved accuracies in the range of 90% F1 on the Penn Treebank , but a range of errors remain. |
Introduction | Figure l: A PP attachment error in the parse output of the Berkeley parser (on Penn Treebank ). |
Parsing Experiments | We use the standard splits of Penn Treebank into training (sections 2-21), development (section 22) and test (section 23). |
Experiments | The experiments were performed on the Penn Treebank (PTB) (Marcus et al., 1993), using a standard set of head-selection rules (Yamada |
Introduction | With the availability of large-scale annotated corpora such as Penn Treebank (Marcus et al., 1993), it is easy to train a high-performance dependency parser using supervised learning methods. |
Introduction | We conduct the experiments on the English Penn Treebank (PTB) (Marcus et al., 1993). |