Abstract | We are interested in parsing constituency-based grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank . |
Abstract | While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. |
Introduction | The standard solution to this bottleneck has relied on manually crafted transformation rules that map readily available syntactic annotations (e.g, the Penn Treebank ) to the desired formalism. |
Introduction | A natural candidate for such coarse annotations is context-free grammar (CFG) from the Penn Treebank , while the target formalism can be any constituency-based grammars, such as Combinatory Categorial Grammar (CCG) (Steedman, 2001), Lexical Functional Grammar (LFG) (Bresnan, 1982) or Head-Driven Phrase Structure Grammar (HPSG) (Pollard and Sag, 1994). |
Introduction | All of these formalisms share a similar basic syntactic structure with Penn Treebank CFG. |
Related Work | For instance, mappings may specify how to convert traces and functional tags in Penn Treebank to the f-structure in LFG (Cahill, 2004). |
Related Work | For instance, Hockenmaier and Steedman (2002) made thousands of POS and constituent modifications to the Penn Treebank to facilitate transfer to CCG. |
Dataset Creation | 21,938 total examples, 15,330 come from sections 2—21 of the Penn Treebank (Marcus et al., 1993). |
Dataset Creation | For the Penn Treebank , we extracted the examples using the provided gold standard parse trees, whereas, for the latter cases, we used the output of an open source parser (Tratz and Hovy, 2011). |
Experiments | The accuracy figures for the test instances from the Penn Treebank , The Jungle Book, and The History of the Decline and Fall of the Roman Empire were 88.8%, 84.7%, and 80.6%, respectively. |
Related Work | The NomBank project (Meyers et al., 2004) provides coarse annotations for some of the possessive constructions in the Penn Treebank , but only those that meet their criteria. |
Semantic Relation Inventory | Penn Treebank , respectively. |
Semantic Relation Inventory | portion of the Penn Treebank . |
Semantic Relation Inventory | The Penn Treebank and The History of the Decline and Fall of the R0-man Empire were substantially more similar, although there are notable differences. |
Abstract | We perform parsing experiments the Penn Treebank and draw comparisons to Tree-Substitution Grammars and between different variations in probabilistic model design. |
Experiments | As a proof of concept, we investigate OSTAG in the context of the classic Penn Treebank statistical parsing setup; training on section 2-21 and testing on section 23. |
Experiments | Furthermore, the various parameteri-zations of adjunction with OSTAG indicate that, at least in the case of the Penn Treebank , the finer grained modeling of a full table of adjunction probabilities for each Goodman index OSTAG3 overcomes the danger of sparse data estimates. |
Introduction | We evaluate OSTAG on the familiar task of parsing the Penn Treebank . |
TAG and Variants | We propose a simple but empirically effective heuristic for grammar induction for our experiments on Penn Treebank data. |
Results | We trained our model on sections 2—21 of the WSJ part of the Penn Treebank (Marcus et al., 1999). |
Results | Unfortunately, marking for argument/modifiers in the Penn Treebank is incomplete, and is limited to certain adverbials, e.g. |
Results | This corpus adds annotations indicating, for each node in the Penn Treebank , whether that node is a modifier. |
Experiments | The experiments are conducted on Penn Treebank Wall Street Journal corpus. |
Experiments | Because we are trying to improve (Yatbaz et al., 2012), we select the experiment on Penn Treebank Wall Street Journal corpus in that work as our baseline and replicate it. |
Introduction | For instance,the gold tag perplexity of word “offers” in the Penn Treebank Wall Street Journal corpus we worked on equals to 1.966. |
Introduction | For example, Higgins and Sadock (2003) find fewer than 1000 sentences with two or more explicit quantifiers in the Wall Street journal section of Penn Treebank . |
Introduction | Plurals form 18% of the NPs in our corpus and 20% of the nouns in Penn Treebank . |
Introduction | Explicit universals, on the other hand, form less than 1% of the determiners in Penn Treebank . |
Experiments | Labeled English data employed in this paper were derived from the Wall Street Journal (WSJ) corpus of the Penn Treebank (Marcus et al., 1993). |
Experiments | In addition, we removed from the unlabeled English data the sentences that appear in the WSJ corpus of the Penn Treebank . |
Introduction | On standard evaluations using both the Penn Treebank and the Penn Chinese Treebank, our parser gave higher accuracies than the Berkeley parser (Petrov and Klein, 2007), a state-of-the-art chart parser. |