Annotations | Annotation Dev, len g 40 v = 0, h = 0 90.1 v = l, h = 0 90.5 v = 0, h = 1 90.2 v = l, h = 1 90.9 Lexicalized 90.3 |
Annotations | Another commonly-used kind of structural annotation is lexicalization (Eisner, 1996; Collins, 1997; Charniak, 1997). |
Annotations | Table 2 shows results from lexicalizing the X-bar grammar; it provides meager improvements. |
Features | Because heads of constituents are often at the beginning or the end of a span, these feature templates can (noisily) capture monolexical properties of heads without having to incur the inferential cost of lexicalized annotations. |
Introduction | For example, head lexicalization (Eisner, 1996; Collins, 1997; Charniak, 1997), structural annotation (Johnson, 1998; Klein and Manning, 2003), and state-splitting (Matsuzaki et al., 2005; Petrov et al., 2006) are all designed to take coarse symbols like PP and decorate them with additional context. |
Other Languages | Historically, many annotation schemes for parsers have required language-specific engineering: for example, lexicalized parsers require a set of head rules and manually-annotated grammars require detailed analysis of the treebank itself (Klein and Manning, 2003). |
Parsing Model | Hall and Klein (2012) employed both kinds of annotations, along with lexicalized head word annotation. |
Sentiment Analysis | Our features can also lexicalize on other discourse connectives such as but or however, which often occur at the split point between two spans. |
Abstract | Our model is purely lexicalized and can be integrated into any MT decoder. |
Introduction | In this paper we use a basic neural network architecture and a lexicalized probability model to create a powerful MT decoding feature. |
Model Variations | Although there has been a substantial amount of past work in lexicalized joint models (Marino et al., 2006; Crego and Yvon, 2010), nearly all of these papers have used older statistical techniques such as Kneser-Ney or Maximum Entropy. |
Model Variations | Le’s model also uses minimal phrases rather than being purely lexicalized , which has two main downsides: (a) a number of complex, handcrafted heuristics are required to define phrase boundaries, which may not transfer well to new languages, (b) the effective vocabulary size is much larger, which substantially increases data sparsity issues. |
Model Variations | The fact that the model is purely lexicalized , which avoids both data sparsity and implementation complexity. |
Comparison to BabySRL | The difference in transitive settings stems from increased lexicalization, as is apparent from their results alone; the model presented here initially performs close to their weakly lexicalized model, though training impedes agent-prediction accuracy due to an increased probability of non-canonical objects. |
Comparison to BabySRL | In sum, the unleXicalized model presented in this paper is able to achieve greater labelling accuracy than the lexicalized BabySRL models in intransitive settings, though this model does perform slightly worse in the less common transitive setting. |
Discussion | This could also be an area where a lexicalized model could do better. |
Discussion | In future, it would be interesting to incorporate lexicalization into the model presented in this paper, as this feature seems likely to bridge the gap between this model and BabySRL in transitive settings. |
Discussion | Lexicalization should also help further distinguish modifiers from arguments and improve the overall accuracy of the model. |
Evaluation | Since the model is not lexicalized , these roles correspond to the semantic roles most commonly associated With subject and object. |
Conclusions and Future Work | First, we defined two simple discourse-aware similarity metrics ( lexicalized and un-lexicalized), which use the all-subtree kernel to compute similarity between discourse parse trees in accordance with the Rhetorical Structure Theory. |
Experimental Results | As expected, DR-LEX performs better than DR since it is lexicalized (at the unigram level), and also gives partial credit to correct structures. |
Our Discourse-Based Measures | We experiment with TKs applied to two different representations of the discourse tree: non-lexicalized (DR), and lexicalized (DR-LEX). |
Generating from the KBGen Knowledge-Base | To generate from the KBGen data, we induce a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG, (Vijay-Shanker and J oshi, 1988)) augmented with a unification-based semantics (Gardent and Kallmeyer, 2003) from the training data. |
Generating from the KBGen Knowledge-Base | 4.1 Feature-Based Lexicalised Tree Adjoining Grammar |
Generating from the KBGen Knowledge-Base | To extract a Feature-Based Lexicalised Tree Adjoining Grammar (FB-LTAG) from the KB Gen data, we parse the sentences of the training corpus; project the entity and event variables to the syntactic projection of the strings they are aligned with; and extract the elementary trees of the resulting FB-LTAG from the parse tree using semantic information. |
Introduction | While recent work has introduced increasingly powerful features (Feng and Hirst, 2012) and inference techniques (J oty et al., 2013), discourse relations remain hard to detect, due in part to a long tail of “alternative lexicalizations” that can be used to realize each relation (Prasad et al., 2010). |
Model | (2010) show that there is a long tail of alternative lexicalizations for discourse relations in the Penn Discourse Treebank, posing obvious challenges for approaches based on directly matching lexical features observed in the training data. |
Related Work | Prior learning-based work has largely focused on lexical, syntactic, and structural features, but the close relationship between discourse structure and semantics (Forbes-Riley et al., 2006) suggests that shallow feature sets may struggle to capture the long tail of alternative lexicalizations that can be used to realize discourse relations (Prasad et al., 2010; Marcu and Echihabi, 2002). |