Abstract | We propose a novel approach that integrates the distributional representation of multiple subsets of the MWP’s words. |
Background and Related Work | While previous work focused either on improving the quality of the distributional representations themselves or on their incorporation into more elaborate systems, we focus on the integration of the distributional representation of multiple LCs to improve the identification of inference relations between MWPs. |
Background and Related Work | Much work in recent years has concentrated on the relation between the distributional representations of composite phrases and the representations of their component subparts (Widdows, 2008; Mitchell and Lapata, 2010; Baroni and Zampar—elli, 2010; Coecke et al., 2010). |
Background and Related Work | Despite significant advances, previous work has mostly been concerned with highly compositional cases and does not address the distributional representation of predicates of varying degrees of compositionality. |
Conclusion | We have presented a novel approach to the distributional representation of multi-word predicates. |
Discussion | Much recent work subsumed under the title Compositional Distributional Semantics addressed the distributional representation of multi-word phrases (see Section 2). |
Discussion | A standard approach in CD8 is to compose distributional representations by taking their vector sum 2),; 2 211 + 212... + on and ’UR = 2/1 + + vjn (Mitchell and Lapata, 2010). |
Introduction | This heterogeneity of “take” is likely to have a negative effect on downstream systems that use its distributional representation . |
Introduction | For instance, while “take” and “accept” are often considered lexically similar, the high frequency in which “take” participates in non-compositional MWPs is likely to push the two verbs’ distributional representations apart. |
Introduction | This approach allows the classifier that uses the distributional representations to take into account the most relevant LCs in order to make the prediction. |
Our Proposal: A Latent LC Approach | We propose a method for addressing MWPs of varying degrees of compositionality through the integration of the distributional representation of multiple subsets of the predicate’s words (LCs). |
Experiments | (2012), learning distributed representations on the Europarl corpus and evaluating on documents from the Reuters RCVIRCV2 corpora. |
Experiments | We use the training data of the corpus to learn distributed representations across 12 languages. |
Experiments | In a third evaluation (Table 4), we apply the embeddings learnt with out models to a monolingual classification task, enabling us to compare with prior work on distributed representation learning. |
Introduction | Distributed representations of words provide the basis for many state-of-the-art approaches to various problems in natural language processing today. |
Overview | Distributed representation learning describes the task of learning continuous representations for discrete objects. |
Overview | Such distributed representations allow a model to share meaning between similar words, and have been used to capture semantic, syntactic and morphological content (Collobert and Weston, 2008; Turian et al., 2010, inter alia). |
Overview | Some work has exploited this idea for transferring linguistic knowledge into low-resource languages or to learn distributed representations at the word level (Klementiev et al., 2012; Zou et al., 2013; Lauly et al., 2013, inter alia). |
Related Work | Distributed Representations Distributed representations can be learned through a number of approaches. |
Related Work | Tasks, where the use of distributed representations has resulted in improvements include topic modelling (Blei et al., 2003) or named entity recognition (Turian et al., 2010; Collobert et al., 2011). |
Related Work | Multilingual Representation Learning Most research on distributed representation induction has focused on single languages. |
Conventional Neural Network | The idea of distributed representation for symbolic data is one of the most important reasons why the neural network works. |
Experiment | Wang and Manning (2013) conduct an empirical study on the effect of nonlinearity and the results suggest that nonlinear models are highly effective only when distributed representation is used. |
Experiment | To explain why distributed representation captures more information than discrete features, we show in Table 4 the effect of character embeddings which are obtained from the lookup table of MMTNN after training. |
Experiment | Therefore, compared with discrete feature representations, distributed representation can capture the syntactic and semantic similarity between characters. |
Max-Margin Tensor Neural Network | To better model the tag-tag interaction given the context characters, distributed representation for tags instead of traditional discrete symbolic representation is used in our model. |
Introduction | Notable among the most effective distributional representations are the recent deep-learning approaches by Socher et al. |
Introduction | With such a working definition, contiguous motifs are likely to make distributional representations less noisy and also assist in disambiguating context. |
Introduction | Also, the lack of specificity ensures that such motifs are common enough to meaningfully influence distributional representation beyond single tokens. |
Introduction | Distributional representations of words have been successfully used in many language processing tasks such as entity set expansion (Pantel et al., 2009), part-of-speech (POS) tagging and chunking (Huang and Yates, 2009), ontology learning (Curran, 2005), computing semantic textual similarity (Besancon et al., 1999), and lexical inference (Kotlerman et al., 2012). |
Introduction | Consequently, the distributional representations of the word lightweight will differ considerably between the two domains. |
Introduction | The SVD smoothing in the first step both reduces the data sparseness in distributional representations of individual words, as well as the dimensionality of the feature space, thereby enabling us to efficiently and accurately learn a prediction model using PLSR in the second step. |
O \ | We first create a distributional representation for a word using the data from a single domain, and then learn a Partial Least Square Regression (PLSR) model to predict the distribution of a word in a target domain given its distribution in a source domain. |
Abstract | We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings. |
Introduction | Distributed representations of words have proved useful for a number of tasks. |
Overview | A word embedding is a distributed representation of meaning where each word is represented as a vector in R”. |