Integrating Semantic Constraint into Surprisal | The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition. |
Introduction | Expectations are represented by a vector of probabilities which reflects the likely location in semantic space of the upcoming word. |
Introduction | The model essentially integrates the predictions of an incremental parser (Roark 2001) together with those of a semantic space model (Mitchell and Lapata 2009). |
Method | Following Mitchell and Lapata (2009), we constructed a simple semantic space based on c0-occurrence statistics from the BLLIP training set. |
Models of Processing Difficulty | As LSA is one the best known semantic space models it comes as no surprise that it has been used to analyze semantic constraint. |
Models of Processing Difficulty | Context is represented by a vector of probabilities which reflects the likely location in semantic space of the upcoming word. |
Models of Processing Difficulty | Importantly, composition models are not defined with a specific semantic space in mind, they could easily be adapted to LSA, or simple co-occurrence vectors, or more sophisticated semantic representations (e.g., Griffiths et al. |
Experimental Setup | 4.2 Visual Semantic Spaces |
Experimental Setup | 4.3 Linguistic Semantic Spaces |
Introduction | This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space . |
Introduction | We first test the effectiveness of our cross-modal semantic space on the so-called zero-shot learning task (Palatucci et al., 2009), which has recently been explored in the machine learning community (Frome et al., 2013; Socher et al., 2013). |
Introduction | We show that the induced cross-modal semantic space is powerful enough that sensible guesses about the correct word denoting an object can be made, even when the linguistic context vector representing the word has been created from as little as 1 sentence containing it. |
Related Work | Most importantly, by projecting visual representations of objects into a shared semantic space , we do not limit ourselves to establishing a link between ob- |
Related Work | (2013) focus on zero-shot learning in the vision-language domain by exploiting a shared visual-linguistic semantic space . |
Related Work | (2013) learn to project unsupervised vector-based image representations onto a word-based semantic space using a neural network architecture. |
Zero-shot learning and fast mapping | Concretely, we assume that concepts, denoted for convenience by word labels, are represented in linguistic terms by vectors in a text-based distributional semantic space (see Section 4.3). |
Zero-shot learning and fast mapping | Objects corresponding to concepts are represented in visual terms by vectors in an image-based semantic space (Section 4.2). |
Application of the vector-space model | One of the advantages conferred by the quantification of semantic similarity is that lexical items can be precisely considered in relation to each other, and by aggregating the similarity information for all items in the distribution, we can produce a visual representation of the structure of the semantic domain of the construction in order to observe how verbs in that domain are related to each other, and to immediately identify the regions of the semantic space that are densely populated (with tight clusters of verbs), and those that are more sparsely populated (fewer and/or more scattered verbs). |
Application of the vector-space model | Outside of these two clusters, the semantic space is much more sparsely populated. |
Application of the vector-space model | In sum, the semantic plots show that densely populated regions of the semantic space appear to be the most likely to attract new members. |
Distributional measure of semantic similarity | The resulting matrix, which contains the distributional information (in 4,683 columns) for 92 verbs occurring in the hell-construction, constitutes the semantic space under consideration in this case study. |
Distributional measure of semantic similarity | Besides, using the same data presents the advantage that the distribution is modeled with the same semantic space in all time periods, which makes it easier to visualize changes. |
Introduction | Coverage relates to how the semantic domain of a construction is populated in the vicinity of a given target coinage, and in particular to the density of the semantic space . |
Composition Models | The construction of the semantic space depends on the definition of linguistic context (e.g., neighbour-ing words can be documents or collocations), the number of components used (e.g., the k most frequent words in a corpus), and their values (e.g., as raw co-occurrence frequencies or ratios of probabilities). |
Composition Models | A hypothetical semantic space is illustrated in Figure 1. |
Composition Models | 1A detailed treatment of existing semantic space models is outside the scope of the present paper. |
Evaluation Setup | This change in the verb’s sense is equated to a shift in its position in semantic space . |
Evaluation Setup | Model Parameters Irrespectiver of their form, all composition models discussed here are based on a semantic space for representing the meanings of individual words. |
Evaluation Setup | The semantic space we used in our experiments was built on a lemmatised version of the BNC. |
Introduction | Moreover, the vector similarities within such semantic spaces have been shown to substantially correlate with human similarity judgments (McDonald, 2000) and word association norms (Denhire and Lemaire, 2004). |
Related Work | Figure l: A hypothetical semantic space for horse and run |
Machine Learning Method | We represent the semantic space for verbs as a matrix of frequencies, where each row corresponds to a Levin verb and each column represents a given feature. |
Machine Learning Method | We construct a semantic space with each feature set. |
Machine Learning Method | For instance, the semantic space with CO features contains over one million columns, which is too huge and cumbersome. |
Composition methods | Distributional semantic models (DSMs), also known as vector-space models, semantic spaces , or by the names of famous incarnations such as Latent Semantic Analysis or Topic Models, approximate the meaning of words with vectors that record their patterns of co-occurrence with corpus context features (often, other words). |
Experimental setup | tion is that a vector, in order to be a good representation of the meaning of the corresponding word, should lie in a region of semantic space populated by intuitively similar meanings, e. g., we are more likely to have captured the meaning of car if the NN of its vector is the automobile vector rather than potato. |
Experimental setup | All 900 derived vectors from the test set were matched with their three closest NNs in our semantic space (see Section 4.2), thus producing a set of 2, 700 word pairs. |
Experimental setup | 6Most steps of the semantic space construction and composition pipelines were implemented using |
Conclusion | (2012) reconstruct phrase tables based on phrase similarity scores in semantic space . |
Introduction | Recent work on grounding language in vision shows that it is possible to represent images and linguistic expressions in a common vector-based semantic space (Frome et al., 2013; Socher et al., 2013). |
Introduction | Translation is another potential application of the generation framework: Given a semantic space shared between two or more languages, one can compose a word sequence in one language and generate translations in another, with the shared semantic vector space functioning as interlingua. |
Noun phrase translation | Creation of cross-lingual vector spaces A common semantic space is required in order to map words and phrases across languages. |
Noun phrase translation | Cross-lingual decomposition training Training proceeds as in the monolingual case, this time concatenating the training data sets and estimating a single (de)-composition function for the two languages in the shared semantic space . |
Experimental Results | We compared J NNSE(Brain+Text) and NNSE(Text) models by measuring the correlation of all pairwise distances in J NNSE(Brain+Text) and NNSE(Text) space to the pairwise distances in the 218-dimensional semantic space . |
Experimental Results | Figure 1: Correlation of JNNSE(Brain+Text) and NNSE(Text) models with the distances in a semantic space constructed from behavioral data. |
Experimental Results | screwdriver and hammer) are closer in semantic space than words in different word categories, which makes some 2 vs. 2 tests more difficult than others. |
NonNegative Sparse Embedding | The sparse and nonnegative representation in A produces a more interpretable semantic space , where interpretability is quantified with a behavioral task (Chang et al., 2009; Murphy et al., 2012a). |
Conclusion | Further experiments and analysis support our hypothesis that bilingual signals are a useful tool for learning distributed representations by enabling models to abstract away from monolingual surface realisations into a deeper semantic space . |
Experiments | This setting causes words from all languages to be embedded in a single semantic space . |
Experiments | These results further support our hypothesis that the bilingual contrastive error function can learn semantically plausible embeddings and furthermore, that it can abstract away from monolingual surface realisations into a shared semantic space across languages. |
Introduction | Unlike most methods for learning word representations, which are restricted to a single language, our approach learns to represent meaning across languages in a shared multilingual semantic space . |
Related Work | (2013), that learn embeddings across a large variety of languages and models such as ours, that learn joint embeddings, that is a projection into a shared semantic space across multiple languages. |
CMSMs Encode Symbolic Approaches | From the perspective of our compositionality framework, those approaches employ a group (or pre-group) (G, -) as semantical space S where the group operation (often written as multiplication) is used as composition operation ><. |
Compositionality and Matrices | More formally, the underlying idea can be described as follows: given a mapping [[ - ]] : 2 —> S from a set of tokens (words) 2 into some semantical space S (the elements of which we will simply call “meanings”), we find a semantic composition operation ><2 S* —> S mapping sequences of meanings to meanings such that the meaning of a sequence of tokens 0'10'2 . |
Compositionality and Matrices | the semantical space consists of quadratic matrices, and the composition operator ><1 coincides with matrix multiplication as introduced in Section 2. |
Compositionality and Matrices | This way, abstracting from specific initial mental state vectors, our semantic space S can be seen as a function space of mental transformations represented by matrices, whereby matrix multiplication realizes subsequent execution of those transformations triggered by the input token sequence. |
Conclusions and Future Work | the semantic space in one language to the other. |
Experiments | Given a phrase pair (3, t), the BRAE model first obtains their semantic phrase representations (p8, pt), and then transforms p8 into target semantic space 198*, pt into source semantic space 1975*. |
Introduction | Furthermore, a transformation function between the Chinese and English semantic spaces can be learned as well. |
Related Work | Although we also follow the composition-based phrase embedding, we are the first to focus on the semantic meanings of the phrases and propose a bilingually-constrained model to induce the semantic information and learn transformation of the semantic space in one language to the other. |
Conclusion | Add to this that, beyond the standard lexical semantics challenges we tested here, predict models are currently been successfully applied in cutting-edge domains such as representing phrases (Mikolov et al., 2013c; Socher et al., 2012) or fusing language and vision in a common semantic space (Frome et al., 2013; Socher et al., 2013). |
Evaluation materials | (2012) with a method that is in the spirit of the predict models, but lets synonymy information from WordNet constrain the learning process (by favoring solutions in which WordNet synonyms are near in semantic space ). |
Evaluation materials | Systems are evaluated in terms of proportion of questions where the nearest neighbour from the whole semantic space is the correct answer (the given example and test vector triples are excluded from the nearest neighbour search). |