Index of papers in Proc. ACL that mention

embeddings

Seen in text as:

embeddings (453)
Embeddings (6)

Seen in 413 sentences in 24 papers.

1. Improving Word Representations via Global Context and Multiple Word Prototypes

Huang, Eric and Socher, Richard and Manning, Christopher and Ng, Andrew

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word.
Experiments	In this section, we first present a qualitative analysis comparing the nearest neighbors of our model’s embeddings with those of others, showing our embeddings better capture the semantics of words, with the use of global context.
Experiments	For all experiments, our models use 50-dimensional embeddings .
Experiments	The nearest neighbors of “market” that C&W’s embeddings give are more constrained by the syntactic constraint that words in plural form are only close to other words in plural form, whereas our model captures that the singular and plural forms of a word are similar in meaning.
Global Context-Aware Neural Language Model	ng = Z maX(0, 1 — 9(8, d) + g(sw, d)) (l) wEV Collobert and Weston (2008) showed that this ranking approach can produce good word embeddings that are useful in several NLP tasks, and allows much faster training of the model compared to optimizing log-likelihood of the next word.
Global Context-Aware Neural Language Model	where [£131,132, ...,;vm] is the concatenation of the m word embeddings representing sequence 8, f is an element-wise activation function such as tanh, a1 6 Rh“ is the activation of the hidden layer with h hidden nodes, W1 6 WW) and W2 6 nlxh are respectively the first and second layer weights of the neural network, and b1, ()2 are the biases of each layer.
Global Context-Aware Neural Language Model	For the score of the global context, we represent the document also as an ordered list of word embeddings , d 2 (d1, d2, ..., dk).
Multi-Prototype Neural Language Model	We present a way to use our learned single-prototype embeddings to represent each context window, which can then be used by clustering to perform word sense discrimination (Schutze, 1998).

embeddings is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

2. Bilingually-constrained Phrase Embeddings for Machine Translation

Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
Introduction	The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
Introduction	(2011) make the phrase embeddings capture the sentiment information.
Introduction	(2013a) enable the phrase embeddings to mainly capture the syntactic knowledge.

embeddings is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

3. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	By exploiting tag embeddings and tensor-based transformation, MMTNN has the ability to model complicated interactions between tags and context characters.
Conventional Neural Network	The character embeddings are then stacked into a embedding matrix M 6 1Rde \|.
Conventional Neural Network	We will analyze in more detail about the effect of character embeddings in Section 4.
Conventional Neural Network	The character embeddings extracted by the Lookup Table layer are then concatenated into a single vector a 6 Km, where H1 2 w - d is the size of Layer 1.
Introduction	between tags and context characters by exploiting tag embeddings and tensor-based transformation.
Max-Margin Tensor Neural Network	Similar to character embeddings, given a fixed-sized tag set T, the tag embeddings for tags are stored in a tag embedding matrix L E Rdx m, where d is the dimensionality
Max-Margin Tensor Neural Network	of the vector space (same with character embeddings ).
Max-Margin Tensor Neural Network	The tag embeddings start from a random initialization and can be automatically trained by back-propagation.

embeddings is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

4. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

Nguyen, Thien Huu and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
Abstract	We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
Introduction	valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
Introduction	ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
Introduction	More importantly, we show empirically that word embeddings and word clusters capture different information and their combination would further improve the adaptability of relation extractors.
Related Work	Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
Related Work	(2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
Related Work	(2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.

embeddings is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

5. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings .
Abstract	Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.
Experiments	We also investigate the learned embeddings from a qualitative perspective in §5.4.
Experiments	All our embeddings have dimensionality d=128, with the margin set to m=d.6 Further, we use L2 regularization with A21 and step-size in {001,005}.
Experiments	This task involves learning language independent embeddings which are then used for document classification across the English-German language pair.
Introduction	Such word embeddings are naturally richer representations than those of symbolic or discrete models, and have been shown to be able to capture both syntactic and semantic information.
Introduction	this work, we extend this hypothesis to multilingual data and joint-space embeddings .
Overview	We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings .

embeddings is mentioned in 28 sentences in this paper.

Topics mentioned in this paper:

6. Learning Semantic Hierarchies via Word Embeddings

Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings , which can be used to measure the semantic relationship between words.
Background	In this paper, we aim to identify hypemym—hyponym relations using word embeddings , which have been shown to preserve good properties for capturing semantic relationship between words.
Introduction	This paper proposes a novel approach for semantic hierarchy construction based on word embeddings .
Introduction	Word embeddings , also known as distributed word representations, typically represent words with dense, low-dimensional and real-valued vectors.
Introduction	Word embeddings have been empirically shown to preserve linguistic regularities, such as the semantic relationship between words (Mikolov et al., 2013b).
Method	Various models for learning word embeddings have been proposed, including neural net language models (Bengio et al., 2003; Mnih and Hinton, 2008; Mikolov et al., 2013b) and spectral models (Dhillon et al., 2011).
Method	(2013a) propose two log-linear models, namely the Skip-gram and CBOW model, to efficiently induce word embeddings .
Method	Therefore, we employ the Skip- gram model for estimating word embeddings in this study.

embeddings is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

7. How much do word embeddings encode about syntax?

Andreas, Jacob and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Do continuous word embeddings encode any useful information for constituency parsing?
Abstract	We isolate three ways in which word embeddings might augment a state-of-the-art statistical parser: by connecting out-of-vocabulary words to known ones, by encouraging common behavior among related in-vocabulary words, and by directly providing features for the lexicon.
Abstract	Despite small gains on extremely small supervised training sets, we find that extra information from embeddings appears to make little or no difference to a parser with adequate training data.
Introduction	This paper investigates a variety of ways in which word embeddings might augment a constituency parser with a discrete state space.
Introduction	While word embeddings can be constructed directly from surface distributional statistics, as in LSA, more sophisticated tools for unsupervised extraction of word representations have recently gained popularity (Collobert et al., 2011; Mikolov et al., 2013a).
Introduction	(Turian et al., 2010) have been shown to benefit from the inclusion of word embeddings as features.

embeddings is mentioned in 38 sentences in this paper.

Topics mentioned in this paper:

8. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking.
Distributed representations	Distributed word representations are called word embeddings .
Distributed representations	Word embeddings are typically induced using neural language models, which use neural networks as the underlying predictive model (Bengio, 2008).
Distributed representations	4.1 Collobert and Weston (2008) embeddings
Introduction	word embeddings using unsupervised approaches.
Introduction	(2009) about Collobert and Weston (2008) embeddings , given training improvements that we describe in Section 7.1.

embeddings is mentioned in 51 sentences in this paper.

Topics mentioned in this paper:

9. Re-embedding words

Labutov, Igor and Lipson, Hod

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data.
Abstract	However, some methods take days or weeks to learn good embeddings , and some are notoriously difficult to train.
Introduction	Moreover, we may already have on our hands embeddings for X and Y obtained from yet another (possibly unsupervised) task (C), in which X and Y are, for example, orthogonal.
Introduction	If the embeddings for task C happen to be learned from a much larger dataset, it would make sense to reuse task C embeddings , but adapt them for task A and/or task B.
Introduction	We will refer to task C and its embeddings as the source task and the source embeddings, and task A/B, and its embeddings as the target task and the target embeddings .

embeddings is mentioned in 38 sentences in this paper.

Topics mentioned in this paper:

10. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

DNN for word alignment	Words are converted to embeddings using the lookup table LT, and the catenation of embeddings are fed to a classic neural network with two hidden-layers, and the output of the network is the our lexical translation score:
DNN structures for NLP	Word embeddings often implicitly encode syntactic or semantic knowledge of the words.
DNN structures for NLP	Assuming a finite sized vocabulary V, word embeddings form a (L x \|V\|)-dimension embedding matrix WV, where L is a predetermined embedding length; mapping words to embeddings is done by simply looking up their respective columns in the embedding matrix WV.
DNN structures for NLP	After words have been transformed to their embeddings , they can be fed into subsequent classical network layers to model highly nonlinear relations:
Introduction	Based on the above analysis, in this paper, both the words in the source and target sides are firstly mapped to a vector via a discriminatively trained word embeddings , and word pairs are scored by a multilayer neural network which takes rich contexts (surrounding words on both source and target sides) into consideration; and a HMM-like distortion model is applied on top of the neural network to characterize structural aspect of bilingual sentences.
Related Work	(Titov et al., 2012) learns a context-free cross-lingual word embeddings to facilitate cross-lingual information retrieval.
Training	Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance.
Training	Most parameters reside in the word embeddings .
Training	To get a good initial value, the usual approach is to pre-train the embeddings on a large monolingual corpus.

embeddings is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

11. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
Abstract	Then, latent states are generated for each bracket, and finally, the latent states at the yield of the bracketing parse tree generate the words of the sentence (in the form of embeddings ).
Abstract	L€t V I: {101, ..., mg, 21, ..., 2H}, With 20,- 1‘61)-resenting the word embeddings , and 21- representing the latent states of the bracketings.

embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

12. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Specifically, our training encourages word embeddings to be consistent across alignment directions by introducing a penalty term that expresses the difference between embedding of words into an objective function.
RNN-based Alignment Model	In the lookup layer, each of these words is converted to its word embedding, and then the concatenation of the two embeddings (any) is fed to the hidden layer in the same manner as the FFNN-based model.
Related Work	Word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words (Bengio et al., 2003).
Training	The constraint concretely enforces agreement in word embeddings of both directions.
Training	The proposed method trains two directional models concurrently based on the following objective by incorporating a penalty term that expresses the difference between word embeddings:
Training	where QFE (or 6gp) denotes the weights of layers in a source-to-target (or target-to-source) alignment model, 6,; denotes weights of a lookup layer, i.e., word embeddings , and 04 is a parameter that controls the strength of the agreement constraint.

embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

13. Semantic Frame Identification with Distributed Word Representations

Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings .
Experiments	The second baseline, tries to decouple the WSABIE training from the embedding input, and trains a log linear model using the embeddings .
Experiments	Hyperparameters For our frame identification model with embeddings , we search for the WSABIE hyperparameters using the development data.
Frame Identification with Embeddings	First, we extract the words in the syntactic context of runs; next, we concatenate their word embeddings as described in §2.2 to create an initial vector space representation.
Frame Identification with Embeddings	Formally, let cc represent the actual sentence with a marked predicate, along with the associated syntactic parse tree; let our initial representation of the predicate context be Suppose that the word embeddings we start with are of dimension n. Then 9 is a function from a parsed sentence cc to Rm“, where k is the number of possible syntactic context types.
Frame Identification with Embeddings	So for example “He runs the company” could help the model disambiguate “He owns the company.” Moreover, since g(:c) relies on word embeddings rather than word identities, information is shared between words.
Overview	We present a model that takes word embeddings as input and learns to identify semantic frames.
Overview	We use word embeddings to represent the syntactic context of a particular predicate instance as a vector.

embeddings is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

14. The Role of Syntax in Vector Space Models of Compositional Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use this model to learn high dimensional embeddings for sentences and evaluate them in a range of tasks, demonstrating that the incorporation of syntax allows a concise model to learn representations that are both effective and general.
Experiments	We conclude with some qualitative analysis to get a better idea of whether the combination of CCG and RAE can learn semantically expressive embeddings .
Experiments	use word-vectors of size 50, initialized using the embeddings provided by Turian et al.
Experiments	Experiment 1: Semi-Supervised Training In the first experiment, we use the semi-supervised training strategy described previously and initialize our models with the embeddings provided by Turian et al.

embeddings is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

CCG (37)
recursive (10)
embeddings (9)

15. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	t. The embeddings of C&W (Collobert et al., 2011), word2vec4, WVSA (Maas et al., 2011) and our models are trained with the same dataset and same parameter setting.
Related Work	ReEmb(C&W) and ReEmb(w2v) stand for the use of embeddings learned from 10 million distant-supervised tweets with C&W and word2vec, respectively.
Related Work	Table 3: Macro-F1 on positive/negative classification of tweets with different word embeddings .

embeddings is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

16. Low-Rank Tensors for Scoring Dependency Structures

Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features.
Introduction	First, features may lack clear linguistic interpretation as in distributional features or continuous vector embeddings of words.
Introduction	0 Our low dimensional embeddings are tailored to the syntactic context of words (head, modifier).
Problem Formulation	By learning parameters U, V, and W that function well in dependency parsing, we also learn context-dependent embeddings for words and arcs.
Related Work	Word-level vector space embeddings have so far had limited impact on parsing performance.
Related Work	This framework enables us to learn new syntactically guided embeddings while also leveraging separately estimated word vectors as starting features, leading to improved parsing performance.
Results	For this purpose, we train a model with only a tensor component (such that it has to learn an accurate tensor) on the English dataset and obtain low dimensional embeddings U gbw and ngw for each word.
Results	The upper part shows our learned embeddings group words with similar syntactic behavior.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

17. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	As we mentioned in Section 5, constructing phrase pair embeddings from word embeddings may be not suitable.
Experiments and Results	We first train the source and target word embeddings separately using large monolingual data, following (Collobert et al., 2011).
Experiments and Results	Ewms and Emma/9) are the monolingual word embeddings, and Ewbs(si) and Emma/9) are the bilingual word embeddings .
Model Training	Back propagation is performed along the tree structure, and the phrase pair embeddings of the leaf nodess are updated.
Phrase Pair Embedding	A simple approach to construct phrase pair embedding is to use the average of the embeddings of the words in the phrase pair.
Phrase Pair Embedding	We use recurrent neural network to generate two smoothed translation confidence scores based on source and target word embeddings .
Related Work	Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance.
Related Work	RNNLM (Mikolov et al., 2010) is firstly used to generate the source and target word embeddings , which are fed into a one-hidden-layer neural network to get a translation confidence score.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

18. Vector space semantics with frequency-driven motifs

Srivastava, Shashank and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Hellinger PCA embeddings learnt using the framework show competitive results on empirical tasks.
Introduction	While word embeddings and language models from such methods have been useful for tasks such as relation classification, polarity detection, event coreference and parsing; much of existing literature on composition is based on abstract linguistic theory and conjecture, and there is little evidence to support that learnt representations for larger linguistic units correspond to their semantic meanings.
Introduction	While this framework is attractive in the lack of assumptions on representation that it makes, the use of distributional embeddings for individual tokens means
Introduction	Recent work (Lebret and Lebret, 2013) has shown that the Hellinger distance is an especially effective measure in learning distributional embeddings , with Hellinger PCA being much more computationally inexpensive than neural language modeling approaches, while performing much better than standard PCA, and competitive with the state-of-the-art in downstream evaluations.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

19. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This result illustrates that the ngram-level knowledge captures more complex interactions of the web text, which cannot be recovered by using only word embeddings .
Experiments	(2012), who found that using both the word embeddings and the hidden units of a trigram WRRBM as additional features for a CRF chunker yields larger improvements than using word embeddings only.
Related Work	(2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling.
Related Work	(2013) induce bilingual word embeddings for word alignment.
Related Work	(2013) investigate Chinese character embeddings for joint word segmentation and POS tagging.

embeddings is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

20. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	These generally consist of a projection layer that maps words, sub-word units or n-grams to high dimensional embeddings ; the latter are then combined component-wise with an operation such as summation.
Convolutional Neural Networks with Dynamic k-Max Pooling	Word embeddings have size d = 4.
Convolutional Neural Networks with Dynamic k-Max Pooling	The values in the embeddings wi are parameters that are op-timised during training.
Experiments	The set of parameters comprises the word embeddings , the filter weights and the weights from the fully connected layers.
Experiments	As the dataset is rather small, we use lower-dimensional word vectors with d = 32 that are initialised with embeddings trained in an unsupervised way to predict contexts of occurrence (Turian et al., 2010).
Experiments	The randomly initialised word embeddings are increased in length to a dimension of d = 60.

embeddings is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Another way to construct B is to use neural word embeddings (Collobert and Weston, 2008).
Experiments	In this case, we can view the product Bv as a composition of the word embeddings , using the simple additive composition model proposed by Mitchell
Experiments	We used the word embeddings from Collobert and Weston (2008) with dimension {25, 50, 100}.

embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. Learning Grounded Meaning Representations with Autoencoders

Silberer, Carina and Lapata, Mirella

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from textual and visual input.
Experimental Setup	Finally, we also compare to the word embeddings obtained using Mikolov et al.’s (2011) recurrent neural network based language model.
Experimental Setup	These were pre-trained on Broadcast news data (400M words) using the word2vec tool.8 We report results with the 640-dimensional embeddings as they performed best.
Introduction	We evaluate the embeddings it produces on two tasks, namely word similarity and categorization.
Results	This indicates that higher level embeddings may be beneficial to NLP tasks in general, not only to those requiring multimodal information.

embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2).
Experiments	0 RNN2-(W2V) is initialized using word2vec embeddings and also includes annotated phrase labels in its training.
Recursive Neural Networks	The word2vec embeddings have linear relationships (e.g., the closest vectors to the average of
Where Compositionality Helps Detect Ideological Bias	Initializing the RNN We matrix with word2vec embeddings improves accuracy over randomly initialization by 1%.

embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

24. Distributed Representations of Geographically Situated Language

Bamman, David and Dyer, Chris and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	The first is the representation matrix W 6 RM”, which encodes the real-valued embeddings for each word in the vocabulary.
Model	Backpropagation using (input :5, output 3/) word tuples learns the values of W (the embeddings ) and X (the output parameter matrix) that maximize the likelihood of y (i.e., the context words) conditioned on cc (i.e., the 31’s).
Model	Given an input word w and set of active variable values A (e.g., A 2 {state 2 MA}), we calculate the hidden layer h as the sum of these independent embeddings : h = wTWmam + 26,64 wTWa.

embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: