SciSurf: Index of "word embeddings" in Proc. ACL

Index of papers in Proc. ACL that mention

word embeddings

Seen in text as:

word embeddings (164)
word embedding (114)
Word embeddings (13)
Word embedding (10)
Word Embedding (8)

Seen in 300 sentences in 21 papers.

1. Improving Word Representations via Global Context and Multiple Word Prototypes

Huang, Eric and Socher, Richard and Manning, Christopher and Ng, Andrew

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word.
Experiments	Table 2: Nearest neighbors of word embeddings learned by our model using the multi-prototype approach based on cosine similarity.
Experiments	Table 3: Spearman’s p correlation on WordSim—353, showing our model’s improvement over previous neural models for learning word embeddings .
Experiments	C&W* is the word embeddings trained and provided by C&W.
Global Context-Aware Neural Language Model	ng = Z maX(0, 1 — 9(8, d) + g(sw, d)) (l) wEV Collobert and Weston (2008) showed that this ranking approach can produce good word embeddings that are useful in several NLP tasks, and allows much faster training of the model compared to optimizing log-likelihood of the next word.
Global Context-Aware Neural Language Model	where [£131,132, ...,;vm] is the concatenation of the m word embeddings representing sequence 8, f is an element-wise activation function such as tanh, a1 6 Rh“ is the activation of the hidden layer with h hidden nodes, W1 6 WW) and W2 6 nlxh are respectively the first and second layer weights of the neural network, and b1, ()2 are the biases of each layer.
Global Context-Aware Neural Language Model	For the score of the global context, we represent the document also as an ordered list of word embeddings , d 2 (d1, d2, ..., dk).
Related Work	Our model uses a similar neural network architecture as these models and uses the ranking-loss training objective proposed by Collobert and Weston (2008), but introduces a new way to combine local and global context to train word embeddings .
Related Work	Besides language modeling, word embeddings induced by neural language models have been useful in chunking, NER (Turian et al., 2010), parsing (Socher et al., 201 lb), sentiment analysis (Socher et al., 2011c) and paraphrase detection (Socher et al., 2011a).

word embeddings is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

2. Bilingually-constrained Phrase Embeddings for Machine Translation

Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingually-constrained Recursive Auto-encoders	After learning word embeddings with DNN (Bengio et al., 2003; Collobert and Weston, 2008; Mikolov et al., 2013), each word in the vocabulary V corresponds to a vector cc 6 R”, and all the vectors are stacked into an embedding matrix L E R“ M.
Bilingually-constrained Recursive Auto-encoders	Since word embeddings for two languages are learned separately and locate in different vector space, we do not enforce the phrase embeddings in two languages to be in the same semantic vector space.
Bilingually-constrained Recursive Auto-encoders	6 L: word embedding matrix L for two languages (Section 3.1.1);
Introduction	The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
Introduction	Kalchbrenner and Blunsom (2013) utilize a simple convolution model to generate phrase embeddings from word embeddings .
Related Work	One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013).
Related Work	Instead, our bilingually-constrained recursive auto-encoders not only learn the composition mechanism of generating phrases from words, but also fine tune the word embeddings during the model training stage, so that we can induce the full information of the phrases and internal words.

word embeddings is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

3. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a method that learns word embedding for Twitter sentiment classification in this paper.
Abstract	We address this issue by learning sentiment-specific word embedding (SSWE), which encodes sentiment information in the continuous representation of words.
Abstract	To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons.
Introduction	Accordingly, it is a crucial step to learn the word representation (or word embedding ), which is a dense, low-dimensional and real-valued vector for a word.
Introduction	Although existing word embedding leam-ing algorithms (Collobert et al., 2011; Mikolov et al., 2013) are intuitive choices, they are not effective enough if directly used for sentiment classification.
Introduction	In this paper, we propose learning sentiment-specific word embedding (SSWE) for sentiment analysis.

word embeddings is mentioned in 46 sentences in this paper.

Topics mentioned in this paper:

4. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To overcome this limitation, we encourage agreement between the two directional models by introducing a penalty function that ensures word embedding consistency across two directional models during training.
Introduction	Specifically, our training encourages word embeddings to be consistent across alignment directions by introducing a penalty term that expresses the difference between embedding of words into an objective function.
RNN-based Alignment Model	In the lookup layer, each of these words is converted to its word embedding , and then the concatenation of the two embeddings (any) is fed to the hidden layer in the same manner as the FFNN-based model.
Related Work	First, the lookup layer converts each input word into its word embedding by looking up its corresponding column in the embedding matrix (L), and then concatenates them.
Related Work	Word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words (Bengio et al., 2003).
Training	The constraint concretely enforces agreement in word embeddings of both directions.
Training	The proposed method trains two directional models concurrently based on the following objective by incorporating a penalty term that expresses the difference between word embeddings:
Training	where QFE (or 6gp) denotes the weights of layers in a source-to-target (or target-to-source) alignment model, 6,; denotes weights of a lookup layer, i.e., word embeddings , and 04 is a parameter that controls the strength of the agreement constraint.

word embeddings is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

Nguyen, Thien Huu and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
Abstract	We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
Introduction	valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
Introduction	ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
Introduction	We explore the embedding-based features in a principled way and demonstrate that word embedding itself is also an effective representation for domain adaptation of RE.
Related Work	Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
Related Work	(2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
Related Work	(2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.

word embeddings is mentioned in 30 sentences in this paper.

Topics mentioned in this paper:

6. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Word embedding is a dense, low dimensional, real-valued vector.
Introduction	Word embedding is usually learnt from large amount of monolingual corpus at first, and then fine tuned for special distinct tasks.
Introduction	In their work, bilingual word embedding is trained to capture lexical translation information, and surrounding words are utilized to model context information.
Our Model	Word embedding act is integrated with
Our Model	The new history ht is used for the future prediction, and updated with new information from word embedding :ct recurrently.
Our Model	Word embedding act is integrated as new input information in recurrent neural networks for each prediction, but in recursive neural networks, no additional input information is used except the two representation vectors of the child nodes.
Related Work	In their work, initial word embedding is firstly trained with a huge monolingual corpus, then the word embedding is adapted and fine tuned bilingually in a context-depended DNN HMM framework.
Related Work	Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance.
Related Work	In their work, not only the target word embedding is used as the input of the network, but also the embedding of the source word, which is aligned to the current target word.

word embeddings is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

7. Learning Semantic Hierarchies via Word Embeddings

Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings , which can be used to measure the semantic relationship between words.
Background	In this paper, we aim to identify hypemym—hyponym relations using word embeddings , which have been shown to preserve good properties for capturing semantic relationship between words.
Introduction	This paper proposes a novel approach for semantic hierarchy construction based on word embeddings .
Introduction	Word embeddings , also known as distributed word representations, typically represent words with dense, low-dimensional and real-valued vectors.
Introduction	Word embeddings have been empirically shown to preserve linguistic regularities, such as the semantic relationship between words (Mikolov et al., 2013b).
Method	Then we elaborate on our proposed method composed of three major steps, namely, word embedding training, projection learning, and hypernym—hyponym relation identification.
Method	3.2 Word Embedding Training
Method	Various models for learning word embeddings have been proposed, including neural net language models (Bengio et al., 2003; Mnih and Hinton, 2008; Mikolov et al., 2013b) and spectral models (Dhillon et al., 2011).

word embeddings is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

8. How much do word embeddings encode about syntax?

Andreas, Jacob and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Do continuous word embeddings encode any useful information for constituency parsing?
Abstract	We isolate three ways in which word embeddings might augment a state-of-the-art statistical parser: by connecting out-of-vocabulary words to known ones, by encouraging common behavior among related in-vocabulary words, and by directly providing features for the lexicon.
Abstract	Our results support an overall hypothesis that word embeddings import syntactic information that is ultimately redundant with distinctions learned from tree-banks in other ways.
Introduction	This paper investigates a variety of ways in which word embeddings might augment a constituency parser with a discrete state space.
Introduction	While word embeddings can be constructed directly from surface distributional statistics, as in LSA, more sophisticated tools for unsupervised extraction of word representations have recently gained popularity (Collobert et al., 2011; Mikolov et al., 2013a).
Introduction	(Turian et al., 2010) have been shown to benefit from the inclusion of word embeddings as features.

word embeddings is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

9. Word Alignment Modeling with Context Dependent Deep Neural Network

Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We describe in detail how we adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method introduced in speech recognition to the HMM-based word alignment model, in which bilingual word embedding is discrimina-tively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences.
DNN structures for NLP	To apply DNN to NLP task, the first step is to transform a discrete word into its word embedding , a low dimensional, dense, real-valued vector (Bengio et al., 2006).
DNN structures for NLP	Word embeddings often implicitly encode syntactic or semantic knowledge of the words.
DNN structures for NLP	Assuming a finite sized vocabulary V, word embeddings form a (L x \|V\|)-dimension embedding matrix WV, where L is a predetermined embedding length; mapping words to embeddings is done by simply looking up their respective columns in the embedding matrix WV.
Introduction	Most works convert atomic lexical entries into a dense, low dimensional, real-valued representation, called word embedding ; Each dimension represents a latent aspect of a word, capturing its semantic and syntactic properties (Bengio et al., 2006).
Introduction	Word embedding is usually first learned from huge amount of monolingual texts, and then fine-tuned with task-specific objectives.
Introduction	As we mentioned in the last paragraph, word embedding (trained with huge monolingual texts) has the ability to map a word into a vector space, in which, similar words are near each other.
Related Work	Most methods using DNN in NLP start with a word embedding phase, which maps words into a fixed length, real valued vectors.
Related Work	(Titov et al., 2012) learns a context-free cross-lingual word embeddings to facilitate cross-lingual information retrieval.
Training	Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance.

word embeddings is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

10. Additive Neural Networks for Statistical Machine Translation

liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector.
Introduction	We also integrate word embedding into the model by representing each word as a feature vector (Collobert and Weston, 2008).
Introduction	For the local feature vector h’ in Eq (5), we employ word embedding features as described in the following subsection.
Introduction	3.3 Word Embedding features for AdNN

word embeddings is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

11. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributed representations	Distributed word representations are called word embeddings .
Distributed representations	Word embeddings are typically induced using neural language models, which use neural networks as the underlying predictive model (Bengio, 2008).
Introduction	word embeddings using unsupervised approaches.
Supervised evaluation tasks	The word embeddings also required a scaling hyperparameter, as described in Section 7.2.
Unlabled Data	For rare words, which are typically updated only 143 times per epochz, and given that our embedding learning rate was typically 1e-6 or 1e-7, this means that rare word embeddings will be concentrated around zero, instead of spread out randomly.
Unlabled Data	7.2 Scaling of Word Embeddings
Unlabled Data	The word embeddings , however, are real numbers that are not necessarily in a bounded range.

word embeddings is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

12. Product Feature Mining: Semantic Clues versus Syntactic Constituents

Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The dimension of word embedding n = 100, the convergence threshold 5 = 10—7, and the number of expanded seeds T = 40.
Experiments	In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding , which is induced from global word co-occurrence statistic.
The Proposed Method	To capture lexical semantic clue, each word is first converted into word embedding , which is a continuous vector with each dimension’s value corresponds to a semantic or grammatical interpretation (Turian et al., 2010).
The Proposed Method	Learning large-scale word embeddings is very time-consuming (Collobert et al., 2011), we thus employ a faster method named Skip-gram model (Mikolov et al., 2013).
The Proposed Method	3.2.1 Learning Word Embedding for Semantic Representation

word embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

13. Semantic Frame Identification with Distributed Word Representations

Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings .
Experiments	as described in §3.l but conjoins them with the word identity rather than a word embedding .
Frame Identification with Embeddings	First, we extract the words in the syntactic context of runs; next, we concatenate their word embeddings as described in §2.2 to create an initial vector space representation.
Frame Identification with Embeddings	Formally, let cc represent the actual sentence with a marked predicate, along with the associated syntactic parse tree; let our initial representation of the predicate context be Suppose that the word embeddings we start with are of dimension n. Then 9 is a function from a parsed sentence cc to Rm“, where k is the number of possible syntactic context types.
Frame Identification with Embeddings	So for example “He runs the company” could help the model disambiguate “He owns the company.” Moreover, since g(:c) relies on word embeddings rather than word identities, information is shared between words.
Overview	We present a model that takes word embeddings as input and learns to identify semantic frames.
Overview	A word embedding is a distributed representation of meaning where each word is represented as a vector in R”.
Overview	We use word embeddings to represent the syntactic context of a particular predicate instance as a vector.

word embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

14. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models.
Experiments	(2013), who published word embeddings across 100 languages, including all languages considered in this paper.
Experiments	While the classification experiments focused on establishing the semantic content of the sentence level representations, we also want to briefly investigate the induced word embeddings .
Introduction	Such word embeddings are naturally richer representations than those of symbolic or discrete models, and have been shown to be able to capture both syntactic and semantic information.
Overview	We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings .
Related Work	In their simplest form, distributional information from large corpora can be used to learn embeddings, where the words appearing within a certain window of the target word are used to compute that word’s embedding .
Related Work	(2011) further popularised using neural network architectures for learning word embeddings from large amounts of largely unlabelled data by showing the embeddings can then be used to improve standard supervised tasks.

word embeddings is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Another way to construct B is to use neural word embeddings (Collobert and Weston, 2008).
Experiments	In this case, we can view the product Bv as a composition of the word embeddings , using the simple additive composition model proposed by Mitchell
Experiments	We used the word embeddings from Collobert and Weston (2008) with dimension {25, 50, 100}.

word embeddings is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This result illustrates that the ngram-level knowledge captures more complex interactions of the web text, which cannot be recovered by using only word embeddings .
Experiments	(2012), who found that using both the word embeddings and the hidden units of a trigram WRRBM as additional features for a CRF chunker yields larger improvements than using word embeddings only.
Related Work	(2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling.
Related Work	(2013) induce bilingual word embeddings for word alignment.
Related Work	In particular, we both use a nonlinear layer to model complex relations underling word embeddings .

word embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

17. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2).
Recursive Neural Networks	The word-level vectors 5% and :35 come from a d x V dimensional word embedding matrix We, where V is the size of the vocabulary.
Recursive Neural Networks	Random The most straightforward choice is to initialize the word embedding matrix We and composition matrices WL and WR randomly such that without any training, representations for words and phrases are arbitrarily projected into the vector space.
Recursive Neural Networks	word2vec The other alternative is to initialize the word embedding matrix We with values that reflect the meanings of the associated word types.

word embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
Abstract	L€t V I: {101, ..., mg, 21, ..., 2H}, With 20,- 1‘61)-resenting the word embeddings , and 21- representing the latent states of the bracketings.
Abstract	Word embeddings As mentioned earlier, each 212,- can be an arbitrary feature vector.

word embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Building Sentiment Lexicons for All Major Languages

Chen, Yanqing and Skiena, Steven

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	proaches draft off of distributed word embedding which offer concise features reflecting the semantics of the underlying vocabulary.
Related Work	(2010) create powerful word embedding by training on real and corrupted phrases, optimizing for the replaceability of words.
Related Work	(2012) demonstrates a powerful approach to English sentiment using word embedding , which can easily be extended to other languages by training on appropriate text corpora.

word embeddings is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Convolutional Neural Networks with Dynamic k-Max Pooling	Word embeddings have size d = 4.
Experiments	The set of parameters comprises the word embeddings , the filter weights and the weights from the fully connected layers.
Experiments	The randomly initialised word embeddings are increased in length to a dimension of d = 60.

word embeddings is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. The Role of Syntax in Vector Space Models of Compositional Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Instead of initialising the model with external word embeddings , we first train it on a large amount of data with the aim of overcoming the sparsity issues encountered in the previous experiment.
Experiments	In this phase only the reconstruction signal is used to learn word embeddings and transformation matrices.
Experiments	By learning word embeddings and composition matrices on more data, the model is likely to gen-eralise better.

word embeddings is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

CCG (37)
recursive (10)
embeddings (9)