SciSurf: Index of "word embeddings" in Proc. ACL 2014

Index of papers in Proc. ACL 2014 that mention

word embeddings

Seen in text as:

word embeddings (120)
word embedding (94)
Word embeddings (10)
Word embedding (7)
Word Embedding (6)

Seen in 229 sentences in 16 papers.

1. How much do word embeddings encode about syntax?

Andreas, Jacob and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Do continuous word embeddings encode any useful information for constituency parsing?
Abstract	We isolate three ways in which word embeddings might augment a state-of-the-art statistical parser: by connecting out-of-vocabulary words to known ones, by encouraging common behavior among related in-vocabulary words, and by directly providing features for the lexicon.
Abstract	Our results support an overall hypothesis that word embeddings import syntactic information that is ultimately redundant with distinctions learned from tree-banks in other ways.
Introduction	This paper investigates a variety of ways in which word embeddings might augment a constituency parser with a discrete state space.
Introduction	While word embeddings can be constructed directly from surface distributional statistics, as in LSA, more sophisticated tools for unsupervised extraction of word representations have recently gained popularity (Collobert et al., 2011; Mikolov et al., 2013a).
Introduction	(Turian et al., 2010) have been shown to benefit from the inclusion of word embeddings as features.

word embeddings is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

2. Learning Semantic Hierarchies via Word Embeddings

Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings , which can be used to measure the semantic relationship between words.
Background	In this paper, we aim to identify hypemym—hyponym relations using word embeddings , which have been shown to preserve good properties for capturing semantic relationship between words.
Introduction	This paper proposes a novel approach for semantic hierarchy construction based on word embeddings .
Introduction	Word embeddings , also known as distributed word representations, typically represent words with dense, low-dimensional and real-valued vectors.
Introduction	Word embeddings have been empirically shown to preserve linguistic regularities, such as the semantic relationship between words (Mikolov et al., 2013b).
Method	Then we elaborate on our proposed method composed of three major steps, namely, word embedding training, projection learning, and hypernym—hyponym relation identification.
Method	3.2 Word Embedding Training
Method	Various models for learning word embeddings have been proposed, including neural net language models (Bengio et al., 2003; Mnih and Hinton, 2008; Mikolov et al., 2013b) and spectral models (Dhillon et al., 2011).

word embeddings is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

3. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Word embedding is a dense, low dimensional, real-valued vector.
Introduction	Word embedding is usually learnt from large amount of monolingual corpus at first, and then fine tuned for special distinct tasks.
Introduction	In their work, bilingual word embedding is trained to capture lexical translation information, and surrounding words are utilized to model context information.
Our Model	Word embedding act is integrated with
Our Model	The new history ht is used for the future prediction, and updated with new information from word embedding :ct recurrently.
Our Model	Word embedding act is integrated as new input information in recurrent neural networks for each prediction, but in recursive neural networks, no additional input information is used except the two representation vectors of the child nodes.
Related Work	In their work, initial word embedding is firstly trained with a huge monolingual corpus, then the word embedding is adapted and fine tuned bilingually in a context-depended DNN HMM framework.
Related Work	Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance.
Related Work	In their work, not only the target word embedding is used as the input of the network, but also the embedding of the source word, which is aligned to the current target word.

word embeddings is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

4. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

Nguyen, Thien Huu and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
Abstract	We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
Introduction	valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
Introduction	ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
Introduction	We explore the embedding-based features in a principled way and demonstrate that word embedding itself is also an effective representation for domain adaptation of RE.
Related Work	Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
Related Work	(2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
Related Work	(2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.

word embeddings is mentioned in 30 sentences in this paper.

Topics mentioned in this paper:

5. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To overcome this limitation, we encourage agreement between the two directional models by introducing a penalty function that ensures word embedding consistency across two directional models during training.
Introduction	Specifically, our training encourages word embeddings to be consistent across alignment directions by introducing a penalty term that expresses the difference between embedding of words into an objective function.
RNN-based Alignment Model	In the lookup layer, each of these words is converted to its word embedding , and then the concatenation of the two embeddings (any) is fed to the hidden layer in the same manner as the FFNN-based model.
Related Work	First, the lookup layer converts each input word into its word embedding by looking up its corresponding column in the embedding matrix (L), and then concatenates them.
Related Work	Word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words (Bengio et al., 2003).
Training	The constraint concretely enforces agreement in word embeddings of both directions.
Training	The proposed method trains two directional models concurrently based on the following objective by incorporating a penalty term that expresses the difference between word embeddings:
Training	where QFE (or 6gp) denotes the weights of layers in a source-to-target (or target-to-source) alignment model, 6,; denotes weights of a lookup layer, i.e., word embeddings , and 04 is a parameter that controls the strength of the agreement constraint.

word embeddings is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

6. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a method that learns word embedding for Twitter sentiment classification in this paper.
Abstract	We address this issue by learning sentiment-specific word embedding (SSWE), which encodes sentiment information in the continuous representation of words.
Abstract	To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons.
Introduction	Accordingly, it is a crucial step to learn the word representation (or word embedding ), which is a dense, low-dimensional and real-valued vector for a word.
Introduction	Although existing word embedding leam-ing algorithms (Collobert et al., 2011; Mikolov et al., 2013) are intuitive choices, they are not effective enough if directly used for sentiment classification.
Introduction	In this paper, we propose learning sentiment-specific word embedding (SSWE) for sentiment analysis.

word embeddings is mentioned in 46 sentences in this paper.

Topics mentioned in this paper:

7. Bilingually-constrained Phrase Embeddings for Machine Translation

Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingually-constrained Recursive Auto-encoders	After learning word embeddings with DNN (Bengio et al., 2003; Collobert and Weston, 2008; Mikolov et al., 2013), each word in the vocabulary V corresponds to a vector cc 6 R”, and all the vectors are stacked into an embedding matrix L E R“ M.
Bilingually-constrained Recursive Auto-encoders	Since word embeddings for two languages are learned separately and locate in different vector space, we do not enforce the phrase embeddings in two languages to be in the same semantic vector space.
Bilingually-constrained Recursive Auto-encoders	6 L: word embedding matrix L for two languages (Section 3.1.1);
Introduction	The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
Introduction	Kalchbrenner and Blunsom (2013) utilize a simple convolution model to generate phrase embeddings from word embeddings .
Related Work	One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013).
Related Work	Instead, our bilingually-constrained recursive auto-encoders not only learn the composition mechanism of generating phrases from words, but also fine tune the word embeddings during the model training stage, so that we can induce the full information of the phrases and internal words.

word embeddings is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

8. Product Feature Mining: Semantic Clues versus Syntactic Constituents

Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The dimension of word embedding n = 100, the convergence threshold 5 = 10—7, and the number of expanded seeds T = 40.
Experiments	In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding , which is induced from global word co-occurrence statistic.
The Proposed Method	To capture lexical semantic clue, each word is first converted into word embedding , which is a continuous vector with each dimension’s value corresponds to a semantic or grammatical interpretation (Turian et al., 2010).
The Proposed Method	Learning large-scale word embeddings is very time-consuming (Collobert et al., 2011), we thus employ a faster method named Skip-gram model (Mikolov et al., 2013).
The Proposed Method	3.2.1 Learning Word Embedding for Semantic Representation

word embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

9. Semantic Frame Identification with Distributed Word Representations

Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings .
Experiments	as described in §3.l but conjoins them with the word identity rather than a word embedding .
Frame Identification with Embeddings	First, we extract the words in the syntactic context of runs; next, we concatenate their word embeddings as described in §2.2 to create an initial vector space representation.
Frame Identification with Embeddings	Formally, let cc represent the actual sentence with a marked predicate, along with the associated syntactic parse tree; let our initial representation of the predicate context be Suppose that the word embeddings we start with are of dimension n. Then 9 is a function from a parsed sentence cc to Rm“, where k is the number of possible syntactic context types.
Frame Identification with Embeddings	So for example “He runs the company” could help the model disambiguate “He owns the company.” Moreover, since g(:c) relies on word embeddings rather than word identities, information is shared between words.
Overview	We present a model that takes word embeddings as input and learns to identify semantic frames.
Overview	A word embedding is a distributed representation of meaning where each word is represented as a vector in R”.
Overview	We use word embeddings to represent the syntactic context of a particular predicate instance as a vector.

word embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

10. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models.
Experiments	(2013), who published word embeddings across 100 languages, including all languages considered in this paper.
Experiments	While the classification experiments focused on establishing the semantic content of the sentence level representations, we also want to briefly investigate the induced word embeddings .
Introduction	Such word embeddings are naturally richer representations than those of symbolic or discrete models, and have been shown to be able to capture both syntactic and semantic information.
Overview	We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings .
Related Work	In their simplest form, distributional information from large corpora can be used to learn embeddings, where the words appearing within a certain window of the target word are used to compute that word’s embedding .
Related Work	(2011) further popularised using neural network architectures for learning word embeddings from large amounts of largely unlabelled data by showing the embeddings can then be used to improve standard supervised tasks.

word embeddings is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

11. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Another way to construct B is to use neural word embeddings (Collobert and Weston, 2008).
Experiments	In this case, we can view the product Bv as a composition of the word embeddings , using the simple additive composition model proposed by Mitchell
Experiments	We used the word embeddings from Collobert and Weston (2008) with dimension {25, 50, 100}.

word embeddings is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

12. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This result illustrates that the ngram-level knowledge captures more complex interactions of the web text, which cannot be recovered by using only word embeddings .
Experiments	(2012), who found that using both the word embeddings and the hidden units of a trigram WRRBM as additional features for a CRF chunker yields larger improvements than using word embeddings only.
Related Work	(2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling.
Related Work	(2013) induce bilingual word embeddings for word alignment.
Related Work	In particular, we both use a nonlinear layer to model complex relations underling word embeddings .

word embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2).
Recursive Neural Networks	The word-level vectors 5% and :35 come from a d x V dimensional word embedding matrix We, where V is the size of the vocabulary.
Recursive Neural Networks	Random The most straightforward choice is to initialize the word embedding matrix We and composition matrices WL and WR randomly such that without any training, representations for words and phrases are arbitrarily projected into the vector space.
Recursive Neural Networks	word2vec The other alternative is to initialize the word embedding matrix We with values that reflect the meanings of the associated word types.

word embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
Abstract	L€t V I: {101, ..., mg, 21, ..., 2H}, With 20,- 1‘61)-resenting the word embeddings , and 21- representing the latent states of the bracketings.
Abstract	Word embeddings As mentioned earlier, each 212,- can be an arbitrary feature vector.

word embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Building Sentiment Lexicons for All Major Languages

Chen, Yanqing and Skiena, Steven

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	proaches draft off of distributed word embedding which offer concise features reflecting the semantics of the underlying vocabulary.
Related Work	(2010) create powerful word embedding by training on real and corrupted phrases, optimizing for the replaceability of words.
Related Work	(2012) demonstrates a powerful approach to English sentiment using word embedding , which can easily be extended to other languages by training on appropriate text corpora.

word embeddings is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Convolutional Neural Networks with Dynamic k-Max Pooling	Word embeddings have size d = 4.
Experiments	The set of parameters comprises the word embeddings , the filter weights and the weights from the fully connected layers.
Experiments	The randomly initialised word embeddings are increased in length to a dimension of d = 60.

word embeddings is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: