SciSurf: Index of "embeddings" in Proc. ACL 2014

Index of papers in Proc. ACL 2014 that mention

embeddings

Seen in text as:

embeddings (303)
Embeddings (3)

Seen in 275 sentences in 19 papers.

1. Bilingually-constrained Phrase Embeddings for Machine Translation

Zhang, Jiajun and Liu, Shujie and Li, Mu and Zhou, Ming and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.
Introduction	The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules.
Introduction	(2011) make the phrase embeddings capture the sentiment information.
Introduction	(2013a) enable the phrase embeddings to mainly capture the syntactic knowledge.

embeddings is mentioned in 34 sentences in this paper.

Topics mentioned in this paper:

2. Learning Semantic Hierarchies via Word Embeddings

Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings , which can be used to measure the semantic relationship between words.
Background	In this paper, we aim to identify hypemym—hyponym relations using word embeddings , which have been shown to preserve good properties for capturing semantic relationship between words.
Introduction	This paper proposes a novel approach for semantic hierarchy construction based on word embeddings .
Introduction	Word embeddings , also known as distributed word representations, typically represent words with dense, low-dimensional and real-valued vectors.
Introduction	Word embeddings have been empirically shown to preserve linguistic regularities, such as the semantic relationship between words (Mikolov et al., 2013b).
Method	Various models for learning word embeddings have been proposed, including neural net language models (Bengio et al., 2003; Mnih and Hinton, 2008; Mikolov et al., 2013b) and spectral models (Dhillon et al., 2011).
Method	(2013a) propose two log-linear models, namely the Skip-gram and CBOW model, to efficiently induce word embeddings .
Method	Therefore, we employ the Skip- gram model for estimating word embeddings in this study.

embeddings is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

3. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings .
Abstract	Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.
Experiments	We also investigate the learned embeddings from a qualitative perspective in §5.4.
Experiments	All our embeddings have dimensionality d=128, with the margin set to m=d.6 Further, we use L2 regularization with A21 and step-size in {001,005}.
Experiments	This task involves learning language independent embeddings which are then used for document classification across the English-German language pair.
Introduction	Such word embeddings are naturally richer representations than those of symbolic or discrete models, and have been shown to be able to capture both syntactic and semantic information.
Introduction	this work, we extend this hypothesis to multilingual data and joint-space embeddings .
Overview	We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings .

embeddings is mentioned in 28 sentences in this paper.

Topics mentioned in this paper:

4. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	By exploiting tag embeddings and tensor-based transformation, MMTNN has the ability to model complicated interactions between tags and context characters.
Conventional Neural Network	The character embeddings are then stacked into a embedding matrix M 6 1Rde \|.
Conventional Neural Network	We will analyze in more detail about the effect of character embeddings in Section 4.
Conventional Neural Network	The character embeddings extracted by the Lookup Table layer are then concatenated into a single vector a 6 Km, where H1 2 w - d is the size of Layer 1.
Introduction	between tags and context characters by exploiting tag embeddings and tensor-based transformation.
Max-Margin Tensor Neural Network	Similar to character embeddings, given a fixed-sized tag set T, the tag embeddings for tags are stored in a tag embedding matrix L E Rdx m, where d is the dimensionality
Max-Margin Tensor Neural Network	of the vector space (same with character embeddings ).
Max-Margin Tensor Neural Network	The tag embeddings start from a random initialization and can be automatically trained by back-propagation.

embeddings is mentioned in 33 sentences in this paper.

Topics mentioned in this paper:

5. How much do word embeddings encode about syntax?

Andreas, Jacob and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Do continuous word embeddings encode any useful information for constituency parsing?
Abstract	We isolate three ways in which word embeddings might augment a state-of-the-art statistical parser: by connecting out-of-vocabulary words to known ones, by encouraging common behavior among related in-vocabulary words, and by directly providing features for the lexicon.
Abstract	Despite small gains on extremely small supervised training sets, we find that extra information from embeddings appears to make little or no difference to a parser with adequate training data.
Introduction	This paper investigates a variety of ways in which word embeddings might augment a constituency parser with a discrete state space.
Introduction	While word embeddings can be constructed directly from surface distributional statistics, as in LSA, more sophisticated tools for unsupervised extraction of word representations have recently gained popularity (Collobert et al., 2011; Mikolov et al., 2013a).
Introduction	(Turian et al., 2010) have been shown to benefit from the inclusion of word embeddings as features.

embeddings is mentioned in 38 sentences in this paper.

Topics mentioned in this paper:

6. Employing Word Representations and Regularization for Domain Adaptation of Relation Extraction

Nguyen, Thien Huu and Grishman, Ralph

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems.
Abstract	We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information.
Introduction	valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively.
Introduction	ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains.
Introduction	More importantly, we show empirically that word embeddings and word clusters capture different information and their combination would further improve the adaptability of relation extractors.
Related Work	Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent.
Related Work	(2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings .
Related Work	(2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE.

embeddings is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

7. Recurrent Neural Networks for Word Alignment Model

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Specifically, our training encourages word embeddings to be consistent across alignment directions by introducing a penalty term that expresses the difference between embedding of words into an objective function.
RNN-based Alignment Model	In the lookup layer, each of these words is converted to its word embedding, and then the concatenation of the two embeddings (any) is fed to the hidden layer in the same manner as the FFNN-based model.
Related Work	Word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words (Bengio et al., 2003).
Training	The constraint concretely enforces agreement in word embeddings of both directions.
Training	The proposed method trains two directional models concurrently based on the following objective by incorporating a penalty term that expresses the difference between word embeddings:
Training	where QFE (or 6gp) denotes the weights of layers in a source-to-target (or target-to-source) alignment model, 6,; denotes weights of a lookup layer, i.e., word embeddings , and 04 is a parameter that controls the strength of the agreement constraint.

embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

8. Spectral Unsupervised Parsing with Additive Tree Metrics

Parikh, Ankur P. and Cohen, Shay B. and Xing, Eric P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree.
Abstract	Then, latent states are generated for each bracket, and finally, the latent states at the yield of the bracketing parse tree generate the words of the sentence (in the form of embeddings ).
Abstract	L€t V I: {101, ..., mg, 21, ..., 2H}, With 20,- 1‘61)-resenting the word embeddings , and 21- representing the latent states of the bracketings.

embeddings is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

9. Semantic Frame Identification with Distributed Word Representations

Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings .
Experiments	The second baseline, tries to decouple the WSABIE training from the embedding input, and trains a log linear model using the embeddings .
Experiments	Hyperparameters For our frame identification model with embeddings , we search for the WSABIE hyperparameters using the development data.
Frame Identification with Embeddings	First, we extract the words in the syntactic context of runs; next, we concatenate their word embeddings as described in §2.2 to create an initial vector space representation.
Frame Identification with Embeddings	Formally, let cc represent the actual sentence with a marked predicate, along with the associated syntactic parse tree; let our initial representation of the predicate context be Suppose that the word embeddings we start with are of dimension n. Then 9 is a function from a parsed sentence cc to Rm“, where k is the number of possible syntactic context types.
Frame Identification with Embeddings	So for example “He runs the company” could help the model disambiguate “He owns the company.” Moreover, since g(:c) relies on word embeddings rather than word identities, information is shared between words.
Overview	We present a model that takes word embeddings as input and learns to identify semantic frames.
Overview	We use word embeddings to represent the syntactic context of a particular predicate instance as a vector.

embeddings is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

10. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	t. The embeddings of C&W (Collobert et al., 2011), word2vec4, WVSA (Maas et al., 2011) and our models are trained with the same dataset and same parameter setting.
Related Work	ReEmb(C&W) and ReEmb(w2v) stand for the use of embeddings learned from 10 million distant-supervised tweets with C&W and word2vec, respectively.
Related Work	Table 3: Macro-F1 on positive/negative classification of tweets with different word embeddings .

embeddings is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

11. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	As we mentioned in Section 5, constructing phrase pair embeddings from word embeddings may be not suitable.
Experiments and Results	We first train the source and target word embeddings separately using large monolingual data, following (Collobert et al., 2011).
Experiments and Results	Ewms and Emma/9) are the monolingual word embeddings, and Ewbs(si) and Emma/9) are the bilingual word embeddings .
Model Training	Back propagation is performed along the tree structure, and the phrase pair embeddings of the leaf nodess are updated.
Phrase Pair Embedding	A simple approach to construct phrase pair embedding is to use the average of the embeddings of the words in the phrase pair.
Phrase Pair Embedding	We use recurrent neural network to generate two smoothed translation confidence scores based on source and target word embeddings .
Related Work	Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance.
Related Work	RNNLM (Mikolov et al., 2010) is firstly used to generate the source and target word embeddings , which are fed into a one-hidden-layer neural network to get a translation confidence score.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

12. Vector space semantics with frequency-driven motifs

Srivastava, Shashank and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Hellinger PCA embeddings learnt using the framework show competitive results on empirical tasks.
Introduction	While word embeddings and language models from such methods have been useful for tasks such as relation classification, polarity detection, event coreference and parsing; much of existing literature on composition is based on abstract linguistic theory and conjecture, and there is little evidence to support that learnt representations for larger linguistic units correspond to their semantic meanings.
Introduction	While this framework is attractive in the lack of assumptions on representation that it makes, the use of distributional embeddings for individual tokens means
Introduction	Recent work (Lebret and Lebret, 2013) has shown that the Hellinger distance is an especially effective measure in learning distributional embeddings , with Hellinger PCA being much more computationally inexpensive than neural language modeling approaches, while performing much better than standard PCA, and competitive with the state-of-the-art in downstream evaluations.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

13. Low-Rank Tensors for Scoring Dependency Structures

Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features.
Introduction	First, features may lack clear linguistic interpretation as in distributional features or continuous vector embeddings of words.
Introduction	0 Our low dimensional embeddings are tailored to the syntactic context of words (head, modifier).
Problem Formulation	By learning parameters U, V, and W that function well in dependency parsing, we also learn context-dependent embeddings for words and arcs.
Related Work	Word-level vector space embeddings have so far had limited impact on parsing performance.
Related Work	This framework enables us to learn new syntactically guided embeddings while also leveraging separately estimated word vectors as starting features, leading to improved parsing performance.
Results	For this purpose, we train a model with only a tensor component (such that it has to learn an accurate tensor) on the English dataset and obtain low dimensional embeddings U gbw and ngw for each word.
Results	The upper part shows our learned embeddings group words with similar syntactic behavior.

embeddings is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

14. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This result illustrates that the ngram-level knowledge captures more complex interactions of the web text, which cannot be recovered by using only word embeddings .
Experiments	(2012), who found that using both the word embeddings and the hidden units of a trigram WRRBM as additional features for a CRF chunker yields larger improvements than using word embeddings only.
Related Work	(2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling.
Related Work	(2013) induce bilingual word embeddings for word alignment.
Related Work	(2013) investigate Chinese character embeddings for joint word segmentation and POS tagging.

embeddings is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	These generally consist of a projection layer that maps words, sub-word units or n-grams to high dimensional embeddings ; the latter are then combined component-wise with an operation such as summation.
Convolutional Neural Networks with Dynamic k-Max Pooling	Word embeddings have size d = 4.
Convolutional Neural Networks with Dynamic k-Max Pooling	The values in the embeddings wi are parameters that are op-timised during training.
Experiments	The set of parameters comprises the word embeddings , the filter weights and the weights from the fully connected layers.
Experiments	As the dataset is rather small, we use lower-dimensional word vectors with d = 32 that are initialised with embeddings trained in an unsupervised way to predict contexts of occurrence (Turian et al., 2010).
Experiments	The randomly initialised word embeddings are increased in length to a dimension of d = 60.

embeddings is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Learning Grounded Meaning Representations with Autoencoders

Silberer, Carina and Lapata, Mirella

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce a new model which uses stacked autoencoders to learn higher-level embeddings from textual and visual input.
Experimental Setup	Finally, we also compare to the word embeddings obtained using Mikolov et al.’s (2011) recurrent neural network based language model.
Experimental Setup	These were pre-trained on Broadcast news data (400M words) using the word2vec tool.8 We report results with the 640-dimensional embeddings as they performed best.
Introduction	We evaluate the embeddings it produces on two tasks, namely word similarity and categorization.
Results	This indicates that higher level embeddings may be beneficial to NLP tasks in general, not only to those requiring multimodal information.

embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

17. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Another way to construct B is to use neural word embeddings (Collobert and Weston, 2008).
Experiments	In this case, we can view the product Bv as a composition of the word embeddings , using the simple additive composition model proposed by Mitchell
Experiments	We used the word embeddings from Collobert and Weston (2008) with dimension {25, 50, 100}.

embeddings is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

18. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2).
Experiments	0 RNN2-(W2V) is initialized using word2vec embeddings and also includes annotated phrase labels in its training.
Recursive Neural Networks	The word2vec embeddings have linear relationships (e.g., the closest vectors to the average of
Where Compositionality Helps Detect Ideological Bias	Initializing the RNN We matrix with word2vec embeddings improves accuracy over randomly initialization by 1%.

embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Distributed Representations of Geographically Situated Language

Bamman, David and Dyer, Chris and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	The first is the representation matrix W 6 RM”, which encodes the real-valued embeddings for each word in the vocabulary.
Model	Backpropagation using (input :5, output 3/) word tuples learns the values of W (the embeddings ) and X (the output parameter matrix) that maximize the likelihood of y (i.e., the context words) conditioned on cc (i.e., the 31’s).
Model	Given an input word w and set of active variable values A (e.g., A 2 {state 2 MA}), we calculate the hidden layer h as the sum of these independent embeddings : h = wTWmam + 26,64 wTWa.

embeddings is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: