Abstract | Do continuous word embeddings encode any useful information for constituency parsing? |
Abstract | We isolate three ways in which word embeddings might augment a state-of-the-art statistical parser: by connecting out-of-vocabulary words to known ones, by encouraging common behavior among related in-vocabulary words, and by directly providing features for the lexicon. |
Abstract | Our results support an overall hypothesis that word embeddings import syntactic information that is ultimately redundant with distinctions learned from tree-banks in other ways. |
Introduction | This paper investigates a variety of ways in which word embeddings might augment a constituency parser with a discrete state space. |
Introduction | While word embeddings can be constructed directly from surface distributional statistics, as in LSA, more sophisticated tools for unsupervised extraction of word representations have recently gained popularity (Collobert et al., 2011; Mikolov et al., 2013a). |
Introduction | (Turian et al., 2010) have been shown to benefit from the inclusion of word embeddings as features. |
Abstract | This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings , which can be used to measure the semantic relationship between words. |
Background | In this paper, we aim to identify hypemym—hyponym relations using word embeddings , which have been shown to preserve good properties for capturing semantic relationship between words. |
Introduction | This paper proposes a novel approach for semantic hierarchy construction based on word embeddings . |
Introduction | Word embeddings , also known as distributed word representations, typically represent words with dense, low-dimensional and real-valued vectors. |
Introduction | Word embeddings have been empirically shown to preserve linguistic regularities, such as the semantic relationship between words (Mikolov et al., 2013b). |
Method | Then we elaborate on our proposed method composed of three major steps, namely, word embedding training, projection learning, and hypernym—hyponym relation identification. |
Method | 3.2 Word Embedding Training |
Method | Various models for learning word embeddings have been proposed, including neural net language models (Bengio et al., 2003; Mnih and Hinton, 2008; Mikolov et al., 2013b) and spectral models (Dhillon et al., 2011). |
Introduction | Word embedding is a dense, low dimensional, real-valued vector. |
Introduction | Word embedding is usually learnt from large amount of monolingual corpus at first, and then fine tuned for special distinct tasks. |
Introduction | In their work, bilingual word embedding is trained to capture lexical translation information, and surrounding words are utilized to model context information. |
Our Model | Word embedding act is integrated with |
Our Model | The new history ht is used for the future prediction, and updated with new information from word embedding :ct recurrently. |
Our Model | Word embedding act is integrated as new input information in recurrent neural networks for each prediction, but in recursive neural networks, no additional input information is used except the two representation vectors of the child nodes. |
Related Work | In their work, initial word embedding is firstly trained with a huge monolingual corpus, then the word embedding is adapted and fine tuned bilingually in a context-depended DNN HMM framework. |
Related Work | Word embeddings capturing lexical translation information and surrounding words modeling context information are leveraged to improve the word alignment performance. |
Related Work | In their work, not only the target word embedding is used as the input of the network, but also the embedding of the source word, which is aligned to the current target word. |
Abstract | This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems. |
Abstract | We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information. |
Introduction | valued features of words (such as word embeddings (Mnih and Hinton, 2007; Collobert and Weston, 2008)) effectively. |
Introduction | ing word embeddings (Bengio et al., 2001; Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Turian et al., 2010) on feature-based methods to adapt RE systems to new domains. |
Introduction | We explore the embedding-based features in a principled way and demonstrate that word embedding itself is also an effective representation for domain adaptation of RE. |
Related Work | Although word embeddings have been successfully employed in many NLP tasks (Collobert and Weston, 2008; Turian et al., 2010; Maas and Ng, 2010), the application of word embeddings in RE is very recent. |
Related Work | (2010) propose an abstraction-augmented string kernel for bio-relation extraction via word embeddings . |
Related Work | (2012) and Khashabi (2013) use pre-trained word embeddings as input for Matrix-Vector Recursive Neural Networks (MV—RNN) to learn compositional structures for RE. |
Abstract | To overcome this limitation, we encourage agreement between the two directional models by introducing a penalty function that ensures word embedding consistency across two directional models during training. |
Introduction | Specifically, our training encourages word embeddings to be consistent across alignment directions by introducing a penalty term that expresses the difference between embedding of words into an objective function. |
RNN-based Alignment Model | In the lookup layer, each of these words is converted to its word embedding , and then the concatenation of the two embeddings (any) is fed to the hidden layer in the same manner as the FFNN-based model. |
Related Work | First, the lookup layer converts each input word into its word embedding by looking up its corresponding column in the embedding matrix (L), and then concatenates them. |
Related Work | Word embeddings are dense, low dimensional, and real-valued vectors that can capture syntactic and semantic properties of the words (Bengio et al., 2003). |
Training | The constraint concretely enforces agreement in word embeddings of both directions. |
Training | The proposed method trains two directional models concurrently based on the following objective by incorporating a penalty term that expresses the difference between word embeddings: |
Training | where QFE (or 6gp) denotes the weights of layers in a source-to-target (or target-to-source) alignment model, 6,; denotes weights of a lookup layer, i.e., word embeddings , and 04 is a parameter that controls the strength of the agreement constraint. |
Abstract | We present a method that learns word embedding for Twitter sentiment classification in this paper. |
Abstract | We address this issue by learning sentiment-specific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. |
Abstract | To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons. |
Introduction | Accordingly, it is a crucial step to learn the word representation (or word embedding ), which is a dense, low-dimensional and real-valued vector for a word. |
Introduction | Although existing word embedding leam-ing algorithms (Collobert et al., 2011; Mikolov et al., 2013) are intuitive choices, they are not effective enough if directly used for sentiment classification. |
Introduction | In this paper, we propose learning sentiment-specific word embedding (SSWE) for sentiment analysis. |
Bilingually-constrained Recursive Auto-encoders | After learning word embeddings with DNN (Bengio et al., 2003; Collobert and Weston, 2008; Mikolov et al., 2013), each word in the vocabulary V corresponds to a vector cc 6 R”, and all the vectors are stacked into an embedding matrix L E R“ M. |
Bilingually-constrained Recursive Auto-encoders | Since word embeddings for two languages are learned separately and locate in different vector space, we do not enforce the phrase embeddings in two languages to be in the same semantic vector space. |
Bilingually-constrained Recursive Auto-encoders | 6 L: word embedding matrix L for two languages (Section 3.1.1); |
Introduction | The models using word embeddings as the direct inputs to DNN cannot make full use of the whole syntactic and semantic information of the phrasal translation rules. |
Introduction | Kalchbrenner and Blunsom (2013) utilize a simple convolution model to generate phrase embeddings from word embeddings . |
Related Work | One method considers the phrases as bag-of-words and employs a convolution model to transform the word embeddings to phrase embeddings (Collobert et al., 2011; Kalchbrenner and Blunsom, 2013). |
Related Work | Instead, our bilingually-constrained recursive auto-encoders not only learn the composition mechanism of generating phrases from words, but also fine tune the word embeddings during the model training stage, so that we can induce the full information of the phrases and internal words. |
Experiments | The dimension of word embedding n = 100, the convergence threshold 5 = 10—7, and the number of expanded seeds T = 40. |
Experiments | In contrast, CONT exploits latent semantics of each word in context, and LEX takes advantage of word embedding , which is induced from global word co-occurrence statistic. |
The Proposed Method | To capture lexical semantic clue, each word is first converted into word embedding , which is a continuous vector with each dimension’s value corresponds to a semantic or grammatical interpretation (Turian et al., 2010). |
The Proposed Method | Learning large-scale word embeddings is very time-consuming (Collobert et al., 2011), we thus employ a faster method named Skip-gram model (Mikolov et al., 2013). |
The Proposed Method | 3.2.1 Learning Word Embedding for Semantic Representation |
Abstract | We present a novel technique for semantic frame identification using distributed representations of predicates and their syntactic context; this technique leverages automatic syntactic parses and a generic set of word embeddings . |
Experiments | as described in §3.l but conjoins them with the word identity rather than a word embedding . |
Frame Identification with Embeddings | First, we extract the words in the syntactic context of runs; next, we concatenate their word embeddings as described in §2.2 to create an initial vector space representation. |
Frame Identification with Embeddings | Formally, let cc represent the actual sentence with a marked predicate, along with the associated syntactic parse tree; let our initial representation of the predicate context be Suppose that the word embeddings we start with are of dimension n. Then 9 is a function from a parsed sentence cc to Rm“, where k is the number of possible syntactic context types. |
Frame Identification with Embeddings | So for example “He runs the company” could help the model disambiguate “He owns the company.” Moreover, since g(:c) relies on word embeddings rather than word identities, information is shared between words. |
Overview | We present a model that takes word embeddings as input and learns to identify semantic frames. |
Overview | A word embedding is a distributed representation of meaning where each word is represented as a vector in R”. |
Overview | We use word embeddings to represent the syntactic context of a particular predicate instance as a vector. |
Conclusion | To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models. |
Experiments | (2013), who published word embeddings across 100 languages, including all languages considered in this paper. |
Experiments | While the classification experiments focused on establishing the semantic content of the sentence level representations, we also want to briefly investigate the induced word embeddings . |
Introduction | Such word embeddings are naturally richer representations than those of symbolic or discrete models, and have been shown to be able to capture both syntactic and semantic information. |
Overview | We describe a multilingual objective function that uses a noise-contrastive update between semantic representations of different languages to learn these word embeddings . |
Related Work | In their simplest form, distributional information from large corpora can be used to learn embeddings, where the words appearing within a certain window of the target word are used to compute that word’s embedding . |
Related Work | (2011) further popularised using neural network architectures for learning word embeddings from large amounts of largely unlabelled data by showing the embeddings can then be used to improve standard supervised tasks. |
Experiments | Another way to construct B is to use neural word embeddings (Collobert and Weston, 2008). |
Experiments | In this case, we can view the product Bv as a composition of the word embeddings , using the simple additive composition model proposed by Mitchell |
Experiments | We used the word embeddings from Collobert and Weston (2008) with dimension {25, 50, 100}. |
Experiments | This result illustrates that the ngram-level knowledge captures more complex interactions of the web text, which cannot be recovered by using only word embeddings . |
Experiments | (2012), who found that using both the word embeddings and the hidden units of a trigram WRRBM as additional features for a CRF chunker yields larger improvements than using word embeddings only. |
Related Work | (2010) learn word embeddings to improve the performance of in-domain POS tagging, named entity recognition, chunking and semantic role labelling. |
Related Work | (2013) induce bilingual word embeddings for word alignment. |
Related Work | In particular, we both use a nonlinear layer to model complex relations underling word embeddings . |
Experiments | o LR-(W2V) is a logistic regression model trained on the average of the pretrained word embeddings for each sentence (Section 2.2). |
Recursive Neural Networks | The word-level vectors 5% and :35 come from a d x V dimensional word embedding matrix We, where V is the size of the vocabulary. |
Recursive Neural Networks | Random The most straightforward choice is to initialize the word embedding matrix We and composition matrices WL and WR randomly such that without any training, representations for words and phrases are arbitrarily projected into the vector space. |
Recursive Neural Networks | word2vec The other alternative is to initialize the word embedding matrix We with values that reflect the meanings of the associated word types. |
Abstract | The word embeddings are used during the leam-ing process, but the final decoder that the learning algorithm outputs maps a POS tag sequence a: to a parse tree. |
Abstract | L€t V I: {101, ..., mg, 21, ..., 2H}, With 20,- 1‘61)-resenting the word embeddings , and 21- representing the latent states of the bracketings. |
Abstract | Word embeddings As mentioned earlier, each 212,- can be an arbitrary feature vector. |
Related Work | proaches draft off of distributed word embedding which offer concise features reflecting the semantics of the underlying vocabulary. |
Related Work | (2010) create powerful word embedding by training on real and corrupted phrases, optimizing for the replaceability of words. |
Related Work | (2012) demonstrates a powerful approach to English sentiment using word embedding , which can easily be extended to other languages by training on appropriate text corpora. |
Convolutional Neural Networks with Dynamic k-Max Pooling | Word embeddings have size d = 4. |
Experiments | The set of parameters comprises the word embeddings , the filter weights and the weights from the fully connected layers. |
Experiments | The randomly initialised word embeddings are increased in length to a dimension of d = 60. |