Abstract | In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. |
Introduction | We also integrate word embedding into the model by representing each word as a feature vector (Collobert and Weston, 2008). |
Introduction | For the local feature vector h’ in Eq (5), we employ word embedding features as described in the following subsection. |
Introduction | 3.3 Word Embedding features for AdNN |
Abstract | We describe in detail how we adapt and extend the CD-DNN-HMM (Dahl et al., 2012) method introduced in speech recognition to the HMM-based word alignment model, in which bilingual word embedding is discrimina-tively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. |
DNN structures for NLP | To apply DNN to NLP task, the first step is to transform a discrete word into its word embedding , a low dimensional, dense, real-valued vector (Bengio et al., 2006). |
DNN structures for NLP | Word embeddings often implicitly encode syntactic or semantic knowledge of the words. |
DNN structures for NLP | Assuming a finite sized vocabulary V, word embeddings form a (L x |V|)-dimension embedding matrix WV, where L is a predetermined embedding length; mapping words to embeddings is done by simply looking up their respective columns in the embedding matrix WV. |
Introduction | Most works convert atomic lexical entries into a dense, low dimensional, real-valued representation, called word embedding ; Each dimension represents a latent aspect of a word, capturing its semantic and syntactic properties (Bengio et al., 2006). |
Introduction | Word embedding is usually first learned from huge amount of monolingual texts, and then fine-tuned with task-specific objectives. |
Introduction | As we mentioned in the last paragraph, word embedding (trained with huge monolingual texts) has the ability to map a word into a vector space, in which, similar words are near each other. |
Related Work | Most methods using DNN in NLP start with a word embedding phase, which maps words into a fixed length, real valued vectors. |
Related Work | (Titov et al., 2012) learns a context-free cross-lingual word embeddings to facilitate cross-lingual information retrieval. |
Training | Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance. |
Experiments | Instead of initialising the model with external word embeddings , we first train it on a large amount of data with the aim of overcoming the sparsity issues encountered in the previous experiment. |
Experiments | In this phase only the reconstruction signal is used to learn word embeddings and transformation matrices. |
Experiments | By learning word embeddings and composition matrices on more data, the model is likely to gen-eralise better. |