Abstract | Recently, with an increase in computing resources, it became possible to learn rich word embeddings from massive amounts of unlabeled data. |
Abstract | However, some methods take days or weeks to learn good embeddings , and some are notoriously difficult to train. |
Introduction | Moreover, we may already have on our hands embeddings for X and Y obtained from yet another (possibly unsupervised) task (C), in which X and Y are, for example, orthogonal. |
Introduction | If the embeddings for task C happen to be learned from a much larger dataset, it would make sense to reuse task C embeddings , but adapt them for task A and/or task B. |
Introduction | We will refer to task C and its embeddings as the source task and the source embeddings, and task A/B, and its embeddings as the target task and the target embeddings . |
DNN for word alignment | Words are converted to embeddings using the lookup table LT, and the catenation of embeddings are fed to a classic neural network with two hidden-layers, and the output of the network is the our lexical translation score: |
DNN structures for NLP | Word embeddings often implicitly encode syntactic or semantic knowledge of the words. |
DNN structures for NLP | Assuming a finite sized vocabulary V, word embeddings form a (L x |V|)-dimension embedding matrix WV, where L is a predetermined embedding length; mapping words to embeddings is done by simply looking up their respective columns in the embedding matrix WV. |
DNN structures for NLP | After words have been transformed to their embeddings , they can be fed into subsequent classical network layers to model highly nonlinear relations: |
Introduction | Based on the above analysis, in this paper, both the words in the source and target sides are firstly mapped to a vector via a discriminatively trained word embeddings , and word pairs are scored by a multilayer neural network which takes rich contexts (surrounding words on both source and target sides) into consideration; and a HMM-like distortion model is applied on top of the neural network to characterize structural aspect of bilingual sentences. |
Related Work | (Titov et al., 2012) learns a context-free cross-lingual word embeddings to facilitate cross-lingual information retrieval. |
Training | Tunable parameters in neural network alignment model include: word embeddings in lookup table LT, parameters Wl, bl for linear transformations in the hidden layers of the neural network, and distortion parameters 3d of jump distance. |
Training | Most parameters reside in the word embeddings . |
Training | To get a good initial value, the usual approach is to pre-train the embeddings on a large monolingual corpus. |
Abstract | We use this model to learn high dimensional embeddings for sentences and evaluate them in a range of tasks, demonstrating that the incorporation of syntax allows a concise model to learn representations that are both effective and general. |
Experiments | We conclude with some qualitative analysis to get a better idea of whether the combination of CCG and RAE can learn semantically expressive embeddings . |
Experiments | use word-vectors of size 50, initialized using the embeddings provided by Turian et al. |
Experiments | Experiment 1: Semi-Supervised Training In the first experiment, we use the semi-supervised training strategy described previously and initialize our models with the embeddings provided by Turian et al. |