Distributed representations | Word embeddings are typically induced using neural language models, which use neural networks as the underlying predictive model (Bengio, 2008). |
Distributed representations | We predict a score s(x) for x by passing e(x) through a single hidden layer neural network . |
Distributed representations | We minimize this loss stochastically over the n-grams in the corpus, doing gradient descent simultaneously over the neural network parameters and the embedding lookup table. |