Distributed representations | Distributed word representations are called word embeddings . |
Distributed representations | Word embeddings are typically induced using neural language models, which use neural networks as the underlying predictive model (Bengio, 2008). |
Introduction | word embeddings using unsupervised approaches. |
Supervised evaluation tasks | The word embeddings also required a scaling hyperparameter, as described in Section 7.2. |
Unlabled Data | For rare words, which are typically updated only 143 times per epochz, and given that our embedding learning rate was typically 1e-6 or 1e-7, this means that rare word embeddings will be concentrated around zero, instead of spread out randomly. |
Unlabled Data | 7.2 Scaling of Word Embeddings |
Unlabled Data | The word embeddings , however, are real numbers that are not necessarily in a bounded range. |