Abstract | We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. |
Experiments | We use 10-word windows of text as the local context, 100 hidden units, and no weight regularization for both neural networks . |
Global Context-Aware Neural Language Model | In this section, we describe the training objective of our model, followed by a description of the neural network architecture, ending with a brief description of our model’s training method. |
Global Context-Aware Neural Language Model | We compute scores 9(8, d) and g(sw, d) where 8w is s with the last word replaced by word w, and g(-, is the scoring function that represents the neural networks used. |
Global Context-Aware Neural Language Model | 2.2 Neural Network Architecture |
Discussion | Secondly, for the Holmes data, we can state that LSA total similarity beats the recurrent neural network , which in turn is better than the baseline n—gram model. |
Discussion | It is an interesting research question if this could be done implicitly with a machine learning technique, for example recurrent or recursive neural networks . |
Experimental Results 5.1 Data Resources | In this case, the best combination is to blend LSA, the Good—Turing language model, and the recurrent neural network . |
Introduction | Also in the language modeling vein, but with potentially global context, we evaluate the use of a recurrent neural network language model. |