Abstract | If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. |
Abstract | We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. |
Abstract | We find further improvements by combining different word representations . |
Distributional representations | Distributional word representations are based upon a cooccurrence matrix F of size WXC, where W is the vocabulary size, each row F w is the initial representation of word w, and each column F c is some context. |
Distributional representations | Hyperspace Analogue to Language (HAL) is another early distributional approach (Lund et al., 1995; Lund & Burgess, 1996) to inducing word representations . |
Introduction | A word representation is a mathematical object associated with each word, often a vector. |
Introduction | These limitations of one-hot word representations have prompted researchers to investigate unsupervised methods for inducing word representations over large unlabeled corpora. |
Introduction | One common approach to inducing unsupervised word representation is to use clustering, perhaps hierarchical. |