Abstract | The resulting clusterings are then used in training partially class—based language models. |
Distributed Clustering | The clusterings generated in each iteration as well as the initial clustering are stored as the set of words in each cluster, the total number of occurrences of each cluster in the training corpus, and the list of words preceeding each cluster. |
Distributed Clustering | The quality of class-based models trained using the resulting clusterings did not differ noticeably from those trained using clusterings for which the full vocabulary was considered in each iteration. |
Experiments | We trained a number of predictive class-based language models on different Arabic and English corpora using clusterings trained on the complete data of the same corpus. |
Experiments | For the first experiment we trained predictive class-based 5-gram models using clusterings with 64, 128, 256 and 512 clusters1 on the eniarget data. |
Background 2.1 Dependency parsing | By using prefixes of various lengths, we can produce clusterings of different granularities (Miller et al., 2004). |
Feature design | (2004), we use prefixes of the Brown cluster hierarchy to produce clusterings of varying granularity. |
Feature design | One possible explanation is that the clusterings generated by the Brown algorithm can be noisy or only weakly relevant to syntax; thus, the clusters are best exploited when “anchored” to words or parts of speech. |