Cross-Language Text Classification | For instance, cross-language latent semantic indexing (Dumais et al., 1997) and cross-language explicit semantic analysis (Potthast et al., 2008) estimate 6 using a parallel corpus . |
Introduction | The approach uses unlabeled documents from both languages along with a small number (100 - 500) of translated words, instead of employing a parallel corpus or an extensive bilingual dictionary. |
Related Work | (1997) is considered as seminal work in CLIR: they propose a method which induces semantic correspondences between two languages by performing latent semantic analysis, LSA, on a parallel corpus . |
Related Work | The major limitation of these approaches is their computational complexity and, in particular, the dependence on a parallel corpus , which is hard to obtain—especially for less resource-rich languages. |
Related Work | Gliozzo and Strapparava (2005) circumvent the dependence on a parallel corpus by using so-called multilingual domain models, which can be acquired from comparable corpora in an unsupervised manner. |
Abstract | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus . |
Experiments and Results | A 5-gram language model is trained on English side of parallel corpus . |
Experiments and Results | SSP puts overly strong alignment constraints on parallel corpus , which impacts performance dramatically. |
Introduction | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step. |