Abstract | We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target. |
Experiments | These extra features assess translation quality past the synchronous grammar derivation and learning general reordering or word emission preferences for the language pair . |
Experiments | We evaluate our method on four different language pairs with English as the source language and French, German, Dutch and Chinese as target. |
Experiments | The data for the first three language pairs are derived from parliament proceedings sourced from the Europarl corpus (Koehn, 2005), with WMT—07 development and test data for French and German. |
Introduction | utilised an ITG-flavour which focused on hierarchical phrase-pairs to capture context-driven translation and reordering patterns with ‘gaps’, offering competitive performance particularly for language pairs with extensive reordering. |
Introduction | By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline. |
Experiments | Their systems behave differently on English/Russian than on other language pairs . |
Experiments | We create gold standards for both language pairs by randomly selecting a few thousand word pairs from the lists of word pairs extracted from the two corpora. |
Experiments | 4This solution is appropriate for all of the language pairs used in our experiments, but should be revisited if there is inflection realized as prefixes, etc. |
Extraction of Transliteration Pairs | In this section, we present an iterative method for the extraction of transliteration pairs from parallel corpora which is fully unsuperVised and language pair independent. |
Extraction of Transliteration Pairs | We ignore non-l-to-l alignments because they are less likely to be transliterations for most language pairs . |
Introduction | Such resources are also not applicable to other language pairs . |
Introduction | We compare our unsupervised transliteration mining method with the semi-supervised systems presented at the NEWS 2010 shared task on transliteration mining (Kumaran et al., 2010) using four language pairs . |
Introduction | We also do experiments on parallel corpora for two language pairs . |
Abstract | These results persist when using automatically learned word tags, suggesting broad applicability of our technique across diverse language pairs for which syntactic resources are not available. |
Conclusion and discussion | Using automatically obtained word clusters instead of POS tags yields essentially the same results, thus making our methods applicable to all languages pairs with parallel corpora, whether syntactic resources are available for them or not. |
Experiments | Even though a key advantage of our method is its applicability to resource-poor languages, we used a language pair for which lin- |
Experiments | Accordingly, we use Chiang’s hierarchical phrase based translation model (Chiang, 2007) as a base line, and the syntax-augmented MT model (Zollmann and Venugopal, 2006) as a ‘target line’, a model that would not be applicable for language pairs without linguistic resources. |
Introduction | The Probabilistic Synchronous Context Free Grammar (PSCFG) formalism suggests an intuitive approach to model the long-distance and lexically sensitive reordering phenomena that often occur across language pairs considered for statistical machine translation. |
Introduction | Of course, for many language pairs and domains, parallel data is not available. |
Introduction | As successful work develops along this line, we expect more domains and language pairs to be conquered by SMT. |
Machine Translation as a Decipherment Task | Data: We work with the Spanish/English language pair and use the following corpora in our MT experiments: |
Machine Translation as a Decipherment Task | 0 OPUS movie subtitle corpus: This is a large open source collection of parallel corpora available for multiple language pairs (Tiedemann, 2009). |
Abstract | Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost. |
Introduction | Such a bilingual bootstrapping strategy when tested on two domains, viz, Tourism and Health using Hindi (L1) and Marathi (L2) as the language pair , consistently does better than a baseline strategy which uses only seed data for training without performing any bootstrapping. |
Synset Aligned Multilingual Dictionary | The average number of such links per synset per language pair is approximately 3. |