Abstract | We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora . |
Abstract | We also apply our method to English/Hindi and English/Arabic parallel corpora and compare the results with manually built gold standards which mark transliterated word pairs. |
Experiments | We evaluate our transliteration mining algorithm on three tasks: transliteration mining from Wikipedia InterLanguage Links, transliteration mining from parallel corpora , and word alignment using a word aligner with a transliteration component. |
Experiments | In the evaluation on parallel corpora , we compare our mining results with a manually built gold standard in which each word pair is either marked as a transliteration or as a non-transliteration. |
Experiments | 4.2 Experiments Using Parallel Corpora |
Extraction of Transliteration Pairs | In this section, we present an iterative method for the extraction of transliteration pairs from parallel corpora which is fully unsuperVised and language pair independent. |
Introduction | Transliteration mining on the WIL data sets is easier due to a higher percentage of transliterations than in parallel corpora . |
Introduction | We also do experiments on parallel corpora for two language pairs. |
Introduction | The transliteration module is trained on the transliteration pairs which our mining method extracts from the parallel corpora . |
Conclusion and discussion | Using automatically obtained word clusters instead of POS tags yields essentially the same results, thus making our methods applicable to all languages pairs with parallel corpora , whether syntactic resources are available for them or not. |
Introduction | Several techniques have been recently proposed to automatically identify and estimate parameters for PSCFGS (or related synchronous grammars) from parallel corpora (Galley et al., 2004; Chiang, 2005; Zollmann and Venugopal, 2006; Liu et al., 2006; Marcu et al., 2006). |
PSCFG-based translation | In this work we experiment with PSCFGs that have been automatically learned from word-aligned parallel corpora . |