Index of papers in Proc. ACL 2008 that mention
  • parallel corpora
Li, Zhifei and Yarowsky, David
Abstract
Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora , in which case current machine translation systems simply treat them as unknown words and leave them untranslated.
Introduction
While the research in statistical machine trans-ation (SMT) has made significant progress, most SMT systems (Koehn et al., 2003; Chiang, 2007; 3alley et al., 2006) rely on parallel corpora to extract ,ranslation entries.
Introduction
In particular, many Chinese abbrevi-1ti0ns may not appear in available parallel corpora , n which case current SMT systems treat them as mknown words and leave them untranslated.
Introduction
To be able to translate a Chinese abbreviation that s unseen in available parallel corpora , one may an-lotate more parallel data.
Unsupervised Translation Induction for Chinese Abbreviations
In this section, we describe an unsupervised method to induce translation entries for Chinese abbreviations, even when these abbreviations never appear in the Chinese side of the parallel corpora .
Unsupervised Translation Induction for Chinese Abbreviations
Regarding the data resource used, Step-l, -2, and -3 rely on the English monolingual corpora, parallel corpora , and the Chinese monolingual corpora, respectively.
Unsupervised Translation Induction for Chinese Abbreviations
This is because the entities are extracted from the English monolingual corpora, which has a much larger vocabulary than the English side of the parallel corpora .
parallel corpora is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Conclusion
Our experiments show that high-precision translations can be mined without any access to parallel corpora .
Experimental Setup
0 EN —AR—D: English: lst 50k sentences of 1994 proceedings of UN parallel corpora ;9 Arabic: 2nd 50k sentences.
Experimental Setup
For English-Arabic, we extract a lexicon from 100k parallel sentences of UN parallel corpora by running the HMM intersected alignment model (Liang et al., 2008), adding (3, t) to the lexicon if s was aligned to t at least three times and more than any other word.
Introduction
Current statistical machine translation systems use parallel corpora to induce translation correspondences, whether those correspondences be at the level of phrases (Koehn, 2004), treelets (Galley et al., 2006), or simply single words (Brown et al., 1994).
parallel corpora is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: