Index of papers in Proc. ACL 2008 that mention

Seen in text as:

Seen in 13 sentences in 2 papers.

Li, Zhifei and Yarowsky, David

Abstract	Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora , in which case current machine translation systems simply treat them as unknown words and leave them untranslated.
Introduction	While the research in statistical machine trans-ation (SMT) has made significant progress, most SMT systems (Koehn et al., 2003; Chiang, 2007; 3alley et al., 2006) rely on parallel corpora to extract ,ranslation entries.
Introduction	In particular, many Chinese abbrevi-1ti0ns may not appear in available parallel corpora , n which case current SMT systems treat them as mknown words and leave them untranslated.
Introduction	To be able to translate a Chinese abbreviation that s unseen in available parallel corpora , one may an-lotate more parallel data.
Unsupervised Translation Induction for Chinese Abbreviations	In this section, we describe an unsupervised method to induce translation entries for Chinese abbreviations, even when these abbreviations never appear in the Chinese side of the parallel corpora .
Unsupervised Translation Induction for Chinese Abbreviations	Regarding the data resource used, Step-l, -2, and -3 rely on the English monolingual corpora, parallel corpora , and the Chinese monolingual corpora, respectively.
Unsupervised Translation Induction for Chinese Abbreviations	This is because the entities are extracted from the English monolingual corpora, which has a much larger vocabulary than the English side of the parallel corpora .

parallel corpora is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan

Conclusion	Our experiments show that high-precision translations can be mined without any access to parallel corpora .
Experimental Setup	0 EN —AR—D: English: lst 50k sentences of 1994 proceedings of UN parallel corpora ;9 Arabic: 2nd 50k sentences.
Experimental Setup	For English-Arabic, we extract a lexicon from 100k parallel sentences of UN parallel corpora by running the HMM intersected alignment model (Liang et al., 2008), adding (3, t) to the lexicon if s was aligned to t at least three times and more than any other word.
Introduction	Current statistical machine translation systems use parallel corpora to induce translation correspondences, whether those correspondences be at the level of phrases (Koehn, 2004), treelets (Galley et al., 2006), or simply single words (Brown et al., 1994).

parallel corpora is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: