Abstract | Instead of using a parallel corpus, labeled and unlabeled instances in one language are translated into ones in the other language and all instances in both languages are then fed into a bilingual active learning engine as pseudo parallel corpora . |
Abstract | Instead of using a parallel corpus which should have entity/relation alignment information and is thus difficult to obtain, this paper employs an off-the-shelf machine translator to translate both labeled and unlabeled instances from one language into the other language, forming pseudo parallel corpora . |
Abstract | (2010) propose a cross-lingual annotation projection approach which uses parallel corpora to acquire a relation detector on the target language. |
Bilingual Lexicon Extraction | As McEnery and Xiao (2007, p. 21) observe, a specialized comparable corpus is built as balanced by analogy with a parallel corpus: “Therefore, in relation to parallel corpora , it is more likely for comparable corpora to be designed as general balanced corpora”. |
Introduction | The bilingual lexicon extraction task from bilingual corpora was initially addressed by using parallel corpora (i.e. |
Introduction | However, despite good results in the compilation of bilingual lexicons, parallel corpora are scarce resources, especially for technical domains and for language pairs not involving English. |
Introduction | ung (2004), who range bilingual corpora from parallel corpora to quasi-comparable corpora going through comparable corpora, there is a continuum from parallel to comparable corpora (i.e. |
Introduction | Statistical machine translation (SMT) systems are trained using bilingual sentence-aligned parallel corpora . |
Introduction | Because of this, collecting parallel corpora for minor languages has become an interesting research challenge. |
Related work | (2012) used MTurk to create parallel corpora for six Indian languages for less than $0.01 per word. |
Introduction | For example, EuroParl corpus (Koehn, 2002), one of the biggest parallel corpora in statistical machine translation, contains 22 languages (but not Turkish). |
Introduction | Although there exist some recent works to produce parallel corpora for Turkish-English pair, the produced corpus is only applicable for phrase-based training (Yeniterzi and Oflazer, 2010; El-Kahlout, 2009). |
Introduction | In recent years, many efforts have been made to annotate parallel corpora with syntactic structure to build parallel treebanks. |