Index of papers in Proc. ACL 2009 that mention
  • parallel corpus
Bernhard, Delphine and Gurevych, Iryna
Parallel Datasets
Question-Answer Pairs (WAQA) In this setting, question-answer pairs are considered as a parallel corpus .
Parallel Datasets
(2008) has shown that the best results are obtained by pooling the question-answer pairs {(q, a)1, ..., (q, a)n} and the answer-question pairs {(a,q)1,..., (a,q)n} for training, so that we obtain the following parallel corpus : {(q, a)1, ..., (q, a)n}U{(a, (1)1, ..., (a, Overall, this corpus contains l,227,362 parallel pairs and will be referred to as WAQA (WikiAnswers Question-Answers) in the rest of the paper.
Parallel Datasets
Question Reformulations (WAQ) In this setting, question and question reformulation pairs are considered as a parallel corpus , e.g.
Related Work
Murdock and Croft (2005) created a first parallel corpus of synonym pairs extracted from WordNet, and an additional parallel corpus of English words translating to the same Arabic term in a parallel English-Arabic corpus.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
Consider a multilingual parallel corpus , such as EuroParl, which contains parallel sentences for several languages.
Experiments
We preprocessed the EuroParl corpus (http://wwwstatmt.org/europarl) (Koehn, 2005) and built a multilingual parallel corpus with 653,513 sentences, excluding the Q4/2000 portion of the data (2000-10 to 2000-12) which is reserved as the test set.
Experiments
The test set consists of 2,000 multi-language sentences and comes from the multilingual parallel corpus built from Q4/2000 portion of the data.
Introduction
The main source of training data for statistical machine translation (SMT) models is a parallel corpus .
Introduction
In many cases, the same information is available in multiple languages simultaneously as a multilingual parallel corpus , e. g., European Parliament (EuroParl) and UN.
Introduction
In this paper, we consider how to use active learning (AL) in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Abstract
A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
Introduction
In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
Introduction
However, dependency parsing focuses on the relations of word pairs, this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case.
The Related Work
The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
Treebank Translation and Dependency Transformation
Since we use a lexicon rather than a parallel corpus to estimate the translation probabilities, we simply assign uniform probabilities to all translation options.
parallel corpus is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Experiments
In contrast, because the alignment entropy doesn’t depend on the gold standard, one can easily report the alignment performance on any unaligned parallel corpus .
Introduction
Since the models are trained on an aligned parallel corpus , the resulting statistical models can only be as good as the alignment of the corpus.
Transliteration alignment techniques
The alignment can be performed via the Expectation-Maximization (EM) by starting with a random initial alignment and calculating the afi‘inity matrix count(ei, cj) over the whole parallel corpus , where element (2', j) is the number of times character 6, was aligned to 03-.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: