Index of papers in Proc. ACL 2008 that mention
  • language pairs
Haghighi, Aria and Liang, Percy and Berg-Kirkpatrick, Taylor and Klein, Dan
Abstract
We show that high—precision lexicons can be learned in a variety of language pairs and from a range of corpus types.
Experimental Setup
all languages pairs except English-Arabic, we extract evaluation lexicons from the Wiktionary online dictionary.
Experiments
We also explored how system performance varies for language pairs other than English-Spanish.
Experiments
One concern is how our system performs on language pairs where orthographic features are less applicable.
Features
While orthographic features are clearly effective for historically related language pairs, they are more limited for other language pairs , where we need to appeal to other clues.
Features
(section 6.2), (c) a variety of language pairs (see section 6.3).
Introduction
Although parallel text is plentiful for some language pairs such as English-Chinese or English-Arabic, it is scarce or even nonexistent for most others, such as English-Hindi or French-Japanese.
Introduction
Moreover, parallel text could be scarce for a language pair even if monolingual data is readily available for both languages.
Introduction
This task, though clearly more difficult than the standard parallel text approach, can operate on language pairs and in domains where standard approaches cannot.
language pairs is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Abstract
We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs .
Abstract
We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions.
Conclusions
Table 3: BLEU scores for all language pairs using all available data.
Conclusions
We tested this hypothesis on six different language pairs from three different domains, and found that the new alignment scheme not only performs better than the baseline, but also improves over a more complicated, intractable model.
Introduction
language pairs used in recent MT competitions.
Phrase-based machine translation
Our next set of experiments look at our performance in both directions across our 6 corpora, when we have small to moderate amounts of training data: for the language pairs with more than 100,000 sentences, we use only the first 100,000 sentences.
Phrase-based machine translation
Table 2: BLEU scores for all language pairs using up to 100k sentences.
language pairs is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Experimental SetUp
To obtain our corpus of short parallel phrases, we preprocessed each language pair using the Giza++ alignment toolkit.6 Given word alignments for each language pair , we extract a list of phrase pairs that form independent sets in the bipartite alignment graph.
Introduction
When modeled in tandem, gains are observed for all language pairs , reducing relative error by as much as 24%.
Introduction
Furthermore, our experiments show that both related and unrelated language pairs benefit from multilingual learning.
Results
However, once character-to-character phonetic correspondences are added as an abstract morpheme prior (final two rows), we find the performance of related language pairs outstrips English, reducing relative error over MONOLINGUAL by 10% and 24% for the Hebrew/Arabic pair.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Conclusion and future work
In the future, we would like to include more sophistication in the design of a lexicon for a particular language pair based on error analysis, and extend our preprocessing to include other operations such as word segmentation.
Integration of inflection models with MT systems
However, for some language pairs , stemming one language can make word alignment worse, if it leads to more violations in the assumptions of current word alignment models, rather than making the source look more like the target.
Machine translation systems and data
For each language pair , we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
Machine translation systems and data
All MT systems for a given language pair used the same datasets.
language pairs is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: