Sentence Level Dialect Identification for Machine Translation System Selection
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona

Article Structure

Abstract

In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input.

Introduction

A language can be described as a set of dialects, among which one "standard variety" has a special representative status.1 Despite being increasingly ubiquitous in informal written genres such as social media, most nonstandard dialects are resource-poor compared to their standard variety.

Related Work

Arabic Dialect Machine Translation.

Machine Translation Experiments

In this section, we present our MT experimental setup and the four baseline systems we built, and we evaluate their performance and the potential of their combination.

MT System Selection

The approach we take in this paper benefits from the techniques and conclusions of previous papers in that we build different MT systems similar to those discussed above but instead of trying to find which one is the best, we try to leverage the use of all of them by automatically deciding what sentences should go to which system.

Discussion and Error Analysis

DA versus MSA Performance.

Conclusion and Future Work

We presented a sentence-level classification approach for MT system selection for diglossic languages.

Topics

MT system

Appears in 41 sentences as: MT System (1) MT system (27) MT Systems (1) MT systems (15)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. Our best result improves over the best single MT system baseline by 1.0% BLEU and over a strong system selection baseline by 0.6% BLEU on a blind test set.
    Page 1, “Abstract”
  2. In this paper we study the use of sentence-level dialect identification together with various linguistic features in optimizing the selection of outputs of four different MT systems on input text that includes a mix of dialects.
    Page 1, “Introduction”
  3. Our best system selection approach improves over our best baseline single MT system by 1.0% absolute BLEU point on a blind test set.
    Page 1, “Introduction”
  4. Sawaf (2010) and Salloum and Habash (2013) used hybrid solutions that combine rule-based algorithms and resources such as leXicons and morphological analyzers with statistical models to map DA to MSA before using MSA-to-English MT systems .
    Page 1, “Related Work”
  5. In this paper we use four MT systems that translate from DA to English in different ways.
    Page 1, “Related Work”
  6. Our fourth MT system uses ELISSA, the DA-to-MSA MT tool by Salloum and Habash (2013), to produce an MSA pivot.
    Page 1, “Related Work”
  7. In this paper, we use AIDA, the system of Elfardy and Diab (2013), to provide a variety of dialect ID features to train classifiers that select, for a given sentence, the MT system that produces the best translation.
    Page 2, “Related Work”
  8. The most popular approach to MT system combination involves building confusion networks from the outputs of different MT systems and decoding them to generate new translations (Rosti et al., 2007; Karakos et al., 2008; He et al., 2008; Xu et al., 2011).
    Page 2, “Related Work”
  9. Other researchers explored the idea of re-ranking the n-best output of MT systems using different types of syntactic models (Och et al., 2004; Hasan et al., 2006; Ma and McKeown, 2013).
    Page 2, “Related Work”
  10. Most MT system combination work uses MT systems employing different techniques to train on the same data.
    Page 2, “Related Work”
  11. Our approach runs a classifier trained only on source language features to decide which system should translate each sentence in the test set, which means that each sentence goes through one MT system only.
    Page 2, “Related Work”

See all papers in Proc. ACL 2014 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

BLEU

Appears in 25 sentences as: BLEU (31)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. Our best result improves over the best single MT system baseline by 1.0% BLEU and over a strong system selection baseline by 0.6% BLEU on a blind test set.
    Page 1, “Abstract”
  2. Our best system selection approach improves over our best baseline single MT system by 1.0% absolute BLEU point on a blind test set.
    Page 1, “Introduction”
  3. Feature weights are tuned to maximize BLEU on tuning sets using Minimum Error Rate Training (Och, 2003).
    Page 2, “Machine Translation Experiments”
  4. Results are presented in terms of BLEU (Papineni et al., 2002).
    Page 2, “Machine Translation Experiments”
  5. All differences in BLEU scores between the four systems are statistically significant above the 95% level.
    Page 3, “Machine Translation Experiments”
  6. We also report in Table 1 an oracle system selection where we pick, for each sentence, the English translation that yields the best BLEU score.
    Page 3, “Machine Translation Experiments”
  7. This oracle indicates that the upper bound for improvement achievable from system selection is 5.4% BLEU .
    Page 3, “Machine Translation Experiments”
  8. We run the 5,562 sentences of the classification training data through our four MT systems and produce sentence-level BLEU scores (with length penalty).
    Page 3, “MT System Selection”
  9. We pick the name of the MT system with the highest BLEU score as the class label for that sentence.
    Page 3, “MT System Selection”
  10. When there is a tie in BLEU scores, we pick the system label that yields better overall BLEU scores from the systems tied.
    Page 3, “MT System Selection”
  11. It improves over the best single system baseline (MSA-Pivot) by a statistically significant 0.5% BLEU .
    Page 4, “MT System Selection”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

sentence-level

Appears in 7 sentences as: sentence-level (7)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input.
    Page 1, “Abstract”
  2. In this paper we study the use of sentence-level dialect identification together with various linguistic features in optimizing the selection of outputs of four different MT systems on input text that includes a mix of dialects.
    Page 1, “Introduction”
  3. used features from their token-level system to train a classifier that performs sentence-level dialect ID (Elfardy and Diab, 2013).
    Page 2, “Related Work”
  4. For baseline system selection, we use the classification decision of Elfardy and Diab (2013)’s sentence-level dialect identification system to decide on the target MT system.
    Page 3, “MT System Selection”
  5. We run the 5,562 sentences of the classification training data through our four MT systems and produce sentence-level BLEU scores (with length penalty).
    Page 3, “MT System Selection”
  6. In 21% of the error cases, our classifier predicted a better translation than the one considered gold by BLEU due to BLEU bias, e.g., severe sentence-level length penalty due to an extra punctuation in a short sentence.
    Page 5, “Discussion and Error Analysis”
  7. We presented a sentence-level classification approach for MT system selection for diglossic languages.
    Page 5, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

Binary Classification

Appears in 6 sentences as: Binary Classification (4) Binary Classifier (1) binary classifier (1)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. 4.1 Dialect ID Binary Classification
    Page 3, “MT System Selection”
  2. We run the sentence through the Dialect ID binary classifier and we use the predicted class label (DA or MSA) as a feature in our system.
    Page 4, “MT System Selection”
  3. It improves over our best baseline single MT system by 1.3% BLEU and over the Dialect ID Binary Classification system selection baseline by 0.8% BLEU.
    Page 4, “MT System Selection”
  4. Best Dialect ID Binary Classifier 32.9 0.4 Best Classifier: Basic + Extended Features 33.5 1.0
    Page 4, “MT System Selection”
  5. The third part shows the results of the Dialect ID Binary Classification which improves by 0.4% BLEU.
    Page 5, “MT System Selection”
  6. The last row shows the four-class classifier results which improves by 1.0% BLEU over the best single MT system baseline and by 0.6% BLEU over the Dialect ID Binary Classification .
    Page 5, “MT System Selection”

See all papers in Proc. ACL 2014 that mention Binary Classification.

See all papers in Proc. ACL that mention Binary Classification.

Back to top.

BLEU scores

Appears in 6 sentences as: BLEU score (3) BLEU scores (4)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. All differences in BLEU scores between the four systems are statistically significant above the 95% level.
    Page 3, “Machine Translation Experiments”
  2. We also report in Table 1 an oracle system selection where we pick, for each sentence, the English translation that yields the best BLEU score .
    Page 3, “Machine Translation Experiments”
  3. We run the 5,562 sentences of the classification training data through our four MT systems and produce sentence-level BLEU scores (with length penalty).
    Page 3, “MT System Selection”
  4. We pick the name of the MT system with the highest BLEU score as the class label for that sentence.
    Page 3, “MT System Selection”
  5. When there is a tie in BLEU scores, we pick the system label that yields better overall BLEU scores from the systems tied.
    Page 3, “MT System Selection”
  6. We plan to give different weights to different training examples based on the drop in BLEU score the example can cause if classified incorrectly.
    Page 5, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention BLEU scores.

See all papers in Proc. ACL that mention BLEU scores.

Back to top.

machine translation

Appears in 6 sentences as: Machine Translation (2) machine translation (4)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input.
    Page 1, “Abstract”
  2. We test our approach on Arabic, a prototypical diglossic language; and we optimize the combination of four different machine translation systems.
    Page 1, “Abstract”
  3. For statistical machine translation (MT), which relies on the existence of parallel data, translating from nonstandard dialects is a challenge.
    Page 1, “Introduction”
  4. Arabic Dialect Machine Translation .
    Page 1, “Related Work”
  5. System Selection and Combination in Machine Translation .
    Page 2, “Related Work”
  6. We use the open-source Moses toolkit (Koehn et al., 2007) to build four Arabic-English phrase-based statistical machine translation systems (SMT).
    Page 2, “Machine Translation Experiments”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

statistically significant

Appears in 6 sentences as: Statistical significance (1) statistically significant (5)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. All differences in BLEU scores between the four systems are statistically significant above the 95% level.
    Page 3, “Machine Translation Experiments”
  2. Statistical significance is computed using paired bootstrap re-sampling (Koehn, 2004).
    Page 3, “Machine Translation Experiments”
  3. It improves over the best single system baseline (MSA-Pivot) by a statistically significant 0.5% BLEU.
    Page 4, “MT System Selection”
  4. Improvements are statistically significant .
    Page 4, “MT System Selection”
  5. The differences in BLEU are statistically significant .
    Page 5, “MT System Selection”
  6. However, our best system selection approach improves over MSA-Pivot by a small margin of 0.2% BLEU absolute only, albeit a statistically significant improvement.
    Page 5, “Discussion and Error Analysis”

See all papers in Proc. ACL 2014 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

language models

Appears in 5 sentences as: language model (2) language models (3)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. The language model for our systems is trained on English Gigaword (Graff and Cieri, 2003).
    Page 2, “Machine Translation Experiments”
  2. We use SRILM Toolkit (Stolcke, 2002) to build a 5-gram language model with modified
    Page 2, “Machine Translation Experiments”
  3. These features rely on language models , MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary.
    Page 3, “MT System Selection”
  4. two language models : MSA and Egyptian.
    Page 4, “MT System Selection”
  5. The second set of features uses perplexity against language models built from the source-side of the training data of each of the four
    Page 4, “MT System Selection”

See all papers in Proc. ACL 2014 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

baseline system

Appears in 4 sentences as: baseline system (2) baseline systems (2)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. In this section, we present our MT experimental setup and the four baseline systems we built, and we evaluate their performance and the potential of their combination.
    Page 2, “Machine Translation Experiments”
  2. For baseline system selection, we use the classification decision of Elfardy and Diab (2013)’s sentence-level dialect identification system to decide on the target MT system.
    Page 3, “MT System Selection”
  3. baseline systems .
    Page 4, “MT System Selection”
  4. The first part of Table 2 repeats the best baseline system and the four-system oracle combination from Table l for convenience.
    Page 4, “MT System Selection”

See all papers in Proc. ACL 2014 that mention baseline system.

See all papers in Proc. ACL that mention baseline system.

Back to top.

morphological analyzer

Appears in 4 sentences as: morphological analyzer (2) morphological analyzers (2)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. Sawaf (2010) and Salloum and Habash (2013) used hybrid solutions that combine rule-based algorithms and resources such as leXicons and morphological analyzers with statistical models to map DA to MSA before using MSA-to-English MT systems.
    Page 1, “Related Work”
  2. The MSA portion of the Arabic side is segmented according to the Arabic Treebank (ATB) tokenization scheme (Maamouri et al., 2004; Sadat and Habash, 2006) using the MADA+TOKAN morphological analyzer and tok-enizer v3.1 (Roth et al., 2008), while the DA portion is ATB-tokenized with MADA-ARZ (Habash et al., 2013).
    Page 2, “Machine Translation Experiments”
  3. These features rely on language models, MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary.
    Page 3, “MT System Selection”
  4. These features are: sentence length (in words), percentage of selected words and phrases, number of selected words, number of selected phrases, number of words morphologically selected as dialectal by a mainly Levantine morphological analyzer , number of words selected as dialectal by the tool’s DA-MSA lexicons, number of OOV words against the MSA-Pivot system training data, number of words in the sentences that appeared less than 5 times in the training data, number of words in the sentences that appeared between 5 and 10 times in the training data, number of words in the sentences that appeared between 10 and 15 times in the training data, number of words that have spelling errors and corrected by this tool (e.g., word-lengthening), number of punctuation marks, and number of words that are written in Latin script.
    Page 4, “MT System Selection”

See all papers in Proc. ACL 2014 that mention morphological analyzer.

See all papers in Proc. ACL that mention morphological analyzer.

Back to top.

parallel data

Appears in 3 sentences as: parallel data (4)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. For statistical machine translation (MT), which relies on the existence of parallel data , translating from nonstandard dialects is a challenge.
    Page 1, “Introduction”
  2. Two approaches have emerged to alleviate the problem of DA-English parallel data scarcity: using MSA as a bridge language (Sawaf, 2010; Salloum and Habash, 2011; Salloum and Habash, 2013; Sajjad et al., 2013), and using crowd sourcing to acquire parallel data (Zbib et al., 2012).
    Page 1, “Related Work”
  3. DAT (in the fourth column) is the DA part of the 5M word DA-En parallel data processed with the DA-MSA MT system.
    Page 3, “Machine Translation Experiments”

See all papers in Proc. ACL 2014 that mention parallel data.

See all papers in Proc. ACL that mention parallel data.

Back to top.

SMT system

Appears in 3 sentences as: SMT system (2) SMT systems (1)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. We used MT08 and EgyDevV3 to tune SMT systems while we divided the remaining sets among classifier training data (5,562 sentences), dev (1,802 sentences) and blind test (1,804 sentences) sets to ensure each of these new sets has a variety of dialects and genres (weblog and newswire).
    Page 2, “Machine Translation Experiments”
  2. This MSA-pivoting system uses Salloum and Habash (2013)’s DA-MSA MT system followed by an Arabic-English SMT system which is trained on both corpora augmented with the DA-English where the DA side is preprocessed with the same DA-MSA MT system then tokenized with MADA-ARZ.
    Page 3, “Machine Translation Experiments”
  3. Test sets are similarly preprocessed before decoding with the SMT system .
    Page 3, “Machine Translation Experiments”

See all papers in Proc. ACL 2014 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

translation systems

Appears in 3 sentences as: translation system (1) translation systems (2)
In Sentence Level Dialect Identification for Machine Translation System Selection
  1. In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input.
    Page 1, “Abstract”
  2. We test our approach on Arabic, a prototypical diglossic language; and we optimize the combination of four different machine translation systems .
    Page 1, “Abstract”
  3. We use the open-source Moses toolkit (Koehn et al., 2007) to build four Arabic-English phrase-based statistical machine translation systems (SMT).
    Page 2, “Machine Translation Experiments”

See all papers in Proc. ACL 2014 that mention translation systems.

See all papers in Proc. ACL that mention translation systems.

Back to top.