Hindi-to-Urdu Machine Translation through Transliteration
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

Article Structure

Abstract

We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation.

Introduction

Hindi is an official language of India and is written in Devanagari script.

Previous Work

There has been a significant amount of work on transliteration.

Our Approach

Both of our models combine a character-based transliteration model with a word-based translation model.

Evaluation

4.1 Training

Error Analysis

Based on preliminary experiments we found three major flaws in our initial formulations.

Final Results

Sample Output

This section gives two examples showing how our model (M 1H 2) performs disambiguation.

Conclusion

We have presented a novel way to integrate transliterations into machine translation.

Topics

language model

Appears in 18 sentences as: Language Model (2) language model (16) language model: (1)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. (1) 3.1.1 Language Model
    Page 3, “Our Approach”
  2. The language model (LM) pm?)
    Page 3, “Our Approach”
  3. The parameters of the language model are learned from a monolingual Urdu corpus.
    Page 3, “Our Approach”
  4. The language model is defined as:
    Page 3, “Our Approach”
  5. For a multi-word ui we do multiple language model lookups, one for each nine in ui = nil, .
    Page 3, “Our Approach”
  6. Language Model for Unknown Words: Our model generates transliterations that can be known or unknown to the language model and the translation model.
    Page 3, “Our Approach”
  7. We refer to the words known to the language model and to the translation model as LM-known and TM-known words respectively and to words that are unknown as LM-unknown and TM-unknown respectively.
    Page 3, “Our Approach”
  8. If one or more mm in a multi-word ui are LM-unknown we assign a language model score pLM 2 w for the entire ui, meaning that we consider partially known transliterations to be as bad as fully unknown transliterations.
    Page 3, “Our Approach”
  9. language model:
    Page 4, “Our Approach”
  10. It searches for an Urdu string that maximizes the product of translation probability and the language model probability (equation 1) by translating one Hindi word at a time.
    Page 5, “Our Approach”
  11. At the higher level, transliteration probabilities are interpolated with pw and then multiplied with language model probabilities to give the probability of a hypothesis.
    Page 5, “Our Approach”

See all papers in Proc. ACL 2010 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

translation model

Appears in 16 sentences as: Translation Model (1) translation model (15)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions.
    Page 2, “Introduction”
  2. Moreover, they are working with a large bitext so they can rely on their translation model and only need to transliterate NEs and OOVs.
    Page 3, “Previous Work”
  3. Our translation model is based on data which is both sparse and noisy.
    Page 3, “Previous Work”
  4. Both of our models combine a character-based transliteration model with a word-based translation model .
    Page 3, “Our Approach”
  5. Language Model for Unknown Words: Our model generates transliterations that can be known or unknown to the language model and the translation model .
    Page 3, “Our Approach”
  6. We refer to the words known to the language model and to the translation model as LM-known and TM-known words respectively and to words that are unknown as LM-unknown and TM-unknown respectively.
    Page 3, “Our Approach”
  7. 3.1.2 Translation Model
    Page 4, “Our Approach”
  8. The translation model (TM) p(h71”|u7f) is approximated with a context-independent model:
    Page 4, “Our Approach”
  9. The parameters of the word-based translation model pw are estimated from the word alignments of a small parallel corpus.
    Page 4, “Our Approach”
  10. Again, the translation model p(h71”|u7f) is approximated with a context-independent model:
    Page 4, “Our Approach”
  11. The parameters of the translation model pw (hi, and the word-based prior probabilities pw(ui) are estimated from the l-l/l-N word-aligned corpus (the one that we also used to estimate translation probabilities pw previously).
    Page 4, “Our Approach”

See all papers in Proc. ACL 2010 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

BLEU

Appears in 10 sentences as: BLEU (11)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
    Page 1, “Abstract”
  2. Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores
    Page 2, “Introduction”
  3. M Pbo Pbl Pb2 M1 M2 BLEU 14.3 16.25 16.13 18.6 17.05
    Page 7, “Evaluation”
  4. Both our systems (Model-1 and Model-2) beat the baseline phrase-based system with a BLEU point difference of 4.30 and 2.75 respectively.
    Page 7, “Evaluation”
  5. The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
    Page 7, “Evaluation”
  6. This section shows the improvement in BLEU score by applying heuristics and combinations of heuristics in both the models.
    Page 8, “Final Results”
  7. BLEU point improvement and combined with all the heuristics (M2H123) gives an overall gain of 1.95 BLEU points and is close to our best results (M1H12).
    Page 9, “Final Results”
  8. One important issue that has not been investigated yet is that BLEU has not yet been shown to have good performance in morphologically rich target languages like Urdu, but there is no metric known to work better.
    Page 9, “Final Results”
  9. We observed that sometimes on data where the translators preferred to translate rather than doing transliteration our system is penalized by BLEU even though our output string is a valid translation.
    Page 9, “Final Results”
  10. For other parts of the data where the translators have heavily used transliteration, the system may receive a higher BLEU score.
    Page 9, “Final Results”

See all papers in Proc. ACL 2010 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

phrase-based

Appears in 8 sentences as: Phrase-based (1) phrase-based (8)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
    Page 1, “Abstract”
  2. Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores
    Page 2, “Introduction”
  3. Table 4: Comparing Model-1 and Model-2 with Phrase-based Systems
    Page 7, “Evaluation”
  4. We also used two methods to incorporate transliterations in the phrase-based system:
    Page 7, “Evaluation”
  5. Post-process P191: All the 00V words in the phrase-based output are replaced with their top-candidate transliteration as given by our transliteration system.
    Page 7, “Evaluation”
  6. Both our systems (Model-1 and Model-2) beat the baseline phrase-based system with a BLEU point difference of 4.30 and 2.75 respectively.
    Page 7, “Evaluation”
  7. The transliteration aided phrase-based systems Pbl and P02 are closer to our Model-2 results but are way below Model-1 results.
    Page 7, “Evaluation”
  8. Our models choose between translations and transliterations based on context unlike the phrase-based systems Pbl and P02 which use transliteration only as a tool to translate OOV words.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

probability model

Appears in 8 sentences as: probabilistic models (2) Probability Model (2) probability model (7)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We propose two probabilistic models , based on conditional and joint probability formulations, that are novel solutions to the problem.
    Page 1, “Abstract”
  2. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model ) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
    Page 1, “Abstract”
  3. Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions.
    Page 2, “Introduction”
  4. 3.1 Model-1 : Conditional Probability Model
    Page 3, “Our Approach”
  5. Because our overall model is a conditional probability model , joint-probabilities are marginalized using character-based prior probabilities:
    Page 4, “Our Approach”
  6. 3.2 Model-2 : Joint Probability Model
    Page 4, “Our Approach”
  7. We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”.
    Page 8, “Final Results”
  8. We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention probability model.

See all papers in Proc. ACL that mention probability model.

Back to top.

conditional probability

Appears in 7 sentences as: conditional probabilities (1) Conditional Probability (1) conditional probability (5)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We obtain final BLEU scores of 19.35 ( conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
    Page 1, “Abstract”
  2. 3.1 Model-1 : Conditional Probability Model
    Page 3, “Our Approach”
  3. Our model estimates the conditional probability by interpolating a word-based model and a character-based (transliteration) model.
    Page 4, “Our Approach”
  4. Because our overall model is a conditional probability model, joint-probabilities are marginalized using character-based prior probabilities:
    Page 4, “Our Approach”
  5. This section briefly defines a variant of our model where we interpolate joint probabilities instead of conditional probabilities .
    Page 4, “Our Approach”
  6. We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”.
    Page 8, “Final Results”
  7. We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention conditional probability.

See all papers in Proc. ACL that mention conditional probability.

Back to top.

BLEU score

Appears in 4 sentences as: BLEU score (2) BLEU scores (2)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
    Page 1, “Abstract”
  2. Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores
    Page 2, “Introduction”
  3. This section shows the improvement in BLEU score by applying heuristics and combinations of heuristics in both the models.
    Page 8, “Final Results”
  4. For other parts of the data where the translators have heavily used transliteration, the system may receive a higher BLEU score .
    Page 9, “Final Results”

See all papers in Proc. ACL 2010 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

BLEU point

Appears in 3 sentences as: BLEU point (2) BLEU points (2)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. Both our systems (Model-1 and Model-2) beat the baseline phrase-based system with a BLEU point difference of 4.30 and 2.75 respectively.
    Page 7, “Evaluation”
  2. The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
    Page 7, “Evaluation”
  3. BLEU point improvement and combined with all the heuristics (M2H123) gives an overall gain of 1.95 BLEU points and is close to our best results (M1H12).
    Page 9, “Final Results”

See all papers in Proc. ACL 2010 that mention BLEU point.

See all papers in Proc. ACL that mention BLEU point.

Back to top.

cross validation

Appears in 3 sentences as: cross validation (3)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. This comprises 108K sentences from the data made available by the University of Leipzig4 + 5600 sentences from the training data of each fold during cross validation .
    Page 5, “Evaluation”
  2. We perform a 5-fold cross validation taking 4/5 of the data as training and 1/5 as test data.
    Page 6, “Evaluation”
  3. Baseline P190: We ran Moses (Koehn et al., 2007) using Koehn’s training scriptslo, doing a 5-fold cross validation with no reordering“.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention cross validation.

See all papers in Proc. ACL that mention cross validation.

Back to top.

language pairs

Appears in 3 sentences as: language pairs (3)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
    Page 1, “Abstract”
  2. The difference of 2.35 BLEU points between M1 and Pbl indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.
    Page 7, “Evaluation”
  3. In closely related language pairs such as Hindi-Urdu with a significant amount of vocabulary overlap,
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention language pairs.

See all papers in Proc. ACL that mention language pairs.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation .
    Page 1, “Abstract”
  2. We have presented a novel way to integrate transliterations into machine translation .
    Page 9, “Conclusion”
  3. transliteration can be very effective in machine translation for more than just translating OOV words.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2010 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

phrase table

Appears in 3 sentences as: phrase table (2) phrase tables (1)
In Hindi-to-Urdu Machine Translation through Transliteration
  1. do not compete with internal phrase tables .
    Page 3, “Previous Work”
  2. (2008) use a tagger to identify good candidates for transliteration (which are mostly NEs) in input text and add transliterations to the SMT phrase table dynamically such that they can directly compete with translations during decoding.
    Page 3, “Previous Work”
  3. 12After having the MERT parameters, we add the 600 dev sentences back into the training corpus, retrain GIZA, and then estimate a new phrase table on all 5600 sentences.
    Page 7, “Error Analysis”

See all papers in Proc. ACL 2010 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.