Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
Yang, Fan and Zhao, Jun and Zou, Bo and Liu, Kang and Liu, Feifan

Article Structure

Abstract

In this paper, we present a novel backward transliteration approach which can further assist the existing statistical model by mining monolingual web resources.

Introduction*

The task of Name Entity (NE) translation is to translate a name entity from source language to target language, which plays an important role in machine translation and cross-language information retrieval (CLIR).

System Framework

Our system has three main modules.

Statistical Transliteration Model

We use syllables as translation units to build a statistical Chinese-English backward transliteration model in our system.

Topics

edit distance

Appears in 8 sentences as: edit distance (8)
In Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
  1. We use the algorithm like calculating the edit distance between two words.
    Page 4, “Statistical Transliteration Model”
  2. The edit distance calculation finds the best matching with the least operation cost to change one word to another word by using deletion/addition/insertion operations on syllables.
    Page 4, “Statistical Transliteration Model”
  3. But the complexity will be too high to afford if we calculate the edit distance between a query and each word in the list.
    Page 4, “Statistical Transliteration Model”
  4. So, we just calculate the edit distance for the words which get high score without the order limitation.
    Page 4, “Statistical Transliteration Model”
  5. Syllable expansion based on syllable edit distance : The disadvantage of last two expansions is that they are entirely dependent on the training set.
    Page 5, “Statistical Transliteration Model”
  6. To solve the problem, we use the method of expansion based on edit distance .
    Page 5, “Statistical Transliteration Model”
  7. We use edit distance to measure the similarity between two syllables, one is in training set and the other is absent.
    Page 5, “Statistical Transliteration Model”
  8. Because the edit distance expansion is not very relevant to pronunciation, we will give this expansion method a low weight in combination.
    Page 5, “Statistical Transliteration Model”

See all papers in Proc. ACL 2008 that mention edit distance.

See all papers in Proc. ACL that mention edit distance.

Back to top.

Chinese-English

Appears in 5 sentences as: Chinese-English (5)
In Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
  1. We can’t get Chinese-English bilingual pages when the input is a Chinese query.
    Page 1, “Introduction*”
  2. We use syllables as translation units to build a statistical Chinese-English backward transliteration model in our system.
    Page 3, “Statistical Transliteration Model”
  3. Based on the above alignment method, we can get our statistical Chinese-English backward transliteration model as,
    Page 3, “Statistical Transliteration Model”
  4. Chinese-English backward transliteration has some differences from traditional translation.
    Page 3, “Statistical Transliteration Model”
  5. Ruling out those pairs which are not suitable for the research on Chinese-English backward transliteration, such as Chinese-Japanese, we select a training set which contains 14,443 pairs of Chinese-European & American person names.
    Page 6, “Statistical Transliteration Model”

See all papers in Proc. ACL 2008 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

language model

Appears in 4 sentences as: language model (4)
In Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
  1. The language model P(e) is trained from English texts.
    Page 3, “Statistical Transliteration Model”
  2. generative probability of a English syllable language model .
    Page 3, “Statistical Transliteration Model”
  3. 2) The language model in backward transliteration describes the relationship of syllables in words.
    Page 3, “Statistical Transliteration Model”
  4. It can’t work as well as the language model describing the word relationship in sentences.
    Page 3, “Statistical Transliteration Model”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

Name Entity

Appears in 4 sentences as: Name Entity (2) name entity (1) named entity (2)
In Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
  1. The task of Name Entity (NE) translation is to translate a name entity from source language to target language, which plays an important role in machine translation and cross-language information retrieval (CLIR).
    Page 1, “Introduction*”
  2. Because the process of named entity recognition may lose some NEs, we will reserve all the words in web corpus without any filtering.
    Page 4, “Statistical Transliteration Model”
  3. Then for every Tik, we use the named entity recognition (NER) software to determine whether rci is a NE or not.
    Page 5, “Statistical Transliteration Model”
  4. The training corpus for statistical transliteration model comes from the corpus of Chinese <-> English Name Entity Lists v 1.0 (LDC2005T34).
    Page 6, “Statistical Transliteration Model”

See all papers in Proc. ACL 2008 that mention Name Entity.

See all papers in Proc. ACL that mention Name Entity.

Back to top.