Measure Word Generation for English-Chinese SMT Systems
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

Article Structure

Abstract

Measure words in Chinese are used to indicate the count of nouns.

Introduction

In linguistics, measure words (MW) are words or morphemes used in combination with numerals or demonstrative pronouns to indicate the count of nounsl, which are often referred to as head words (HW).

Our Method

2.1 Measure word generation in Chinese

Model Training and Application 3.1 Training

We parsed English and Chinese sentences to get training samples for measure word generation model.

Experiments

4.1 Data

Related Work

Most existing rule-based English-to-Chinese MT systems have a dedicated module handling measure word generation.

Conclusion and Future Work

In this paper we propose a statistical model for measure word generation for English-to-Chinese SMT systems, in which contextual knowledge from both source and target sentences is involved.

Topics

SMT system

Appears in 23 sentences as: SMT system (14) SMT Systems (1) SMT systems (6) SMT system’s (2)
In Measure Word Generation for English-Chinese SMT Systems
  1. In this paper, we propose a statistical model to generate appropriate measure words of nouns for an English-to-Chinese SMT system .
    Page 1, “Abstract”
  2. Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system .
    Page 1, “Abstract”
  3. English-Chinese SMT Systems
    Page 1, “Introduction”
  4. However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.
    Page 2, “Introduction”
  5. Due to the limited size of bilingual corpora, many measure words, as well as the collocations between a measure and its head word, cannot be well covered by the phrase translation table in an SMT system .
    Page 2, “Introduction”
  6. To overcome the disadvantage of measure word generation in a general SMT system , this paper proposes a dedicated statistical model to generate measure words for English-to-Chinese translation.
    Page 2, “Introduction”
  7. Our method is performed as a postprocessing procedure of the output of SMT systems .
    Page 2, “Introduction”
  8. The advantage is that it can be easily integrated into any SMT system .
    Page 2, “Introduction”
  9. For those having English translations, such as 9K” (meter), “DEE” (ton), we just use the translation produced by the SMT system itself.
    Page 2, “Our Method”
  10. The model is applied to SMT system outputs as a postprocessing procedure.
    Page 3, “Our Method”
  11. Based on contextual information contained in both input source sentence and SMT system’s output translation, a measure word candidate set M is constructed.
    Page 3, “Our Method”

See all papers in Proc. ACL 2008 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

language model

Appears in 7 sentences as: language model (7) Language Modeling (1)
In Measure Word Generation for English-Chinese SMT Systems
  1. Moreover, Chinese measure words often have a long distance dependency to their head words which makes language model ineffective in selecting the correct measure words from the measure word candidate set.
    Page 2, “Introduction”
  2. In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
    Page 2, “Introduction”
  3. For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
    Page 4, “Our Method”
  4. Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags
    Page 4, “Our Method”
  5. We used the SRI Language Modeling Toolkit (Stolcke, 2002) to train a five-gram model with modified Kneser-Ney smoothing (Chen and Goodman, 1998).
    Page 5, “Model Training and Application 3.1 Training”
  6. In the experiments, the language model is a Chinese 5-gram language model trained with the Chinese part of the LDC parallel corpus and the Xin-hua part of the Chinese Gigaword corpus with about 27 million words.
    Page 5, “Experiments”
  7. In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

contextual information

Appears in 6 sentences as: contextual information (6)
In Measure Word Generation for English-Chinese SMT Systems
  1. We also compared the performance of our model based on different contextual information , and show that both large-scale monolingual data and parallel bilingual data can be helpful to generate correct measure words.
    Page 2, “Introduction”
  2. Based on contextual information contained in both input source sentence and SMT system’s output translation, a measure word candidate set M is constructed.
    Page 3, “Our Method”
  3. After obtaining the measure word candidate set M, a measure word selection model is employed to select the best one from M. Given the contextual information C in both source window and target
    Page 3, “Our Method”
  4. Then, the collocation between measure words and head words and their surrounding contextual information are extracted to train the measure word selection models.
    Page 4, “Model Training and Application 3.1 Training”
  5. Then, contextual information within the windows in the source and the target sentence is extracted and fed to the measure word selection model.
    Page 5, “Model Training and Application 3.1 Training”
  6. We do not integrate our measure word generation module into the SMT decoder since there is only little target contextual information available during SMT decoding.
    Page 5, “Model Training and Application 3.1 Training”

See all papers in Proc. ACL 2008 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.

machine translation

Appears in 6 sentences as: machine translation (6)
In Measure Word Generation for English-Chinese SMT Systems
  1. Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
    Page 1, “Abstract”
  2. Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system.
    Page 1, “Abstract”
  3. According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
    Page 1, “Introduction”
  4. Therefore, in the English-to-Chinese machine translation task we need to take additional efforts to generate the missing measure words in Chinese.
    Page 1, “Introduction”
  5. In most statistical machine translation (SMT) models (Och et al., 2004; Koehn et al., 2003; Chiang, 2005), some of measure words can be generated without modification or additional processing.
    Page 1, “Introduction”
  6. We also compared our method with a well-known rule-based machine translation system —SYSTRAN3.
    Page 8, “Experiments”

See all papers in Proc. ACL 2008 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

precision and recall

Appears in 6 sentences as: precision and recall (6)
In Measure Word Generation for English-Chinese SMT Systems
  1. Experimental results show our method can achieve high precision and recall in measure word generation.
    Page 1, “Abstract”
  2. Table 3 and Table 4 show the precision and recall of our measure word generation method.
    Page 5, “Experiments”
  3. In addition to precision and recall , we also evaluate the Bleu score (Papineni et al., 2002) changes before and after applying our measure word generation method to the SMT output.
    Page 6, “Experiments”
  4. Table 6 and Table 7 show the precision and recall when using different features.
    Page 7, “Experiments”
  5. The precision and recall are 63.82% and 51.09% respectively, which are also lower than our method.
    Page 8, “Experiments”
  6. Experimental results show that our method not only achieves high precision and recall for generating measure words, but also improves the quality of English-to-Chinese SMT systems.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2008 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

parse tree

Appears in 5 sentences as: parse tree (3) parse trees (2)
In Measure Word Generation for English-Chinese SMT Systems
  1. The source head word feature is defined to be a function fl to indicate whether a word ei is the source head word in English according to a parse tree of the source sentence.
    Page 4, “Our Method”
  2. Based on the source syntax parse tree , for each measure word, we identified its head word by using a toolkit from (Chiang and Bikel, 2002) which can heuristically identify head words for sub-trees.
    Page 4, “Model Training and Application 3.1 Training”
  3. might be incorrect due to errors in English parse trees .
    Page 7, “Experiments”
  4. Given a source sentence, the corresponding syntax parse tree T S is first constructed with an English parser.
    Page 7, “Experiments”
  5. The other problem comes from the English head word selection error introduced by using source parse trees .
    Page 8, “Experiments”

See all papers in Proc. ACL 2008 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

data sparseness

Appears in 4 sentences as: data sparseness (4)
In Measure Word Generation for English-Chinese SMT Systems
  1. Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
    Page 1, “Abstract”
  2. However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.
    Page 2, “Introduction”
  3. Compared with the baseline, the Mo-ME method takes advantage of a large size monolingual training corpus and reduces the data sparseness problem.
    Page 6, “Experiments”
  4. One problem is data sparseness with respect to collocations be-
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention data sparseness.

See all papers in Proc. ACL that mention data sparseness.

Back to top.

Lm

Appears in 4 sentences as: Lm (4)
In Measure Word Generation for English-Chinese SMT Systems
  1. In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
    Page 7, “Experiments”
  2. Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh 61.43% 49.22% +Punc 62.54% 50.08% +Tlex 64.80% 51.87%
    Page 7, “Experiments”
  3. Feature setting Precision Recall Baseline 54.82% 45.61% Lm 51.11% 41.24% +Tmh+Smh 64.50% 51.64% +Hs 65.32% 52.26% +Punc 66.29% 53.10% +Pos 66.53% 53.25% +Tlex 67.50% 54.02% +Slex 69.52% 55.54%
    Page 7, “Experiments”
  4. The method with only Lm feature performs worse than the baseline.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention Lm.

See all papers in Proc. ACL that mention Lm.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (4)
In Measure Word Generation for English-Chinese SMT Systems
  1. According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
    Page 1, “Introduction”
  2. Table 1 shows the relative position’s distribution of head words around measure words in the Chinese Penn Treebank , where a negative position indicates that the head word is to the left of the measure word and a positive position indicates that the head word is to the right of the measure word.
    Page 2, “Introduction”
  3. According to our survey, about 70.4% of measure words in the Chinese Penn Treebank need
    Page 2, “Our Method”
  4. The training corpus for Mo-ME model consists of the Chinese Peen Treebank and the Chinese part of the LDC parallel corpus with about 2 million sentences.
    Page 5, “Experiments”

See all papers in Proc. ACL 2008 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

n-gram

Appears in 4 sentences as: n-gram (5)
In Measure Word Generation for English-Chinese SMT Systems
  1. In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
    Page 2, “Introduction”
  2. For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
    Page 4, “Our Method”
  3. Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags
    Page 4, “Our Method”
  4. In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

rule-based

Appears in 4 sentences as: rule-based (4)
In Measure Word Generation for English-Chinese SMT Systems
  1. In this section we compare our statistical methods with the preprocessing method and the rule-based methods for measure word generation in a translation task.
    Page 7, “Experiments”
  2. We also compared our method with a well-known rule-based machine translation system —SYSTRAN3.
    Page 8, “Experiments”
  3. Most existing rule-based English-to-Chinese MT systems have a dedicated module handling measure word generation.
    Page 8, “Related Work”
  4. In general a rule-based method uses manually constructed rule patterns to predict measure words.
    Page 8, “Related Work”

See all papers in Proc. ACL 2008 that mention rule-based.

See all papers in Proc. ACL that mention rule-based.

Back to top.

sentence pairs

Appears in 4 sentences as: sentence pair (1) sentence pairs (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair .
    Page 5, “Model Training and Application 3.1 Training”
  2. We extracted both development and test data set from years of NIST Chinese-to-English evaluation data by filtering out sentence pairs not containing measure words.
    Page 5, “Experiments”
  3. The development set is extracted from NIST evaluation data from 2002 to 2004, and the test set consists of sentence pairs from NIST evaluation data from 2005 to 2006.
    Page 5, “Experiments”
  4. There are 759 testing cases for measure word generation in our test data consisting of 2746 sentence pairs .
    Page 5, “Experiments”

See all papers in Proc. ACL 2008 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.

word alignment

Appears in 3 sentences as: word alignment (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. For the bilingual corpus, we also perform word alignment to get correspondences between source and target words.
    Page 4, “Model Training and Application 3.1 Training”
  2. According to word alignment results, we classify
    Page 4, “Model Training and Application 3.1 Training”
  3. We ran GIZA++ (Och and Ney, 2000) on the training corpus in both directions with IBM model 4, and then applied the refinement rule described in (Koehn et al., 2003) to obtain a many-to-many word alignment for each sentence pair.
    Page 5, “Model Training and Application 3.1 Training”

See all papers in Proc. ACL 2008 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

Bleu

Appears in 3 sentences as: Bleu (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. In addition to precision and recall, we also evaluate the Bleu score (Papineni et al., 2002) changes before and after applying our measure word generation method to the SMT output.
    Page 6, “Experiments”
  2. For our test data, we only consider sentences containing measure words for Bleu score evaluation.
    Page 6, “Experiments”
  3. Our measure word generation step leads to a Bleu score improvement of 0.32 where the window size is set to 10, which shows that it can improve the translation quality of an English-to-Chinese SMT system.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention Bleu.

See all papers in Proc. ACL that mention Bleu.

Back to top.

statistical machine translation

Appears in 3 sentences as: statistical machine translation (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
    Page 1, “Abstract”
  2. Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system.
    Page 1, “Abstract”
  3. In most statistical machine translation (SMT) models (Och et al., 2004; Koehn et al., 2003; Chiang, 2005), some of measure words can be generated without modification or additional processing.
    Page 1, “Introduction”

See all papers in Proc. ACL 2008 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.

Penn Treebank

Appears in 3 sentences as: Penn Treebank (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
    Page 1, “Introduction”
  2. Table 1 shows the relative position’s distribution of head words around measure words in the Chinese Penn Treebank , where a negative position indicates that the head word is to the left of the measure word and a positive position indicates that the head word is to the right of the measure word.
    Page 2, “Introduction”
  3. According to our survey, about 70.4% of measure words in the Chinese Penn Treebank need
    Page 2, “Our Method”

See all papers in Proc. ACL 2008 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

model training

Appears in 3 sentences as: model trained (1) model training (2)
In Measure Word Generation for English-Chinese SMT Systems
  1. In the experiments, the language model is a Chinese 5-gram language model trained with the Chinese part of the LDC parallel corpus and the Xin-hua part of the Chinese Gigaword corpus with about 27 million words.
    Page 5, “Experiments”
  2. The Bi-ME model is trained with FBIS corpus, whose size is smaller than that used in Mo-ME model training .
    Page 5, “Experiments”
  3. We can see that the Bi-ME model can achieve better results than the Mo-ME model in both recall and precision metrics although only a small sized bilingual corpus is used for Bi-ME model training .
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention model training.

See all papers in Proc. ACL that mention model training.

Back to top.

Bleu score

Appears in 3 sentences as: Bleu score (3)
In Measure Word Generation for English-Chinese SMT Systems
  1. In addition to precision and recall, we also evaluate the Bleu score (Papineni et al., 2002) changes before and after applying our measure word generation method to the SMT output.
    Page 6, “Experiments”
  2. For our test data, we only consider sentences containing measure words for Bleu score evaluation.
    Page 6, “Experiments”
  3. Our measure word generation step leads to a Bleu score improvement of 0.32 where the window size is set to 10, which shows that it can improve the translation quality of an English-to-Chinese SMT system.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention Bleu score.

See all papers in Proc. ACL that mention Bleu score.

Back to top.