Pseudo-Word for Phrase-Based Machine Translation
Duan, Xiangyu and Zhang, Min and Li, Haizhou

Article Structure

Abstract

The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.

Introduction

The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.

Searching for Pseudo-words

Pseudo-word searching problem is equal to decomposition of a given sentence into pseudo-words.

Experiments and Results

In our experiments, pseudo-words are fed into PB-SMT pipeline.

Conclusion

We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation.

Topics

word alignments

Appears in 18 sentences as: word aligned (2) word aligner (4) word alignment (4) word alignments (9)
In Pseudo-Word for Phrase-Based Machine Translation
  1. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
    Page 1, “Abstract”
  2. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist.
    Page 1, “Abstract”
  3. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
    Page 1, “Introduction”
  4. But there is a deficiency in such manner that word is too fine-grained in some cases such as non-compositional phrasal equivalences, where clear word alignments do not exist.
    Page 1, “Introduction”
  5. No clear word alignments
    Page 1, “Introduction”
  6. Marcu and Wong (2002) attempted to directly learn phrasal alignments instead of word alignments .
    Page 1, “Introduction”
  7. (2004) used word aligner to build a Chinese dictionary to re-segment Chinese sentence.
    Page 1, “Introduction”
  8. Ma and Way (2009) tackle this problem by using word aligner to bootstrap bilingual segmentation suitable for machine translation.
    Page 1, “Introduction”
  9. pressions by monotonically segmenting a given Spanish-English sentence pair into bilingual units, where word aligner is also used.
    Page 2, “Introduction”
  10. This paper focuses on determining the basic translational units on both language sides without using word aligner before feeding them into PB-SMT pipeline.
    Page 2, “Introduction”
  11. Then we apply word alignment techniques to build pseudo-word alignments.
    Page 3, “Searching for Pseudo-words”

See all papers in Proc. ACL 2010 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

sentence pair

Appears in 12 sentences as: sentence pair (12) sentence pairs (1)
In Pseudo-Word for Phrase-Based Machine Translation
  1. But computational complexity is prohibitively high for the exponentially large number of decompositions of a sentence pair into phrase pairs.
    Page 1, “Introduction”
  2. pressions by monotonically segmenting a given Spanish-English sentence pair into bilingual units, where word aligner is also used.
    Page 2, “Introduction”
  3. X and Y are sentence pair and multi-word pairs respectively in this bilingual scenario.
    Page 2, “Searching for Pseudo-words”
  4. Pseudo-word pairs of one sentence pair are such pairs that maximize the sum of Span-pairs’ bilingual sequence significances: K pwpf = ARGMAXZkzlSigspan_pm-rk (6) span—patrl
    Page 3, “Searching for Pseudo-words”
  5. Searching for pseudo-word pairs pwpIK is equal to bilingual segmentation of a sentence pair into optimal Span-pairIK.
    Page 3, “Searching for Pseudo-words”
  6. Maximal sum of bilingual sequence sig-nif1cances for one sentence pair is guaranteed through this bottom-up way, and the optimal decomposition of the sentence pair is obtained correspondingly.
    Page 4, “Searching for Pseudo-words”
  7. In fact, SSP requires that all parts of a sentence pair should be aligned.
    Page 4, “Searching for Pseudo-words”
  8. Since we limit excluded parts in the middle of a span-pair, the algorithm will end without considering boundary parts of a sentence pair as NULL alignments.
    Page 5, “Searching for Pseudo-words”
  9. Statistics of corpora, “Ch” denotes Chinese, “En” denotes English, “Sent.” row is the number of sentence pairs , “word” row is the number of words,
    Page 5, “Experiments and Results”
  10. Figure 5 illustrates examples of pseudo-words of one Chinese-to-English sentence pair .
    Page 6, “Experiments and Results”
  11. SSP has a strong constraint that all parts of a sentence pair should be aligned, so source sentence and target sentence have same length after merging words into
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

language model

Appears in 11 sentences as: language model (12) Language Modeling (1)
In Pseudo-Word for Phrase-Based Machine Translation
  1. Further experiments of removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, show that pseudo-words still improve translational performance significantly over unary words.
    Page 2, “Introduction”
  2. The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
    Page 5, “Experiments and Results”
  3. A 5-gram language model is trained on English side of parallel corpus.
    Page 6, “Experiments and Results”
  4. Xinhua portion of the English Gigaword3 corpus is used together with English side of large corpus to train a 4-gram language model .
    Page 6, “Experiments and Results”
  5. The setting of language model order for each corpus is not changed.
    Page 6, “Experiments and Results”
  6. Because pseudo-word is a kind of multi-word expression, it has inborn advantage of higher language model order and longer max phrase length over unary word.
    Page 6, “Experiments and Results”
  7. The advantage of longer max phrase length is removed during phrase extraction, and the advantage of higher order of language model is also removed during decoding since we use language model trained on unary words.
    Page 6, “Experiments and Results”
  8. pseudo-word itself as basic translational unit, does not rely very much on higher language model order or longer max phrase length setting.
    Page 7, “Experiments and Results”
  9. It shows that the improvement derives from pseudo-word itself as basic translational unit, does not rely very much on higher language model order or longer max phrase length setting.
    Page 8, “Experiments and Results”
  10. In fact, slight improvement in pwchpwen and pwchwen is seen after pseudo-word unpacking, which indicates that higher language model order and longer max phrase length impact the performance in these two configurations.
    Page 8, “Experiments and Results”
  11. Removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, shows that pseudo-words still improve translational performance significantly over unary words.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

BLEU

Appears in 10 sentences as: BLEU (11)
In Pseudo-Word for Phrase-Based Machine Translation
  1. Statistical significance in BLEU score differences was tested by paired bootstrap re-sampling (Koehn, 2004).
    Page 6, “Experiments and Results”
  2. BLEU 0.4029 0.3146 NIST 7.0419 8.8462 METEOR 0.5785 0.5335
    Page 6, “Experiments and Results”
  3. Both SMP and ESSP outperform baseline consistently in BLEU , NIST and METEOR.
    Page 6, “Experiments and Results”
  4. Best ESSP (Wchpwen) is significantly better than baseline (p<0.0l) in BLEU score, best SMP (wdpwen) is significantly better than baseline (p<0.05) in BLEU score.
    Page 6, “Experiments and Results”
  5. BLEU 0.4097 0.4182 0.4031 0.4029 NIST 7.5547 7.2893 7.2670 7.0419 METEOR 0.5951 0.5874 0.5846 0.5785
    Page 7, “Experiments and Results”
  6. wchpwen is significantly better than baseline (p<0.04) in BLEU score.
    Page 7, “Experiments and Results”
  7. BLEU 0.3185 0.3230 0.3166 0.3146 NIST 8.9216 9.0447 8.9210 8.8462 METEOR 0.5402 0.5489 0.5435 0.5335
    Page 7, “Experiments and Results”
  8. chhPWen attains the best BLEU among the three configurations, and is significantly better than baseline (p<0.03).
    Page 7, “Experiments and Results”
  9. BLEU 0.3219 0.3192 0.3187 0.3146 NIST 8.9458 8.9325 8.9801 8.8462 METEOR 0.5429 0.5424 0.5411 0.5335
    Page 8, “Experiments and Results”
  10. The experimental results show that English chunking performs far below baseline, usually 8 absolute BLEU points below.
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

machine translation

Appears in 9 sentences as: Machine Translation (2) machine translation (7) machine translational (1)
In Pseudo-Word for Phrase-Based Machine Translation
  1. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
    Page 1, “Abstract”
  2. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
    Page 1, “Introduction”
  3. Some researchers have explored coarse-grained translational unit for machine translation .
    Page 1, “Introduction”
  4. (2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation .
    Page 1, “Introduction”
  5. There are also researches focusing on the impact of various segmentation tools on machine translation (Ma et al.
    Page 1, “Introduction”
  6. Ma and Way (2009) tackle this problem by using word aligner to bootstrap bilingual segmentation suitable for machine translation .
    Page 1, “Introduction”
  7. We conduct experiments on Chinese-to-English machine translation .
    Page 5, “Experiments and Results”
  8. We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation .
    Page 8, “Conclusion”
  9. Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

NIST

Appears in 6 sentences as: NIST (6)
In Pseudo-Word for Phrase-Based Machine Translation
  1. Additionally, NIST score (Dod-dington, 2002) and METEOR (Banerjee and La-vie, 2005) are also used to check the consistency of experimental results.
    Page 6, “Experiments and Results”
  2. BLEU 0.4029 0.3146 NIST 7.0419 8.8462 METEOR 0.5785 0.5335
    Page 6, “Experiments and Results”
  3. Both SMP and ESSP outperform baseline consistently in BLEU, NIST and METEOR.
    Page 6, “Experiments and Results”
  4. BLEU 0.4097 0.4182 0.4031 0.4029 NIST 7.5547 7.2893 7.2670 7.0419 METEOR 0.5951 0.5874 0.5846 0.5785
    Page 7, “Experiments and Results”
  5. BLEU 0.3185 0.3230 0.3166 0.3146 NIST 8.9216 9.0447 8.9210 8.8462 METEOR 0.5402 0.5489 0.5435 0.5335
    Page 7, “Experiments and Results”
  6. BLEU 0.3219 0.3192 0.3187 0.3146 NIST 8.9458 8.9325 8.9801 8.8462 METEOR 0.5429 0.5424 0.5411 0.5335
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

phrase-based

Appears in 6 sentences as: Phrase-Based (2) phrase-based (4)
In Pseudo-Word for Phrase-Based Machine Translation
  1. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
    Page 1, “Abstract”
  2. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
    Page 1, “Introduction”
  3. The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
    Page 5, “Experiments and Results”
  4. We use GIZA++ model 4 for word alignment, use Moses for phrase-based decoding.
    Page 6, “Experiments and Results”
  5. We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation.
    Page 8, “Conclusion”
  6. Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

Chinese word

Appears in 5 sentences as: Chinese word (5)
In Pseudo-Word for Phrase-Based Machine Translation
  1. In Chinese-to-English translation task Where Chinese word boundaries are not marked, Xu et al.
    Page 1, “Introduction”
  2. (2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation.
    Page 1, “Introduction”
  3. 2009), only focusing on Chinese word as basic translational unit is not adequate to model I-to-n translations.
    Page 1, “Introduction”
  4. It seems that Chinese word prefers to have English pseudo-word equivalence which has more than or equal to one word.
    Page 6, “Experiments and Results”
  5. Similar to performances on small corpus, wdpwen always performs better than the other two cases, which indicates that Chinese word prefers to have English pseudo-word equivalence which has more than or equal to one word.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention Chinese word.

See all papers in Proc. ACL that mention Chinese word.

Back to top.

parallel corpus

Appears in 4 sentences as: parallel corpus (4)
In Pseudo-Word for Phrase-Based Machine Translation
  1. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus .
    Page 1, “Abstract”
  2. The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
    Page 1, “Introduction”
  3. A 5-gram language model is trained on English side of parallel corpus .
    Page 6, “Experiments and Results”
  4. SSP puts overly strong alignment constraints on parallel corpus , which impacts performance dramatically.
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

BLEU score

Appears in 3 sentences as: BLEU score (4)
In Pseudo-Word for Phrase-Based Machine Translation
  1. Statistical significance in BLEU score differences was tested by paired bootstrap re-sampling (Koehn, 2004).
    Page 6, “Experiments and Results”
  2. Best ESSP (Wchpwen) is significantly better than baseline (p<0.0l) in BLEU score, best SMP (wdpwen) is significantly better than baseline (p<0.05) in BLEU score .
    Page 6, “Experiments and Results”
  3. wchpwen is significantly better than baseline (p<0.04) in BLEU score .
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

fine-grained

Appears in 3 sentences as: fine-grained (3)
In Pseudo-Word for Phrase-Based Machine Translation
  1. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist.
    Page 1, “Abstract”
  2. But there is a deficiency in such manner that word is too fine-grained in some cases such as non-compositional phrasal equivalences, where clear word alignments do not exist.
    Page 1, “Introduction”
  3. It is proposed to replace too fine-grained word as basic translational unit.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2010 that mention fine-grained.

See all papers in Proc. ACL that mention fine-grained.

Back to top.

Gold standard

Appears in 3 sentences as: Gold standard (2) gold standard (1)
In Pseudo-Word for Phrase-Based Machine Translation
  1. Gold standard word alignments are shown at the bottom of figure 5.
    Page 6, “Experiments and Results”
  2. Gold standard word alignments
    Page 7, “Experiments and Results”
  3. Outputs of the three algorithms ESSP, SMP and SSP on one sentence pair and gold standard word alignments.
    Page 7, “Experiments and Results”

See all papers in Proc. ACL 2010 that mention Gold standard.

See all papers in Proc. ACL that mention Gold standard.

Back to top.