A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
Shen, Libin and Xu, Jinxi and Weischedel, Ralph

Article Structure

Abstract

In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation.

Introduction

In recent years, hierarchical methods have been successfully applied to Statistical Machine Translation (Graehl and Knight, 2004; Chiang, 2005; Ding and Palmer, 2005; Quirk et al., 2005).

String-to-Dependency Translation

2.1 Transfer Rules with Well-Formed Dependency Structures

Dependency Language Model

For the dependency tree in Figure l, we calculate the probability of the tree as follows

Implementation Details

Features

Experiments

We carried out experiments on three models.

Discussion

The well-formed dependency structures defined here are similar to the data structures in previous work on monolingual parsing (Eisner and Satta, 1999; McDonald et al., 2005).

Conclusions and Future Work

In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation.

Topics

dependency tree

Appears in 18 sentences as: dependency tree (8) Dependency Trees (1) Dependency trees (2) dependency trees (7)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. Figure l: The dependency tree for sentence the boy will find it interesting
    Page 2, “Introduction”
  2. 1.2 Dependency Trees
    Page 2, “Introduction”
  3. Dependency trees reveal long-distance relations between words.
    Page 2, “Introduction”
  4. Figure 1 shows an example of a dependency tree .
    Page 2, “Introduction”
  5. Dependency trees are simpler in form than CFG trees since there are no constituent labels.
    Page 2, “Introduction”
  6. As such, dependency trees are a desirable prior model of the target sentence.
    Page 2, “Introduction”
  7. In order to take advantage of dynamic programming, we fixed the positions onto which another another tree could be attached by specifying NTs in dependency trees .
    Page 2, “Introduction”
  8. The use of dependency structures is due to the flexibility of dependency trees as a representation method which does not rely on constituents (Fox, 2002; Ding and Palmer, 2005; Quirk et al., 2005).
    Page 2, “Introduction”
  9. In one kind, we keep dependency trees with a sub-root, where all the children of the sub-root are complete.
    Page 3, “String-to-Dependency Translation”
  10. Figure 5: A dependency tree with flexible combination
    Page 4, “String-to-Dependency Translation”
  11. Figure 1 shows a traditional dependency tree .
    Page 4, “String-to-Dependency Translation”

See all papers in Proc. ACL 2008 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

BLEU

Appears in 11 sentences as: BLEU (12)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. Our eXperiments show that the string-to-dependency decoder achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to a standard hierarchical string—to—string system on the N IST 04 Chinese—English evaluation set.
    Page 1, “Abstract”
  2. For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system.
    Page 1, “Introduction”
  3. Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.
    Page 1, “Introduction”
  4. All models are tuned on BLEU (Papineni et al., 2001), and evaluated on both BLEU and Translation Error Rate (TER) (Snover et al., 2006) so that we could detect over-tuning on one metric.
    Page 7, “Experiments”
  5. BLEU % TER% lower mixed lower mixed Decoding (3—gram LM) baseline 38.18 35.77 58.91 56.60 filtered 37.92 35.48 57.80 55.43 str-dep 39.52 37.25 56.27 54.07 Rescoring (5—gram LM) baseline 40.53 38.26 56.35 54.15 filtered 40.49 38.26 55.57 53.47 str-dep 41.60 39.47 55.06 52.96
    Page 8, “Experiments”
  6. Table 2: BLEU and TER scores on the test set.
    Page 8, “Experiments”
  7. Table 2 shows the BLEU and TER scores on MT04.
    Page 8, “Experiments”
  8. On decoding output, the string-to-dependency system achieved 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to the baseline hierarchical string-to-string system.
    Page 8, “Experiments”
  9. After 5-gram rescoring, it achieved 1.21 point improvement in BLEU and 1.19 improvement in TER.
    Page 8, “Experiments”
  10. The filtered model does not show improvement on BLEU .
    Page 8, “Experiments”
  11. Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

language model

Appears in 11 sentences as: Language model (1) language model (11) language models (1)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n—gram language model .
    Page 1, “Abstract”
  2. language model during decoding, in order to exploit long-distance word relations which are unavailable with a traditional n-gram language model on target strings.
    Page 1, “Introduction”
  3. Section 3 illustrates of the use of dependency language models .
    Page 1, “Introduction”
  4. Formal definitions also allow us to easily extend the framework to incorporate a dependency language model in decoding.
    Page 3, “String-to-Dependency Translation”
  5. Supposing we use a traditional trigram language model in decoding, we need to specify the leftmost two words and the rightmost two words in a state.
    Page 6, “String-to-Dependency Translation”
  6. In the next section, we will explain how to extend categories and states to exploit a dependency language model during decoding.
    Page 6, “String-to-Dependency Translation”
  7. wh-as-head represents 21);, used as the head, and it is different from 212;, in the dependency language model .
    Page 6, “Dependency Language Model”
  8. In order to calculate the dependency language model score, or depLM score for short, on the fly for
    Page 6, “Dependency Language Model”
  9. Language model score .
    Page 7, “Implementation Details”
  10. Dependency language model score 8.
    Page 7, “Implementation Details”
  11. (2003) described a two-step string-to-CFG—tree translation model which employed a syntax-based language model to select the best translation from a target parse forest built in the first step.
    Page 8, “Discussion”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

LM

Appears in 10 sentences as: LM (12) LM, (1)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. Suppose we use a trigram dependency LM,
    Page 6, “Dependency Language Model”
  2. We rescore 1000-best translations (Huang and Chiang, 2005) by replacing the 3-gram LM score with the 5-gram LM score computed offline.
    Page 7, “Implementation Details”
  3. 0 str-dep: a string-to-dependency system with a dependency LM .
    Page 7, “Experiments”
  4. The English side of this subset was also used to train a 3-gram dependency LM .
    Page 7, “Experiments”
  5. BLEU% TER% lower mixed lower mixed Decoding (3—gram LM) baseline 38.18 35.77 58.91 56.60 filtered 37.92 35.48 57.80 55.43 str-dep 39.52 37.25 56.27 54.07 Rescoring (5—gram LM ) baseline 40.53 38.26 56.35 54.15 filtered 40.49 38.26 55.57 53.47 str-dep 41.60 39.47 55.06 52.96
    Page 8, “Experiments”
  6. However, dependency structures allow the use of a dependency LM which gives rise to significant improvement.
    Page 8, “Experiments”
  7. Only translation probability P was employed in the construction of the target forest due to the complexity of the syntax-based LM .
    Page 8, “Discussion”
  8. Since our dependency LM models structures over target words directly based on dependency trees, we can build a single-step system.
    Page 8, “Discussion”
  9. This dependency LM can also be used in hierarchical MT systems using lexical-ized CFG trees.
    Page 8, “Discussion”
  10. The use of a dependency LM in MT is similar to the use of a structured LM in ASR (Xu et al., 2002), which was also designed to exploit long-distance relations.
    Page 8, “Discussion”

See all papers in Proc. ACL 2008 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

TER

Appears in 9 sentences as: TER (9)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. Our eXperiments show that the string-to-dependency decoder achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to a standard hierarchical string—to—string system on the N IST 04 Chinese—English evaluation set.
    Page 1, “Abstract”
  2. Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.
    Page 1, “Introduction”
  3. All models are tuned on BLEU (Papineni et al., 2001), and evaluated on both BLEU and Translation Error Rate ( TER ) (Snover et al., 2006) so that we could detect over-tuning on one metric.
    Page 7, “Experiments”
  4. BLEU% TER % lower mixed lower mixed Decoding (3—gram LM) baseline 38.18 35.77 58.91 56.60 filtered 37.92 35.48 57.80 55.43 str-dep 39.52 37.25 56.27 54.07 Rescoring (5—gram LM) baseline 40.53 38.26 56.35 54.15 filtered 40.49 38.26 55.57 53.47 str-dep 41.60 39.47 55.06 52.96
    Page 8, “Experiments”
  5. Table 2: BLEU and TER scores on the test set.
    Page 8, “Experiments”
  6. Table 2 shows the BLEU and TER scores on MT04.
    Page 8, “Experiments”
  7. On decoding output, the string-to-dependency system achieved 1.48 point improvement in BLEU and 2.53 point improvement in TER compared to the baseline hierarchical string-to-string system.
    Page 8, “Experiments”
  8. After 5-gram rescoring, it achieved 1.21 point improvement in BLEU and 1.19 improvement in TER .
    Page 8, “Experiments”
  9. Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

dynamic programming

Appears in 5 sentences as: Dynamic Programming (1) dynamic programming (4)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. We restrict the target side to the so called well-formed dependency structures, in order to cover a large set of non-constituent transfer rules (Marcu et al., 2006), and enable efficient decoding through dynamic programming .
    Page 1, “Introduction”
  2. Dynamic Programming
    Page 2, “Introduction”
  3. tation and make it hard to employ shared structures for efficient dynamic programming .
    Page 2, “Introduction”
  4. In order to take advantage of dynamic programming , we fixed the positions onto which another another tree could be attached by specifying NTs in dependency trees.
    Page 2, “Introduction”
  5. The well-formedness of the dependency structures enables efficient decoding through dynamic programming .
    Page 2, “Introduction”

See all papers in Proc. ACL 2008 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

NIST

Appears in 5 sentences as: NIST (5)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system.
    Page 1, “Introduction”
  2. Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.
    Page 1, “Introduction”
  3. We used part of the NIST 2006 Chinese-English large track data as well as some LDC corpora collected for the DARPA GALE program (LDC2005E83, LDC2006E34 and LDC2006G05) as our bilingual training data.
    Page 7, “Experiments”
  4. We tuned the weights on NIST MT05 and tested on MT04.
    Page 7, “Experiments”
  5. Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

Chinese-English

Appears in 4 sentences as: Chinese-English (4)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. For example, Chiang (2007) showed that the Hiero system achieved about 1 to 3 point improvement in BLEU on the NIST 03/04/05 Chinese-English evaluation sets compared to a start-of-the-art phrasal system.
    Page 1, “Introduction”
  2. Our string-to-dependency decoder shows 1.48 point improvement in BLEU and 2.53 point improvement in TER on the NIST 04 Chinese-English MT evaluation set.
    Page 1, “Introduction”
  3. We used part of the NIST 2006 Chinese-English large track data as well as some LDC corpora collected for the DARPA GALE program (LDC2005E83, LDC2006E34 and LDC2006G05) as our bilingual training data.
    Page 7, “Experiments”
  4. Our string-to-dependency system generates 80% fewer rules, and achieves 1.48 point improvement in BLEU and 2.53 point improvement in TER on the decoding output on the NIST 04 Chinese-English evaluation set.
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

Machine Translation

Appears in 4 sentences as: Machine Translation (2) machine translation (2)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
    Page 1, “Abstract”
  2. In recent years, hierarchical methods have been successfully applied to Statistical Machine Translation (Graehl and Knight, 2004; Chiang, 2005; Ding and Palmer, 2005; Quirk et al., 2005).
    Page 1, “Introduction”
  3. 1.1 Hierarchical Machine Translation
    Page 1, “Introduction”
  4. In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention Machine Translation.

See all papers in Proc. ACL that mention Machine Translation.

Back to top.

model score

Appears in 3 sentences as: model score (3)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. In order to calculate the dependency language model score , or depLM score for short, on the fly for
    Page 6, “Dependency Language Model”
  2. Language model score .
    Page 7, “Implementation Details”
  3. Dependency language model score 8.
    Page 7, “Implementation Details”

See all papers in Proc. ACL 2008 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.

MT system

Appears in 3 sentences as: MT system (2) MT systems (1)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. In section 4, we describe the implementation details of our MT system .
    Page 1, “Introduction”
  2. 0 filtered: a string-to-string MT system as in baseline.
    Page 7, “Experiments”
  3. This dependency LM can also be used in hierarchical MT systems using lexical-ized CFG trees.
    Page 8, “Discussion”

See all papers in Proc. ACL 2008 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

statistical machine translation

Appears in 3 sentences as: Statistical Machine Translation (1) statistical machine translation (2)
In A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
  1. In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
    Page 1, “Abstract”
  2. In recent years, hierarchical methods have been successfully applied to Statistical Machine Translation (Graehl and Knight, 2004; Chiang, 2005; Ding and Palmer, 2005; Quirk et al., 2005).
    Page 1, “Introduction”
  3. In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
    Page 8, “Conclusions and Future Work”

See all papers in Proc. ACL 2008 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.