Forest-Based Translation
Mi, Haitao and Huang, Liang and Liu, Qun

Article Structure

Abstract

Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart.

Introduction

Syntax-based machine translation has witnessed promising improvements in recent years.

Tree-based systems

Current tree-based systems perform translation in two separate steps: parsing and decoding.

Forest-based translation

We now extend the tree-based idea from the previous section to the case of forest-based translation.

Experiments

We can extend the simple model in Equation 1 to a log-linear one (Liu et al., 2006; Huang et al., 2006):

Conclusion and future work

We have presented a novel forest-based translation approach which uses a packed forest rather than the 1-best parse tree (or k-best parse trees) to direct the translation.

Topics

BLEU

Appears in 12 sentences as: BLEU (12)
In Forest-Based Translation
  1. Large-scale experiments show an absolute improvement of 1.7 BLEU points over the l-best baseline.
    Page 1, “Abstract”
  2. Large-scale experiments (Section 4) show an improvement of 1.7 BLEU points over the l-best baseline, which is also 0.8 points higher than decoding with 30-best trees, and takes even less time thanks to the sharing of common subtrees.
    Page 2, “Introduction”
  3. BLEU score
    Page 6, “Experiments”
  4. We use the standard minimum error-rate training (Och, 2003) to tune the feature weights to maximize the system’s BLEU score on the dev set.
    Page 6, “Experiments”
  5. The BLEU score of the baseline 1-best decoding is 0.2325, which is consistent with the result of 0.2302 in (Liu et al., 2007) on the same training, development and test sets, and with the same rule extraction procedure.
    Page 6, “Experiments”
  6. The corresponding BLEU score of Pharaoh (Koehn, 2004) is 0.2182 on this dataset.
    Page 6, “Experiments”
  7. Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees.
    Page 6, “Experiments”
  8. and produces consistently better BLEU scores.
    Page 7, “Experiments”
  9. With pruning threshold p = 12, it achieved a BLEU score of 0.2485, which is an absolute improvement of 1.6% points over the 1-best baseline, and is statistically significant using the sign-test of Collins et al.
    Page 7, “Experiments”
  10. Table 1: BLEU score results from training on large data.
    Page 7, “Experiments”
  11. The BLEU score of forest decoder with TR is 0.2839, which is a 1.7% points improvement over the 1-best baseline, and this difference is statistically significant (p < 0.01).
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

parse tree

Appears in 11 sentences as: parse tree (9) parse trees (3)
In Forest-Based Translation
  1. Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than its string-based counterpart.
    Page 1, “Abstract”
  2. Depending on the type of input, these efforts can be divided into two broad categories: the string-based systems whose input is a string to be simultaneously parsed and translated by a synchronous grammar (Wu, 1997; Chiang, 2005; Galley et al., 2006), and the tree-based systems whose input is already a parse tree to be directly converted into a target tree or string (Lin, 2004; Ding and Palmer, 2005; Quirk et al., 2005; Liu et al., 2006; Huang et al., 2006).
    Page 1, “Introduction”
  3. However, despite these advantages, current tree-based systems suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors (Quirk and Corston-Oliver, 2006).
    Page 1, “Introduction”
  4. Informally, a packed parse forest, or forest in short, is a compact representation of all the derivations (i.e., parse trees ) for a given sentence under a context-free grammar (Billot and Lang, 1989).
    Page 3, “Forest-based translation”
  5. The parse tree for the preposition case is shown in Figure 2(b) as the l-best parse, while for the conjunction case, the two proper nouns (Basin and Shalong) are combined to form a coordinated NP
    Page 3, “Forest-based translation”
  6. Shown in Figure 3(a), these two parse trees can be represented as a single forest by sharing common subtrees such as NPB0,1 and VPB3,6.
    Page 3, “Forest-based translation”
  7. Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees.
    Page 6, “Experiments”
  8. 1' (rank of the parse tree picked by the decoder)
    Page 7, “Experiments”
  9. Figure 5: Percentage of the i-th best parse tree being picked in decoding.
    Page 7, “Experiments”
  10. We also investigate the question of how often the ith-best parse tree is picked to direct the translation (2' = 1, 2, .
    Page 7, “Experiments”
  11. We have presented a novel forest-based translation approach which uses a packed forest rather than the 1-best parse tree (or k-best parse trees ) to direct the translation.
    Page 7, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.

BLEU score

Appears in 10 sentences as: BLEU score (9) BLEU scores (1)
In Forest-Based Translation
  1. BLEU score
    Page 6, “Experiments”
  2. We use the standard minimum error-rate training (Och, 2003) to tune the feature weights to maximize the system’s BLEU score on the dev set.
    Page 6, “Experiments”
  3. The BLEU score of the baseline 1-best decoding is 0.2325, which is consistent with the result of 0.2302 in (Liu et al., 2007) on the same training, development and test sets, and with the same rule extraction procedure.
    Page 6, “Experiments”
  4. The corresponding BLEU score of Pharaoh (Koehn, 2004) is 0.2182 on this dataset.
    Page 6, “Experiments”
  5. Using more than one parse tree apparently improves the BLEU score , but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees.
    Page 6, “Experiments”
  6. and produces consistently better BLEU scores .
    Page 7, “Experiments”
  7. With pruning threshold p = 12, it achieved a BLEU score of 0.2485, which is an absolute improvement of 1.6% points over the 1-best baseline, and is statistically significant using the sign-test of Collins et al.
    Page 7, “Experiments”
  8. Table 1: BLEU score results from training on large data.
    Page 7, “Experiments”
  9. The BLEU score of forest decoder with TR is 0.2839, which is a 1.7% points improvement over the 1-best baseline, and this difference is statistically significant (p < 0.01).
    Page 7, “Experiments”
  10. Using bilingual phrases further improves the BLEU score by 3.1% points, which is 2.1% points higher than the respective 1-best baseline.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

subtrees

Appears in 7 sentences as: subtrees (7)
In Forest-Based Translation
  1. However, a k-best list, with its limited scope, often has too few variations and too many redundancies; for example, a 50—best list typically encodes a combination of 5 or 6 binary ambiguities (since 25 < 50 < 26), and many subtrees are repeated across different parses (Huang, 2008).
    Page 1, “Introduction”
  2. Large-scale experiments (Section 4) show an improvement of 1.7 BLEU points over the l-best baseline, which is also 0.8 points higher than decoding with 30-best trees, and takes even less time thanks to the sharing of common subtrees .
    Page 2, “Introduction”
  3. (Liu et al., 2007) was a misnomer which actually refers to a set of several unrelated subtrees over disjoint spans, and should not be confused with the standard concept of packed forest.
    Page 2, “Tree-based systems”
  4. which results in two unfinished subtrees in (c).
    Page 2, “Tree-based systems”
  5. which perform phrasal translations for the two remaining subtrees , respectively, and get the Chinese translation in (e).
    Page 3, “Tree-based systems”
  6. Shown in Figure 3(a), these two parse trees can be represented as a single forest by sharing common subtrees such as NPB0,1 and VPB3,6.
    Page 3, “Forest-based translation”
  7. Using more than one parse tree apparently improves the BLEU score, but at the cost of much slower decoding, since each of the top-k trees has to be decoded individually although they share many common subtrees .
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

+LM

Appears in 6 sentences as: +LM (5) LM (4)
In Forest-Based Translation
  1. The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training.
    Page 5, “Forest-based translation”
  2. For l-best search, we use the cube pruning technique (Chiang, 2007; Huang and Chiang, 2007) which approximately intersects the translation forest with the LM .
    Page 5, “Forest-based translation”
  3. Basically, cube pruning works bottom up in a forest, keeping at most k +LM items at each node, and uses the best-first expansion idea from the Algorithm 2 of Huang and Chiang (2005) to speed
    Page 5, “Forest-based translation”
  4. An +LM item of node v has the form (v“*”), where a and b are the target-language boundary words.
    Page 5, “Forest-based translation”
  5. For example, (VP 111,661‘1 * Sharon) is an +LM item with its translation starting with “held” and ending with “Sharon”.
    Page 5, “Forest-based translation”
  6. However, this time we work on a finer-grained forest, called translation +LM forest, resulting from the intersection of the translation forest and the LM, with its nodes being the +LM items during cube pruning.
    Page 5, “Forest-based translation”

See all papers in Proc. ACL 2008 that mention +LM.

See all papers in Proc. ACL that mention +LM.

Back to top.

language model

Appears in 4 sentences as: language model (4) Language Modeling (1)
In Forest-Based Translation
  1. The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training.
    Page 5, “Forest-based translation”
  2. language model and (93'8' is the length penalty term
    Page 5, “Experiments”
  3. We also use the SRI Language Modeling Toolkit (Stolcke, 2002) to train a trigram language model with Kneser-Ney smoothing on the English side of the bitext.
    Page 6, “Experiments”
  4. Besides the trigram language model trained on the English side of these bitext, we also use another trigram model trained on the first 1/3 of the Xinhua portion of Gigaword corpus.
    Page 7, “Experiments”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.