Improving Tree-to-Tree Translation with Packed Forests
Liu, Yang and Lü, Yajuan and Liu, Qun

Article Structure

Abstract

Current tree-to-tree models suffer from parsing errors as they usually use only 1-best parses for rule extraction and decoding.

Introduction

Approaches to syntax-based statistical machine translation make use of parallel data with syntactic annotations, either in the form of phrase structure trees or dependency trees.

Model

Figure 1 shows an aligned forest pair for a Chinese sentence and an English sentence.

Rule Extraction

Given an aligned forest pair as shown in Figure 1, how to extract all valid tree-to-tree rules that explain its synchronous generation process?

Decoding

Given a source packed forest F8, our decoder finds the target yield of the single best derivation d that has source yield of T, (d) 6 F8:

Experiments

5.1 Data Preparation

Related Work

In machine translation, the concept of packed forest is first used by Huang and Chiang (2007) to characterize the search space of decoding with language models.

Conclusion

We have shown how to improve tree-to-tree translation with packed forests, which compactly encode exponentially many parses.

Topics

BLEU

Appears in 14 sentences as: BLEU (15)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of 3.6 BLEU points over using l-best trees.
    Page 1, “Abstract”
  2. Comparable to Moses, our forest-based tree-to-tree model achieves an absolute improvement of 3.6 BLEU points over conventional tree-based model.
    Page 1, “Introduction”
  3. We evaluated the translation quality using the BLEU metric, as calculated by mteval-vl lb.pl with its default setting except that we used case-insensitive matching of n-grams.
    Page 6, “Experiments”
  4. avg trees # of rules BLEU
    Page 7, “Experiments”
  5. Table 3: Comparison of BLEU scores for tree-based and forest-based tree-to-tree models.
    Page 7, “Experiments”
  6. Table 3 shows the BLEU scores of tree-based and forest-based tree-to-tree models achieved on the test set over different pruning thresholds.
    Page 7, “Experiments”
  7. With the increase of the number of rules used, the BLEU score increased accordingly.
    Page 7, “Experiments”
  8. The absolute improvement of 3.6 BLEU points (from 0.2021 to 02385) is statistically significant at p < 0.01 using the sign-test as described by Collins et al.
    Page 7, “Experiments”
  9. We also ran Moses (Koehn et al., 2007) with its default setting using the same data and obtained a BLEU score of 0.2366, slightly lower than our best result (i.e., 0.2385).
    Page 7, “Experiments”
  10. BLEU
    Page 8, “Experiments”
  11. Figure 5: Effect of maximal node count on BLEU scores.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

BLEU score

Appears in 8 sentences as: BLEU score (5) BLEU scores (4)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Table 3: Comparison of BLEU scores for tree-based and forest-based tree-to-tree models.
    Page 7, “Experiments”
  2. Table 3 shows the BLEU scores of tree-based and forest-based tree-to-tree models achieved on the test set over different pruning thresholds.
    Page 7, “Experiments”
  3. With the increase of the number of rules used, the BLEU score increased accordingly.
    Page 7, “Experiments”
  4. We also ran Moses (Koehn et al., 2007) with its default setting using the same data and obtained a BLEU score of 0.2366, slightly lower than our best result (i.e., 0.2385).
    Page 7, “Experiments”
  5. Figure 5: Effect of maximal node count on BLEU scores .
    Page 8, “Experiments”
  6. Figure 5 shows the effect of maximal node count on BLEU scores .
    Page 8, “Experiments”
  7. With the increase of maximal node count, the BLEU score increased dramatically.
    Page 8, “Experiments”
  8. Moses obtained a BLEU score of 0.3043 and our forest-based tree-to-tree system achieved a BLEU score of 0.3059.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

word alignments

Appears in 6 sentences as: word alignments (6)
In Improving Tree-to-Tree Translation with Packed Forests
  1. The solid lines denote hyperedges and the dashed lines denote word alignments .
    Page 2, “Introduction”
  2. The solid lines denote hyperedges and the dashed lines denote word alignments between the two forests.
    Page 2, “Model”
  3. By constructing a theory that gives formal semantics to word alignments , Galley et al.
    Page 3, “Rule Extraction”
  4. Their GHKM procedure draws connections among word alignments , derivations, and rules.
    Page 3, “Rule Extraction”
  5. They first identify the tree nodes that subsume tree-string pairs consistent with word alignments and then extract rules from these nodes.
    Page 3, “Rule Extraction”
  6. We obtained word alignments of the training data by first running GIZA++ (Och and Ney, 2003) and then applying the refinement rule “grow-diag-final-and” (Koehn et al., 2003).
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

language model

Appears in 4 sentences as: language model (3) language models (1)
In Improving Tree-to-Tree Translation with Packed Forests
  1. 1 to a log-linear model (Och and Ney, 2002) that uses the following eight features: relative frequencies in two directions, lexical weights in two directions, number of rules used, language model score, number of target words produced, and the probability of matched source tree (Mi et al., 2008).
    Page 6, “Decoding”
  2. We use the cube pruning method (Chiang, 2007) to approximately intersect the translation forest with the language model .
    Page 6, “Decoding”
  3. A trigram language model was trained on the English sentences of the training corpus.
    Page 6, “Experiments”
  4. In machine translation, the concept of packed forest is first used by Huang and Chiang (2007) to characterize the search space of decoding with language models .
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

lexicalized

Appears in 4 sentences as: lexicalized (4)
In Improving Tree-to-Tree Translation with Packed Forests
  1. If the tree contains only one node, it must be a lexicalized frontier node;
    Page 4, “Rule Extraction”
  2. If the tree contains more than one nodes, its leaves are either non-lexicalized frontier nodes or lexicalized non-frontier nodes.
    Page 4, “Rule Extraction”
  3. Figure 4: Coverage of lexicalized STSG rules on bilingual phrases.
    Page 7, “Experiments”
  4. We extracted phrase pairs from the training data to investigate how many phrase pairs can be captured by lexicalized tree-to-tree rules that contain only terminals.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention lexicalized.

See all papers in Proc. ACL that mention lexicalized.

Back to top.

phrase-based

Appears in 4 sentences as: phrase-based (4)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of 3.6 BLEU points over using l-best trees.
    Page 1, “Abstract”
  2. The absence of such non-syntactic mappings prevents tree-based tree-to-tree models from achieving comparable results to phrase-based models.
    Page 7, “Experiments”
  3. They replace l-best trees with packed forests both in training and decoding and show superior translation quality over the state-of-the-art hierarchical phrase-based system.
    Page 8, “Related Work”
  4. Our system also achieves comparable performance with the state-of-the-art phrase-based system Moses.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2009 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

translation quality

Appears in 4 sentences as: translation quality (4)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Studies reveal that the absence of such non-syntactic mappings will impair translation quality dramatically (Marcu et al., 2006; Liu et al., 2007; DeNeefe et al., 2007; Zhang et al., 2008).
    Page 1, “Introduction”
  2. We evaluated the translation quality using the BLEU metric, as calculated by mteval-vl lb.pl with its default setting except that we used case-insensitive matching of n-grams.
    Page 6, “Experiments”
  3. As a result, packed forests enable tree-to-tree models to capture more useful source-target mappings and therefore improve translation quality .
    Page 8, “Experiments”
  4. They replace l-best trees with packed forests both in training and decoding and show superior translation quality over the state-of-the-art hierarchical phrase-based system.
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.

BLEU points

Appears in 3 sentences as: BLEU points (3)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of 3.6 BLEU points over using l-best trees.
    Page 1, “Abstract”
  2. Comparable to Moses, our forest-based tree-to-tree model achieves an absolute improvement of 3.6 BLEU points over conventional tree-based model.
    Page 1, “Introduction”
  3. The absolute improvement of 3.6 BLEU points (from 0.2021 to 02385) is statistically significant at p < 0.01 using the sign-test as described by Collins et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2009 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.

phrase pairs

Appears in 3 sentences as: phrase pair (1) phrase pairs (3)
In Improving Tree-to-Tree Translation with Packed Forests
  1. We extracted phrase pairs from the training data to investigate how many phrase pairs can be captured by lexicalized tree-to-tree rules that contain only terminals.
    Page 7, “Experiments”
  2. We set the maximal length of phrase pairs to 10.
    Page 7, “Experiments”
  3. While this method works for l-best tree pairs, it cannot be applied to packed forest pairs because it is impractical to enumerate all tree pairs over a phrase pair .
    Page 8, “Related Work”

See all papers in Proc. ACL 2009 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pairs (3)
In Improving Tree-to-Tree Translation with Packed Forests
  1. Table 4: Comparison of rule extraction time (seconds/ 1000 sentence pairs ) and decoding time (sec-ond/sentence)
    Page 7, “Experiments”
  2. Table 4 gives the rule extraction time (seconds/ 1000 sentence pairs ) and decoding time (sec-ond/sentence) with varying pruning thresholds.
    Page 8, “Experiments”
  3. the new training corpus contained about 260K sentence pairs with 7.39M Chinese words and 9.41M English words.
    Page 8, “Experiments”

See all papers in Proc. ACL 2009 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.