Cohesive Phrase-Based Decoding for Statistical Machine Translation
Cherry, Colin

Article Structure

Abstract

Phrase-based decoding produces state-of-the-art translations with no regard for syntax.

Introduction

Statistical machine translation (SMT) is complicated by the fact that words can move during translation.

Cohesive Phrasal Output

Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003).

Cohesive Decoding

This section describes a modification to standard phrase-based decoding, so that the system is constrained to produce only cohesive output.

Experiments

We have adapted the notion of syntactic cohesion so that it is applicable to phrase-based decoding.

Discussion

Examining the French translations produced by our cohesion constrained phrasal decoder, we can draw some qualitative generalizations.

Conclusion

We have presented a definition of syntactic cohesion that is applicable to phrase-based SMT.

Topics

subtrees

Appears in 11 sentences as: subtrees (13)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. Equivalently, one can say that phrases in the source, defined by subtrees in its parse, remain contiguous after translation.
    Page 1, “Introduction”
  2. This alignment is used to project the spans of subtrees from the source tree onto the target sentence.
    Page 2, “Cohesive Phrasal Output”
  3. The key to checking for interruptions quickly is knowing which subtrees T(r) to check for qualities (l:a,b,c).
    Page 4, “Cohesive Decoding”
  4. A na‘1've approach would check every subtree that has begun translation in Figure 3a highlights the roots of all such subtrees for a hypothetical T and Fortunately, with a little analysis that accounts for fh+1, we can show that at most two subtrees need to be checked.
    Page 4, “Cohesive Decoding”
  5. For a given interruption-free flh, we call subtrees that have begun translation, but are not yet complete, open subtrees .
    Page 4, “Cohesive Decoding”
  6. Only open subtrees can lead to interruptions.
    Page 4, “Cohesive Decoding”
  7. This leaves the subtrees highlighted in Figure 3b to be checked.
    Page 4, “Cohesive Decoding”
  8. Furthermore, we need only consider subtrees that contain the left and rightmost source tokens 6L and BB translated by fh.
    Page 4, “Cohesive Decoding”
  9. Any lower subtree would be a descendant of 7“, and therefore the check for the lower subtree is subsumed by the check for T This leaves only two subtrees , highlighted in our running example in Figure 3c.
    Page 4, “Cohesive Decoding”
  10. The tree structure caches the source spans corresponding to each of its subtrees .
    Page 5, “Cohesive Decoding”
  11. 4A hard cohesion constraint used in conjunction with a traditional distortion limit also requires a second linear-time check to ensure that all subtrees currently in progress can be finished under the constraints induced by the distortion limit.
    Page 5, “Cohesive Decoding”

See all papers in Proc. ACL 2008 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

phrase-based

Appears in 10 sentences as: Phrase-based (2) phrase-based (8)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. Phrase-based decoding produces state-of-the-art translations with no regard for syntax.
    Page 1, “Abstract”
  2. The resulting cohesive, phrase-based decoder is shown to produce translations that are preferred over noncohesive output in both automatic and human evaluations.
    Page 1, “Abstract”
  3. We attempt to use this strong, but imperfect, characterization of movement to assist a non-syntactic translation method: phrase-based SMT.
    Page 1, “Introduction”
  4. Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation.
    Page 1, “Introduction”
  5. This section describes a modification to standard phrase-based decoding, so that the system is constrained to produce only cohesive output.
    Page 3, “Cohesive Decoding”
  6. We have adapted the notion of syntactic cohesion so that it is applicable to phrase-based decoding.
    Page 6, “Experiments”
  7. The lexical reordering model provides a good comparison point as a non-syntactic, and potentially orthogonal, improvement to phrase-based movement modeling.
    Page 6, “Experiments”
  8. This indicates that the cohesive subset is easier to translate with a phrase-based system.
    Page 6, “Experiments”
  9. We have presented a definition of syntactic cohesion that is applicable to phrase-based SMT.
    Page 8, “Conclusion”
  10. This suggests that cohesion could be a strong feature in estimating the confidence of phrase-based translations.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

BLEU

Appears in 9 sentences as: BLEU (9)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. We tested this approach on our English-French development set, and saw no improvement in BLEU score.
    Page 3, “Cohesive Phrasal Output”
  2. Initially, we were not certain to what extent this feature would be used by the MERT module, as BLEU is not always sensitive to syntactic improvements.
    Page 5, “Cohesive Decoding”
  3. We first present our soft cohesion constraint’s effect on BLEU score (Papineni et al., 2002) for both our dev-test and test sets.
    Page 6, “Experiments”
  4. First of all, looking across columns, we can see that there is a definite divide in BLEU score between our two evaluation subsets.
    Page 6, “Experiments”
  5. Sentences with cohesive baseline translations receive much higher BLEU scores than those with uncohesive baseline translations.
    Page 6, “Experiments”
  6. Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU , when measured on all sentence pairs, with improvements ranging between 0.2 and 0.5 absolute points.
    Page 6, “Experiments”
  7. Its BLEU scores are very similar, with lex-
    Page 6, “Experiments”
  8. Our experiments have shown that roughly 1/5 of our baseline English-French translations contain cohesion violations, and these translations tend to receive lower BLEU scores.
    Page 8, “Conclusion”
  9. Our soft constraint produced improvements ranging between 0.5 and 1.1 BLEU points on sentences for which the baseline produces uncohesive translations.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

dependency tree

Appears in 7 sentences as: dependency tree (5) dependency trees (2)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. We add syntax to this process with a cohesion constraint based on a dependency tree for the source sentence.
    Page 1, “Abstract”
  2. Next, we introduce our source dependency tree T. Each source token e,- is also a node in T. We define T(ei) to be the subtree of T rooted at 61-.
    Page 2, “Cohesive Phrasal Output”
  3. spanS 6-, T, am 2 min a -, max ak, ( z 1 > {j|€j€T(€i)} j {k|€k€T(€i)} Consider the simple phrasal translation shown in Figure 1 along with a dependency tree for the English source.
    Page 2, “Cohesive Phrasal Output”
  4. The decoder stores the flat sentence in the original sentence data structure, and the head-encoded dependency tree in an attached tree data structure.
    Page 5, “Cohesive Decoding”
  5. Since we require source dependency trees , all experiments test English to French translation.
    Page 6, “Experiments”
  6. English dependency trees are provided by Minipar (Lin, 1994).
    Page 6, “Experiments”
  7. This algorithm was used to implement a soft cohesion constraint for the Moses decoder, based on a source-side dependency tree .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention dependency tree.

See all papers in Proc. ACL that mention dependency tree.

Back to top.

BLEU score

Appears in 6 sentences as: BLEU score (3) BLEU scores (3)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. We tested this approach on our English-French development set, and saw no improvement in BLEU score .
    Page 3, “Cohesive Phrasal Output”
  2. We first present our soft cohesion constraint’s effect on BLEU score (Papineni et al., 2002) for both our dev-test and test sets.
    Page 6, “Experiments”
  3. First of all, looking across columns, we can see that there is a definite divide in BLEU score between our two evaluation subsets.
    Page 6, “Experiments”
  4. Sentences with cohesive baseline translations receive much higher BLEU scores than those with uncohesive baseline translations.
    Page 6, “Experiments”
  5. Its BLEU scores are very similar, with lex-
    Page 6, “Experiments”
  6. Our experiments have shown that roughly 1/5 of our baseline English-French translations contain cohesion violations, and these translations tend to receive lower BLEU scores .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

soft constraint

Appears in 6 sentences as: Soft Constraint (1) soft constraint (5)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. To further increase flexibility, we incorporate cohesion as a decoder feature, creating a soft constraint .
    Page 1, “Abstract”
  2. 3.2 Soft Constraint
    Page 5, “Cohesive Decoding”
  3. ilarly, with the soft constraint providing more stable, and generally better results.
    Page 6, “Experiments”
  4. We confirmed these trends on our test set, but to conserve space, we provide detailed results for only the soft constraint .
    Page 6, “Experiments”
  5. The cohesive system, even with a soft constraint , cannot reproduce the same movement, and returns a less grammatical translation.
    Page 8, “Discussion”
  6. Our soft constraint produced improvements ranging between 0.5 and 1.1 BLEU points on sentences for which the baseline produces uncohesive translations.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2008 that mention soft constraint.

See all papers in Proc. ACL that mention soft constraint.

Back to top.

word alignment

Appears in 4 sentences as: word alignment (3) Word alignments (1)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. Fox (2002) showed that cohesion is held in the vast majority of cases for English-French, while Cherry and Lin (2006) have shown it to be a strong feature for word alignment .
    Page 1, “Introduction”
  2. Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003).
    Page 2, “Cohesive Phrasal Output”
  3. showed that a soft cohesion constraint is superior to a hard constraint for word alignment .
    Page 5, “Cohesive Decoding”
  4. Word alignments are provided by GIZA++ (Och and Ney, 2003) with grow-diag-final combination, with infrastructure for alignment combination and phrase extraction provided by the shared task.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

language model

Appears in 3 sentences as: language model (3)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. order, forcing the decoder to rely heavily on its language model .
    Page 2, “Introduction”
  2. So long as the vocabulary present in our phrase table and language model supports a literal translation, cohesion tends to produce an improvement.
    Page 8, “Discussion”
  3. In the baseline translation, the language model encourages the system to move the negation away from “exist” and toward “reduce.” The result is a tragic reversal of meaning in the translation.
    Page 8, “Discussion”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

log-linear

Appears in 3 sentences as: log-linear (3)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. This count becomes a feature in the decoder’s log-linear model, the weight of which is trained with MERT.
    Page 5, “Cohesive Decoding”
  2. Weights for the log-linear model are set using MERT, as implemented by Venugopal and Vogel (2005).
    Page 6, “Experiments”
  3. Since adding features to the decoder’s log-linear model is straightforward, we also experiment with a combined system that uses both the cohesion constraint and a lexical reordering model.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

log-linear model

Appears in 3 sentences as: log-linear model (3)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. This count becomes a feature in the decoder’s log-linear model , the weight of which is trained with MERT.
    Page 5, “Cohesive Decoding”
  2. Weights for the log-linear model are set using MERT, as implemented by Venugopal and Vogel (2005).
    Page 6, “Experiments”
  3. Since adding features to the decoder’s log-linear model is straightforward, we also experiment with a combined system that uses both the cohesion constraint and a lexical reordering model.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. Statistical machine translation (SMT) is complicated by the fact that words can move during translation.
    Page 1, “Introduction”
  2. 1We use the term “syntactic cohesion” throughout this paper to mean what has previously been referred to as “phrasal cohesion”, because the nonlinguistic sense of “phrase” has become so common in machine translation literature.
    Page 1, “Introduction”
  3. Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation .
    Page 1, “Introduction”

See all papers in Proc. ACL 2008 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

sentence pairs

Appears in 3 sentences as: sentence pair (1) sentence pairs (2)
In Cohesive Phrase-Based Decoding for Statistical Machine Translation
  1. Previous approaches to measuring the cohesion of a sentence pair have worked with a word alignment (Fox, 2002; Lin and Cherry, 2003).
    Page 2, “Cohesive Phrasal Output”
  2. We test our cohesion-enhanced Moses decoder trained using 688K sentence pairs of Europarl French-English data, provided by the SMT 2006 Shared Task (Koehn and Monz, 2006).
    Page 6, “Experiments”
  3. Comparing the baseline with and without the soft cohesion constraint, we see that cohesion has only a modest effect on BLEU, when measured on all sentence pairs , with improvements ranging between 0.2 and 0.5 absolute points.
    Page 6, “Experiments”

See all papers in Proc. ACL 2008 that mention sentence pairs.

See all papers in Proc. ACL that mention sentence pairs.

Back to top.