Incremental Syntactic Language Models for Phrase-based Translation
Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen

Article Structure

Abstract

This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing.

Introduction

Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,

Related Work

Neither phrase-based (Koehn et al., 2003) nor hierarchical phrase-based translation (Chiang, 2005) take explicit advantage of the syntactic structure of either source or target language.

Topics

language model

Appears in 47 sentences as: Language Model (3) language model (32) language modeling (2) language models (14) languages models (2)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporating syntax into phrase-based translation.
    Page 1, “Abstract”
  2. We give a formal definition of one such linear-time syntactic language model , detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system.
    Page 1, “Abstract”
  3. Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model , which guided the fluency of generated target language strings (Brown et al.,
    Page 1, “Introduction”
  4. Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models .
    Page 1, “Introduction”
  5. Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
    Page 1, “Introduction”
  6. We observe that incremental parsers, used as structured language models , provide an appropriate algorithmic match to incremental phrase-based decoding.
    Page 1, “Introduction”
  7. This approach re-exerts the role of the language model as a mechanism for encouraging syntactically fluent translations.
    Page 1, “Introduction”
  8. Instead, we incorporate syntax into the language model .
    Page 2, “Related Work”
  9. Traditional approaches to language models in
    Page 2, “Related Work”
  10. Chelba and Jelinek (1998) proposed that syntactic structure could be used as an altema-tive technique in language modeling .
    Page 2, “Related Work”
  11. Syntactic language models have also been explored with tree-based translation models.
    Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

phrase-based

Appears in 32 sentences as: Phrase-Based (1) Phrase-based (2) phrase-based (32)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing.
    Page 1, “Abstract”
  2. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right.
    Page 1, “Abstract”
  3. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporating syntax into phrase-based translation.
    Page 1, “Abstract”
  4. We give a formal definition of one such linear-time syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system.
    Page 1, “Abstract”
  5. Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
    Page 1, “Introduction”
  6. Bottom-up and top-down parsers typically require a completed string as input; this requirement makes it difficult to incorporate these parsers into phrase-based translation, which generates hypothesized translations incrementally, from left-to-right.1 As a workaround, parsers can rerank the translated output of translation systems (Och et al., 2004).
    Page 1, “Introduction”
  7. We observe that incremental parsers, used as structured language models, provide an appropriate algorithmic match to incremental phrase-based decoding.
    Page 1, “Introduction”
  8. We directly integrate incremental syntactic parsing into phrase-based translation.
    Page 1, “Introduction”
  9. 0 A novel method for integrating syntactic LMs into phrase-based translation (§3)
    Page 1, “Introduction”
  10. Neither phrase-based (Koehn et al., 2003) nor hierarchical phrase-based translation (Chiang, 2005) take explicit advantage of the syntactic structure of either source or target language.
    Page 2, “Related Work”
  11. Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality.
    Page 2, “Related Work”

See all papers in Proc. ACL 2011 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

n-gram

Appears in 16 sentences as: n-gram (18)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
    Page 1, “Introduction”
  2. Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
    Page 1, “Introduction”
  3. (2007) use supertag n-gram LMs.
    Page 2, “Related Work”
  4. An n-gram language model history is also maintained at each node in the translation lattice.
    Page 4, “Related Work”
  5. The search space is further trimmed with hypothesis recombination, which collapses lattice nodes that share a common coverage vector and n-gram state.
    Page 4, “Related Work”
  6. We trained the syntactic language model from §4 (HHMM) and an interpolated n-gram language model with modified Kneser-Ney smoothing (Chen and Goodman, 1998); models were trained on sections 2-21 of the Wall Street Journal (WSJ) tree-bank (Marcus et al., 1993).
    Page 8, “Related Work”
  7. The HHMM outperforms the n-gram model in terms of out-of-domain test set perpleXity when trained on the same WSJ data; the best perpleXity results for in-domain and out-of-domain test sets4 are found by interpolating
    Page 8, “Related Work”
  8. HHMM and n-gram LMs (Figure 7).
    Page 9, “Related Work”
  9. We trained the HHMM and n-gram LMs on the WSJ data in order to make them as similar as possible.
    Page 9, “Related Work”
  10. During tuning, Moses was first configured to use just the n-gram LM, then configured to use both the n-gram LM and the syntactic HHMM LM.
    Page 9, “Related Work”
  11. MERT consistently assigned positive weight to the syntactic LM feature, typically slightly less than the n-gram LM weight.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

LM

Appears in 9 sentences as: LM (13)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Like (Galley and Manning, 2009) our work implements an incremental syntactic language model; our approach differs by calculating syntactic LM scores over all available phrase-structure parses at each hypothesis instead of the l-best dependency parse.
    Page 2, “Related Work”
  2. In-domain Out-of-domain LM WSJ 23 ppl ur-en dev ppl WSJ 1-gram 1973.57 3581.72 WSJ 2-gram 349.18 1312.61 WSJ 3-gram 262.04 1264.47 WSJ 4-gram 244.12 1261.37 WSJ 5-gram 232.08 1261.90 WSJ HHMM 384.66 529.41 Interpolated WSJ 5-gram + HHMM 209.13 225.48 Giga 5-gram 258.35 312.28 Interp.
    Page 8, “Related Work”
  3. HHMM parser beam sizes are indicated for the syntactic LM .
    Page 9, “Related Work”
  4. To show the effects of training an LM on more data, we also report perpleXity results on the 5 -gram LM trained for the GALE Arabic-English task using the English Gi-gaword corpus.
    Page 9, “Related Work”
  5. During tuning, Moses was first configured to use just the n-gram LM, then configured to use both the n-gram LM and the syntactic HHMM LM .
    Page 9, “Related Work”
  6. MERT consistently assigned positive weight to the syntactic LM feature, typically slightly less than the n-gram LM weight.
    Page 9, “Related Work”
  7. Moses LM (s) ‘ BLEU ‘
    Page 9, “Related Work”
  8. This means incremental syntactic LM scores can be calculated during the decoding process, rather than waiting until a complete sentence is posited, which is typically necessary in top-down or bottom-up parsing.
    Page 9, “Related Work”
  9. Various additional improvements could include caching the HHMM LM calculations, and exploiting properties of the right-corner transform that limit the number of decisions between successive time steps.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

translation model

Appears in 9 sentences as: translation model (6) translation models (6)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model , which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,
    Page 1, “Introduction”
  2. The translation models in these techniques define phrases as contiguous word sequences (with gaps allowed in the case of hierarchical phrases) which may or may not correspond to any linguistic constituent.
    Page 2, “Related Work”
  3. Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality.
    Page 2, “Related Work”
  4. Significant research has examined the extent to which syntax can be usefully incorporated into statistical tree-based translation models: string-to-tree (Yamada and Knight, 2001; Gildea, 2003; Imamura et al., 2004; Galley et al., 2004; Graehl and Knight, 2004; Melamed, 2004; Galley et al., 2006; Huang et al., 2006; Shen et al., 2008), tree-to-string (Liu et al., 2006; Liu et al., 2007; Mi et al., 2008; Mi and Huang, 2008; Huang and Mi, 2010), tree-to-tree (Abeille et al., 1990; Shieber and Schabes, 1990; Poutsma, 1998; Eisner, 2003; Shieber, 2004; Cowan et al., 2006; Nesson et al., 2006; Zhang et al., 2007; DeNeefe et al., 2007; DeNeefe and Knight, 2009; Liu et al., 2009; Chiang, 2010), and treelet (Ding and Palmer, 2005; Quirk et al., 2005) techniques use syntactic information to inform the translation model .
    Page 2, “Related Work”
  5. In contrast to the above tree-based translation models, our approach maintains a standard (non-syntactic) phrase-based translation model .
    Page 2, “Related Work”
  6. Syntactic language models have also been explored with tree-based translation models .
    Page 2, “Related Work”
  7. Our work, in contrast to the above approaches, explores the use of incremental syntactic language models in conjunction with phrase-based translation models .
    Page 2, “Related Work”
  8. Given a source language input sentence f, a trained source-to-target translation model , and a target language model, the task of translation is to find the maximally probable translation é using a linear combination of j feature functions h weighted ac-
    Page 4, “Related Work”
  9. We trained a phrase-based translation model on the full NIST Open MT08 Urdu-English translation model using the full training data.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

machine translation

Appears in 7 sentences as: machine translation (7)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing.
    Page 1, “Abstract”
  2. Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,
    Page 1, “Introduction”
  3. Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
    Page 1, “Introduction”
  4. Recent work has shown that parsing-based machine translation using syntax-augmented (Zoll-mann and Venugopal, 2006) hierarchical translation grammars with rich nonterminal sets can demonstrate substantial gains over hierarchical grammars for certain language pairs (Baker et al., 2009).
    Page 2, “Related Work”
  5. speech recognition and statistical machine translation focus on the use of n-grams, which provide a simple finite-state model approximation of the target language.
    Page 2, “Related Work”
  6. priate algorithmic fit for incorporating syntax into phrase-based statistical machine translation , since both process sentences in an incremental left-to-right fashion.
    Page 9, “Related Work”
  7. The use of very large n-gram language models is typically a key ingredient in the best-performing machine translation systems (Brants et al., 2007).
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation system

Appears in 5 sentences as: translation system (3) translation systems (2)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. We give a formal definition of one such linear-time syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system .
    Page 1, “Abstract”
  2. Bottom-up and top-down parsers typically require a completed string as input; this requirement makes it difficult to incorporate these parsers into phrase-based translation, which generates hypothesized translations incrementally, from left-to-right.1 As a workaround, parsers can rerank the translated output of translation systems (Och et al., 2004).
    Page 1, “Introduction”
  3. (2003) use syntactic language models to rescore the output of a tree-based translation system .
    Page 2, “Related Work”
  4. Post and Gildea (2009) use tree substitution grammar parsing for language modeling, but do not use this language model in a translation system .
    Page 2, “Related Work”
  5. The use of very large n-gram language models is typically a key ingredient in the best-performing machine translation systems (Brants et al., 2007).
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention translation system.

See all papers in Proc. ACL that mention translation system.

Back to top.

beam size

Appears in 4 sentences as: beam size (3) beam sizes (1)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. HHMM was run with beam size of 2000.
    Page 8, “Related Work”
  2. HHMM parser beam sizes are indicated for the syntactic LM.
    Page 9, “Related Work”
  3. Figure 9: Results for Ur—En devtest (only sentences with 1-20 words) with HHMM beam size of 2000 and Moses settings of distortion limit 10, stack size 200, and ttable_limit 20.
    Page 9, “Related Work”
  4. By increasing the beam size and distortion limit of the baseline system, future work may examine whether a baseline system with comparable runtimes can achieve comparable translation quality.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention beam size.

See all papers in Proc. ACL that mention beam size.

Back to top.

statistical machine translation

Appears in 4 sentences as: statistical machine translation (4)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,
    Page 1, “Introduction”
  2. Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
    Page 1, “Introduction”
  3. speech recognition and statistical machine translation focus on the use of n-grams, which provide a simple finite-state model approximation of the target language.
    Page 2, “Related Work”
  4. priate algorithmic fit for incorporating syntax into phrase-based statistical machine translation , since both process sentences in an incremental left-to-right fashion.
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention statistical machine translation.

See all papers in Proc. ACL that mention statistical machine translation.

Back to top.

BLEU

Appears in 3 sentences as: BLEU (3)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perpleXity.
    Page 1, “Abstract”
  2. Figure 9 shows a statistically significant improvement to the BLEU score when using the HHMM and the n-gram LMs together on this reduced test set.
    Page 9, “Related Work”
  3. Moses LM(s) ‘ BLEU
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

syntactic parsing

Appears in 3 sentences as: Syntactic parsing (1) syntactic parsing (2)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing .
    Page 1, “Abstract”
  2. Syntactic parsing may help produce more grammatical output by better modeling structural relationships and long-distance dependencies.
    Page 1, “Introduction”
  3. We directly integrate incremental syntactic parsing into phrase-based translation.
    Page 1, “Introduction”

See all papers in Proc. ACL 2011 that mention syntactic parsing.

See all papers in Proc. ACL that mention syntactic parsing.

Back to top.

translation quality

Appears in 3 sentences as: translation quality (4)
In Incremental Syntactic Language Models for Phrase-based Translation
  1. Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality .
    Page 2, “Related Work”
  2. The translation quality significantly improved on a constrained task, and the perplexity improvements suggest that interpolating between 71- gram and syntactic LMs may hold promise on larger data sets.
    Page 9, “Related Work”
  3. By increasing the beam size and distortion limit of the baseline system, future work may examine whether a baseline system with comparable runtimes can achieve comparable translation quality .
    Page 9, “Related Work”

See all papers in Proc. ACL 2011 that mention translation quality.

See all papers in Proc. ACL that mention translation quality.

Back to top.