An exponential translation model for target language morphology
Subotin, Michael

Article Structure

Abstract

This paper presents an exponential model for translation into highly inflected languages which can be scaled to very large datasets.

Introduction

Translation into languages with rich morphology presents special challenges for phrase-based methods.

Hierarchical phrase-based translation

We take as our starting point David Chiang’s Hiero system, which generalizes phrase-based translation to substrings with gaps (Chiang, 2007).

Exponential phrase models with shared features

The model used in this work is based on the familiar equation for conditional exponential models:

Features

The feature space for target-side inflection models used in this work consists of features tracking the source phrase and the corresponding target phrase together with its complete morphological tag, which will be referred to as rule features for brevity.

Decoding with target-side model dependencies

The procedure for decoding with nonlocal target-side feature dependencies is similar in its general outlines to the standard method of decoding with a

Modeling unobserved target inflections

As a consequence of translating into a morphologically rich language, some inflected forms of target words are unobserved in training data and cannot be generated by the decoder under standard phrase-based approaches.

Corpora and baselines

We investigate the models using the 2009 edition of the parallel treebank from UFAL (Bojar and Zabokrtsky, 2009), containing 8,029,801 sentence pairs from various genres.

Parameter estimation

Parameter estimation was performed using a modified version of the maximum entropy module from SciPy (Jones et al., 2001) and the LBFGS-B algorithm (Byrd et al., 1995).

Results and discussion

Table 1 shows the results.

Conclusion

This paper has introduced a scalable exponential phrase model for target languages with complex morphology that can be trained on the full parallel corpus.

Topics

phrase-based

Appears in 7 sentences as: phrase-based (8)
In An exponential translation model for target language morphology
  1. Translation into languages with rich morphology presents special challenges for phrase-based methods.
    Page 1, “Introduction”
  2. Thus, Birch et al (2008) find that translation quality achieved by a popular phrase-based system correlates significantly with a measure of target-side, but not source-side morphological complexity.
    Page 1, “Introduction”
  3. Recently, several studies (Bojar, 2007; Avramidis and Koehn, 2009; Ramanathan et al., 2009; Yen-iterzi and Oflazer, 2010) proposed modeling target-side morphology in a phrase-based factored models framework (Koehn and Hoang, 2007).
    Page 1, “Introduction”
  4. We take as our starting point David Chiang’s Hiero system, which generalizes phrase-based translation to substrings with gaps (Chiang, 2007).
    Page 2, “Hierarchical phrase-based translation”
  5. As shown by Chiang (2007), a weighted grammar of this form can be collected and scored by simple extensions of standard methods for phrase-based translation and efficiently combined with a language model in a CKY decoder to achieve large improvements over a state-of-the-art phrase-based system.
    Page 2, “Hierarchical phrase-based translation”
  6. Although a variety of scores interpolated into the decision rule for phrase-based systems have been investigated over the years, only a handful have been discovered to be consistently useful.
    Page 2, “Hierarchical phrase-based translation”
  7. As a consequence of translating into a morphologically rich language, some inflected forms of target words are unobserved in training data and cannot be generated by the decoder under standard phrase-based approaches.
    Page 6, “Modeling unobserved target inflections”

See all papers in Proc. ACL 2011 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

language model

Appears in 5 sentences as: language model (5)
In An exponential translation model for target language morphology
  1. As shown by Chiang (2007), a weighted grammar of this form can be collected and scored by simple extensions of standard methods for phrase-based translation and efficiently combined with a language model in a CKY decoder to achieve large improvements over a state-of-the-art phrase-based system.
    Page 2, “Hierarchical phrase-based translation”
  2. language model , as described in Chiang (2007).
    Page 4, “Decoding with target-side model dependencies”
  3. In the case of the language model these aspects include any of its target-side words that are part of still incomplete n-grams.
    Page 4, “Decoding with target-side model dependencies”
  4. A 5-gram language model with modified interpolated Kneser—Ney smoothing (Chen and Goodman, 1998) was trained by the SRILM toolkit (Stolcke, 2002) on a set of 208 million running words of text obtained by combining the monolingual Czech text distributed by the 2010
    Page 7, “Corpora and baselines”
  5. The baselines consisted of the language model , two phrase translation models, two lexical models, and a brevity penalty.
    Page 7, “Corpora and baselines”

See all papers in Proc. ACL 2011 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

parallel corpus

Appears in 4 sentences as: parallel corpus (4)
In An exponential translation model for target language morphology
  1. We obtain the counts of instances and features from the standard heuristics used to extract the grammar from a word-aligned parallel corpus .
    Page 2, “Exponential phrase models with shared features”
  2. 2Note that this model is estimated from the fall parallel corpus , rather than a held-out development set.
    Page 3, “Features”
  3. This paper has introduced a scalable exponential phrase model for target languages with complex morphology that can be trained on the full parallel corpus .
    Page 9, “Conclusion”
  4. The results suggest that the model should be especially useful for languages with sparser resources, but that performance improvements can be obtained even for a very large parallel corpus .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2011 that mention parallel corpus.

See all papers in Proc. ACL that mention parallel corpus.

Back to top.

Treebank

Appears in 4 sentences as: Treebank (2) treebank (2)
In An exponential translation model for target language morphology
  1. The inflection for number is particularly easy to model in translating from English, since it is generally marked on the source side, and POS taggers based on the Penn treebank tag set attempt to infer it in cases where it is not.
    Page 3, “Features”
  2. We investigate the models using the 2009 edition of the parallel treebank from UFAL (Bojar and Zabokrtsky, 2009), containing 8,029,801 sentence pairs from various genres.
    Page 7, “Corpora and baselines”
  3. The English-side annotation follows the standards of the Penn Treebank and includes dependency parses and structural role labels such as subject and object.
    Page 7, “Corpora and baselines”
  4. The Czech tags follow the standards of the Prague Dependency Treebank 2.0.
    Page 7, “Corpora and baselines”

See all papers in Proc. ACL 2011 that mention Treebank.

See all papers in Proc. ACL that mention Treebank.

Back to top.

word alignments

Appears in 4 sentences as: word aligned (1) word alignments (2) words aligned (1)
In An exponential translation model for target language morphology
  1. We add inflection features for all words aligned to at least one English verb, adjective, noun, pronoun, or determiner, excepting definite and indefinite articles.
    Page 3, “Features”
  2. These features would be more properly defined based on the identity of the target word aligned to these quantifiers, but little ambiguity seems to arise from this substitution in practice.
    Page 3, “Features”
  3. These dependencies are inferred from source-side annotation Via word alignments , as depicted in figure 1, without any use of target-side dependency parses.
    Page 4, “Features”
  4. All conditions use word alignments produced by sequential iterations of IBM model 1, HMM, and IBM model 4 in GIZA++, followed by “diag-and” symmetrization (Koehn et al., 2003).
    Page 7, “Corpora and baselines”

See all papers in Proc. ACL 2011 that mention word alignments.

See all papers in Proc. ACL that mention word alignments.

Back to top.

word pairs

Appears in 3 sentences as: word pair (1) word pairs (2)
In An exponential translation model for target language morphology
  1. For word pairs whose source-side word is a verb, we add a feature marking the number of its subject, with separate features for noun and pronoun subjects.
    Page 3, “Features”
  2. For word pairs whose source side is an adjective, we add a feature marking the number of the head of the smallest noun phrase that contains it.
    Page 3, “Features”
  3. For greater speed we estimate the probabilities for the other two models using interpolated Kneser-Ney smoothing (Chen and Goodman, 1998), where the surface form of a rule or an aligned word pair plays to role of a trigram, the pairing of the source surface form with the lemmatized target form plays the role of a bigram, and the source surface form alone plays the role of a unigram.
    Page 6, “Modeling unobserved target inflections”

See all papers in Proc. ACL 2011 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.