A Hybrid Approach to Skeleton-based Translation
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

Article Structure

Abstract

In this paper we explicitly consider sentence skeleton information for Machine Translation (MT).

Introduction

Current Statistical Machine Translation (SMT) approaches model the translation problem as a process of generating a derivation of atomic translation units, assuming that every unit is drawn out of the same model.

A Skeleton-based Approach to MT 2.1 Skeleton Identification

The first issue that arises is how to identify the skeleton for a given source sentence.

Evaluation

3.1 Experimental Setup

Related Work

Skeleton is a concept that has been used in several subareas in MT for years.

Conclusion and Future Work

We have presented a simple but effective approach to integrating the sentence skeleton information into a phrase-based system.

Topics

language model

Appears in 19 sentences as: language model (19) language modeling (2) language models (2)
In A Hybrid Approach to Skeleton-based Translation
  1. 0 We develop a skeletal language model to describe the possibility of translation skeleton and handle some of the long-distance word dependencies.
    Page 1, “Introduction”
  2. In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models .
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  3. Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  4. lm(d) and wlm are the score and weight of the language model , respectively.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  5. For language modeling, lm is the standard n-gram language model adopted in the baseline system.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  6. In such a way of string representation, the skeletal language model can be implemented as a standard n-gram language model , that is, a string probability is calculated by a product of a sequence of n-gram probabilities (involving normal words and X).
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  7. To learn the skeletal language model , we replace non-skeleton parts of the target sentences in the bilingual corpus to Xs using the source sentence skeletons and word alignments.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  8. The skeletal language model is then trained on these generalized strings in a standard way of n-gram language modeling .
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  9. Given a baseline phrase-based system, all we need is to learn the feature weights w and WT on the development set (with source-language skeleton annotation) and the skeletal language model [7777 on the target-language side of the bilingual corpus.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  10. (7), we can perform standard decoding while ”doubly weighting” the phrases which cover a skeletal section of the sentence, and combining the two language models and the translation model in a linear fashion.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  11. A 5-gram language model was trained on the Xinhua portion of the English Gi-gaword corpus in addition to the target-side of the bilingual data.
    Page 4, “Evaluation”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

translation model

Appears in 14 sentences as: translation model (18) translation models (2)
In A Hybrid Approach to Skeleton-based Translation
  1. The basic idea is that we translate the key elements of the input sentence using a skeleton translation model, and then cover the remain segments using a full translation model .
    Page 1, “Abstract”
  2. Note that the source-language structural information has been intensively investigated in recent studies of syntactic translation models .
    Page 1, “Introduction”
  3. 0 We develop a skeleton-based model which divides translation into two sub-models: a skeleton translation model (i.e., translating the key elements) and a full translation model (i.e., translating the remaining source words and generating the complete translation).
    Page 1, “Introduction”
  4. To compute g(d), we use a linear combination of a skeleton translation model 98196; (d) and a full translation model gfuu (d):
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  5. 9(d) = gskel(d) + gfuu(d) (3) where the skeleton translation model handles the translation of the sentence skeleton, while the full translation model is the baseline model and handles the original problem of translating the whole sentence.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  6. The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  7. In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  8. Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  9. Both 98196; (d) and gfuu (d) share the same translation model m which can easily learned from the bilingual datal.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  10. Note that this method does not require any new translation models for implementation.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  11. (7), we can perform standard decoding while ”doubly weighting” the phrases which cover a skeletal section of the sentence, and combining the two language models and the translation model in a linear fashion.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”

See all papers in Proc. ACL 2014 that mention translation model.

See all papers in Proc. ACL that mention translation model.

Back to top.

phrase-based

Appears in 13 sentences as: phrase-based (14)
In A Hybrid Approach to Skeleton-based Translation
  1. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
    Page 1, “Abstract”
  2. The simplest of these is the phrase-based approach (Och et al., 1999; Koehn et al., 2003) which employs a global model to process any sub-strings of the input sentence.
    Page 1, “Introduction”
  3. However, these approaches suffer from the same problem as the phrase-based counterpart and use the single global model to handle different translation units, no matter they are from the skeleton of the input tree/sentence or other not-so-important substructures.
    Page 1, “Introduction”
  4. 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 1, “Introduction”
  5. As is standard in SMT, we further assume that 1) the translation process can be decomposed into a derivation of phrase-pairs (for phrase-based models) or translation rules (for syntax-based models); 2) and a linear function is used to assign a model score to each derivation.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  6. See Figure l for an example of applying the above model to phrase-based MT.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  7. While we will restrict ourself to phrase-based translation in the following description and experiments, we can choose different models/features for gskel(d) and gfuu(d).
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  8. E.g., one may introduce syntactic features into 98196; (d) due to their good ability in capturing structural information; and employ a standard phrase-based model for gfuu (d) in which not all segments of the sentence need to respect syntactic constraints.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  9. In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  10. In phrase-based MT, the translation problem is modeled by a derivation of phrase-pairs.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  11. This model makes the skeleton translation and full translation much simpler because they perform in the same way of string translation in phrase-based MT.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”

See all papers in Proc. ACL 2014 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

model score

Appears in 9 sentences as: Model Score (1) model score (6) model scores (2)
In A Hybrid Approach to Skeleton-based Translation
  1. As is standard in SMT, we further assume that 1) the translation process can be decomposed into a derivation of phrase-pairs (for phrase-based models) or translation rules (for syntax-based models); 2) and a linear function is used to assign a model score to each derivation.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  2. above problem can be redefined in a Viterbi fashion - we find the derivation d with the highest model score given 3 and 7':
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  3. The skeleton translation model focuses on the translation of the sentence skeleton, i.e., the solid (red) rectangles; while the full translation model computes the model score for all those phrase-pairs, i.e., all solid and dashed rectangles.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  4. 2.3 Model Score Computation
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  5. In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models.
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  6. Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  7. Then, we can simply define gskel(d) and gfull(d) as the model scores of d, and d:
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  8. 1In gskel(d), we compute the reordering model score on the skeleton though it is learned from the full sentences.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  9. Figure 1 shows the translation process and associated model scores for the example sentence.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”

See all papers in Proc. ACL 2014 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.

BLEU

Appears in 6 sentences as: BLEU (6)
In A Hybrid Approach to Skeleton-based Translation
  1. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
    Page 1, “Abstract”
  2. 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 1, “Introduction”
  3. Table 1 shows the case-insensitive IBM-version BLEU and TER scores of different systems.
    Page 4, “Evaluation”
  4. Seen from row —lmT of Table l, the removal of the skeletal language model results in a significant drop in both BLEU and TER performance.
    Page 4, “Evaluation”
  5. Row s-space of Table 1 shows the BLEU and TER results of restricting the baseline system to the space of skeleton-consistent derivations, i.e., we remove both the skeleton-based translation model and language model from the SBMT system.
    Page 5, “Evaluation”
  6. The experimental results show that the proposed approach achieves very promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 5, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

lm

Appears in 6 sentences as: lm (7)
In A Hybrid Approach to Skeleton-based Translation
  1. Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  2. g(d;w, m, lm) 2 Wm - fm(d) + wlm - lm (d) (4)
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  3. lm (d) and wlm are the score and weight of the language model, respectively.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  4. 9(dT;WT,m,lmT) (5) é 9(d; w, m, lm ) (6)
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  5. For language modeling, lm is the standard n-gram language model adopted in the baseline system.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  6. d = arg max (Wm - fm(d) + wlm - lm (d) + d
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”

See all papers in Proc. ACL 2014 that mention lm.

See all papers in Proc. ACL that mention lm.

Back to top.

TER

Appears in 6 sentences as: TER (6)
In A Hybrid Approach to Skeleton-based Translation
  1. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
    Page 1, “Abstract”
  2. 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 1, “Introduction”
  3. Table 1 shows the case-insensitive IBM-version BLEU and TER scores of different systems.
    Page 4, “Evaluation”
  4. Seen from row —lmT of Table l, the removal of the skeletal language model results in a significant drop in both BLEU and TER performance.
    Page 4, “Evaluation”
  5. Row s-space of Table 1 shows the BLEU and TER results of restricting the baseline system to the space of skeleton-consistent derivations, i.e., we remove both the skeleton-based translation model and language model from the SBMT system.
    Page 5, “Evaluation”
  6. The experimental results show that the proposed approach achieves very promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 5, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

baseline system

Appears in 4 sentences as: baseline system (4)
In A Hybrid Approach to Skeleton-based Translation
  1. For language modeling, lm is the standard n-gram language model adopted in the baseline system .
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  2. Row s-space of Table 1 shows the BLEU and TER results of restricting the baseline system to the space of skeleton-consistent derivations, i.e., we remove both the skeleton-based translation model and language model from the SBMT system.
    Page 5, “Evaluation”
  3. We see that the limited search space is a little harmful to the baseline system .
    Page 5, “Evaluation”
  4. Further, we regarded skeleton-consistent derivations as an indicator feature and introduced it into the baseline system .
    Page 5, “Evaluation”

See all papers in Proc. ACL 2014 that mention baseline system.

See all papers in Proc. ACL that mention baseline system.

Back to top.

Chinese-English

Appears in 4 sentences as: Chinese-English (4)
In A Hybrid Approach to Skeleton-based Translation
  1. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
    Page 1, “Abstract”
  2. 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 1, “Introduction”
  3. We experimented with our approach on Chinese-English translation using the NiuTrans open-source MT toolkit (Xiao et al., 2012).
    Page 4, “Evaluation”
  4. It contains the annotation of sentence skeleton on the Chinese-language side of the Penn Parallel Chinese-English Treebank (LD-C2003E07).
    Page 4, “Evaluation”

See all papers in Proc. ACL 2014 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

feature weights

Appears in 4 sentences as: feature weight (1) feature weights (3)
In A Hybrid Approach to Skeleton-based Translation
  1. Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
    Page 2, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  2. On the other hand, it has different feature weight vectors for indiVidual models (i.e., w and WT).
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  3. Given a baseline phrase-based system, all we need is to learn the feature weights w and WT on the development set (with source-language skeleton annotation) and the skeletal language model [7777 on the target-language side of the bilingual corpus.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  4. All feature weights were learned using minimum error rate training (Och, 2003).
    Page 4, “Evaluation”

See all papers in Proc. ACL 2014 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

NIST

Appears in 4 sentences as: NIST (4)
In A Hybrid Approach to Skeleton-based Translation
  1. We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data.
    Page 1, “Abstract”
  2. 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 1, “Introduction”
  3. We used the newswire portion of the NIST MT06 evaluation data as our development set, and used the evaluation data of MT04 and MTOS as our test sets.
    Page 4, “Evaluation”
  4. The experimental results show that the proposed approach achieves very promising BLEU improvements and TER reductions on the NIST evaluation data.
    Page 5, “Conclusion and Future Work”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

MT system

Appears in 3 sentences as: MT system (3)
In A Hybrid Approach to Skeleton-based Translation
  1. We see, first of all, that the MT system benefits from our approach in most cases.
    Page 4, “Evaluation”
  2. Apart from showing the effects of the skeleton-based model, we also studied the behavior of the MT system under the different settings of search space.
    Page 5, “Evaluation”
  3. More importantly, we develop a complete approach to this issue and show its effectiveness in a state-of-the-art MT system .
    Page 5, “Related Work”

See all papers in Proc. ACL 2014 that mention MT system.

See all papers in Proc. ACL that mention MT system.

Back to top.

n-gram

Appears in 3 sentences as: n-gram (4)
In A Hybrid Approach to Skeleton-based Translation
  1. For language modeling, lm is the standard n-gram language model adopted in the baseline system.
    Page 3, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  2. In such a way of string representation, the skeletal language model can be implemented as a standard n-gram language model, that is, a string probability is calculated by a product of a sequence of n-gram probabilities (involving normal words and X).
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”
  3. The skeletal language model is then trained on these generalized strings in a standard way of n-gram language modeling.
    Page 4, “A Skeleton-based Approach to MT 2.1 Skeleton Identification”

See all papers in Proc. ACL 2014 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.