Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

Article Structure

Abstract

Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units.

Introduction

Long distance reordering remains one of the greatest challenges in Statistical Machine Translation (SMT) research.

Two-Neighbor Orientation Model

Given an aligned sentence pair 9 = (F, E, N), let A(@) be all possible chunks that can be extracted from 9 according to: 2

Maximal Orientation Span

As shown in Eq.

Model Decomposition and Variants

To make the model more tractable, we decompose PTNO in Eq.

Training

The TNO model training consists of two different training regimes: l) discriminative for training POL ,POR; and 2) generative for training PML , PMR.

Decoding

Integrating the TNO Model into syntax-based SMT systems is nontrivial, especially with the MOS modeling.

Experiments

Our baseline systems is a state-of-the-art string-to-dependency system (Shen et al., 2008).

Related Work

Our work intersects with existing work in many different respects.

Conclusion

We presented a novel approach to address a kind of long-distance reordering that requires global cross-boundary contextual information.

Topics

BLEU

Appears in 10 sentences as: BLEU (13)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
    Page 1, “Abstract”
  2. MT08 nw MT08 wb BLEU \ TER BLEU \ TER
    Page 8, “Experiments”
  3. The best TER and BLEU results on each genre are in bold.
    Page 8, “Experiments”
  4. For BLEU , higher scores are better, while for TER, lower scores are better.
    Page 8, “Experiments”
  5. In columns 2 and 4, we report the BLEU scores, while in columns 3 and 5, we report the TER scores.
    Page 8, “Experiments”
  6. Model 1 provides most of the gain across the two genres of around +0.9 to +1.2 BLEU and -0.5 to -1.1 TER.
    Page 8, “Experiments”
  7. Model 2 which conditions POL on OR provides an additional +0.2 BLEU improvement on BLEU score consistently across the two genres.
    Page 8, “Experiments”
  8. In newswire, Model 3 gives an additional +0.4 BLEU and -0.2 TER, while in weblog, it gives a stronger improvement of an additional +0.5 BLEU and -0.3 TER.
    Page 8, “Experiments”
  9. The inclusion of explicit MOS modeling in Model 4 gives a significant BLEU score improvement of +0.5 but no TER improvement in newswire.
    Page 8, “Experiments”
  10. In weblog, Model 4 gives a mixed results of +0.2 BLEU score improvement and a hit of +0.6 TER.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

TER

Appears in 10 sentences as: TER (13)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
    Page 1, “Abstract”
  2. MT08 nw MT08 wb BLEU \ TER BLEU \ TER
    Page 8, “Experiments”
  3. The best TER and BLEU results on each genre are in bold.
    Page 8, “Experiments”
  4. For BLEU, higher scores are better, while for TER , lower scores are better.
    Page 8, “Experiments”
  5. In columns 2 and 4, we report the BLEU scores, while in columns 3 and 5, we report the TER scores.
    Page 8, “Experiments”
  6. Model 1 provides most of the gain across the two genres of around +0.9 to +1.2 BLEU and -0.5 to -1.1 TER .
    Page 8, “Experiments”
  7. In newswire, Model 3 gives an additional +0.4 BLEU and -0.2 TER, while in weblog, it gives a stronger improvement of an additional +0.5 BLEU and -0.3 TER .
    Page 8, “Experiments”
  8. The inclusion of explicit MOS modeling in Model 4 gives a significant BLEU score improvement of +0.5 but no TER improvement in newswire.
    Page 8, “Experiments”
  9. In weblog, Model 4 gives a mixed results of +0.2 BLEU score improvement and a hit of +0.6 TER .
    Page 8, “Experiments”
  10. Our most advanced model gives significant improvement of +1.8 BLEU/-0.8 TER in newswire domain and +2.1 BLEU/-1.0 TER over a strong string-to-dependency syntax-based SMT enhanced with additional state-of-the-art features.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

contextual information

Appears in 7 sentences as: contextual information (7)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. Long distance reordering remains one of the greatest challenges in statistical machine translation research as the key contextual information may well be beyond the confine of translation units.
    Page 1, “Abstract”
  2. Often, such reordering decisions require contexts that span across multiple translation units.1 Unfortunately, previous approaches fall short in capturing such cross-unit contextual information that could be
    Page 1, “Introduction”
  3. In this paper, we argue that reordering modeling would greatly benefit from richer cross-boundary contextual information
    Page 1, “Introduction”
  4. We introduce a reordering model that incorporates such contextual information , named the Two-Neighbor Orientation (TNO) model.
    Page 1, “Introduction”
  5. As shown, the empirical results confirm our intuition that SMT can greatly benefit from reordering model that incorporate cross-unit contextual information .
    Page 8, “Experiments”
  6. We presented a novel approach to address a kind of long-distance reordering that requires global cross-boundary contextual information .
    Page 9, “Conclusion”
  7. Empirical results confirm our intuition that incorporating cross-boundaries contextual information improves translation quality.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.

phrase-based

Appears in 7 sentences as: phrase-based (7)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. 1We define translation units as phrases in phrase-based SMT, and as translation rules in syntax-based SMT.
    Page 1, “Introduction”
  2. Specifically, the popular distortion or lexicalized reordering models in phrase-based SMT focus only on making good local prediction (i.e.
    Page 1, “Introduction”
  3. Even though the experimental results carried out in this paper employ SCFG-based SMT systems, we would like to point out that our models is applicable to other systems including phrase-based SMT systems.
    Page 2, “Introduction”
  4. 3We use hierarchical phrase-based translation system as a case in point, but the merit is generalizable to other systems.
    Page 3, “Maximal Orientation Span”
  5. Additionally, this illustration also shows a case where MOS acts as a cross-boundary context which effectively relaxes the context-free assumption of hierarchical phrase-based formalism.
    Page 4, “Maximal Orientation Span”
  6. Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007).
    Page 9, “Related Work”
  7. Our MOS concept is also closely related to hierarchical reordering model (Galley and Manning, 2008) in phrase-based decoding, which computes 0 of b with respect to a multi-block unit that may go beyond 19’.
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

SMT system

Appears in 6 sentences as: SMT system (4) SMT systems (3)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system , which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation.
    Page 2, “Introduction”
  2. We show the efficacy of our proposal in a large-scale Chinese-to-English translation task where the introduction of our TNO model provides a significant gain over a state-of-the-art string-to-dependency SMT system (Shen et al., 2008) that we enhance with additional state-of-the-art features.
    Page 2, “Introduction”
  3. Even though the experimental results carried out in this paper employ SCFG-based SMT systems, we would like to point out that our models is applicable to other systems including phrase-based SMT systems .
    Page 2, “Introduction”
  4. Each of these factors will act as an additional feature in the log-linear framework of our SMT system .
    Page 5, “Model Decomposition and Variants”
  5. Integrating the TNO Model into syntax-based SMT systems is nontrivial, especially with the MOS modeling.
    Page 7, “Decoding”
  6. We describe four versions of the model and implement an algorithm to integrate our proposed model into a syntax-based SMT system .
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention SMT system.

See all papers in Proc. ACL that mention SMT system.

Back to top.

jointly model

Appears in 5 sentences as: joint modeling (1) jointly model (2) jointly models (2)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. In this paper, we propose Two-Neighbor Orientation (TNO) model that jointly models the orientation decisions between anchors and two neighboring multi-unit chunks which may cross phrase or rule boundaries.
    Page 1, “Abstract”
  2. Then, we jointly model the orientations of chunks that immediately precede and follow the anchors (hence, the name “two-neighbor”) along with the maximal span of these chunks, to which we refer as Maximal Orientation Span (MOS).
    Page 1, “Introduction”
  3. To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system, which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation.
    Page 2, “Introduction”
  4. Our Two-Neighbor Orientation model (TNO) designates A C A(@) as anchors and jointly models the orientation of chunks that appear immediately to the left and to the right of the anchors as well as the identities of these chunks.
    Page 2, “Two-Neighbor Orientation Model”
  5. Our approach, which we formulate as a Two-Neighbor Orientation model, includes the joint modeling of two orientation decisions and the modeling of the maximal span of the reordered chunks through the concept of Maximal Orientation Span.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention jointly model.

See all papers in Proc. ACL that mention jointly model.

Back to top.

part-of-speech

Appears in 5 sentences as: part-of-speech (5)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. In our experiments, we use a simple heuristics based on part-of-speech tags which will be described in Section 7.
    Page 2, “Two-Neighbor Orientation Model”
  2. In total, we consider 2l part-of-speech tags; some of which are as follow: VC (copula), DEG, DEG, DER, DEV (de-related), PU (punctuation), AD (adjectives) and P (prepositions).
    Page 5, “Training”
  3. We train the classifiers on a rich set of binary features ranging from lexical to part-of-speech (POS) and to syntactic features.
    Page 6, “Training”
  4. 1. anchor-related: slex (the actual word of 33.12), spos ( part-of-speech (POS) tag of slex), sparent (spos’s parent in the parse tree), tlex (6:: ’s actual target word)..
    Page 6, “Training”
  5. Subsequent improvements use the P(0|b, b’) formula, for example, for incorporating various linguistics feature like part-of-speech (Zens and Ney, 2006), syntactic (Chang et al., 2009), dependency information (Bach et al., 2009) and predicate-argument structure (Xiong et al., 2012).
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention part-of-speech.

See all papers in Proc. ACL that mention part-of-speech.

Back to top.

POS tag

Appears in 5 sentences as: POS tag (7)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. 2. surrounding: lslex (the previous word / 33:11), rslex (the next word/ fJJ-jill), lspos (lsleX’S POS tag), rspos (rsleX’S POS tag ), lsparent (lsleX’S parent), rsparent
    Page 6, “Training”
  2. 3. nonlocal: lanchorslex (thE: pl‘EDVlOLlS anchor’s word) , ranchorslex (the next an-ChOf’S word), lanchorspos (lanchorslex’s POS tag), ranchorspos (ranchorslex’s POS tag ).
    Page 6, “Training”
  3. Of mosl_int_spos (mosl_int_sleX’S POS tag ), mosl_ext_spos (mosl_ext_spos’S PQS tag), mosr_int_slex (the actual word.
    Page 6, “Training”
  4. of mosr_ext_slex (thE: actual word of mosr_int_spos (mosr_int_sleX’S POS tag ),
    Page 6, “Training”
  5. mosr_ext_spos (mosr-ext_spos’s POS tag ).
    Page 6, “Training”

See all papers in Proc. ACL 2013 that mention POS tag.

See all papers in Proc. ACL that mention POS tag.

Back to top.

sentence pair

Appears in 5 sentences as: sentence pair (4) sentence pairs (1)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. Given an aligned sentence pair 9 = (F, E, N), let A(@) be all possible chunks that can be extracted from 9 according to: 2
    Page 2, “Two-Neighbor Orientation Model”
  2. Figure 1: An aligned Chinese-English sentence pair .
    Page 3, “Two-Neighbor Orientation Model”
  3. To be more concrete, let us consider an aligned sentence pair in Fig.
    Page 3, “Two-Neighbor Orientation Model”
  4. For each aligned sentence pair (F, E, N) in the training data, the training starts with the identification of the regions in the source sentences as anchors (A).
    Page 5, “Training”
  5. To train our Two-Neighbor Orientation model, we select a subset of 5 million aligned sentence pairs .
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

word alignment

Appears in 4 sentences as: word alignment (3) words aligned (1)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. We first identify anchors as regions in the source sentences around which ambiguous reordering patterns frequently occur and chunks as regions that are consistent with word alignment which may span multiple translation units at decoding time.
    Page 1, “Introduction”
  2. We also attach the word indices as the superscript of the source words and project the indices to the target words aligned , as such “have5” suggests that the word “have” is aligned to the 5-th source word, i.e.
    Page 4, “Maximal Orientation Span”
  3. Note that to facilitate the projection, the rules must come with internal word alignment in practice.
    Page 4, “Maximal Orientation Span”
  4. the source sentence, the complete translation and the word alignment .
    Page 4, “Maximal Orientation Span”

See all papers in Proc. ACL 2013 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

translation task

Appears in 4 sentences as: translation task (4)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. We integrate our proposed model into a state-of-the-art string-to-dependency translation system and demonstrate the efficacy of our proposal in a large-scale Chinese-to-English translation task .
    Page 1, “Abstract”
  2. We show the efficacy of our proposal in a large-scale Chinese-to-English translation task where the introduction of our TNO model provides a significant gain over a state-of-the-art string-to-dependency SMT system (Shen et al., 2008) that we enhance with additional state-of-the-art features.
    Page 2, “Introduction”
  3. Here, we would like to point out that even in this simple example where all local decisions are made accurate, this ambiguity occurs and it would occur even more so in the real translation task where local decisions may be highly inaccurate.
    Page 4, “Maximal Orientation Span”
  4. In a large scale Chinese-to-English translation task , we achieve a significant improvement over a strong baseline.
    Page 9, “Conclusion”

See all papers in Proc. ACL 2013 that mention translation task.

See all papers in Proc. ACL that mention translation task.

Back to top.

NIST

Appears in 4 sentences as: NIST (4)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. On NIST MT08 set, our most advanced model brings around +2.0 BLEU and -1.0 TER improvement.
    Page 1, “Abstract”
  2. As for the blind test set, we report the performance on the NIST MT08 evaluation set, which consists of 691 sentences from newswire and 666 sentences from weblog.
    Page 8, “Experiments”
  3. Table 4 summarizes the experimental results on NIST MT08 newswire and weblog.
    Page 8, “Experiments”
  4. Table 4: The NIST MT08 results on newswire (nw) and weblog (wb) genres.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

BLEU score

Appears in 4 sentences as: BLEU score (3) BLEU scores (1)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. In columns 2 and 4, we report the BLEU scores , while in columns 3 and 5, we report the TER scores.
    Page 8, “Experiments”
  2. Model 2 which conditions POL on OR provides an additional +0.2 BLEU improvement on BLEU score consistently across the two genres.
    Page 8, “Experiments”
  3. The inclusion of explicit MOS modeling in Model 4 gives a significant BLEU score improvement of +0.5 but no TER improvement in newswire.
    Page 8, “Experiments”
  4. In weblog, Model 4 gives a mixed results of +0.2 BLEU score improvement and a hit of +0.6 TER.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

shift-reduce

Appears in 3 sentences as: shift-reduce (3)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. We implement an efficient shift-reduce algorithm that facilitates the accumulation of partial context in a bottom-up fashion, allowing our model to influence the translation process even in the absence of full context.
    Page 2, “Introduction”
  2. In Section 6, we describe our shift-reduce algorithm which inte-
    Page 2, “Introduction”
  3. The algorithm bears a close resemblance to the shift-reduce algorithm where a stack is used to accumulate (partial) information about a, M L and M R for each a E A in the derivation.
    Page 7, “Decoding”

See all papers in Proc. ACL 2013 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

phrase pair

Appears in 3 sentences as: phrase pair (3) phrase pairs (1)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. 2We represent a chunk as a source and target phrase pair (fjf/BZ) where the subscript and the superscript indicate the
    Page 2, “Two-Neighbor Orientation Model”
  2. UOM Views reordering as a process of generating (19, 0) in a left-to-right fashion, where b is the current phrase pair and 0 is the orientation of b with the preViously generated phrase pair 19’.
    Page 9, “Related Work”
  3. Tillmann and Zhang (2007) proposed a Bigram Orientation Model (BOM) to include both phrase pairs (1) and (9’) into the model.
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

Chinese-English

Appears in 3 sentences as: Chinese-English (3)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. Figure 1: An aligned Chinese-English sentence pair.
    Page 3, “Two-Neighbor Orientation Model”
  2. For our Chinese-English experiments, we use a simple heuristic that equates as anchors, single-word chunks whose corresponding word class belongs to closed-word classes, bearing a close resemblance to (Setiawan et al., 2007).
    Page 5, “Training”
  3. The system is trained on 10 million parallel sentences that are available to the Phase 1 of the DARPA BOLT Chinese-English MT task.
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

unigram

Appears in 3 sentences as: Unigram (1) unigram (2)
In Two-Neighbor Orientation Model with Cross-Boundary Global Contexts
  1. In this way, we hope to upgrade the unigram formulation of existing reordering models to a higher order formulation.
    Page 1, “Introduction”
  2. As the backbone of our string-to-dependency system, we train 3-gram models for left and right dependencies and unigram for head using the target side of the bilingual training data.
    Page 8, “Experiments”
  3. Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007).
    Page 9, “Related Work”

See all papers in Proc. ACL 2013 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.