Discriminative Pruning for Discriminative ITG Alignment
Liu, Shujie and Li, Chi-Ho and Zhou, Ming

Article Structure

Abstract

While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed.

Introduction

Inversion transduction grammar (ITG) (Wu, 1997) is an adaptation of SCFG to bilingual parsing.

Basics of ITG

The simplest formulation of ITG contains three types of rules: terminal unary rules X —> e/ f , where e and f represent words (possibly a null word, 8) in the English and foreign language respectively, and the binary rules X —> [X , X] and X —> (X, X), which refer to that the component English and foreign phrases are combined in the same and inverted order respectively.

Basics of ITG Parsing

Based on the rules in normal form, ITG word alignment is done in a similar way to chart parsing (Wu, 1997).

Pruning in ITG Parsing

The ITG parsing framework has three levels of pruning:

The DPDI Framework

DPDI, the discriminative pruning model proposed in this paper, assigns score to a span pair (f, 6) as probability from a log-linear model:

The DITG Models

The discriminative ITG alignment can be conceived as a two-staged process.

Evaluation

DPDI is evaluated against the baselines of Tic-tac-toe (TTT) pruning (Zhang and Gildea, 2005) and Dynamic Program (DP) pruning (Haghighi et al., 2009; DeNero et al., 2009) with respect to Chinese-to-English alignment and translation.

Conclusion and Future Work

This paper reviews word alignment through ITG parsing, and clarifies the problem of ITG pruning.

Topics

phrase pair

Appears in 19 sentences as: phrase pair (15) Phrase Pairs (1) phrase pairs (9)
In Discriminative Pruning for Discriminative ITG Alignment
  1. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs , which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
    Page 1, “Abstract”
  2. On top of the discriminative pruning method, we also propose a discriminative ITG alignment system using hierarchical phrase pairs .
    Page 1, “Introduction”
  3. f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g.
    Page 5, “The DPDI Framework”
  4. 3 An inconsistent link connects a word Within the phrase pair to some word outside the phrase pair .
    Page 5, “The DPDI Framework”
  5. Another is DITG with hierarchical phrase pairs (henceforth HP-DITG), which relaxes the l-to-l constraint by adopting hierarchical phrase pairs in Chiang (2007).
    Page 5, “The DITG Models”
  6. 6.2 DITG with Hierarchical Phrase Pairs
    Page 6, “The DITG Models”
  7. Wu (1997) proposes a bilingual segmentation grammar extending the terminal rules by including phrase pairs .
    Page 6, “The DITG Models”
  8. Cherry and Lin (2007) incorporate phrase pairs in phrase-based SMT into ITG, and Haghighi et a].
    Page 6, “The DITG Models”
  9. HP-DITG extends Cherry and Lin’s approach by not only employing simple phrase pairs but also hierarchical phrase pairs (Chiang, 2007).
    Page 6, “The DITG Models”
  10. The grammar is enriched with rules of the format: X —>e'l-/ Where e'l- and refer to the English and foreign side of the i-th (simple/hierarchical) phrase pair respectively.
    Page 6, “The DITG Models”
  11. As example, if there is a simple phrase pair
    Page 6, “The DITG Models”

See all papers in Proc. ACL 2010 that mention phrase pair.

See all papers in Proc. ACL that mention phrase pair.

Back to top.

F-score

Appears in 16 sentences as: F-Score (1) F-score (18)
In Discriminative Pruning for Discriminative ITG Alignment
  1. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
    Page 1, “Abstract”
  2. The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure.
    Page 5, “The DITG Models”
  3. Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment.
    Page 5, “The DITG Models”
  4. An alternative criterion is the upper bound on alignment F-score , which essentially measures how many links in annotated alignment can be kept in ITG parse.
    Page 6, “Evaluation”
  5. The calculation of F-score upper bound is done in a bottom-up way like ITG parsing.
    Page 6, “Evaluation”
  6. The upper bound of alignment F-score can thus be calculated as well.
    Page 7, “Evaluation”
  7. Finally, we also do end-to-end evaluation using both F-score in alignment and Bleu score in translation.
    Page 7, “Evaluation”
  8. In terms of F-score upper bound, DPDI is 1 percent higher.
    Page 7, “Evaluation”
  9. DPDI achieves even larger improvement in actual F-score .
    Page 7, “Evaluation”
  10. To enable TTT achieving similar F-score or F-score upper bound, the beam size has to be doubled and the time cost is more than twice the original (c.f.
    Page 7, “Evaluation”
  11. In fact, we fail to enable DP achieve the same F-score upper bound as the other two methods before DP leads to intolerable memory consumption.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2010 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

word pairs

Appears in 14 sentences as: Word pair (1) word pair (3) word pairs (10)
In Discriminative Pruning for Discriminative ITG Alignment
  1. HMM); Zhang and Gildea (2005) propose Tic-tac-toe pruning, which is based on the Model 1 probabilities of word pairs inside and outside a pair of spans.
    Page 1, “Introduction”
  2. From the viewpoint of word alignment, the terminal unary rules provide the links of word pairs , whereas the binary rules represent the reordering factor.
    Page 1, “Basics of ITG”
  3. For instance, there are two parses for three consecutive word pairs , viz.
    Page 2, “Basics of ITG”
  4. The base step applies all relevant terminal unary rules to establish the links of word pairs .
    Page 2, “Basics of ITG Parsing”
  5. The word pairs are then combined into span pairs in all possible ways.
    Page 2, “Basics of ITG Parsing”
  6. In the base step, only the word pairs listed in sentence-level annotation are inserted in the hypergraph, and the re-cursive steps are just the same as usual.
    Page 3, “The DPDI Framework”
  7. Zhang and Gildea (2005) show that Model 1 (Brown et al., 1993; Och and Ney., 2000) probabilities of the word pairs inside and outside a span pair ([ei1,ei2]/[jj-1,jj-2]) are useful.
    Page 4, “The DPDI Framework”
  8. probability of word pairs Within the span pair):
    Page 4, “The DPDI Framework”
  9. probability of the word pairs outside the span pair):
    Page 4, “The DPDI Framework”
  10. The word pair 62 / f 1 is the only link in the span pair.
    Page 5, “The DPDI Framework”
  11. The following features about alignment link are used in W-DITG: 1) Word pair translation probabilities trained from HMM model (Vogel, et.al., 1996) and IBM model 4 (Brown et.al., 1993; Och and Ney, 2000).
    Page 5, “The DITG Models”

See all papers in Proc. ACL 2010 that mention word pairs.

See all papers in Proc. ACL that mention word pairs.

Back to top.

word alignment

Appears in 9 sentences as: word alignment (8) word aligns (1)
In Discriminative Pruning for Discriminative ITG Alignment
  1. For this reason ITG has gained more and more attention recently in the word alignment community (Zhang and Gildea, 2005; Cherry and Lin, 2006; Haghighi et al., 2009).
    Page 1, “Introduction”
  2. From the viewpoint of word alignment , the terminal unary rules provide the links of word pairs, whereas the binary rules represent the reordering factor.
    Page 1, “Basics of ITG”
  3. First of all, it imposes a l-to-l constraint in word alignment .
    Page 1, “Basics of ITG”
  4. Secondly, the simple ITG leads to redundancy if word alignment is the sole purpose of applying ITG.
    Page 2, “Basics of ITG”
  5. Based on the rules in normal form, ITG word alignment is done in a similar way to chart parsing (Wu, 1997).
    Page 2, “Basics of ITG Parsing”
  6. Discriminative approaches to word alignment use manually annotated alignment for sentence pairs.
    Page 3, “The DPDI Framework”
  7. However, in reality there are often the cases where a foreign word aligns to more than one English word.
    Page 3, “The DPDI Framework”
  8. Table 3 lists the word alignment time cost and SMT performance of different pruning methods.
    Page 8, “Evaluation”
  9. This paper reviews word alignment through ITG parsing, and clarifies the problem of ITG pruning.
    Page 8, “Conclusion and Future Work”

See all papers in Proc. ACL 2010 that mention word alignment.

See all papers in Proc. ACL that mention word alignment.

Back to top.

sentence pair

Appears in 8 sentences as: sentence pair (5) sentence pairs (3)
In Discriminative Pruning for Discriminative ITG Alignment
  1. Larger and larger span pairs are recursively built until the sentence pair is built.
    Page 2, “Basics of ITG Parsing”
  2. Figure 1(a) shows one possible derivation for a toy example sentence pair with three words in each sentence.
    Page 2, “Basics of ITG Parsing”
  3. Discriminative approaches to word alignment use manually annotated alignment for sentence pairs .
    Page 3, “The DPDI Framework”
  4. Discriminative pruning, however, handles not only a sentence pair but every possible span pair.
    Page 3, “The DPDI Framework”
  5. The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure.
    Page 5, “The DITG Models”
  6. Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment.
    Page 5, “The DITG Models”
  7. The 491 sentence pairs in this dataset are adapted to our own Chinese word segmentation standard.
    Page 7, “Evaluation”
  8. 250 sentence pairs are used as training data and the other 241 are test data.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2010 that mention sentence pair.

See all papers in Proc. ACL that mention sentence pair.

Back to top.

alignment model

Appears in 7 sentences as: alignment model (7)
In Discriminative Pruning for Discriminative ITG Alignment
  1. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
    Page 1, “Abstract”
  2. (2009) do pruning based on the probabilities of links from a simpler alignment model (viz.
    Page 1, “Introduction”
  3. The simpler alignment model we used is HMM.
    Page 3, “The DPDI Framework”
  4. The four links are produced by some simpler alignment model like HMM.
    Page 5, “The DPDI Framework”
  5. f len +elen Where #linksincon is the number of links which are inconsistent with the phrase pair according to some simpler alignment model (e.g.
    Page 5, “The DPDI Framework”
  6. (2009) show that posterior probabilities from the HMM alignment model is useful for pruning.
    Page 5, “The DPDI Framework”
  7. IBM Model 1 and HMM alignment model are re-implemented as they are required by the three ITG pruning methods.
    Page 7, “Evaluation”

See all papers in Proc. ACL 2010 that mention alignment model.

See all papers in Proc. ACL that mention alignment model.

Back to top.

Bleu

Appears in 6 sentences as: Bleu (6)
In Discriminative Pruning for Discriminative ITG Alignment
  1. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
    Page 1, “Abstract”
  2. Finally, we also do end-to-end evaluation using both F-score in alignment and Bleu score in translation.
    Page 7, “Evaluation”
  3. HP-DITG using DPDI achieves the best Bleu score with acceptable time cost.
    Page 8, “Evaluation”
  4. It shows that HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score.
    Page 8, “Evaluation”
  5. Note that the Bleu score differences between HP-DITG and the three baselines are statistically significant (Koehn, 2004)
    Page 8, “Evaluation”
  6. This also explains (in Tables 2 and 3) why DPDI with beam size 10 leads to higher Bleu than TTT with beam size 20, even though both pruning methods lead to roughly the same alignment F-score
    Page 8, “Evaluation”

See all papers in Proc. ACL 2010 that mention Bleu.

See all papers in Proc. ACL that mention Bleu.

Back to top.

beam size

Appears in 5 sentences as: beam size (6)
In Discriminative Pruning for Discriminative ITG Alignment
  1. The third type of pruning is equivalent to minimizing the beam size of alignment hypotheses in each hypernode.
    Page 2, “Pruning in ITG Parsing”
  2. Note that the beam size (max number of E-spans) for each F-span is 10.
    Page 4, “The DPDI Framework”
  3. The results for W-DITG are listed in Table l. Tests 1 and 2 show that with the same beam size (i.e.
    Page 7, “Evaluation”
  4. To enable TTT achieving similar F-score or F-score upper bound, the beam size has to be doubled and the time cost is more than twice the original (c.f.
    Page 7, “Evaluation”
  5. This also explains (in Tables 2 and 3) why DPDI with beam size 10 leads to higher Bleu than TTT with beam size 20, even though both pruning methods lead to roughly the same alignment F-score
    Page 8, “Evaluation”

See all papers in Proc. ACL 2010 that mention beam size.

See all papers in Proc. ACL that mention beam size.

Back to top.

Bleu score

Appears in 5 sentences as: Bleu score (5)
In Discriminative Pruning for Discriminative ITG Alignment
  1. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
    Page 1, “Abstract”
  2. Finally, we also do end-to-end evaluation using both F-score in alignment and Bleu score in translation.
    Page 7, “Evaluation”
  3. HP-DITG using DPDI achieves the best Bleu score with acceptable time cost.
    Page 8, “Evaluation”
  4. It shows that HP-DITG (with DPDI) is better than the three baselines both in alignment F-score and Bleu score .
    Page 8, “Evaluation”
  5. Note that the Bleu score differences between HP-DITG and the three baselines are statistically significant (Koehn, 2004)
    Page 8, “Evaluation”

See all papers in Proc. ACL 2010 that mention Bleu score.

See all papers in Proc. ACL that mention Bleu score.

Back to top.

sentence-level

Appears in 4 sentences as: sentence-level (4)
In Discriminative Pruning for Discriminative ITG Alignment
  1. Rather than recruiting annotators for marking span pairs, we modify the parsing algorithm in Section 3 so as to produce span pair annotation out of sentence-level annotation.
    Page 3, “The DPDI Framework”
  2. In the base step, only the word pairs listed in sentence-level annotation are inserted in the hypergraph, and the re-cursive steps are just the same as usual.
    Page 3, “The DPDI Framework”
  3. If the sentence-level annotation satisfies the alignment constraints of ITG, then each F-span will have only one E-span in the parse tree.
    Page 3, “The DPDI Framework”
  4. It should be noted that this automatic span pair annotation may violate some of the links in the original sentence-level alignment annotation.
    Page 3, “The DPDI Framework”

See all papers in Proc. ACL 2010 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

Error Rate

Appears in 3 sentences as: Error Rate (2) error rate (1)
In Discriminative Pruning for Discriminative ITG Alignment
  1. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment.
    Page 1, “Abstract”
  2. Parameter training of DPDI is based on Minimum Error Rate Training (MERT) (Och, 2003), a Widely used method in SMT.
    Page 3, “The DPDI Framework”
  3. That is, the first evaluation metric, pruning error rate (henceforth PER), measures how many correct E-spans are discarded.
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention Error Rate.

See all papers in Proc. ACL that mention Error Rate.

Back to top.

parse tree

Appears in 3 sentences as: parse tree (3)
In Discriminative Pruning for Discriminative ITG Alignment
  1. Once the complete parse tree is built, the k-best list of the topmost span is obtained by minimally expanding the list of alignment hypotheses of minimal number of span pairs.
    Page 2, “Pruning in ITG Parsing”
  2. If the sentence-level annotation satisfies the alignment constraints of ITG, then each F-span will have only one E-span in the parse tree .
    Page 3, “The DPDI Framework”
  3. The major drawback of PER is that not all decisions in pruning would impact on alignment quality, since certain F-spans are of little use to the entire ITG parse tree .
    Page 6, “Evaluation”

See all papers in Proc. ACL 2010 that mention parse tree.

See all papers in Proc. ACL that mention parse tree.

Back to top.