Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
Zhao, Bing and Lee, Young-Suk and Luo, Xiaoqiang and Li, Liu

Article Structure

Abstract

We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars.

Introduction

Most syntax-based machine translation models with synchronous context free grammar (SCFG) have been relying on the off—the—shelf monolingual parse structures to learn the translation equivalences for string-to-tree, tree—to—string or tree—to—tree grammars.

The Projectable Structures

A context-free style nonterminal in PSCFG rules means the source span governed by the nonterminal should be translated into a contiguous target chunk.

Elementary Trees to String Grammar

We propose to use variations of an elementary tree, which is a connected sub graph fitted in the original monolingual parse tree.

Decoding

Decoding using the proposed elementary tree to string grammar naturally resembles bottom up chart parsing algorithms.

Experiments

In our experiments, we built our system using most of the parallel training data available to us: 250M Arabic running tokens, corresponding to the “unconstrained” condi-

Discussions and Conclusions

We proposed a framework to learn models to predict how to transform an elementary tree into its simplified forms for better executing the word reorderings.

Topics

binarization

Appears in 18 sentences as: binarization (9) Binarizations (1) binarizations (7) binarizing (1)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. In particular, we integrate synchronous binarizations , verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries.
    Page 1, “Abstract”
  2. We specified a few operators for transforming an elementary tree 7, including flattening tree operators such as removing interior nodes in vi, or grouping the children via binarizations .
    Page 3, “Elementary Trees to String Grammar”
  3. Obvious systematic linguistic divergences between language-pairs could be handled by some simple operators such as using binarization to regroup contiguously aligned children.
    Page 4, “Elementary Trees to String Grammar”
  4. 3.4.1 Binarizations
    Page 4, “Elementary Trees to String Grammar”
  5. One of the simplest way for transforming a tree is via binarization .
    Page 4, “Elementary Trees to String Grammar”
  6. Monolingual binarization chooses to regroup children into smaller subtree with a suitable label for the newly created root.
    Page 4, “Elementary Trees to String Grammar”
  7. In decoding time, we need to select trees from all possible binarizations , while in the training time, we restrict the choices allowed with the alignment constraint, that every grouped children should be aligned contiguously on the target side.
    Page 4, “Elementary Trees to String Grammar”
  8. Our goal is to simulate the synchronous binarization as much as we can.
    Page 4, “Elementary Trees to String Grammar”
  9. In this paper, we applied the four basic operators for binarizing a tree: leftmost, rightmost and additionally head-out left and head-out right for more than three children.
    Page 4, “Elementary Trees to String Grammar”
  10. With the proper binarization , the structure becomes rich in substructures which allow certain reordering to happen more likely than others.
    Page 4, “Elementary Trees to String Grammar”
  11. E T as above-mentioned, such as binarizations , at different levels for constructing partial hypothesis.
    Page 6, “Decoding”

See all papers in Proc. ACL 2011 that mention binarization.

See all papers in Proc. ACL that mention binarization.

Back to top.

parse trees

Appears in 15 sentences as: parse tree (4) parse trees (11) parsed trees (1)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars.
    Page 1, “Abstract”
  2. For instance, in Arabic—to—English translation, we find only 45.5% of Arabic NP-SBJ structures are mapped to the English NP-SBJ with machine alignment and parse trees, and only 60.1% of NP-SBJs are mapped with human alignment and parse trees as in § 2.
    Page 1, “Introduction”
  3. Mi and Huang (2008) introduced parse forests to blur the chunking decisions to a certain degree, to expand search space and reduce parsing errors from l-best trees (Mi et al., 2008); others tried to use the parse trees as soft constraints on top of unlabeled grammar such as Hiero (Marton and Resnik, 2008; Chiang, 2010; Huang et al., 2010; Shen et al., 2010) without sufficiently leveraging rich tree context.
    Page 1, “Introduction”
  4. On the basis of our study on investigating the language divergence between Arabic-English with human aligned and parsed data, we integrate several simple statistical operations, to transform parse trees adaptively to serve the
    Page 1, “Introduction”
  5. We carried out a controlled study on the projectable structures using human annotated parse trees and word alignment for 5k Arabic—English sentence-pairs.
    Page 2, “The Projectable Structures”
  6. In Table 1, the unlabeled F-measures with machine alignment and parse trees show that, for only 48.71% of the time, the boundaries introduced by the source parses
    Page 2, “The Projectable Structures”
  7. Table 1: The labeled and unlabeled F-measures for projecting the source nodes onto the target side via alignments and parse trees ; unlabeled F—measures show the bracketing accuracies for translating a source span contiguously.
    Page 2, “The Projectable Structures”
  8. We propose to use variations of an elementary tree, which is a connected sub graph fitted in the original monolingual parse tree .
    Page 3, “Elementary Trees to String Grammar”
  9. where of is a set of frontier nodes which contain nonterminals or words; of are the interior nodes with source la-bels/symbols; E is the set of edges connecting the nodes 12 = of +vi into a connected subgraph fitted in the source parse tree ; 6 is the immediate common parent of the frontier nodes of .
    Page 3, “Elementary Trees to String Grammar”
  10. Given a grammar G, and the input source parse tree 7r from a monolingual parser, we first construct the elementary tree for a source span, and then retrieve all the relevant subgraphs seen in the given grammar through the proposed operators.
    Page 6, “Decoding”
  11. There are 16 thousand human parse trees with human alignment; additional 1 thousand human parse and aligned sent-pairs are used as unseen test set to verify our MaxEnt models and parsers.
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

BLEU

Appears in 14 sentences as: BLEU (16)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST—08 evaluations by 1.3 absolute BLEU , which is statistically significant.
    Page 1, “Abstract”
  2. We use BLEU (Papineni et al., 2002) and TER (Snover et al., 2006) to evaluate translation qualities.
    Page 6, “Experiments”
  3. and we achieved a BLEUr4n4 55.01 for MT08-NW, or a cased BLEU of 53.31, which is close to the best officially reported result 53.85 for unconstrained systems.2 We expose the statistical decisions in Eqn.
    Page 7, “Experiments”
  4. 3 as additional cost, the translation results in Table 11 show it helps BLEU by 0.29 BLEU points (56.13 V.s.
    Page 7, “Experiments”
  5. Table 10: TER and BLEU for MT08-NW, using only {(7)
    Page 7, “Experiments”
  6. Table 11: TER and BLEU for MT08-NW, using {(7,
    Page 7, “Experiments”
  7. over the baseline from +0.18 (via rightmost binarization) to +0.52 (via head-out-right) BLEU points.
    Page 8, “Experiments”
  8. Additionally, regrouping the verbs is marginally helpful for BLEU and TER.
    Page 8, “Experiments”
  9. It helps more on TER than BLEU for MT08—NW.
    Page 8, “Experiments”
  10. Table 11 shows when we add such boundary markups in our rules, an improvement of 0.33 BLEU points were obtained (56.46 v.s.
    Page 8, “Experiments”
  11. We conclude the investigation with full function {Wm—1), which leads to a BLEUr4n4 of 56.87 (cased BLEUr4n4c 55.16), a significant improvement of 1.77 BLEU point over a already strong baseline.
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

MaxEnt

Appears in 9 sentences as: MaXEnt (1) MaxEnt (8)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. The boundary cases were not addressed in the previous literature for trees, and here we include them in our feature sets for learning a MaxEnt model to predict the transformations.
    Page 2, “Introduction”
  2. The rest of the paper is organized as follows: in section 2, we analyze the projectable structures using human aligned and parsed data, to identify the problems for SCFG in general; in section 3, our proposed approach is explained in detail, including the statistical operators using a MaxEnt model; in section 4, we illustrate the integration of the proposed approach in our decoder; in section 5, we present experimental results; in section 6, we conclude with discussions and future work.
    Page 2, “Introduction”
  3. During training, we label nodes with translation boundaries, as one additional fitnction tag; during decoding, we employ the MaxEnt model to predict the translation boundary label probability for each span associated with a subgraph y, and discourage derivations accordingly for using nonterminals over the non—translation boundary span.
    Page 6, “Elementary Trees to String Grammar”
  4. To learn our MaxEnt models defined in § 3.3, we collect the events during extracting elm2str grammar in training time, and learn the model using improved iterative scaling.
    Page 6, “Experiments”
  5. There are 16 thousand human parse trees with human alignment; additional 1 thousand human parse and aligned sent-pairs are used as unseen test set to verify our MaxEnt models and parsers.
    Page 6, “Experiments”
  6. It showed our MaxEnt model is very accurate using human trees: 94.5% of accuracy, and about 84.7% of accuracy for using the machine parsed trees.
    Page 7, “Experiments”
  7. According to our MaxEnt model, 20% of times we should discourage a VP tree to be translated contiguously; such VP trees have an average span length of 16.9 tokens in MT08-NW.
    Page 7, “Experiments”
  8. The highlighted parts in Figure 3 show that, the rules on partial trees are effectively selected and applied for capturing long-distance word reordering, which is otherwise rather difficult to get correct in a phrasal system even with a MaxEnt reordering model.
    Page 8, “Experiments”
  9. We achieved a high accuracy of 84.7% for predicting such boundaries using MaXEnt model on machine parse trees.
    Page 9, “Discussions and Conclusions”

See all papers in Proc. ACL 2011 that mention MaxEnt.

See all papers in Proc. ACL that mention MaxEnt.

Back to top.

TER

Appears in 7 sentences as: TER (7)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. We use BLEU (Papineni et al., 2002) and TER (Snover et al., 2006) to evaluate translation qualities.
    Page 6, “Experiments”
  2. | Setups \ TER \ BLEUr4n4 \
    Page 7, “Experiments”
  3. Table 10: TER and BLEU for MT08-NW, using only {(7)
    Page 7, “Experiments”
  4. | Setups \ TER \ BLEUr4n4 \
    Page 7, “Experiments”
  5. Table 11: TER and BLEU for MT08-NW, using {(7,
    Page 7, “Experiments”
  6. Additionally, regrouping the verbs is marginally helpful for BLEU and TER .
    Page 8, “Experiments”
  7. It helps more on TER than BLEU for MT08—NW.
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

BLEU points

Appears in 5 sentences as: BLEU point (1) BLEU points (5)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. 3 as additional cost, the translation results in Table 11 show it helps BLEU by 0.29 BLEU points (56.13 V.s.
    Page 7, “Experiments”
  2. over the baseline from +0.18 (via rightmost binarization) to +0.52 (via head-out-right) BLEU points .
    Page 8, “Experiments”
  3. Table 11 shows when we add such boundary markups in our rules, an improvement of 0.33 BLEU points were obtained (56.46 v.s.
    Page 8, “Experiments”
  4. We conclude the investigation with full function {Wm—1), which leads to a BLEUr4n4 of 56.87 (cased BLEUr4n4c 55.16), a significant improvement of 1.77 BLEU point over a already strong baseline.
    Page 8, “Experiments”
  5. On DEV10—NW, we observed l.29 BLEU points improvement, and about 0.63 and 0.98 improved BLEU points for MT08—WB and DEV10-WB, respectively.
    Page 8, “Experiments”

See all papers in Proc. ACL 2011 that mention BLEU points.

See all papers in Proc. ACL that mention BLEU points.

Back to top.

lexicalization

Appears in 5 sentences as: lexicalization (2) lexicalized (2) lexicalizing (1)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. The transformations could be as simple as merging two adjacent nonterminals into one bracket to accommodate non-contiguity on the target side, or lexicalizing those words which have fork-style, many—to—many alignment, or unaligned content words to enable the rest of the span to be generalized into nonterminals.
    Page 2, “The Projectable Structures”
  2. It should be easier to model the swapping of (N OUN ADJ) using the tree (NP N OUN , ADJ) instead of the original bigger tree of (N P—SBJ Azmp, N OUN , ADJ) with one lexicalized node.
    Page 3, “The Projectable Structures”
  3. The frontier nodes can be merged, lexicalized , or even deleted in the tree-to-string rule associated with 7’ , as long as the alignment for the nonterminals are book-kept in the derivations.
    Page 3, “Elementary Trees to String Grammar”
  4. The NP tree in Figure 2 happens to be an “inside-out” style alignment, and context free grammar such as ITG (Wu, 1997) can not explain this structure well without necessary lexicalization .
    Page 5, “Elementary Trees to String Grammar”
  5. With lexicalization , a Hiero style rule “de X Aly AlAnfjAr I—> to ignite X” is potentially a better alternative for translating the NP tree.
    Page 5, “Elementary Trees to String Grammar”

See all papers in Proc. ACL 2011 that mention lexicalization.

See all papers in Proc. ACL that mention lexicalization.

Back to top.

content word

Appears in 3 sentences as: content word (2) content words (1)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. The transformations could be as simple as merging two adjacent nonterminals into one bracket to accommodate non-contiguity on the target side, or lexicalizing those words which have fork-style, many—to—many alignment, or unaligned content words to enable the rest of the span to be generalized into nonterminals.
    Page 2, “The Projectable Structures”
  2. (0/ , 7’ ) deletes a src content word
    Page 4, “Elementary Trees to String Grammar”
  3. (0/ , 7’ ) over generates a tgt content word ( v )
    Page 4, “Elementary Trees to String Grammar”

See all papers in Proc. ACL 2011 that mention content word.

See all papers in Proc. ACL that mention content word.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. Most syntax-based machine translation models with synchronous context free grammar (SCFG) have been relying on the off—the—shelf monolingual parse structures to learn the translation equivalences for string-to-tree, tree—to—string or tree—to—tree grammars.
    Page 1, “Introduction”
  2. However, state—of-the-art monolingual parsers are not necessarily well suited for machine translation in terms of both labels and chunks/brackets.
    Page 1, “Introduction”
  3. 3 significantly enriches reordering powers for syntax-based machine translation .
    Page 4, “Elementary Trees to String Grammar”

See all papers in Proc. ACL 2011 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

translation qualities

Appears in 3 sentences as: translation qualities (3)
In Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
  1. We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars.
    Page 1, “Abstract”
  2. We integrate the neighboring context of the subgraph in our transformation preference predictions, and this improve translation qualities further.
    Page 2, “Introduction”
  3. We use BLEU (Papineni et al., 2002) and TER (Snover et al., 2006) to evaluate translation qualities .
    Page 6, “Experiments”

See all papers in Proc. ACL 2011 that mention translation qualities.

See all papers in Proc. ACL that mention translation qualities.

Back to top.