Dependency-based Pre-ordering for Chinese-English Machine Translation
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie

Article Structure

Abstract

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.

Introduction

SMT systems have difficulties translating between distant language pairs such as Chinese and English.

Dependency-based Pre-ordering Rule Set

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Experiments

We used the MOSES PBSMT system (Koehn et al., 2007) in our experiments.

Conclusion

In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system.

Topics

dependency parse

Appears in 20 sentences as: dependency parse (11) dependency parsers (4) dependency parsing (6)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT.
    Page 1, “Abstract”
  2. Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).
    Page 1, “Introduction”
  3. In contrast, we propose a set of pre-ordering rules for dependency parsers .
    Page 1, “Introduction”
  4. (2007) exist, it is almost impossible to automatically convert their rules into rules that are applicable to dependency parsers .
    Page 1, “Introduction”
  5. In fact, we abandoned our initial attempts to automatically convert their rules into rules for dependency parsers , and
    Page 1, “Introduction”
  6. (b) Stanford typed dependency parse tree
    Page 2, “Introduction”
  7. Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.
    Page 2, “Introduction”
  8. They created a pre-ordering rule set for dependency parsers from English to several SOV languages.
    Page 2, “Introduction”
  9. Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same
    Page 2, “Dependency-based Pre-ordering Rule Set”
  10. As shown in the figure, the number of nodes in the dependency parse tree (i.e.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  11. Because dependency parse trees are generally more concise than the constituent ones, they can conduct long-distance reorderings in a finer way.
    Page 2, “Dependency-based Pre-ordering Rule Set”

See all papers in Proc. ACL 2014 that mention dependency parse.

See all papers in Proc. ACL that mention dependency parse.

Back to top.

parse trees

Appears in 16 sentences as: parse tree (8) parse trees (13)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. These pre-ordering approaches first parse the source language sentences to create parse trees .
    Page 1, “Introduction”
  2. Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language.
    Page 1, “Introduction”
  3. terrorism definition (a) A constituent parse tree
    Page 2, “Introduction”
  4. (b) Stanford typed dependency parse tree
    Page 2, “Introduction”
  5. Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.
    Page 2, “Introduction”
  6. Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same
    Page 2, “Dependency-based Pre-ordering Rule Set”
  7. As shown in the figure, the number of nodes in the dependency parse tree (i.e.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  8. 9) is much fewer than that in its corresponding constituent parse tree (i.e.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  9. Because dependency parse trees are generally more concise than the constituent ones, they can conduct long-distance reorderings in a finer way.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  10. I Search the Chinese dependency parse trees in the corpus and rank all of the structures matching the two types of rules respectively according to their frequencies.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  11. For each kind of structure, we selected some of the sample dependency parse trees that contained it, tried to restructure the parse trees according to the matched rule and judged the reordered Chinese phrases.
    Page 3, “Dependency-based Pre-ordering Rule Set”

See all papers in Proc. ACL 2014 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

constituent parse

Appears in 12 sentences as: constituency parsers (1) constituent parse (7) constituent parsers (3) constituent parsing (2)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs, such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).
    Page 1, “Introduction”
  2. Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).
    Page 1, “Introduction”
  3. They created a set of pre-ordering rules for constituent parsers for Chinese-English PBSMT.
    Page 1, “Introduction”
  4. terrorism definition (a) A constituent parse tree
    Page 2, “Introduction”
  5. Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.
    Page 2, “Introduction”
  6. By applying our rules and Wang et al.’s rules, one can use both dependency and constituency parsers for pre-ordering in Chinese-English PB SMT.
    Page 2, “Introduction”
  7. Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same
    Page 2, “Dependency-based Pre-ordering Rule Set”
  8. 9) is much fewer than that in its corresponding constituent parse tree (i.e.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  9. First, we converted the constituent parse trees in the results of the Berkeley Parser into dependency parse trees by employing a tool in the Stanford Parser (Klein and Manning, 2003).
    Page 4, “Experiments”
  10. In our opinion, the reason for the great decrease was that the dependency parse trees were more concise than the constituent parse trees in describing sentences and they could also describe the reordering at the sentence level in a finer way.
    Page 5, “Experiments”
  11. In contrast, the constituent parse trees were more redundant and they needed more nodes to conduct long-distance reordering.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention constituent parse.

See all papers in Proc. ACL that mention constituent parse.

Back to top.

Chinese-English

Appears in 11 sentences as: Chinese-English (11)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT.
    Page 1, “Abstract”
  2. Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs, such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).
    Page 1, “Introduction”
  3. The purpose of this paper is to introduce a novel dependency-based pre-ordering approach through creating a pre-ordering rule set and applying it to the Chinese-English PBSMT system.
    Page 1, “Introduction”
  4. To our knowledge, our manually created pre-ordering rule set is the first Chinese-English dependency-based pre-ordering rule set.
    Page 1, “Introduction”
  5. They created a set of pre-ordering rules for constituent parsers for Chinese-English PBSMT.
    Page 1, “Introduction”
  6. By applying our rules and Wang et al.’s rules, one can use both dependency and constituency parsers for pre-ordering in Chinese-English PB SMT.
    Page 2, “Introduction”
  7. In contrast, our rule set is for Chinese-English PBSMT.
    Page 2, “Introduction”
  8. Because there are a lot of language specific decisions that reflect specific aspects of the source language and the language pair combination, our rule set provides a valuable resource for pre-ordering in Chinese-English PBSMT.
    Page 2, “Introduction”
  9. Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
    Page 4, “Experiments”
  10. In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system.
    Page 5, “Conclusion”
  11. These results indicated that dependency parsing is more effective for conducting pre-ordering for Chinese-English PBSMT.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

Berkeley Parser

Appears in 7 sentences as: Berkeley Parser (7)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. The Berkeley Parser (Petrov et al., 2006) was employed for parsing the Chinese sentences.
    Page 4, “Experiments”
  2. For training the Berkeley Parser , we used Chinese Treebank (CTB) 7.0.
    Page 4, “Experiments”
  3. We conducted our dependency-based pre-ordering experiments on the Berkeley Parser and the Mate Parser (Bohnet, 2010), which were shown to be the two best parsers for Stanford typed dependencies (Che et al., 2012).
    Page 4, “Experiments”
  4. First, we converted the constituent parse trees in the results of the Berkeley Parser into dependency parse trees by employing a tool in the Stanford Parser (Klein and Manning, 2003).
    Page 4, “Experiments”
  5. Thus, we then extracted the POS information from the results of the Berkeley Parser and used these as the pre-specified POS tags for the Mate Parser.
    Page 4, “Experiments”
  6. Finally, we applied our dependency-based pre-ordering rule set to the dependency parse trees created from the converted Berkeley Parser and the Mate Parser, respectively.
    Page 4, “Experiments”
  7. Table 1 presents a comparison of the system without pre-ordering, the constituent system using WR07 and two dependency systems employing the converted Berkeley Parser and the Mate Parser, respectively.
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention Berkeley Parser.

See all papers in Proc. ACL that mention Berkeley Parser.

Back to top.

BLEU

Appears in 7 sentences as: BLEU (7)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. We present a set of dependency-based pre-ordering rules which improved the BLEU score by 1.61 on the NIST 2006 evaluation data.
    Page 1, “Abstract”
  2. Experiment results showed that our pre-ordering rule set improved the BLEU score on the NIST 2006 evaluation data by 1.61.
    Page 1, “Introduction”
  3. In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  4. Lng the performance ( BLEU ) on the test set, the total
    Page 4, “Experiments”
  5. For evaluation, we used BLEU scores (Papineni et al., 2002).
    Page 4, “Experiments”
  6. It shows the BLEU scores on the test set and the statistics of pre-ordering on the training set, which includes the total count of each rule set and the number of sentences they were ap-
    Page 4, “Experiments”
  7. The results showed that our approach achieved a BLEU score gain of 1.61.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

BLEU score

Appears in 6 sentences as: BLEU score (3) BLEU scores (3)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. We present a set of dependency-based pre-ordering rules which improved the BLEU score by 1.61 on the NIST 2006 evaluation data.
    Page 1, “Abstract”
  2. Experiment results showed that our pre-ordering rule set improved the BLEU score on the NIST 2006 evaluation data by 1.61.
    Page 1, “Introduction”
  3. In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  4. For evaluation, we used BLEU scores (Papineni et al., 2002).
    Page 4, “Experiments”
  5. It shows the BLEU scores on the test set and the statistics of pre-ordering on the training set, which includes the total count of each rule set and the number of sentences they were ap-
    Page 4, “Experiments”
  6. The results showed that our approach achieved a BLEU score gain of 1.61.
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention BLEU score.

See all papers in Proc. ACL that mention BLEU score.

Back to top.

dependency relation

Appears in 6 sentences as: dependency relation (4) dependency relations (3)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. Here, both x and y are dependency relations (e.g., plmod or lobj in Figure 2).
    Page 2, “Dependency-based Pre-ordering Rule Set”
  2. We define the dependency structure of a dependency relation as the structure containing the dependent word (e. g., the word directly indicated by plmod, or “El?” in Figure 2) and the whole subtree under the dependency relation (all of the words that directly or indirectly depend on the dependent word, or the words under “El?” in Figure 2).
    Page 2, “Dependency-based Pre-ordering Rule Set”
  3. Further, we define X and Y as the corresponding dependency structures of the dependency relations x and y, respectively.
    Page 2, “Dependency-based Pre-ordering Rule Set”
  4. For example, in Figure 2, let x and y denote plmod and lobj dependency relations , then X represents “El?” and all words under “E'TJ”, Y represents “iii/LEE” and all words under “iii/3E”, and X \Y represents
    Page 2, “Dependency-based Pre-ordering Rule Set”
  5. 2) Filter out the structures from which it was almost impossible to derive candidate pre-ordering rules because x or y was an “irrespective” dependency relation , for example, root, conj, cc and so on.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  6. As a result, we obtained eight pre-ordering rules in total, which can be divided into three dependency relation categories.
    Page 3, “Dependency-based Pre-ordering Rule Set”

See all papers in Proc. ACL 2014 that mention dependency relation.

See all papers in Proc. ACL that mention dependency relation.

Back to top.

language pairs

Appears in 6 sentences as: language pair (1) language pairs (5)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.
    Page 1, “Abstract”
  2. SMT systems have difficulties translating between distant language pairs such as Chinese and English.
    Page 1, “Introduction”
  3. Reordering therefore becomes a key issue in SMT systems between distant language pairs .
    Page 1, “Introduction”
  4. Syntax-based pre-ordering by employing constituent parsing have demonstrated effectiveness in many language pairs , such as English-French (Xia and McCord, 2004), German-English (Collins et al., 2005), Chinese-English (Wang et al., 2007; Zhang et al., 2008), and English-Japanese (Lee et al., 2010).
    Page 1, “Introduction”
  5. Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).
    Page 1, “Introduction”
  6. Because there are a lot of language specific decisions that reflect specific aspects of the source language and the language pair combination, our rule set provides a valuable resource for pre-ordering in Chinese-English PBSMT.
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention language pairs.

See all papers in Proc. ACL that mention language pairs.

Back to top.

word order

Appears in 6 sentences as: word order (4) word orders (2)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders .
    Page 1, “Abstract”
  2. The reason for this is that there are great differences in their word orders .
    Page 1, “Introduction”
  3. Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language.
    Page 1, “Introduction”
  4. If the reordering produced a Chinese phrase that had a closer word order to that of the English one, this structure would be a candidate pre-ordering rule.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  5. In this example, with the application of an nsubj : rcmod rule, the phrase can be translated into “a senior official close to Sharon say”, which has a word order very close to English.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  6. A bilingual speaker of Chinese and English looked at an original Chinese phrase and the pre-ordered one with their corresponding English phrase and judged whether the pre-ordering obtained a Chinese phrase that had a closer word order to the English one.
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

development set

Appears in 5 sentences as: development set (5)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. 4 Conduct primary experiments which used the same training set and development set as the experiments described in Section 3.
    Page 3, “Dependency-based Pre-ordering Rule Set”
  2. In the primary experiments, we tested the effectiveness of the candidate rules and filtered the ones that did not work based on the BLEU scores on the development set .
    Page 3, “Dependency-based Pre-ordering Rule Set”
  3. Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
    Page 4, “Experiments”
  4. lected from the development set .
    Page 5, “Experiments”
  5. The evaluation set contained 200 sentences randomly selected from the development set .
    Page 5, “Experiments”

See all papers in Proc. ACL 2014 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

NIST

Appears in 4 sentences as: NIST (4)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. We present a set of dependency-based pre-ordering rules which improved the BLEU score by 1.61 on the NIST 2006 evaluation data.
    Page 1, “Abstract”
  2. Experiment results showed that our pre-ordering rule set improved the BLEU score on the NIST 2006 evaluation data by 1.61.
    Page 1, “Introduction”
  3. Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.
    Page 4, “Experiments”
  4. Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs.
    Page 4, “Experiments”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

machine translation

Appears in 3 sentences as: machine translation (3)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.
    Page 1, “Abstract”
  2. This is especially important on the point of the system combination of PBSMT systems, because the diversity of outputs from machine translation systems is important for system combination (Cer et al., 2013).
    Page 2, “Introduction”
  3. By using both our rules and Wang et al.’s rules, one can obtain diverse machine translation results because the pre-ordering results of these two rule sets are generally different.
    Page 2, “Introduction”

See all papers in Proc. ACL 2014 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

sentences pairs

Appears in 3 sentences as: sentence pairs (1) sentences pairs (2)
In Dependency-based Pre-ordering for Chinese-English Machine Translation
  1. Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs .
    Page 4, “Experiments”
  2. Our test set was the NIST 2006 MT evaluation data, consisting of 1664 sentence pairs .
    Page 4, “Experiments”
  3. Moreover, our dependency-based pre-ordering rule set substantially decreased the time for applying pre-ordering rules about 60% compared with WR07, on the training set of 1M sentences pairs .
    Page 5, “Conclusion”

See all papers in Proc. ACL 2014 that mention sentences pairs.

See all papers in Proc. ACL that mention sentences pairs.

Back to top.