A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Wu, Xianchao and Sudoh, Katsuhito and Duh, Kevin and Tsukada, Hajime and Nagata, Masaaki

Article Structure

Abstract

This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers.

Introduction

Target language side dependency structures have been successfully used in statistical machine translation (SMT) by Shen et a1.

Gaining Dependency Structures

2.1 Dependency tree

Experiments

3.1 Setup

Conclusion

We have constructed a string-to-dependency translation platform for comparing non-isomorphic target dependency structures.

Topics

dependency trees

Appears in 17 sentences as: Dependency tree (1) dependency tree (7) dependency trees (9)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. For example, using the constituent-to-dependency conversion approach proposed by J ohansson and Nugues (2007), we can easily yield dependency trees from PCFG style trees.
    Page 1, “Introduction”
  2. 2.1 Dependency tree
    Page 1, “Gaining Dependency Structures”
  3. We follow the definition of dependency graph and dependency tree as given in (McDonald and Nivre, 2011).
    Page 1, “Gaining Dependency Structures”
  4. A dependency graph G for sentence 3 is called a dependency tree when it satisfies, (1) the nodes cover all the words in s besides the ROOT; (2) one node can have one and only one head (word) with a determined syntactic role; and (3) the ROOT of the graph is reachable from all other nodes.
    Page 1, “Gaining Dependency Structures”
  5. In order to generate word-level dependency trees from the PCFG tree, we use the LTH constituent-to-dependency conversion tool3 written by J ohansson and Nugues (2007).
    Page 2, “Gaining Dependency Structures”
  6. We thus need an algorithm to transform these phrasal predicate-argument dependencies into a word-to-word dependency tree .
    Page 2, “Gaining Dependency Structures”
  7. Our algorithm (refer to Figure l for an example) for changing PASs into word-based dependency trees is as follows:
    Page 2, “Gaining Dependency Structures”
  8. 3. checking, i.e., post modifying the dependency graph according to the definition of dependency tree (Section 2.1).
    Page 2, “Gaining Dependency Structures”
  9. We use “PAS+syn” to represent the dependency trees generated from the HPSG PASs guided by the syntactic heads.
    Page 2, “Gaining Dependency Structures”
  10. We need to post modify the dependency graph after applying the mapping, since it is not guaranteed to be a dependency tree .
    Page 2, “Gaining Dependency Structures”
  11. Referring to the definition of dependency tree (Section 2.1), we need the strategy for (l) selecting only one head from multiple
    Page 2, “Gaining Dependency Structures”

See all papers in Proc. ACL 2012 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

CCG

Appears in 10 sentences as: CCG (12)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASS) which are generated by an HPSG parser and a CCG parser.
    Page 1, “Abstract”
  2. A semantic dependency representation of a whole sentence, predicate-argument structures (PASS), are also included in the output trees of (1) a state-of-the-art head-driven phrase structure grammar (HPSG) (Pollard and Sag, 1994; Sag et al., 2003) parser, Enju1 (Miyao and Tsujii, 2008) and (2) a state-of-the-art CCG parser2 (Clark and Curran, 2007).
    Page 1, “Introduction”
  3. 2.5 CCG parsing
    Page 3, “Gaining Dependency Structures”
  4. We also use the predicate-argument dependencies generated by the CCG parser developed by Clark and Curran (2007).
    Page 3, “Gaining Dependency Structures”
  5. The algorithm for generating word-level dependency tree is easier than processing the PASS included in the HPSG trees, since the word level predicate-argument relations have already been included in the output of CCG parser.
    Page 3, “Gaining Dependency Structures”
  6. The postprocessing is like that described for HPSG parsing, except we greedily use the MST’s sentence root when we can not determine it based on the CCG parser’s PASs.
    Page 3, “Gaining Dependency Structures”
  7. In terms of root word comparison, we observe that MST and CCG share 87.3% of identical root words, caused by borrowing roots from MST to CCG .
    Page 3, “Experiments”
  8. Malt Berkeley PAS PAS CCG +syn +sem MST 70.5 62.5 69.2 53.3 87.3 (77.3) (64.6) (58.5) (5 8.
    Page 4, “Experiments”
  9. For example, CCG is worse than Malt in terms of P/R yet with a higher BLEU score.
    Page 4, “Experiments”
  10. From the table, we have the following observations: (1) all the dependency structures (except Malt) achieved a significant better BLEU score than the phrasal Moses; (2) PAS+syn performed the best in the test set (0.3376), and it is significantly better than phrasal/hierarchical Moses (p < 0.01), MST (p < 0.05), Malt (p < 0.01), Berkeley (p < 0.05), and CCG (p < 0.05); and (3) CCG performed as well as MST and Berkeley.
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention CCG.

See all papers in Proc. ACL that mention CCG.

Back to top.

BLEU

Appears in 5 sentences as: BLEU (5)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. training data and not necessarily exactly follow the tendency of the final BLEU scores.
    Page 4, “Experiments”
  2. For example, CCG is worse than Malt in terms of P/R yet with a higher BLEU score.
    Page 4, “Experiments”
  3. Also, PAS+sem has a lower P/R than Berkeley, yet their final BLEU scores are not statistically different.
    Page 4, “Experiments”
  4. Table 3 also shows the BLEU scores, the number of flat phrases and hierarchical rules (both integrated with target dependency structures), and the number of illegal dependency trees generated by each parser.
    Page 4, “Experiments”
  5. From the table, we have the following observations: (1) all the dependency structures (except Malt) achieved a significant better BLEU score than the phrasal Moses; (2) PAS+syn performed the best in the test set (0.3376), and it is significantly better than phrasal/hierarchical Moses (p < 0.01), MST (p < 0.05), Malt (p < 0.01), Berkeley (p < 0.05), and CCG (p < 0.05); and (3) CCG performed as well as MST and Berkeley.
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

BLEU scores

Appears in 5 sentences as: BLEU score (2) BLEU scores (3)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. training data and not necessarily exactly follow the tendency of the final BLEU scores .
    Page 4, “Experiments”
  2. For example, CCG is worse than Malt in terms of P/R yet with a higher BLEU score .
    Page 4, “Experiments”
  3. Also, PAS+sem has a lower P/R than Berkeley, yet their final BLEU scores are not statistically different.
    Page 4, “Experiments”
  4. Table 3 also shows the BLEU scores , the number of flat phrases and hierarchical rules (both integrated with target dependency structures), and the number of illegal dependency trees generated by each parser.
    Page 4, “Experiments”
  5. From the table, we have the following observations: (1) all the dependency structures (except Malt) achieved a significant better BLEU score than the phrasal Moses; (2) PAS+syn performed the best in the test set (0.3376), and it is significantly better than phrasal/hierarchical Moses (p < 0.01), MST (p < 0.05), Malt (p < 0.01), Berkeley (p < 0.05), and CCG (p < 0.05); and (3) CCG performed as well as MST and Berkeley.
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention BLEU scores.

See all papers in Proc. ACL that mention BLEU scores.

Back to top.

word-level

Appears in 5 sentences as: =word-level (1) word-level (4)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. Arrows in red (upper): PASS, orange (bottom) =word-level dependencies generated from PASs, blue=newly appended dependencies.
    Page 2, “Gaining Dependency Structures”
  2. In order to generate word-level dependency trees from the PCFG tree, we use the LTH constituent-to-dependency conversion tool3 written by J ohansson and Nugues (2007).
    Page 2, “Gaining Dependency Structures”
  3. Table 1 lists the mapping from HPSG’s PAS types to word-level dependency arcs.
    Page 2, “Gaining Dependency Structures”
  4. In Table 1, this PAS type corresponds to arg2—>pred—>argl, then the result word-level dependency is t6(is)—>t0(when)—>th(is).
    Page 2, “Gaining Dependency Structures”
  5. The algorithm for generating word-level dependency tree is easier than processing the PASS included in the HPSG trees, since the word level predicate-argument relations have already been included in the output of CCG parser.
    Page 3, “Gaining Dependency Structures”

See all papers in Proc. ACL 2012 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.

dependency parsers

Appears in 4 sentences as: dependency parsers (2) Dependency parsing (1) dependency parsing (1)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. Besides using traditional dependency parsers , we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASS) which are generated by an HPSG parser and a CCG parser.
    Page 1, “Abstract”
  2. 2.2 Dependency parsing
    Page 2, “Gaining Dependency Structures”
  3. Graph-based and transition-based are two predominant paradigms for data-driven dependency parsing .
    Page 2, “Gaining Dependency Structures”
  4. These results lead us to argue that the robustness of deep syntactic parsers can be advantageous in SMT compared with traditional dependency parsers .
    Page 4, “Experiments”

See all papers in Proc. ACL 2012 that mention dependency parsers.

See all papers in Proc. ACL that mention dependency parsers.

Back to top.

dependency relations

Appears in 3 sentences as: Dependency Relation (1) dependency relations (2)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. Dependency Relation
    Page 3, “Gaining Dependency Structures”
  2. Table 1: Mapping from HPS G’s PAS types to dependency relations .
    Page 3, “Gaining Dependency Structures”
  3. heads and (2) appending dependency relations for those words/punctuation that do not have any head.
    Page 3, “Gaining Dependency Structures”

See all papers in Proc. ACL 2012 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

NIST

Appears in 3 sentences as: NIST (3)
In A Comparative Study of Target Dependency Structures for Statistical Machine Translation
  1. (2008) and achieved state-of-the-art results as reported in the NIST 2008 Open MT Evaluation workshop and the NTCIR-9 Chinese-to-English patent translation task (Goto et al., 2011; Ma and Matsoukas, 2011).
    Page 1, “Introduction”
  2. For Chinese-to-English translation, we use the parallel data from NIST Open Machine Translation Evaluation tasks.
    Page 3, “Experiments”
  3. The NIST 2003 and 2005 test data are respectively taken as the development and test set.
    Page 3, “Experiments”

See all papers in Proc. ACL 2012 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.