Quadratic-Time Dependency Parsing for Machine Translation
Galley, Michel and Manning, Christopher D.

Article Structure

Abstract

Efficiency is a prime concern in syntactic MT decoding, yet significant developments in statistical parsing with respect to asymptotic efficiency haven’t yet been explored in MT.

Introduction

Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).

Dependency parsing for machine translation

In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding.

Dependency parsing experiments

In this section, we compare the performance of our parsing model to the ones of McDonald et al.

Machine translation experiments

In our experiments, we use a re-implementation of the Moses phrase-based decoder (Koehn et al., 2007).

Related work

Perhaps due to the high computational cost of synchronous CFG decoding, there have been various attempts to exploit syntactic knowledge and hierarchical structure in other machine translation experiments that do not require chart parsing.

Conclusion and future work

In this paper, we presented a non-projective dependency parser whose time-complexity of 0(n2) improves upon the cubic time implementation of (McDonald et al., 2005b), and does so with little loss in dependency accuracy (25% to 34%).

Topics

phrase-based

Appears in 22 sentences as: phrase-based (22)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores.
    Page 1, “Abstract”
  2. Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
    Page 1, “Abstract”
  3. Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).
    Page 1, “Introduction”
  4. In comparison, phrase-based decoding can run in linear time if a distortion limit is imposed.
    Page 1, “Introduction”
  5. Since exact MT decoding is NP complete (Knight, 1999), there is no exact search algorithm for either phrase-based or syntactic MT that runs in polynomial time (unless P = NP).
    Page 1, “Introduction”
  6. petitive phrase-based systems in large-scale experiments such as NIST evaluations.2 This lack of significant difference may not be completely surprising.
    Page 1, “Introduction”
  7. Indeed, researchers have shown that gigantic language models are key to state-of-the-art performance (Brants et al., 2007), and the ability of phrase-based decoders to handle large-size, high-order language models with no consequence on asymptotic running time during decoding presents a compelling advantage over CKY decoders, whose time complexity grows prohibitively large with higher-order language mod-ds
    Page 1, “Introduction”
  8. 2Results of the 2008 NIST Open MT evaluation (http://www.itl.nist.gov/iad/mig/tests/mt/2008/doc/ mt08_official_results_v0 .html) reveal that, while many of the best systems in the Chinese-English and Arabic-English tasks incorporate synchronous CFG models, score differences with the best phrase-based system were insignificantly small.
    Page 1, “Introduction”
  9. dency structure is built as a byproduct of phrase-based decoding, without reliance on a dynamic-programming or chart parsing algorithm such as CKY or Earley.
    Page 2, “Introduction”
  10. In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding.
    Page 2, “Dependency parsing for machine translation”
  11. We now formalize weighted non-projective dependency parsing similarly to (McDonald et al., 2005b) and then describe a modified and more efficient version that can be integrated into a phrase-based decoder.
    Page 3, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

language model

Appears in 19 sentences as: language model (15) language modeling (1) language models (4)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores.
    Page 1, “Abstract”
  2. Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
    Page 1, “Abstract”
  3. Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).
    Page 1, “Introduction”
  4. Indeed, researchers have shown that gigantic language models are key to state-of-the-art performance (Brants et al., 2007), and the ability of phrase-based decoders to handle large-size, high-order language models with no consequence on asymptotic running time during decoding presents a compelling advantage over CKY decoders, whose time complexity grows prohibitively large with higher-order language mod-ds
    Page 1, “Introduction”
  5. Most interestingly, the time complexity of non-projective dependency parsing remains quadratic as the order of the language model increases.
    Page 2, “Introduction”
  6. This provides a compelling advantage over previous dependency language models for MT (Shen et al., 2008), which use a 5 - gram LM only during rerank-ing.
    Page 2, “Introduction”
  7. In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
    Page 2, “Introduction”
  8. While it seems that loopy graphs are undesirable when the goal is to obtain a syntactic analysis, that is not necessarily the case when one just needs a language modeling score.
    Page 4, “Dependency parsing for machine translation”
  9. We use the standard features implemented almost exactly as in Moses: four translation features (phrase-based translation probabilities and lexically-weighted probabilities), word penalty, phrase penalty, linear distortion, and language model score.
    Page 6, “Machine translation experiments”
  10. In order to train a competitive baseline given our computational resources, we built a large 5-gram language model using the Xinhua and AFP sections of the Gigaword corpus (LDC2007T40) in addition to the target side of the parallel data.
    Page 6, “Machine translation experiments”
  11. The language model was smoothed with the modified Kneser-Ney algorithm as implemented in (Stolcke, 2002), and we only kept 4-grams and 5-grams that occurred at least three times in the training data.6
    Page 6, “Machine translation experiments”

See all papers in Proc. ACL 2009 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

dependency parsing

Appears in 18 sentences as: dependency parser (2) dependency parsing (18)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be solved in quadratic time relative to the length of the sentence.
    Page 1, “Abstract”
  2. They show that MST parsing is almost as accurate as cubic-time dependency parsing in the case of English, and that it is more accurate with free word order languages.
    Page 1, “Abstract”
  3. While deterministic parsers are often deemed inadequate for dealing with ambiguities of natural language, highly accurate 0(n2) algorithms exist in the case of dependency parsing .
    Page 1, “Introduction”
  4. (2005b) present a quadratic-time dependency parsing algorithm that is just 0.7% less accurate than “full-fledged” chart parsing (which, in the case of dependency parsing , runs in time 0(n3) (Eisner, 1996)).
    Page 1, “Introduction”
  5. Most interestingly, the time complexity of non-projective dependency parsing remains quadratic as the order of the language model increases.
    Page 2, “Introduction”
  6. In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding.
    Page 2, “Dependency parsing for machine translation”
  7. In the case of dependency parsing for Czech, (McDonald et al., 2005b) even outperforms projective parsing, and was one of the top systems in the CoNLL-06 shared task in multilingual dependency parsing .
    Page 3, “Dependency parsing for machine translation”
  8. 2.1 0(n2)-time dependency parsing for MT
    Page 3, “Dependency parsing for machine translation”
  9. We now formalize weighted non-projective dependency parsing similarly to (McDonald et al., 2005b) and then describe a modified and more efficient version that can be integrated into a phrase-based decoder.
    Page 3, “Dependency parsing for machine translation”
  10. whose weight vector A is trained using MIRA (Crammer and Singer, 2003) to optimize dependency parsing accuracy (McDonald et al., 2005a).
    Page 3, “Dependency parsing for machine translation”
  11. Hence, non-projective dependency parsing is solved in quadratic time.
    Page 3, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention dependency parsing.

See all papers in Proc. ACL that mention dependency parsing.

Back to top.

treebank

Appears in 11 sentences as: Treebank (2) treebank (8) Treebanks (1) treebanks (4)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Our training data includes newswire from the English translation treebank (LDC2007T02) and the English-Arabic Treebank (LDC2006T10), which are respectively translations of sections of the Chinese treebank (CTB) and Arabic treebank (ATB).
    Page 5, “Dependency parsing experiments”
  2. We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank .
    Page 5, “Dependency parsing experiments”
  3. Our other test set is the standard Section 23 of the Penn treebank .
    Page 5, “Dependency parsing experiments”
  4. For Parsing, sentences are cased and tokenization abides to the PTB segmentation as used in the Penn treebank version 3.
    Page 5, “Dependency parsing experiments”
  5. For this setting, we also had to harmonize the four treebanks .
    Page 5, “Dependency parsing experiments”
  6. The most crucial modification was to add NP internal bracketing to the WSJ (Vadas and Curran, 2007), since the three other treebanks contain that information.
    Page 5, “Dependency parsing experiments”
  7. Treebanks were also transformed to be consistent with MT tokenization.
    Page 5, “Dependency parsing experiments”
  8. To extract dependencies from treebanks , we used the LTH Penn Converter (ht tp : / / nlp .
    Page 6, “Machine translation experiments”
  9. We constrain the converter not to use functional tags found in the treebanks , in order to make it possible to use automatically parsed texts (i.e., perform self-training) in future work.
    Page 6, “Machine translation experiments”
  10. Chinese words were automatically segmented with a conditional random field (CRF) classifier (Chang et al., 2008) that conforms to the Chinese Treebank (CTB) standard.
    Page 6, “Machine translation experiments”
  11. We used the dependency model trained on the English CTB and ATB treebank , W8], and OntoNotes.
    Page 7, “Machine translation experiments”

See all papers in Proc. ACL 2009 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

LM

Appears in 9 sentences as: LM (9)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. This provides a compelling advantage over previous dependency language models for MT (Shen et al., 2008), which use a 5 - gram LM only during rerank-ing.
    Page 2, “Introduction”
  2. In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
    Page 2, “Introduction”
  3. This LM required 16GB of RAM during training.
    Page 6, “Machine translation experiments”
  4. Regarding the small difference in BLEU scores on MT08, we would like to point out that tuning on MTOS and testing on MT08 had a rather adverse effect with respect to translation length: while the two systems are relatively close in terms of BLEU scores (24.83 and 24.91, respectively), the dependency LM provides a much bigger gain when evaluated with BLEU precision (27.73 vs. 28.79), i.e., by ignoring the brevity penalty.
    Page 7, “Machine translation experiments”
  5. LM newswire web speech all
    Page 7, “Machine translation experiments”
  6. LM newswire web speech all
    Page 7, “Machine translation experiments”
  7. Figure 2: Running time of our phrase-based decoder with and without quadratic-time dependency LM scoring.
    Page 7, “Machine translation experiments”
  8. In the case of English translations of 40 words and shorter, the baseline system took 6.5 seconds per sentence, whereas the dependency LM system spent 15.6 seconds per sentence, i.e., 2.4 times the baseline running time.
    Page 7, “Machine translation experiments”
  9. In the latter paper, Huang and Chiang introduce rescoring methods named “cube pruning” and “cube growing”, which first use a baseline decoder (either synchronous CFG or a phrase-based system) and no LM to generate a hypergraph, and then rescoring this hypergraph with a language model.
    Page 8, “Related work”

See all papers in Proc. ACL 2009 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

BLEU

Appears in 9 sentences as: BLEU (11)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
    Page 1, “Abstract”
  2. In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
    Page 2, “Introduction”
  3. We found that dependency scores with or without loop elimination are generally close and highly correlated, and that MT performance without final loop removal was about the same (generally less than 0.2% BLEU ).
    Page 4, “Dependency parsing for machine translation”
  4. Parameter tuning was done with minimum error rate training (Och, 2003), which was used to maximize BLEU (Papineni et al., 2001).
    Page 6, “Machine translation experiments”
  5. In the final evaluations, we report results using both TER (Snover et al., 2006) and the original BLEU metric as described in (Papineni et al., 2001).
    Page 6, “Machine translation experiments”
  6. For BLEU evaluations, differences are significant in four out of six cases, and in the case of TER, all differences are significant.
    Page 7, “Machine translation experiments”
  7. Regarding the small difference in BLEU scores on MT08, we would like to point out that tuning on MTOS and testing on MT08 had a rather adverse effect with respect to translation length: while the two systems are relatively close in terms of BLEU scores (24.83 and 24.91, respectively), the dependency LM provides a much bigger gain when evaluated with BLEU precision (27.73 vs. 28.79), i.e., by ignoring the brevity penalty.
    Page 7, “Machine translation experiments”
  8. BLEU [%]
    Page 7, “Machine translation experiments”
  9. We use dependency scores as an extra feature in our MT experiments, and found that our dependency model provides significant gains over a competitive baseline that incorporates a large 5-gram language model (0.92% TER and 0.45% BLEU absolute improvements).
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2009 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

TER

Appears in 7 sentences as: TER (7)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
    Page 1, “Abstract”
  2. In our experiments, we build a competitive baseline (Koehn et al., 2007) incorporating a 5-gram LM trained on a large part of Gigaword and show that our dependency language model provides improvements on five different test sets, with an overall gain of 0.92 in TER and 0.45 in BLEU scores.
    Page 2, “Introduction”
  3. In the final evaluations, we report results using both TER (Snover et al., 2006) and the original BLEU metric as described in (Papineni et al., 2001).
    Page 6, “Machine translation experiments”
  4. For BLEU evaluations, differences are significant in four out of six cases, and in the case of TER , all differences are significant.
    Page 7, “Machine translation experiments”
  5. On the other hand, the difference on MT08 is significant in terms of TER .
    Page 7, “Machine translation experiments”
  6. TER [ %]
    Page 7, “Machine translation experiments”
  7. We use dependency scores as an extra feature in our MT experiments, and found that our dependency model provides significant gains over a competitive baseline that incorporates a large 5-gram language model (0.92% TER and 0.45% BLEU absolute improvements).
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2009 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

machine translation

Appears in 7 sentences as: machine translation (8)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).
    Page 1, “Introduction”
  2. may sometimes appear too computa-tionally expensive for high-end statistical machine translation, there are many alternative parsing algorithms that have seldom been explored in the machine translation literature.
    Page 1, “Introduction”
  3. In this paper, we show how to exploit syntactic dependency structure for better machine translation , under the constraint that the depen-
    Page 1, “Introduction”
  4. (2005b) for machine translation , we incrementally build dependency structure left-to-right in time 0(n2) during decoding.
    Page 2, “Introduction”
  5. Dependency models have recently gained considerable interest in many NLP applications, including machine translation (Ding and Palmer, 2005; Quirk et al., 2005; Shen et al., 2008).
    Page 2, “Dependency parsing for machine translation”
  6. For the MT setting, texts are all lower case, and tokenization was changed to improve machine translation (e. g., most hyphenated words were split).
    Page 5, “Dependency parsing experiments”
  7. Perhaps due to the high computational cost of synchronous CFG decoding, there have been various attempts to exploit syntactic knowledge and hierarchical structure in other machine translation experiments that do not require chart parsing.
    Page 8, “Related work”

See all papers in Proc. ACL 2009 that mention machine translation.

See all papers in Proc. ACL that mention machine translation.

Back to top.

word order

Appears in 6 sentences as: word order (6)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. They show that MST parsing is almost as accurate as cubic-time dependency parsing in the case of English, and that it is more accurate with free word order languages.
    Page 1, “Abstract”
  2. Finally, dependency models are more flexible and account for (non-projective) head-modifier relations that CFG models fail to represent adequately, which is problematic with certain types of grammatical constructions and with free word order languages,
    Page 2, “Dependency parsing for machine translation”
  3. It is also linguistically desirable in the case of free word order languages such as Czech, Dutch, and German.
    Page 2, “Dependency parsing for machine translation”
  4. with relatively rigid word order such as English, there may be some concern that searching the space of non-projective dependency trees, which is considerably larger than the space of projective dependency trees, would yield poor performance.
    Page 3, “Dependency parsing for machine translation”
  5. (2007) sidestep the need to operate large-scale word order changes during decoding (and thus lessening the need for syntactic decoding) by rearranging input words in the training data to match the syntactic structure of the target language.
    Page 8, “Related work”
  6. It would also be interesting to apply these models to target languages that have free word order , which would presumably benefit more from the flexibility of non-projective dependency models.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2009 that mention word order.

See all papers in Proc. ACL that mention word order.

Back to top.

spanning tree

Appears in 6 sentences as: spanning tree (7)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be solved in quadratic time relative to the length of the sentence.
    Page 1, “Abstract”
  2. In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding.
    Page 2, “Dependency parsing for machine translation”
  3. This optimization problem is a known instance of the maximum spanning tree (MST) problem.
    Page 3, “Dependency parsing for machine translation”
  4. Finding the spanning tree y C E rooted at x0 that maximizes s(x,y) as defined in Equation 2 has a straightforward solution in 0(nzlog time for dense graphs such as G, though Tarjan (1977) shows that the problem can be solved in 0(n2).
    Page 3, “Dependency parsing for machine translation”
  5. Once all loops have been eliminated, the algorithm maps back the maximum spanning tree of the contracted graph onto the original graph G, and it can be shown that this yields a spanning tree that is optimal with respect to G and s (Georgiadis, 2003).
    Page 3, “Dependency parsing for machine translation”
  6. finds the maximum spanning tree .
    Page 4, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.

dependency trees

Appears in 5 sentences as: dependency tree (1) dependency trees (5)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. The parsing literature presents faster alternatives for both phrase-structure and dependency trees , e.g., 0(n) shift-reduce parsers and variants ((Ratnaparkhi, 1997; Nivre, 2003), inter alia).
    Page 1, “Introduction”
  2. Second, dependency trees contain exactly one node per word, which contributes to cutting down the search space during parsing: indeed, the task of the parser is merely to connect existing nodes rather than hypothesizing new ones.
    Page 2, “Dependency parsing for machine translation”
  3. Figure 1: A dependency tree with directed edges going from heads to modifiers.
    Page 2, “Dependency parsing for machine translation”
  4. Their algorithm exploits the special properties of dependency trees to reduce the worst-case complexity of bilexical parsing, which otherwise requires 0(n4) for bilexical constituency-based parsing.
    Page 2, “Dependency parsing for machine translation”
  5. with relatively rigid word order such as English, there may be some concern that searching the space of non-projective dependency trees, which is considerably larger than the space of projective dependency trees , would yield poor performance.
    Page 3, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

maximum spanning

Appears in 5 sentences as: maximum spanning (5)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. (2005b) formalized dependency parsing as a maximum spanning tree (MST) problem, which can be solved in quadratic time relative to the length of the sentence.
    Page 1, “Abstract”
  2. In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding.
    Page 2, “Dependency parsing for machine translation”
  3. This optimization problem is a known instance of the maximum spanning tree (MST) problem.
    Page 3, “Dependency parsing for machine translation”
  4. Once all loops have been eliminated, the algorithm maps back the maximum spanning tree of the contracted graph onto the original graph G, and it can be shown that this yields a spanning tree that is optimal with respect to G and s (Georgiadis, 2003).
    Page 3, “Dependency parsing for machine translation”
  5. finds the maximum spanning tree.
    Page 4, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention maximum spanning.

See all papers in Proc. ACL that mention maximum spanning.

Back to top.

NIST

Appears in 5 sentences as: NIST (5)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
    Page 1, “Abstract”
  2. petitive phrase-based systems in large-scale experiments such as NIST evaluations.2 This lack of significant difference may not be completely surprising.
    Page 1, “Introduction”
  3. 2Results of the 2008 NIST Open MT evaluation (http://www.itl.nist.gov/iad/mig/tests/mt/2008/doc/ mt08_official_results_v0 .html) reveal that, while many of the best systems in the Chinese-English and Arabic-English tasks incorporate synchronous CFG models, score differences with the best phrase-based system were insignificantly small.
    Page 1, “Introduction”
  4. For tuning and testing, we use the official NIST MT evaluation data for Chinese from 2002 to 2008 (MT02 to MT08), which all have four English references for each input sentence.
    Page 6, “Machine translation experiments”
  5. Table 6 provides experimental results on the NIST test data (excluding the tuning set MTOS) for each of the three genres: newswire, web data, and speech (broadcast news and conversation).
    Page 7, “Machine translation experiments”

See all papers in Proc. ACL 2009 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

POS tags

Appears in 5 sentences as: POS tag (1) POS tags (4)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. o a predicted POS tag tj; o a dependency score sj.
    Page 3, “Dependency parsing for machine translation”
  2. We write h-word, h-pos, m-word, m-pos to refer to head and modifier words and POS tags , and append a numerical value to shift the word offset either to the left or to the right (e.g., h-pos+1 is the POS to the right of the head word).
    Page 4, “Dependency parsing for machine translation”
  3. It is quite similar to the McDonald (2005a) feature set, except that it does not include the set of all POS tags that appear between each candidate head-modifier pair (i , j).
    Page 4, “Dependency parsing for machine translation”
  4. First, we restrict extraction of in-between POS tags to those words that appear within a window of five words relative to either the head or the modifier.
    Page 4, “Dependency parsing for machine translation”
  5. The only features that are not cached are the ones that include contextual POS tags , since their miss rate is relatively high.
    Page 5, “Dependency parsing experiments”

See all papers in Proc. ACL 2009 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

model score

Appears in 4 sentences as: model score (2) model scores (1) modeling score (1)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores .
    Page 1, “Abstract”
  2. While it seems that loopy graphs are undesirable when the goal is to obtain a syntactic analysis, that is not necessarily the case when one just needs a language modeling score .
    Page 4, “Dependency parsing for machine translation”
  3. We use the standard features implemented almost exactly as in Moses: four translation features (phrase-based translation probabilities and lexically-weighted probabilities), word penalty, phrase penalty, linear distortion, and language model score .
    Page 6, “Machine translation experiments”
  4. model score computed with the dependency parsing algorithm described in Section 2.
    Page 7, “Machine translation experiments”

See all papers in Proc. ACL 2009 that mention model score.

See all papers in Proc. ACL that mention model score.

Back to top.

parsing algorithm

Appears in 4 sentences as: parsing algorithm (3) parsing algorithms (1)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. may sometimes appear too computa-tionally expensive for high-end statistical machine translation, there are many alternative parsing algorithms that have seldom been explored in the machine translation literature.
    Page 1, “Introduction”
  2. (2005b) present a quadratic-time dependency parsing algorithm that is just 0.7% less accurate than “full-fledged” chart parsing (which, in the case of dependency parsing, runs in time 0(n3) (Eisner, 1996)).
    Page 1, “Introduction”
  3. dency structure is built as a byproduct of phrase-based decoding, without reliance on a dynamic-programming or chart parsing algorithm such as CKY or Earley.
    Page 2, “Introduction”
  4. model score computed with the dependency parsing algorithm described in Section 2.
    Page 7, “Machine translation experiments”

See all papers in Proc. ACL 2009 that mention parsing algorithm.

See all papers in Proc. ACL that mention parsing algorithm.

Back to top.

Penn treebank

Appears in 3 sentences as: Penn treebank (3)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. We also trained the parser on the broadcast-news treebank available in the OntoNotes corpus (LDC2008T04), and added sections 02-21 of the WSJ Penn treebank .
    Page 5, “Dependency parsing experiments”
  2. Our other test set is the standard Section 23 of the Penn treebank .
    Page 5, “Dependency parsing experiments”
  3. For Parsing, sentences are cased and tokenization abides to the PTB segmentation as used in the Penn treebank version 3.
    Page 5, “Dependency parsing experiments”

See all papers in Proc. ACL 2009 that mention Penn treebank.

See all papers in Proc. ACL that mention Penn treebank.

Back to top.

highest scoring

Appears in 3 sentences as: highest score (1) highest scoring (2)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. When there is no need to ensure projectivity, one can independently select the highest scoring edge (i, j) for each modifier x j, yet we generally want to ensure that the resulting structure is a tree, i.e., that it does not contain any circular dependencies.
    Page 3, “Dependency parsing for machine translation”
  2. The main idea behind the CLE algorithm is to first greedily select for each word x j the incoming edge (i, j) with highest score , then to successively repeat the following two steps: (a) identify a loop in the graph, and if there is none, halt; (b) contract the loop into a single vertex, and update scores for edges coming in and out of the loop.
    Page 3, “Dependency parsing for machine translation”
  3. The greedy approach of selecting the highest scoring edge (i, j) for each modifier xj can easily be applied left-to-right during phrase-based decoding, which proceeds in the same order.
    Page 3, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention highest scoring.

See all papers in Proc. ACL that mention highest scoring.

Back to top.

feature sets

Appears in 3 sentences as: feature set (1) feature sets (2)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. The three feature sets that were used in our experiments are shown in Table 2.
    Page 4, “Dependency parsing for machine translation”
  2. It is quite similar to the McDonald (2005a) feature set , except that it does not include the set of all POS tags that appear between each candidate head-modifier pair (i , j).
    Page 4, “Dependency parsing for machine translation”
  3. The primary difference between our feature sets and the ones of McDonald et a1.
    Page 4, “Dependency parsing for machine translation”

See all papers in Proc. ACL 2009 that mention feature sets.

See all papers in Proc. ACL that mention feature sets.

Back to top.

Chinese words

Appears in 3 sentences as: Chinese words (3)
In Quadratic-Time Dependency Parsing for Machine Translation
  1. Chinese words drawn from various news parallel corpora distributed by the Linguistic Data Consortium (LDC).
    Page 6, “Machine translation experiments”
  2. Chinese words were automatically segmented with a conditional random field (CRF) classifier (Chang et al., 2008) that conforms to the Chinese Treebank (CTB) standard.
    Page 6, “Machine translation experiments”
  3. Since parsing of the source is relatively inexpensive compared to the target side, it would be relatively easy to condition head-modifier dependencies not only on the two target words, but also on their corresponding Chinese words and their relative positions in the Chinese tree.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2009 that mention Chinese words.

See all papers in Proc. ACL that mention Chinese words.

Back to top.