A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
Liu, Yang

Article Structure

Abstract

We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.

Introduction

Modern statistical machine translation approaches can be roughly divided into two broad categories: phrase-based and syntax-based.

Topics

phrase-based

Appears in 32 sentences as: Phrase-based (2) phrase-based (31)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 1, “Abstract”
  2. As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
    Page 1, “Abstract”
  3. Modern statistical machine translation approaches can be roughly divided into two broad categories: phrase-based and syntax-based.
    Page 1, “Introduction”
  4. Phrase-based approaches treat phrase, which is usually a sequence of consecutive words, as the basic unit of translation (Koehn et al., 2003; Och and Ney, 2004).
    Page 1, “Introduction”
  5. As phrases are capable of memorizing local context, phrase-based approaches excel at handling local word selection and reordering.
    Page 1, “Introduction”
  6. In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
    Page 1, “Introduction”
  7. As a result, phrase-based decoders only need to maintain the boundary words on one end to calculate language model probabilities.
    Page 1, “Introduction”
  8. However, as phrase-based decoding usually casts translation as a string concatenation problem and permits arbitrary permutation, it proves to be NP-complete (Knight, 1999).
    Page 1, “Introduction”
  9. While some authors try to integrate syntax into phrase-based decoding (Galley and Manning, 2008; Galley and Manning, 2009; Feng et al., 2010), others develop incremental algorithms for syntax-based models (Watanabe et al., 2006; Huang and Mi, 2010; Dyer and Resnik, 2010; Feng et al., 2012).
    Page 1, “Introduction”
  10. While parsing algorithms can be used to parse partial translations in phrase-based decoding, the search space is significantly enlarged since there are exponentially many parse trees for exponentially many translations.
    Page 1, “Introduction”
  11. In this paper, we propose a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

shift-reduce

Appears in 27 sentences as: Shift-Reduce (2) Shift-reduce (2) shift-reduce (23)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 1, “Abstract”
  2. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data.
    Page 1, “Abstract”
  3. In this paper, we propose a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 2, “Introduction”
  4. 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering.
    Page 2, “Introduction”
  5. 2 Shift-Reduce Parsing for Phrase-based String-to-Dependency Translation
    Page 2, “Introduction”
  6. To integrate the advantages of phrase-based and string-to-dependency models, we propose a shift-reduce algorithm for phrase-based string-t0-dependency translation.
    Page 3, “Introduction”
  7. Note that unlike monolingual shift-reduce parsers (Nivre, 2004; Zhang and Clark, 2008; Huang et al., 2009), our algorithm does not maintain a queue for remaining words of the input because the future dependency structure to be shifted is unknown in advance in the translation scenario.
    Page 3, “Introduction”
  8. Table l: Conflicts in shift-reduce parsing.
    Page 4, “Introduction”
  9. 3 A Maximum Entropy Based Shift-Reduce Parsing Model
    Page 4, “Introduction”
  10. Shift-reduce parsing is efficient but suffers from parsing errors caused by syntactic ambiguity.
    Page 4, “Introduction”
  11. This is often referred to as conflict in the shift-reduce dependency parsing literature (Huang et al., 2009).
    Page 4, “Introduction”

See all papers in Proc. ACL 2013 that mention shift-reduce.

See all papers in Proc. ACL that mention shift-reduce.

Back to top.

language models

Appears in 15 sentences as: language model (8) language models (9)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models .
    Page 1, “Abstract”
  2. In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
    Page 1, “Introduction”
  3. As a result, phrase-based decoders only need to maintain the boundary words on one end to calculate language model probabilities.
    Page 1, “Introduction”
  4. Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
    Page 1, “Introduction”
  5. 3. efficient integration of n-gram language model: as translation grows left-to-right in our algorithm, integrating n-gram language models is straightforward.
    Page 2, “Introduction”
  6. 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering.
    Page 2, “Introduction”
  7. Therefore, with dependency structures present in the stacks, it is possible to use dependency language models to encourage linguistically plausible phrase reordering.
    Page 4, “Introduction”
  8. 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model ; 10. maximum entropy parsing model.
    Page 6, “Introduction”
  9. As the stack of a state keeps changing during the decoding process, the context information needed to calculate dependency language model and maximum entropy model probabilities (e. g., root word, leftmost child, etc.)
    Page 6, “Introduction”
  10. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  11. 4-gram language model on the Xinhua portion of the GIGAWORD coprus, which contians 238M English words.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

BLEU

Appears in 11 sentences as: BLEU (11)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Experiments show that our approach significantly outperforms both phrase-based (Koehn et al., 2007) and string-t0-dependency approaches (Shen et al., 2008) in terms of BLEU and TER.
    Page 2, “Introduction”
  2. | features | BLEU | TER |
    Page 7, “Introduction”
  3. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  4. We evaluated translation quality using uncased BLEU (Papineni et al., 2002) and TER (Snover et al., 2006).
    Page 7, “Introduction”
  5. The features were optimized with respect to BLEU using the minimum error rate training algorithm (Och, 2003).
    Page 7, “Introduction”
  6. | rules | coverage | BLEU | TER | | well-formed | 44.87 | 34.42 | 57.35 | | all | 100.00 | 35.71** | 55.87** |
    Page 7, “Introduction”
  7. BLEU and TER scores are calculated on the development set.
    Page 7, “Introduction”
  8. Table 2 gives the performance of Moses, the bottom-up string-to-dependency system, and our system in terms of uncased BLEU and TER scores.
    Page 7, “Introduction”
  9. BLEU
    Page 8, “Introduction”
  10. The gains in TER are much larger than BLEU because dependency language models do not model n-grams directly.
    Page 8, “Introduction”
  11. The BLEU score is slightly lower than Moses with the same configuration.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention BLEU.

See all papers in Proc. ACL that mention BLEU.

Back to top.

dependency trees

Appears in 11 sentences as: dependency tree (3) dependency trees (8)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models.
    Page 1, “Abstract”
  2. 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering.
    Page 2, “Introduction”
  3. 5. resolving local parsing ambiguity: as dependency trees for phrases are memorized in rules, our approach avoids resolving local parsing ambiguity and explores in a smaller search space than parsing word-by-word on the fly in decoding (Galley and Manning, 2009).
    Page 2, “Introduction”
  4. Figure 1 shows a training example consisting of a (romanized) Chinese sentence, an English dependency tree , and the word alignment between them.
    Page 2, “Introduction”
  5. The decoding process terminates when all source words are covered and there is a complete dependency tree in the stack.
    Page 3, “Introduction”
  6. Figure 3 shows two (partial) derivations for a dependency tree .
    Page 4, “Introduction”
  7. We used the Stanford parser (Klein and Manning, 2003) to get dependency trees for English sentences.
    Page 6, “Introduction”
  8. 3Note that the first pass does not work like a phrase-based decoder because it yields dependency trees on the target side.
    Page 6, “Introduction”
  9. A 3-gram dependency language model was trained on the English dependency trees .
    Page 7, “Introduction”
  10. More importantly, in our work dependency trees are memorized for phrases rather than being generated word by word on the fly in decoding.
    Page 8, “Introduction”
  11. In addition, their algorithm produces phrasal dependency parse trees while the leaves of our dependency trees are words, making dependency language models can be directly used.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention dependency trees.

See all papers in Proc. ACL that mention dependency trees.

Back to top.

maximum entropy

Appears in 10 sentences as: Maximum Entropy (1) maximum entropy (9)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data.
    Page 1, “Abstract”
  2. 3 A Maximum Entropy Based Shift-Reduce Parsing Model
    Page 4, “Introduction”
  3. We propose a maximum entropy model to resolve the conflicts for “h+h”: 2
    Page 5, “Introduction”
  4. 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model; 10. maximum entropy parsing model.
    Page 6, “Introduction”
  5. As the stack of a state keeps changing during the decoding process, the context information needed to calculate dependency language model and maximum entropy model probabilities (e. g., root word, leftmost child, etc.)
    Page 6, “Introduction”
  6. Table 3: Contribution of maximum entropy shift-reduce parsing model.
    Page 7, “Introduction”
  7. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  8. For our system, we used the Le Zhang’s maximum entropy modeling toolkit to train the shift-reduce parsing model after extracting 32.6M events from the training data.
    Page 7, “Introduction”
  9. We find that adding dependency language and maximum entropy shift-reduce models consistently brings significant improvements, both separately and jointly.
    Page 8, “Introduction”
  10. In the future, we plan to include more contextual information (e.g., the uncovered source phrases) in the maximum entropy model to resolve conflicts.
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

TER

Appears in 8 sentences as: TER (8)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Experiments show that our approach significantly outperforms both phrase-based (Koehn et al., 2007) and string-t0-dependency approaches (Shen et al., 2008) in terms of BLEU and TER .
    Page 2, “Introduction”
  2. | features | BLEU | TER |
    Page 7, “Introduction”
  3. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  4. We evaluated translation quality using uncased BLEU (Papineni et al., 2002) and TER (Snover et al., 2006).
    Page 7, “Introduction”
  5. | rules | coverage | BLEU | TER | | well-formed | 44.87 | 34.42 | 57.35 | | all | 100.00 | 35.71** | 55.87** |
    Page 7, “Introduction”
  6. BLEU and TER scores are calculated on the development set.
    Page 7, “Introduction”
  7. Table 2 gives the performance of Moses, the bottom-up string-to-dependency system, and our system in terms of uncased BLEU and TER scores.
    Page 7, “Introduction”
  8. The gains in TER are much larger than BLEU because dependency language models do not model n-grams directly.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention TER.

See all papers in Proc. ACL that mention TER.

Back to top.

phrase pairs

Appears in 7 sentences as: phrase pair (2) phrase pairs (5)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Moreover, syntax-based approaches often suffer from the rule coverage problem since syntactic constraints rule out a large portion of non-syntactic phrase pairs , which might help decoders generalize well to unseen data (Marcu et al., 2006).
    Page 1, “Introduction”
  2. The basic unit of translation in our model is string-to-dependency phrase pair , which consists of a phrase on the source side and a dependency structure on the target side.
    Page 2, “Introduction”
  3. The algorithm generates well-formed dependency structures for partial translations left-to-right using string-to-dependency phrase pairs .
    Page 2, “Introduction”
  4. 2. full rule coverage: all phrase pairs , both syntactic and non-syntactic, can be used in our algorithm.
    Page 2, “Introduction”
  5. To control the grammar size, we only extracted “tight” initial phrase pairs (i.e., the boundary words of a phrase must be aligned) as suggested by (Chiang, 2007).
    Page 7, “Introduction”
  6. The major difference is that our hypergraph is not a phrasal lattice because each phrase pair is associated with a dependency structure on the target side.
    Page 8, “Introduction”
  7. The algorithm generates dependency structures incrementally using string-to-dependency phrase pairs .
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention phrase pairs.

See all papers in Proc. ACL that mention phrase pairs.

Back to top.

parsing algorithm

Appears in 6 sentences as: parsing algorithm (5) parsing algorithms (1)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 1, “Abstract”
  2. While parsing algorithms can be used to parse partial translations in phrase-based decoding, the search space is significantly enlarged since there are exponentially many parse trees for exponentially many translations.
    Page 1, “Introduction”
  3. In this paper, we propose a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 2, “Introduction”
  4. 4. exploiting syntactic information: as the shift-reduce parsing algorithm generates target language dependency trees in decoding, dependency language models (Shen et al., 2008; Shen et al., 2010) can be used to encourage linguistically-motivated reordering.
    Page 2, “Introduction”
  5. Our decoding algorithm is similar to Gimpel and Smith (2011)’s lattice parsing algorithm as we divide decoding into two steps: hypergraph generation and hypergraph rescoring.
    Page 8, “Introduction”
  6. We have presented a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention parsing algorithm.

See all papers in Proc. ACL that mention parsing algorithm.

Back to top.

Chinese-English

Appears in 5 sentences as: Chinese-English (5)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
    Page 1, “Abstract”
  2. We evaluate our method on the NIST Chinese-English translation datasets.
    Page 2, “Introduction”
  3. 1Empirically, we find that the average number of stacks for J words is about 1.5 X J on the Chinese-English data.
    Page 3, “Introduction”
  4. We evaluated our phrase-based string-to-dependency translation system on Chinese-English translation.
    Page 6, “Introduction”
  5. We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention Chinese-English.

See all papers in Proc. ACL that mention Chinese-English.

Back to top.

parsing model

Appears in 5 sentences as: Parsing Model (1) parsing model (4)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. 3 A Maximum Entropy Based Shift-Reduce Parsing Model
    Page 4, “Introduction”
  2. 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model; 10. maximum entropy parsing model .
    Page 6, “Introduction”
  3. Table 3: Contribution of maximum entropy shift-reduce parsing model .
    Page 7, “Introduction”
  4. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  5. For our system, we used the Le Zhang’s maximum entropy modeling toolkit to train the shift-reduce parsing model after extracting 32.6M events from the training data.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention parsing model.

See all papers in Proc. ACL that mention parsing model.

Back to top.

n-gram

Appears in 5 sentences as: n-gram (6)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models.
    Page 1, “Abstract”
  2. In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
    Page 1, “Introduction”
  3. Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
    Page 1, “Introduction”
  4. 3. efficient integration of n-gram language model: as translation grows left-to-right in our algorithm, integrating n-gram language models is straightforward.
    Page 2, “Introduction”
  5. 1. relative frequencies in two directions; 2. lexical weights in two directions; 3. phrase penalty; 4. distance-based reordering model; 5. lexicaized reordering model; 6. n-gram language model model; 7. word penalty; 8. ill-formed structure penalty; 9. dependency language model; 10. maximum entropy parsing model.
    Page 6, “Introduction”

See all papers in Proc. ACL 2013 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

development set

Appears in 4 sentences as: development set (4)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set , both separately and jointly.
    Page 7, “Introduction”
  2. We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
    Page 7, “Introduction”
  3. BLEU and TER scores are calculated on the development set .
    Page 7, “Introduction”
  4. Figure 5 shows the performance of Moses and our system with various distortion limits on the development set .
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

reranking

Appears in 4 sentences as: reranking (3) reranks (1)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Therefore, we use hypergraph reranking (Huang and Chiang, 2007; Huang, 2008), which proves to be effective for integrating nonlocal features into dynamic programming, to alleviate this problem.
    Page 6, “Introduction”
  2. 3 In the second pass, we use the hypergraph reranking algorithm (Huang, 2008) to find promising translations using additional dependency features (i.e., features 8-10 in the list).
    Page 6, “Introduction”
  3. Table 3 shows the effect of hypergraph reranking .
    Page 8, “Introduction”
  4. In the second pass, our decoder reranks the hypergraph with additional dependency features.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.

dependency parse

Appears in 4 sentences as: dependency parse (2) dependency parsing (2)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. They suffice to operate on well-formed structures and produce projective dependency parse trees.
    Page 4, “Introduction”
  2. This is often referred to as conflict in the shift-reduce dependency parsing literature (Huang et al., 2009).
    Page 4, “Introduction”
  3. Unfortunately, such oracle turns out to be non-unique even for monolingual shift-reduce dependency parsing (Huang et al., 2009).
    Page 5, “Introduction”
  4. In addition, their algorithm produces phrasal dependency parse trees while the leaves of our dependency trees are words, making dependency language models can be directly used.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention dependency parse.

See all papers in Proc. ACL that mention dependency parse.

Back to top.

NIST

Appears in 3 sentences as: NIST (4)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
    Page 1, “Abstract”
  2. We evaluate our method on the NIST Chinese-English translation datasets.
    Page 2, “Introduction”
  3. We used the 2002 NIST MT Chinese-English dataset as the development set and the 2003-2005 NIST datasets as the testsets.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.

parse trees

Appears in 3 sentences as: parse trees (3)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. While parsing algorithms can be used to parse partial translations in phrase-based decoding, the search space is significantly enlarged since there are exponentially many parse trees for exponentially many translations.
    Page 1, “Introduction”
  2. They suffice to operate on well-formed structures and produce projective dependency parse trees .
    Page 4, “Introduction”
  3. In addition, their algorithm produces phrasal dependency parse trees while the leaves of our dependency trees are words, making dependency language models can be directly used.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

maxent

Appears in 3 sentences as: maxent (2) “maxent” (1)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. | standard | 34.79 | 56.93 | + depLM 3529* 56.17** + maxent 35.40** 56.09**
    Page 7, “Introduction”
  2. + depLM & maxent 35.71** 55.87**
    Page 7, “Introduction”
  3. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model ( “maxent” ) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention maxent.

See all papers in Proc. ACL that mention maxent.

Back to top.

phrase table

Appears in 3 sentences as: phrase table (2) phrase tables (1)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. Furthermore, the introduction of non-terminals makes the grammar size significantly bigger than phrase tables and leads to higher memory requirement (Chiang, 2007).
    Page 1, “Introduction”
  2. On the other hand, although target words can be generated left-to-right by altering the way of tree transversal in syntax-based models, it is still difficult to reach full rule coverage as compared with phrase table .
    Page 1, “Introduction”
  3. phrase table limit is set to 20 for all the three systems.
    Page 7, “Introduction”

See all papers in Proc. ACL 2013 that mention phrase table.

See all papers in Proc. ACL that mention phrase table.

Back to top.

significant improvements

Appears in 3 sentences as: significant improvements (2) significantly improves (1)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
    Page 1, “Abstract”
  2. Adding dependency language model (“depLM”) and the maximum entropy shift-reduce parsing model (“maxent”) significantly improves BLEU and TER on the development set, both separately and jointly.
    Page 7, “Introduction”
  3. We find that adding dependency language and maximum entropy shift-reduce models consistently brings significant improvements , both separately and jointly.
    Page 8, “Introduction”

See all papers in Proc. ACL 2013 that mention significant improvements.

See all papers in Proc. ACL that mention significant improvements.

Back to top.

contextual information

Appears in 3 sentences as: context information (1) contextual information (2)
In A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation
  1. As the stack of a state keeps changing during the decoding process, the context information needed to calculate dependency language model and maximum entropy model probabilities (e. g., root word, leftmost child, etc.)
    Page 6, “Introduction”
  2. As a result, the chance of risk-free hypothesis recombination (Koehn et al., 2003) significantly decreases because complicated contextual information is much less likely to be identical.
    Page 6, “Introduction”
  3. In the future, we plan to include more contextual information (e.g., the uncovered source phrases) in the maximum entropy model to resolve conflicts.
    Page 9, “Introduction”

See all papers in Proc. ACL 2013 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.