Abstract | This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores. |
Abstract | Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets. |
Dependency parsing for machine translation | In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding. |
Dependency parsing for machine translation | We now formalize weighted non-projective dependency parsing similarly to (McDonald et al., 2005b) and then describe a modified and more efficient version that can be integrated into a phrase-based decoder. |
Introduction | Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996). |
Introduction | In comparison, phrase-based decoding can run in linear time if a distortion limit is imposed. |
Introduction | Since exact MT decoding is NP complete (Knight, 1999), there is no exact search algorithm for either phrase-based or syntactic MT that runs in polynomial time (unless P = NP). |
Discussion | We believe that our efficient algorithms will make them more widely applicable in both SCFG—based and phrase-based MT systems. |
Experiments | Our phrase-based statistical MT system is similar to the alignment template system described in (Och and Ney, 2004; Tromble et al., 2008). |
Experiments | We also train two SCFG—based MT systems: a hierarchical phrase-based SMT (Chiang, 2007) system and a syntax augmented machine translation (SAMT) system using the approach described in Zollmann and Venugopal (2006). |
Experiments | Both systems are built on top of our phrase-based systems. |
Introduction | These two techniques were originally developed for N -best lists of translation hypotheses and recently extended to translation lattices (Macherey et al., 2008; Tromble et al., 2008) generated by a phrase-based SMT system (Och and Ney, 2004). |
Introduction | SMT systems based on synchronous context free grammars (SCFG) (Chiang, 2007; Zollmann and Venugopal, 2006; Galley et al., 2006) have recently been shown to give competitive performance relative to phrase-based SMT. |
Translation Hypergraphs | A translation lattice compactly encodes a large number of hypotheses produced by a phrase-based SMT system. |
Background | In phrase-based models, a decision can be translating a source phrase into a target phrase or reordering the target phrases. |
Experiments | first model was the hierarchical phrase-based model (Chiang, 2005; Chiang, 2007). |
Introduction | We evaluated our joint decoder that integrated a hierarchical phrase-based model (Chiang, 2005; Chiang, 2007) and a tree-to-string model (Liu et al., 2006) on the NIST 2005 Chinese-English test-set. |
Introduction | Some researchers prefer to saying “phrase-based approaches” or “phrase-based systems”. |
Introduction | On the other hand, other authors (e. g., (Och and Ney, 2004; Koehn et al., 2003; Chiang, 2007)) do use the expression “phrase-based models”. |
Joint Decoding | Figure 2(a) demonstrates a translation hypergraph for one model, for example, a hierarchical phrase-based model. |
Joint Decoding | Although phrase-based decoders usually produce translations from left to right, they can adopt bottom-up decoding in principle. |
Joint Decoding | (2006) propose left-to-right target generation for hierarchical phrase-based translation. |
Abstract | Hierarchical phrase-based models are attractive because they provide a consistent framework within which to characterize both local and long-distance reorderings, but they also make it difficult to distinguish many implausible reorderings from those that are linguistically plausible. |
Abstract | Rather than appealing to annotation-driven syntactic modeling, we address this problem by observing the influential role of function words in determining syntactic structure, and introducing soft constraints on function word relationships as part of a standard log-linear hierarchical phrase-based model. |
Hierarchical Phrase-based System | Formally, a hierarchical phrase-based SMT system is based on a weighted synchronous context free grammar (SCFG) with one type of nonterminal symbol. |
Hierarchical Phrase-based System | Synchronous rules in hierarchical phrase-based models take the following form: |
Hierarchical Phrase-based System | Translation of a source sentence 6 using hierarchical phrase-based models is formulated as a search for the most probable derivation D* whose source side is equal to e: |
Introduction | Hierarchical phrase-based models (Chiang, 2005; Chiang, 2007) offer a number of attractive benefits in statistical machine translation (SMT), while maintaining the strengths of phrase-based systems (Koehn et al., 2003). |
Introduction | To model such a reordering, a hierarchical phrase-based system demands no additional parameters, since long and short distance reorderings are modeled identically using synchronous context free grammar (SCFG) rules. |
Introduction | Interestingly, hierarchical phrase-based models provide this benefit without making any linguistic commitments beyond the structure of the model. |
Overgeneration and Topological Ordering of Function Words | The problem may be less severe in hierarchical phrase-based MT than in BTG, since lexical items on the rules’ right hand sides often limit the span of nonterminals. |
Abstract | We also present useful features that reflect the compositionality and discriminative power of a phrase and its constituent words for optimizing the weights of phrase use in phrase-based retrieval models. |
Introduction | Our approach to phrase-based retrieval is motivated from the following linguistic intuitions: a) phrases have relatively different degrees of significance, and b) the influence of a phrase should be differentiated based on the phrase’s constituents in retrieval models. |
Previous Work | One of the most earliest work on phrase-based retrieval was done by (Fagan, 1987). |
Previous Work | In many cases, the early researches on phrase-based retrieval have only focused on extracting phrases, not concerning about how to devise a retrieval model that effectively considers both words and phrases in ranking. |
Previous Work | While a phrase-based approach selectively incorporated potentially-useful relation between words, the probabilistic approaches force to estimate parameters for all possible combinations of words in text. |
Proposed Method | In this section, we present a phrase-based retrieval framework that utilizes both words and phrases effectively in ranking. |
Proposed Method | 3.1 Basic Phrase-based Retrieval Model |
Proposed Method | We start out by presenting a simple phrase-based language modeling retrieval model that assumes uniform contribution of words and phrases. |
Abstract | Previous efforts add syntactic constraints to phrase-based translation by directly rewarding/punishing a hypothesis whenever it matches/violates source-side constituents. |
Introduction | The phrase-based approach is widely adopted in statistical machine translation (SMT). |
Introduction | In such a process, original phrase-based decoding (Koehn et al., 2003) does not take advantage of any linguistic analysis, which, however, is broadly used in rule-based approaches. |
Introduction | Since it is not linguistically motivated, original phrase-based decoding might produce ungrammatical or even wrong translations. |
The Syntax-Driven Bracketing Model 3.1 The Model | 3.3 The Integration of the SDB Model into Phrase-Based SMT |
The Syntax-Driven Bracketing Model 3.1 The Model | We integrate the SDB model into phrase-based SMT to help decoder perform syntax-driven phrase translation. |
The Syntax-Driven Bracketing Model 3.1 The Model | In this paper, we implement the SDB model in a state-of-the-art phrase-based system which adapts a binary bracketing transduction grammar (BTG) (Wu, 1997) to phrase translation and reordering, described in (Xiong et al., 2006). |
Abstract | In this paper, we focus on the reverse mapping, showing that any phrase-based SMT decoding problem can be directly reformulated as a TSP. |
Conclusion | The main contribution of this paper has been to propose a transformation for an arbitrary phrase-based SMT decoding instance into a TSP instance. |
Introduction | Phrase-based systems (Koehn et al., 2003) are probably the most widespread class of Statistical Machine Translation systems, and arguably one of the most successful. |
Introduction | We will see in the next section that some characteristics of beam-search make it a suboptimal choice for phrase-based decoding, and we will propose an alternative. |
Introduction | This alternative is based on the observation that phrase-based decoding can be very naturally cast as a Traveling Salesman Problem (TSP), one of the best studied problems in combinatorial optimization. |
Phrase-based Decoding as TSP | Successful phrase-based systems typically employ language models of order higher than two. |
Related work | In (Tillmann and Ney, 2003) and (Tillmann, 2006), the authors modify a certain Dynamic Programming technique used for TSP for use with an IBM-4 word-based model and a phrase-based model respectively. |
Related work | with the mainstream phrase-based SMT models, and therefore making it possible to directly apply existing TSP solvers to SMT. |
The Traveling Salesman Problem and its variants | As will be shown in the next section, phrase-based SMT decoding can be directly reformulated as an AGTSP. |
Conclusions and Future Work | The experimental results show that our model outperforms the baseline models and verify the effectiveness of noncontiguous translational equivalences to noncontiguous phrase modeling in both syntax-based and phrase-based systems. |
Experiments | We compare the SncTSSG based model against two baseline models: the phrase-based and the STSSG-based models. |
Experiments | For the phrase-based model, we use Moses (Koehn et al, 2007) with its default settings; for the STSSG and SncTSSG based models we use our decoder Pisces by setting the following parameters: d = 4, h = 6, C = 6, I: 6, a = 50, ,8 = 50. |
Experiments | Table 3 explores the contribution of the noncontiguous translational equivalence to phrase-based models (all the rules in Table 3 has no grammar tags, but a gap <***> is allowed in the last three rows). |
Introduction | Current research in statistical machine translation (SMT) mostly settles itself in the domain of either phrase-based or syntax-based. |
Introduction | Between them, the phrase-based approach (Marcu and Wong, 2002; Koehn et a1, 2003; Och and Ney, 2004) allows local reordering and contiguous phrase translation. |
Introduction | However, it is hard for phrase-based models to learn global reorderings and to deal with noncontiguous phrases. |
The Pisces decoder | Consequently, this distortional operation, like phrase-based models, is much more flexible in the order of the target constituents than the traditional syntax-based models which are limited by the syntactic structure. |
Abstract | We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting. |
Introduction | 0 We introduce new highly effective sentence selection methods that improve phrase-based SMT in the multilingual and single language pair setting. |
Sentence Selection: Multiple Language Pairs | For the single language pair setting, (Haffari et al., 2009) presents and compares several sentence selection methods for statistical phrase-based machine translation. |
Sentence Selection: Single Language Pair | Phrases are basic units of translation in phrase-based SMT models. |
Abstract | Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of 3.6 BLEU points over using l-best trees. |
Conclusion | Our system also achieves comparable performance with the state-of-the-art phrase-based system Moses. |
Experiments | The absence of such non-syntactic mappings prevents tree-based tree-to-tree models from achieving comparable results to phrase-based models. |
Related Work | They replace l-best trees with packed forests both in training and decoding and show superior translation quality over the state-of-the-art hierarchical phrase-based system. |
Collaborative Decoding | Similar to a typical phrase-based decoder (Koehn, 2004), we associate each hypothesis with a coverage vector C to track translated source words in it. |
Collaborative Decoding | But to be a general framework, this step is necessary for some state-of-the-art phrase-based decoders (Koehn, 2007; Och and Ney, 2004) because in these decoders, hypotheses with different coverage vectors can coeXist in the same bin, or hypotheses associated with the same coverage vector might appear in different bins. |
Experiments | The first one (SYS 1) is re-implementation of Hiero, a hierarchical phrase-based decoder. |
Background 2.1 Terminology | In MT, spurious ambiguity occurs both in regular phrase-based systems (e.g., Koehn et al. |
Background 2.1 Terminology | Figure l: Segmentation ambiguity in phrase-based MT: two different segmentations lead to the same translation string. |
Background 2.1 Terminology | It can be used to encode exponentially many hypotheses generated by a phrase-based MT system (e.g., Koehn et al. |
Introduction | With phrase-based clustering, “Land of Odds” is grouped with many names that are labeled as company names, which is a strong indication that it is a company name as well. |
Introduction | The disambiguation power of phrases is also evidenced by the improvements of phrase-based machine translation systems (Koehn et. |
Introduction | We demonstrate the advantages of phrase-based clusters over word-based ones with experimental results from two distinct application domains: named entity recognition and query classification. |
Experiment | We use the first three syntax-based systems (TT2S, TTS2S, FT2S) and Moses (Koehn et al., 2007), the state-of-the-art phrase-based system, as our baseline systems. |
Forest-based tree sequence to string model | We use seven basic features that are analogous to the commonly used features in phrase-based systems (Koehn, 2003): l) bidirectional rule mapping probabilities, 2) bidirectional lexical rule translation probabilities, 3) target language model, 4) number of rules used and 5) number of target words. |
Related work | Motivated by the fact that non-syntactic phrases make nontrivial contribution to phrase-based SMT, the tree sequence-based translation model is proposed (Liu et al., 2007; Zhang et al., 2008a) that uses tree sequence as the basic translation unit, rather than using single subtree as in the STSG. |