Abstract | This paper proposes new distortion models for phrase-based SMT. |
Distortion Model for Phrase-Based SMT | A Moses-style phrase-based SMT generates target hypotheses sequentially from left to right. |
Introduction | To address this problem, there has been a lot of research done into word reordering: lexical reordering model (Tillman, 2004), which is one of the distortion models, reordering constraint (Zens et al., 2004), pre-ordering (Xia and Mc-Cord, 2004), hierarchical phrase-based SMT (Chiang, 2007), and syntax-based SMT (Yamada and Knight, 2001). |
Introduction | Phrase-based SMT (Koehn et al., 2007) is a widely used SMT method that does not use a parser. |
Introduction | Phrase-based SMT mainly1 estimates word reordering using distortion models2. |
Abstractive Caption Generation | Phrase-based Model The model outlined in equation (8) will generate captions with function words. |
Abstractive Caption Generation | Search To generate a caption it is necessary to find the sequence of words that maximizes P(w1,w2, ...,wn) for the word-based model (equation (8)) and P(p1, p2, ..., pm) for the phrase-based model (equation (15)). |
Experimental Setup | Documents and captions were parsed with the Stanford parser (Klein and Manning, 2003) in order to obtain dependencies for the phrase-based abstractive model. |
Experimental Setup | We tuned the caption length parameter on the development set using a range of [5, 14] tokens for the word-based model and [2, 5] phrases for the phrase-based model. |
Experimental Setup | For the phrase-based model, we also experimented with reducing the search scope, either by considering only the n most similar sentences to the keywords (range [2,10]), or simply the single most similar sentence and its neighbors (range [2, 5]). |
Results | different from phrase-based abstractive system. |
Results | Table 4: Captions written by humans (G) and generated by extractive (KL), word-based abstractive (AW), and phrase-based extractive (Ap systems). |
Results | It is significantly worse than the phrase-based abstractive system (0c < 0.01), the extractive system (0c < 0.01), and the gold standard (0c < 0.01). |
Abstract | We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase-based SMT system. |
Experiments | We use a BTG phrase-based system with a Max-Ent based leXicalized reordering model (Wu, 1997; Xiong et al., 2006) as our baseline system for |
Integration into SMT system | There are two ways to integrate the ranking reordering model into a phrase-based SMT system: the pre-reorder method, and the decoding time constraint method. |
Integration into SMT system | Reordered sentences can go through the normal pipeline of a phrase-based decoder. |
Integration into SMT system | The above three penalties are added as additional features into the log-linear model of the phrase-based system. |
Introduction | In phrase-based models (Och, 2002; Koehn et al., 2003), phrase is introduced to serve as the fundamental translation element and deal with local reordering, while a distance based distortion model is used to coarsely depict the exponentially decayed word movement probabilities in language translation. |
Introduction | Long-distance word reordering between language pairs with substantial word order difference, such as Japanese with Subject-Object—Verb (SOV) structure and English with Subject-Verb-Object (SVO) structure, is generally viewed beyond the scope of the phrase-based systems discussed above, because of either distortion limits or lack of discriminative features for modeling. |
Introduction | This is usually done in a preprocessing step, and then followed by a standard phrase-based SMT system that takes the reordered source sentence as input to finish the translation. |
Abstract | We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. |
Abstract | As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. |
Experiments on Phrase-Based SMT | Moses (Koehn et al., 2007) is used as the baseline phrase-based SMT system. |
Experiments on Phrase-Based SMT | 6.2 Effect of improved word alignment on phrase-based SMT |
Improving Phrase Table | Phrase-based SMT system automatically extracts bilingual phrase pairs from the word aligned bilingual corpus. |
Improving Phrase Table | These collocation probabilities are incorporated into the phrase-based SMT system as features. |
Introduction | In phrase-based SMT (Koehn et al., 2003), the phrase boundary is usually determined based on the bidirectional word alignments. |
Introduction | Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Introduction | Then the phrase collocation probabilities are used as additional features in phrase-based SMT systems. |
Abstract | Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora. |
Abstract | This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior. |
Analysis | We have presented a novel method for leam-ing a phrase-based model of translation directly from parallel data which we have framed as leam-ing an inverse transduction grammar (ITG) using a recursive Bayesian prior. |
Experiments | As a baseline, we train a phrase-based model using the moses toolkit12 based on the word alignments obtained using GIZA++ in both directions and symmetrized using the grow-diag-final-and heuristic13 (Koehn et al., 2003). |
Introduction | The phrase-based approach (Koehn et al., 2003) to machine translation (MT) has transformed MT from a narrow research topic into a truly useful technology to end users. |
Introduction | Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data, from |
Introduction | Firstly, many phrase-based phenomena which do not decompose into word translations (e.g., idioms) will be missed, as the underlying word-based alignment model is unlikely to propose the correct alignments. |
Related Work | A number of other approaches have been developed for learning phrase-based models from bilingual data, starting with Marcu and Wong (2002) who developed an extension to IBM model 1 to handle multi-word units. |
A Phrase-Based Error Model | The goal of the phrase-based error model is to transform a correctly spelled query C into a misspelled query Q. |
A Phrase-Based Error Model | If we assume a uniform probability over segmentations, then the phrase-based probability can be defined as: |
Abstract | Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. |
Abstract | Results show that the system using the phrase-based error model outperforms significantly its baseline systems. |
Introduction | Among these models, the most effective one is a phrase-based error model that captures the probability of transforming one multi-term phrase into another multi-term phrase. |
Introduction | Comparing to traditional error models that account for transformation probabilities between single characters (Kernighan et al., 1990) or sub-word strings (Brill and Moore, 2000), the phrase-based model is more powerful in that it captures some contextual information by retaining inter-term dependencies. |
Introduction | In particular, the speller system incorporating a phrase-based error model significantly outperforms its baseline systems. |
Related Work | To this end, inspired by the phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003; Och and Ney, 2004), we propose a phrase-based error model where we assume that query spelling correction is performed at the phrase level. |
Related Work | In what follows, before presenting the phrase-based error model, we will first describe the clickthrough data and the query speller system we used in this study. |
The Baseline Speller System | Figure 2: Example demonstrating the generative procedure behind the phrase-based error model. |
Abstract | Previous efforts add syntactic constraints to phrase-based translation by directly rewarding/punishing a hypothesis whenever it matches/violates source-side constituents. |
Introduction | The phrase-based approach is widely adopted in statistical machine translation (SMT). |
Introduction | In such a process, original phrase-based decoding (Koehn et al., 2003) does not take advantage of any linguistic analysis, which, however, is broadly used in rule-based approaches. |
Introduction | Since it is not linguistically motivated, original phrase-based decoding might produce ungrammatical or even wrong translations. |
The Syntax-Driven Bracketing Model 3.1 The Model | 3.3 The Integration of the SDB Model into Phrase-Based SMT |
The Syntax-Driven Bracketing Model 3.1 The Model | We integrate the SDB model into phrase-based SMT to help decoder perform syntax-driven phrase translation. |
The Syntax-Driven Bracketing Model 3.1 The Model | In this paper, we implement the SDB model in a state-of-the-art phrase-based system which adapts a binary bracketing transduction grammar (BTG) (Wu, 1997) to phrase translation and reordering, described in (Xiong et al., 2006). |
Conclusion & Future Work | We will also add capability of supporting syntax-based ensemble decoding and experiment how a phrase-based system can benefit from syntax information present in a syntax-aware MT system. |
Ensemble Decoding | The current implementation is able to combine hierarchical phrase-based systems (Chiang, 2005) as well as phrase-based translation systems (Koehn et al., 2003). |
Ensemble Decoding | phrase-based, hierarchical phrase-based , and/or syntax-based systems. |
Experiments & Results 4.1 Experimental Setup | For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model (LM) and word count. |
Experiments & Results 4.1 Experimental Setup | For ensemble decoding, we modified an in-house implementation of hierarchical phrase-based system, Kriya (Sankaran et al., 2012) which uses the same features mentioned in (Chiang, 2005): forward and backward relative-frequency and lexical TM probabilities; LM; word, phrase and glue-rules penalty. |
Experiments & Results 4.1 Experimental Setup | The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system. |
Introduction | We have modified Kriya (Sankaran et al., 2012), an in-house implementation of hierarchical phrase-based translation system (Chiang, 2005), to implement ensemble decoding using multiple translation models. |
Related Work 5.1 Domain Adaptation | Among these approaches are sentence-based, phrase-based and word-based output combination methods. |
Related Work 5.1 Domain Adaptation | Firstly, unlike the multi-table support of Moses which only supports phrase-based translation table combination, our approach supports ensembles of both hierarchical and phrase-based systems. |
Abstract | We also present useful features that reflect the compositionality and discriminative power of a phrase and its constituent words for optimizing the weights of phrase use in phrase-based retrieval models. |
Introduction | Our approach to phrase-based retrieval is motivated from the following linguistic intuitions: a) phrases have relatively different degrees of significance, and b) the influence of a phrase should be differentiated based on the phrase’s constituents in retrieval models. |
Previous Work | One of the most earliest work on phrase-based retrieval was done by (Fagan, 1987). |
Previous Work | In many cases, the early researches on phrase-based retrieval have only focused on extracting phrases, not concerning about how to devise a retrieval model that effectively considers both words and phrases in ranking. |
Previous Work | While a phrase-based approach selectively incorporated potentially-useful relation between words, the probabilistic approaches force to estimate parameters for all possible combinations of words in text. |
Proposed Method | In this section, we present a phrase-based retrieval framework that utilizes both words and phrases effectively in ranking. |
Proposed Method | 3.1 Basic Phrase-based Retrieval Model |
Proposed Method | We start out by presenting a simple phrase-based language modeling retrieval model that assumes uniform contribution of words and phrases. |
Abstract | Hierarchical phrase-based models are attractive because they provide a consistent framework within which to characterize both local and long-distance reorderings, but they also make it difficult to distinguish many implausible reorderings from those that are linguistically plausible. |
Abstract | Rather than appealing to annotation-driven syntactic modeling, we address this problem by observing the influential role of function words in determining syntactic structure, and introducing soft constraints on function word relationships as part of a standard log-linear hierarchical phrase-based model. |
Hierarchical Phrase-based System | Formally, a hierarchical phrase-based SMT system is based on a weighted synchronous context free grammar (SCFG) with one type of nonterminal symbol. |
Hierarchical Phrase-based System | Synchronous rules in hierarchical phrase-based models take the following form: |
Hierarchical Phrase-based System | Translation of a source sentence 6 using hierarchical phrase-based models is formulated as a search for the most probable derivation D* whose source side is equal to e: |
Introduction | Hierarchical phrase-based models (Chiang, 2005; Chiang, 2007) offer a number of attractive benefits in statistical machine translation (SMT), while maintaining the strengths of phrase-based systems (Koehn et al., 2003). |
Introduction | To model such a reordering, a hierarchical phrase-based system demands no additional parameters, since long and short distance reorderings are modeled identically using synchronous context free grammar (SCFG) rules. |
Introduction | Interestingly, hierarchical phrase-based models provide this benefit without making any linguistic commitments beyond the structure of the model. |
Overgeneration and Topological Ordering of Function Words | The problem may be less severe in hierarchical phrase-based MT than in BTG, since lexical items on the rules’ right hand sides often limit the span of nonterminals. |
Abstract | However phrase-based approaches are much less able to model sentence level effects between different phrase-pairs. |
Experiments | However in this paper we limit our focus to inducing word alignments, i.e., by using the model to infer alignments which are then used in a standard phrase-based translation pipeline. |
Experiments | We leave full decoding for later work, which we anticipate would further improve performance by exploiting gapping phrases and other phenomena that implicitly form part of our model but are not represented in the phrase-based decoder. |
Introduction | Recent years have witnessed burgeoning development of statistical machine translation research, notably phrase-based (Koehn et al., 2003) and syntax-based approaches (Chiang, 2005; Galley et al., 2006; Liu et al., 2006). |
Introduction | These approaches model sentence translation as a sequence of simple translation decisions, such as the application of a phrase translation in phrase-based methods or a grammar rule in syntax-based approaches. |
Introduction | This conflicts with the intuition behind phrase-based MT, namely that translation decisions should be dependent on con- |
Model | We consider a process in which the target string is generated using a left-to-right order, similar to the decoding strategy used by phrase-based machine translation systems (Koehn et al., 2003). |
Model | In contrast to phrase-based models, we use words as our basic translation unit, rather than multi-word phrases. |
Related Work | This idea has been developed explicitly in a number of previous approaches, in grammar based (Chiang, 2005) and phrase-based systems (Galley and Manning, 2010). |
Background | In phrase-based models, a decision can be translating a source phrase into a target phrase or reordering the target phrases. |
Experiments | first model was the hierarchical phrase-based model (Chiang, 2005; Chiang, 2007). |
Introduction | We evaluated our joint decoder that integrated a hierarchical phrase-based model (Chiang, 2005; Chiang, 2007) and a tree-to-string model (Liu et al., 2006) on the NIST 2005 Chinese-English test-set. |
Introduction | Some researchers prefer to saying “phrase-based approaches” or “phrase-based systems”. |
Introduction | On the other hand, other authors (e. g., (Och and Ney, 2004; Koehn et al., 2003; Chiang, 2007)) do use the expression “phrase-based models”. |
Joint Decoding | Figure 2(a) demonstrates a translation hypergraph for one model, for example, a hierarchical phrase-based model. |
Joint Decoding | Although phrase-based decoders usually produce translations from left to right, they can adopt bottom-up decoding in principle. |
Joint Decoding | (2006) propose left-to-right target generation for hierarchical phrase-based translation. |
Abstract | We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation. |
Abstract | As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets. |
Introduction | Modern statistical machine translation approaches can be roughly divided into two broad categories: phrase-based and syntax-based. |
Introduction | Phrase-based approaches treat phrase, which is usually a sequence of consecutive words, as the basic unit of translation (Koehn et al., 2003; Och and Ney, 2004). |
Introduction | As phrases are capable of memorizing local context, phrase-based approaches excel at handling local word selection and reordering. |
Abstract | Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model. |
Abstract | Phrasal-Hiero still has the same hypothesis space as the original Hiero but incorporates a phrase-based distance cost feature and lexicalized reodering features into the chart decoder. |
Abstract | The work consists of two parts: 1) for each Hiero translation derivation, find its corresponding discontinuous phrase-based path. |
Introduction | Phrase-based and tree-based translation model are the two main streams in state-of-the-art machine translation. |
Introduction | Yet, tree-based translation often underperforms phrase-based translation in language pairs with short range reordering such as Arabic-English translation (Zollmann et al., 2008; Birch et al., 2009). |
Introduction | (2003) for our phrase-based system and Chiang (2005) for our Hiero system. |
Discussion | We believe that our efficient algorithms will make them more widely applicable in both SCFG—based and phrase-based MT systems. |
Experiments | Our phrase-based statistical MT system is similar to the alignment template system described in (Och and Ney, 2004; Tromble et al., 2008). |
Experiments | We also train two SCFG—based MT systems: a hierarchical phrase-based SMT (Chiang, 2007) system and a syntax augmented machine translation (SAMT) system using the approach described in Zollmann and Venugopal (2006). |
Experiments | Both systems are built on top of our phrase-based systems. |
Introduction | These two techniques were originally developed for N -best lists of translation hypotheses and recently extended to translation lattices (Macherey et al., 2008; Tromble et al., 2008) generated by a phrase-based SMT system (Och and Ney, 2004). |
Introduction | SMT systems based on synchronous context free grammars (SCFG) (Chiang, 2007; Zollmann and Venugopal, 2006; Galley et al., 2006) have recently been shown to give competitive performance relative to phrase-based SMT. |
Translation Hypergraphs | A translation lattice compactly encodes a large number of hypotheses produced by a phrase-based SMT system. |
Abstract | We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system. |
Background | The first SMT system is a phrase-based system with two reordering models including the maximum entropy-based lexicalized reordering model proposed by Xiong et al. |
Background | The second SMT system is an in-house reim-plementation of the Hiero system which is based on the hierarchical phrase-based model proposed by Chiang (2005). |
Background | After 5, 7 and 8 iterations, relatively stable improvements are achieved by the phrase-based system, the Hiero system and the syntaX-based system, respectively. |
Introduction | Many SMT frameworks have been developed, including phrase-based SMT (Koehn et al., 2003), hierarchical phrase-based SMT (Chiang, 2005), syntax-based SMT (Eisner, 2003; Ding and Palmer, 2005; Liu et al., 2006; Galley et al., 2006; Cowan et al., 2006), etc. |
Introduction | Our experiments are conducted on Chinese-to-English translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based |
Abstract | This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores. |
Abstract | Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets. |
Dependency parsing for machine translation | In this section, we review dependency parsing formulated as a maximum spanning tree problem (McDonald et al., 2005b), which can be solved in quadratic time, and then present its adaptation and novel application to phrase-based decoding. |
Dependency parsing for machine translation | We now formalize weighted non-projective dependency parsing similarly to (McDonald et al., 2005b) and then describe a modified and more efficient version that can be integrated into a phrase-based decoder. |
Introduction | Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996). |
Introduction | In comparison, phrase-based decoding can run in linear time if a distortion limit is imposed. |
Introduction | Since exact MT decoding is NP complete (Knight, 1999), there is no exact search algorithm for either phrase-based or syntactic MT that runs in polynomial time (unless P = NP). |
Abstract | The model leverages on the strengths of both phrase-based and linguistically syntax-based method. |
Introduction | Phrase-based modeling method (Koehn et al., 2003; Och and Ney, 2004a) is a simple, but powerful mechanism to machine translation since it can model local reorderings and translations of multi-word expressions well. |
Introduction | It is designed to combine the strengths of phrase-based and syntax-based methods. |
Introduction | Experiment results on the NIST MT-2005 Chinese-English translation task show that our method significantly outperforms Moses (Koehn et al., 2007), a state-of-the-art phrase-based SMT system, and other linguistically syntax-based methods, such as SCFG-based and STSG-based methods (Zhang et al., 2007). |
Related Work | However, most of them fail to utilize non-syntactic phrases well that are proven useful in the phrase-based methods (Koehn et al., 2003) |
Related Work | Chiang (2005)’s hierarchal phrase-based model achieves significant performance improvement. |
Related Work | In the last two years, many research efforts were devoted to integrating the strengths of phrase-based and syntax-based methods. |
Tree Sequence Alignment Model | We use seven basic features that are analogous to the commonly used features in phrase-based systems (Koehn, 2004): 1) bidirectional rule mapping probabilities; 2) bidirectional lexical rule translation probabilities; 3) the target language model; 4) the number of rules used and 5) the number of target words. |
Abstract | We present a novel scheme to apply factored phrase-based SMT to a language pair with very disparate morphological structures. |
Conclusions | We have presented a novel way to incorporate source syntactic structure in English-to-Turkish phrase-based machine translation by parsing the source sentences and then encoding many local and nonlocal source syntactic structures as additional complex tag factors. |
Experimental Setup and Results | We evaluated the impact of the transformations in factored phrase-based SMT with an English-Turkish data set which consists of 52712 parallel sentences. |
Experimental Setup and Results | As a baseline system, we built a standard phrase-based system, using the surface forms of the words without any transformations, and with a 3—gram LM in the decoder. |
Experimental Setup and Results | Factored phrase-based SMT allows the use of multiple language models for the target side, for different factors during decoding. |
Introduction | Once these were identified as separate tokens, they were then used as “words” in a standard phrase-based framework (Koehn et al., 2003). |
Introduction | This facilitates the use of factored phrase-based translation that was not previously applicable due to the morphological complexity on the target side and mismatch between source and target morphologies. |
Introduction | We assume that the reader is familiar with the basics of phrase-based statistical machine translation (Koehn et al., 2003) and factored statistical machine translation (Koehn and Hoang, 2007). |
Related Work | Koehn (2005) applied standard phrase-based SMT to Finnish using the Europarl corpus and reported that translation to Finnish had the worst BLEU scores. |
Related Work | Yang and Kirchhoff (2006) have used phrase-based backoff models to translate unknown words by morphologically decomposing the unknown source words. |
Related Work | They used both CCG supertags and LTAG su-pertags in Arabic-to-English phrase-based translation and have reported about 6% relative improvement in BLEU scores. |
Abstract | In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an au-tomatically-induced translation lexicon or a manually-edited translation dictionary. |
Experiments | We construct two kinds of phrase-based models using Moses (Koehn et al., 2007): one uses out-of-domain data and the other uses in-domain data. |
Introduction | Novel translation models, such as phrase-based models (Koehn et a., 2007), hierarchical phrase-based models (Chiang, 2007) and linguistically syntax-based models (Liu et a., 2006; Huang et al., 2006; Galley, 2006; Zhang et a1, 2008; Chiang, 2010; Zhang et al., 2011; Zhai et al., 2011, 2012) have been proposed and achieved higher and higher translation performance. |
Introduction | Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012). |
Introduction | level translation rules and learn a phrase-based model from the monolingual corpora. |
Phrase Pair Refinement and Parameterization | It is well known that in the phrase-based SMT there are four translation probabilities and the reordering probability for each phrase pair. |
Phrase Pair Refinement and Parameterization | The translation probabilities in the traditional phrase-based SMT include bidirectional phrase translation probabilities and bidirectional lexical weights. |
Probabilistic Bilingual Lexicon Acquisition | In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation). |
Related Work | For the target-side monolingual data, they just use it to train language model, and for the source-side monolingual data, they employ a baseline (word-based SMT or phrase-based SMT trained with small-scale bitext) to first translate the source sentences, combining the source sentence and its target translation as a bilingual sentence pair, and then train a new phrase-base SMT with these pseudo sentence pairs. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | As is standard in SMT, we further assume that 1) the translation process can be decomposed into a derivation of phrase-pairs (for phrase-based models) or translation rules (for syntax-based models); 2) and a linear function is used to assign a model score to each derivation. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | See Figure l for an example of applying the above model to phrase-based MT. |
A Skeleton-based Approach to MT 2.1 Skeleton Identification | While we will restrict ourself to phrase-based translation in the following description and experiments, we can choose different models/features for gskel(d) and gfuu(d). |
Abstract | We apply our approach to a state-of-the-art phrase-based system and demonstrate very promising BLEU improvements and TER reductions on the NIST Chinese-English MT evaluation data. |
Introduction | The simplest of these is the phrase-based approach (Och et al., 1999; Koehn et al., 2003) which employs a global model to process any sub-strings of the input sentence. |
Introduction | However, these approaches suffer from the same problem as the phrase-based counterpart and use the single global model to handle different translation units, no matter they are from the skeleton of the input tree/sentence or other not-so-important substructures. |
Introduction | 0 We apply the proposed model to Chinese-English phrase-based MT and demonstrate promising BLEU improvements and TER reductions on the NIST evaluation data. |
Abstract | This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing. |
Abstract | This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. |
Abstract | Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporating syntax into phrase-based translation. |
Introduction | Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output. |
Introduction | Bottom-up and top-down parsers typically require a completed string as input; this requirement makes it difficult to incorporate these parsers into phrase-based translation, which generates hypothesized translations incrementally, from left-to-right.1 As a workaround, parsers can rerank the translated output of translation systems (Och et al., 2004). |
Introduction | We observe that incremental parsers, used as structured language models, provide an appropriate algorithmic match to incremental phrase-based decoding. |
Related Work | Neither phrase-based (Koehn et al., 2003) nor hierarchical phrase-based translation (Chiang, 2005) take explicit advantage of the syntactic structure of either source or target language. |
Related Work | Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality. |
Abstract | Phrase-based decoding produces state-of-the-art translations with no regard for syntax. |
Abstract | The resulting cohesive, phrase-based decoder is shown to produce translations that are preferred over noncohesive output in both automatic and human evaluations. |
Cohesive Decoding | This section describes a modification to standard phrase-based decoding, so that the system is constrained to produce only cohesive output. |
Conclusion | We have presented a definition of syntactic cohesion that is applicable to phrase-based SMT. |
Conclusion | This suggests that cohesion could be a strong feature in estimating the confidence of phrase-based translations. |
Experiments | We have adapted the notion of syntactic cohesion so that it is applicable to phrase-based decoding. |
Experiments | The lexical reordering model provides a good comparison point as a non-syntactic, and potentially orthogonal, improvement to phrase-based movement modeling. |
Experiments | This indicates that the cohesive subset is easier to translate with a phrase-based system. |
Introduction | We attempt to use this strong, but imperfect, characterization of movement to assist a non-syntactic translation method: phrase-based SMT. |
Introduction | Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation. |
A Class-based Model of Agreement | We chose a bigram model due to the aggressive recombination strategy in our phrase-based decoder. |
Abstract | The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. |
Discussion of Translation Results | Phrase Table Coverage In a standard phrase-based system, effective translation into a highly inflected target language requires that the phrase table contain the inflected word forms necessary to construct an output with correct agreement. |
Discussion of Translation Results | This large gap between the unigram recall of the actual translation output (top) and the lexical coverage of the phrase-based model (bottom) indicates that translation performance can be improved dramatically by altering the translation model through features such as ours, without expanding the search space of the decoder. |
Experiments | Experimental Setup Our decoder is based on the phrase-based approach to translation (Och and Ney, 2004) and contains various feature functions including phrase relative frequency, word-level alignment statistics, and lexicalized reordering models (Tillmann, 2004; Och et al., 2004). |
Inference during Translation Decoding | 3.1 Phrase-based Translation Decoding |
Inference during Translation Decoding | We consider the standard phrase-based approach to MT (Och and Ney, 2004). |
Introduction | Agreement relations that cross statistical phrase boundaries are not explicitly modeled in most phrase-based MT systems (Avramidis and Koehn, 2008). |
Introduction | The model can be implemented using the feature APIs of popular phrase-based decoders such as Moses (Koehn et al., 2007) and Phrasal (Cer et al., 2010). |
Related Work | Subotin (2011) recently extended factored translation models to hierarchical phrase-based translation and developed a discriminative model for predicting target-side morphology in English-Czech. |
Abstract | In this paper, we focus on the reverse mapping, showing that any phrase-based SMT decoding problem can be directly reformulated as a TSP. |
Conclusion | The main contribution of this paper has been to propose a transformation for an arbitrary phrase-based SMT decoding instance into a TSP instance. |
Introduction | Phrase-based systems (Koehn et al., 2003) are probably the most widespread class of Statistical Machine Translation systems, and arguably one of the most successful. |
Introduction | We will see in the next section that some characteristics of beam-search make it a suboptimal choice for phrase-based decoding, and we will propose an alternative. |
Introduction | This alternative is based on the observation that phrase-based decoding can be very naturally cast as a Traveling Salesman Problem (TSP), one of the best studied problems in combinatorial optimization. |
Phrase-based Decoding as TSP | Successful phrase-based systems typically employ language models of order higher than two. |
Related work | In (Tillmann and Ney, 2003) and (Tillmann, 2006), the authors modify a certain Dynamic Programming technique used for TSP for use with an IBM-4 word-based model and a phrase-based model respectively. |
Related work | with the mainstream phrase-based SMT models, and therefore making it possible to directly apply existing TSP solvers to SMT. |
The Traveling Salesman Problem and its variants | As will be shown in the next section, phrase-based SMT decoding can be directly reformulated as an AGTSP. |
Abstract | Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data. |
Alignment | We apply our normal phrase-based decoder on the source side of the training data and constrain the translations to the corresponding target sentences from the training data. |
Conclusion | We have shown that training phrase models can improve translation performance on a state-of-the-art phrase-based translation model. |
Experimental Evaluation | The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model. |
Introduction | A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003). |
Introduction | We use a modified version of a phrase-based decoder to perform the forced alignment. |
Related Work | For the hierarchical phrase-based approach, (Blunsom et al., 2008) present a discriminative rule model and show the difference between using only the viterbi alignment in training and using the full sum over all possible derivations. |
Related Work | We also include these word lexica, as they are standard components of the phrase-based system. |
Related Work | They report improvements over a phrase-based model that uses an inverse phrase model and a language model. |
Abstract | Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. |
Conclusion and Future Work | Unlike the previous pipeline approaches, which directly merge TM phrases into the final translation result, we integrate TM information of each source phrase into the phrase-based SMT at decoding. |
Experiments | For the phrase-based SMT system, we adopted the Moses toolkit (Koehn et al., 2007). |
Experiments | conducted using the Moses phrase-based decoder (Koehn et al., 2007). |
Introduction | Statistical machine translation (SMT), especially the phrase-based model (Koehn et al., 2003), has developed very fast in the last decade. |
Introduction | On a Chinese—English computer technical documents TM database, our experiments have shown that the proposed Model-III improves the translation quality significantly over either the pure phrase-based SMT or the TM systems when the fuzzy match score is above 0.4. |
Problem Formulation | Compared with the standard phrase-based machine translation model, the translation problem is reformulated as follows (only based on the best TM, however, it is similar for multiple TM sentences): |
Problem Formulation | mula (3) is just the typical phrase-based SMT model, and the second factor P(Mk|Lk, 2:) (to be specified in the Section 3) is the information derived from the TM sentence pair. |
Problem Formulation | Therefore, we can still keep the original phrase-based SMT model and only pay attention to how to extract |
Abstract | We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. |
Evaluation | Table 4: Comparing Model-1 and Model-2 with Phrase-based Systems |
Evaluation | We also used two methods to incorporate transliterations in the phrase-based system: |
Evaluation | Post-process P191: All the 00V words in the phrase-based output are replaced with their top-candidate transliteration as given by our transliteration system. |
Introduction | Section 4 discusses the training data, parameter optimization and the initial set of experiments that compare our two models with a baseline Hindi-Urdu phrase-based system and with two transliteration-aided phrase-based systems in terms of BLEU scores |
Conclusions and Future Work | The experimental results show that our model outperforms the baseline models and verify the effectiveness of noncontiguous translational equivalences to noncontiguous phrase modeling in both syntax-based and phrase-based systems. |
Experiments | We compare the SncTSSG based model against two baseline models: the phrase-based and the STSSG-based models. |
Experiments | For the phrase-based model, we use Moses (Koehn et al, 2007) with its default settings; for the STSSG and SncTSSG based models we use our decoder Pisces by setting the following parameters: d = 4, h = 6, C = 6, I: 6, a = 50, ,8 = 50. |
Experiments | Table 3 explores the contribution of the noncontiguous translational equivalence to phrase-based models (all the rules in Table 3 has no grammar tags, but a gap <***> is allowed in the last three rows). |
Introduction | Current research in statistical machine translation (SMT) mostly settles itself in the domain of either phrase-based or syntax-based. |
Introduction | Between them, the phrase-based approach (Marcu and Wong, 2002; Koehn et a1, 2003; Och and Ney, 2004) allows local reordering and contiguous phrase translation. |
Introduction | However, it is hard for phrase-based models to learn global reorderings and to deal with noncontiguous phrases. |
The Pisces decoder | Consequently, this distortional operation, like phrase-based models, is much more flexible in the order of the target constituents than the traditional syntax-based models which are limited by the syntactic structure. |
Abstract | Our work is based on a phrase-based SMT system. |
Abstract | Phrase-based Translation System |
Abstract | The translation process of phrase-based SMT can be briefly described in three steps: segment source sentence into a sequence of phrases, translate each |
Abstract | Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). |
Experiments | We compare three systems: a phrase-based system (Och and Ney, 2004), a hierarchical phrase-based system (Chiang, 2005), and our forest-to-string system with different binarization schemes. |
Experiments | In the phrase-based decoder, jump width is set to 8. |
Experiments | Besides standard features (Och and Ney, 2004), the phrase-based decoder also uses a Maximum Entropy phrasal reordering model (Zens and Ney, 2006). |
Experimental setup | Implementation In all experiments, we use the IBM Model 4 implementation from the GIZA++ toolkit (Och and Ney, 2000) for alignment, and the phrase-based and hierarchical models implemented in the Moses toolkit (Koehn et a1., 2007) for rule extraction. |
Introduction | Our contributions are as follows: We develop a semantic parser using off-the-shelf MT components, exploring phrase-based as well as hierarchical models. |
MT—based semantic parsing | We consider a phrase-based translation model (Koehn et al., 2003) and a hierarchical translation model (Chiang, 2005). |
MT—based semantic parsing | Rules for the phrase-based model consist of pairs of aligned source and target sequences, while hierarchical rules are SCFG productions containing at most two instances of a single nonterminal symbol. |
Related Work | The present work is also the first we are aware of which uses phrase-based rather than tree-based machine translation techniques to learn a semantic parser. |
Related Work | multilevel rules composed from smaller rules, a process similar to the one used for creating phrase tables in a phrase-based MT system. |
Results | We first compare the results for the two translation rule extraction models, phrase-based and hierar-chica1 (“MT-phrase” and “MT-hier” respectively in Table 1). |
Abstract | The two models are integrated into a state-of-the-art phrase-based machine translation system and evaluated on Chinese-to-English translation tasks with large-scale training data. |
Conclusions and Future Work | The two models have been integrated into a phrase-based SMT system and evaluated on Chinese-to-English translation tasks using large-scale training data. |
Integrating the Two Models into SMT | In this section, we elaborate how to integrate the two models into phrase-based SMT. |
Integrating the Two Models into SMT | In particular, we integrate the models into a phrase-based system which uses bracketing transduction grammars (BTG) (Wu, 1997) for phrasal translation (Xiong et al., 2006). |
Integrating the Two Models into SMT | It is straightforward to integrate the predicate translation model into phrase-based SMT (Koehn et al., |
Introduction | We integrate these two discriminative models into a state-of-the-art phrase-based system. |
Introduction | 1We define translation units as phrases in phrase-based SMT, and as translation rules in syntax-based SMT. |
Introduction | Specifically, the popular distortion or lexicalized reordering models in phrase-based SMT focus only on making good local prediction (i.e. |
Introduction | Even though the experimental results carried out in this paper employ SCFG-based SMT systems, we would like to point out that our models is applicable to other systems including phrase-based SMT systems. |
Maximal Orientation Span | 3We use hierarchical phrase-based translation system as a case in point, but the merit is generalizable to other systems. |
Maximal Orientation Span | Additionally, this illustration also shows a case where MOS acts as a cross-boundary context which effectively relaxes the context-free assumption of hierarchical phrase-based formalism. |
Related Work | Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007). |
Related Work | Our MOS concept is also closely related to hierarchical reordering model (Galley and Manning, 2008) in phrase-based decoding, which computes 0 of b with respect to a multi-block unit that may go beyond 19’. |
Abstract | Our best results improve a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross-entropy trained model by up to 0.6 BLEU in a single reference setup. |
Conclusion and Future Work | Our best result improves the output of a phrase-based decoder by up to 2.0 BLEU on French-English translation, outperforming n-best rescoring by up to 1.1 BLEU and lattice rescoring by up to 0.4 BLEU. |
Decoder Integration | Typically, phrase-based decoders maintain a set of states representing partial and complete translation hypothesis that are scored by a set of features. |
Expected BLEU Training | Formally, our phrase-based model is parameterized by M parameters A where each Am 6 A, m = l . |
Experiments | We use a phrase-based system similar to Moses (Koehn et al., 2007) based on a set of common features including maximum likelihood estimates pML (elf) and pML (f |e), lexically weighted estimates pLW(e| f) and p LW( f |e), word and phrase-penalties, a hierarchical reordering model (Galley and Manning, 2008), a linear distortion feature, and a modified Kneser—Ney language model trained on the target-side of the parallel data. |
Experiments | ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set, augmented by scores corresponding to the neural network models. |
Experiments | We use the same data both for training the phrase-based system as well as the language model but find that the resulting bias did not hurt end-to-end accuracy (Yu et al., 2013). |
Discussions | As the semantic phrase embedding can fully represent the phrase, we can go a step further in the phrase-based SMT and feed the semantic phrase embeddings to DNN in order to model the whole translation process (e.g. |
Experiments | With the semantic phrase embeddings and the vector space transformation function, we apply the BRAE to measure the semantic similarity between a source phrase and its translation candidates in the phrase-based SMT. |
Experiments | We have implemented a phrase-based translation system with a maximum entropy based reordering model using the bracketing transduction grammar (Wu, 1997; Xiong et al., 2006). |
Experiments | Typically, four translation probabilities are adopted in the phrase-based SMT, including phrase translation probability and lexical weights in both directions. |
Introduction | However, in the conventional ( phrase-based ) SMT, phrases are the basic translation units. |
Introduction | Instead, we focus on learning phrase embeddings from the view of semantic meaning, so that our phrase embedding can fully represent the phrase and best fit the phrase-based SMT. |
Abstract | The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs. |
Experimental Setup | Training We obtained phrase-based salience scores using a supervised machine learning algorithm. |
Experimental Setup | The SVM was trained with the same features used to obtain phrase-based salience scores, but with sentence-level labels (labels (1) and (2) positive, (3) negative). |
Experimental Setup | Figure 3: ROUGE-l and ROUGE-L results for phrase-based ILP model and two baselines, with error bars showing 95% confidence levels. |
Results | F-score is higher for the phrase-based system but not significantly. |
Results | The highlights created by the sentence ILP were considered significantly more verbose (0c < 0.05) than those created by the phrase-based system and the CNN abstractors. |
Results | Table 5 shows the output of the phrase-based system for the documents in Table 1. |
Hierarchical phrase-based translation | We take as our starting point David Chiang’s Hiero system, which generalizes phrase-based translation to substrings with gaps (Chiang, 2007). |
Hierarchical phrase-based translation | As shown by Chiang (2007), a weighted grammar of this form can be collected and scored by simple extensions of standard methods for phrase-based translation and efficiently combined with a language model in a CKY decoder to achieve large improvements over a state-of-the-art phrase-based system. |
Hierarchical phrase-based translation | Although a variety of scores interpolated into the decision rule for phrase-based systems have been investigated over the years, only a handful have been discovered to be consistently useful. |
Introduction | Translation into languages with rich morphology presents special challenges for phrase-based methods. |
Introduction | Thus, Birch et al (2008) find that translation quality achieved by a popular phrase-based system correlates significantly with a measure of target-side, but not source-side morphological complexity. |
Introduction | Recently, several studies (Bojar, 2007; Avramidis and Koehn, 2009; Ramanathan et al., 2009; Yen-iterzi and Oflazer, 2010) proposed modeling target-side morphology in a phrase-based factored models framework (Koehn and Hoang, 2007). |
Modeling unobserved target inflections | As a consequence of translating into a morphologically rich language, some inflected forms of target words are unobserved in training data and cannot be generated by the decoder under standard phrase-based approaches. |
Abstract | This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system). |
Conclusion and Future Work | We also demonstrate that for Finnish (and possibly other agglutinative languages), phrase-based MT benefits from allowing the translation model access to morphological segmentation yielding productive morphological phrases. |
Conclusion and Future Work | In order to help with replication of the results in this paper, we have run the various morphological analysis steps and created the necessary training, tuning and test data files needed in order to train, tune and test any phrase-based machine translation system with our data. |
Experimental Results | In all the experiments conducted in this paper, we used the Moses5 phrase-based translation system (Koehn et al., 2007), 2008 version. |
Models 2.1 Baseline Models | We then trained the Moses phrase-based system (Koehn et al., 2007) on the segmented and marked text. |
Translation and Morphology | In this work, we propose to address the problem of morphological complexity in an English-to-Finnish MT task within a phrase-based translation framework. |
Abstract | In this paper, instead of designing new features based on intuition, linguistic knowledge and domain, we learn some new and effective features using the deep auto-encoder (DAE) paradigm for phrase-based translation model. |
Experiments and Results | We choose the Moses (Koehn et al., 2007) framework to implement our phrase-based machine system. |
Input Features for DNN Feature Learning | The phrase-based translation model (Koehn et al., 2003; Och and Ney, 2004) has demonstrated superior performance and been widely used in current SMT systems, and we employ our implementation on this translation model. |
Introduction | Instead of designing new features based on intuition, linguistic knowledge and domain, for the first time, Maskey and Zhou (2012) explored the possibility of inducing new features in an unsupervised fashion using deep belief net (DBN) (Hinton et al., 2006) for hierarchical phrase-based trans- |
Introduction | al., 2010), and speech spectrograms (Deng et al., 2010), we propose new feature learning using semi-supervised DAE for phrase-based translation model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Abstract | We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. |
Conclusion and Future Work | We have presented a topic similarity model which incorporates the rule-topic distributions on both the source and target side into traditional hierarchical phrase-based system. |
Experiments | We divide the rules into three types: phrase rules, which only contain terminals and are the same as the phrase pairs in phrase-based system; monotone rules, which contain non-terminals and produce monotone translations; reordering rules, which also contain non-terminals but change the order of translations. |
Introduction | Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution. |
Introduction | We augment the hierarchical phrase-based system by integrating the proposed topic similarity model as a new feature (Section 3.1). |
Introduction | Experiments on Chinese-English translation tasks (Section 6) show that, our method outperforms the baseline hierarchial phrase-based system by +0.9 BLEU points. |
Abstract | This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: l) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. |
Abstract | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Conclusion and Future Work | Experiments on Chinese-English translation show that the reordering approach can significantly improve a state-of-the-art hierarchical phrase-based translation system. |
Introduction | The popular distortion or lexicalized reordering models in phrase-based SMT make good local predictions by focusing on reordering on word level, while the synchronous context free grammars in hierarchical phrase-based (HPB) translation models are capable of handling nonlocal reordering on the translation phrase level. |
Introduction | The general ideas, however, are applicable to other translation models, e.g., phrase-based model, as well. |
Related Work | Last, we also note that recent work on non-syntax-based reorderings in (flat) phrase-based models (Cherry, 2013; Feng et al., 2013) can also be potentially adopted to hpb models. |
Abstract | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. |
Conclusion | We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation. |
Conclusion | Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain. |
Experiments and Results | The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998). |
Experiments and Results | We use GIZA++ model 4 for word alignment, use Moses for phrase-based decoding. |
Introduction | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step. |
Abstract | Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system. |
Bag-of-Words Vector Space Model | In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005). |
Experiments | For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java. |
Hierarchical phrase-based MT system | The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax-based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG). |
Hierarchical phrase-based MT system | Empirically, this method has yielded better performance on language pairs such as Chinese-English than the phrase-based method because it permits phrases with gaps; it generalizes the normal phrase-based models in a way that allows long-distance reordering (Chiang, 2005; Chiang, 2007). |
Introduction | We chose a hierarchical phrase-based SMT system as our baseline; thus, the units involved in computation of sense similarities are hierarchical rules. |
Corpus Data and Baseline SMT | Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric. |
Experimental Setup and Results | The baseline English-to-Iraqi phrase-based SMT system was built as described in Section 3. |
Introduction | In this paper, we describe a novel topic-based adaptation technique for phrase-based statistical machine translation (SMT) of spoken conversations. |
Introduction | Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model. |
Introduction | With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task. |
Experiments | The Czech noun cases which appear only in prepositional phrases were ignored, since they are covered by the phrase-based model. |
Introduction | Our method is based on factored phrase-based statistical machine translation models. |
Introduction | 1.1 Morphology in Phrase-based SMT |
Introduction | 0 Meanwhile, in phrase-based SMT models, words are mapped in chunks. |
Abstract | Statistical phrase-based translation learns translation rules from bilingual corpora, and has traditionally only used monolingual evidence to construct features that rescore existing translation candidates. |
Abstract | Our proposed approach significantly improves the performance of competitive phrase-based systems, leading to consistent improvements between 1 and 4 BLEU points on standard evaluation sets. |
Evaluation | The baseline is a state-of-the-art phrase-based system; we perform word alignment using a lexicalized hidden Markov model, and then the phrase table is extracted using the grow—diag—final heuristic (Koehn et al., 2003). |
Generation & Propagation | 2.5 Phrase-based SMT Expansion |
Introduction | With large amounts of data, phrase-based translation systems (Koehn et al., 2003; Chiang, 2007) achieve state-of-the-art results in many ty-pologically diverse language pairs (Bojar et al., 2013). |
Abstract | This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head—Driven HPB (HD—HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. |
Conclusion | We present a head-driven hierarchical phrase-based (HD-HPB) translation model, which adopts head information (derived through unlabeled dependency analysis) in the definition of non-terminals to better differentiate among translation rules. |
Head-Driven HPB Translation Model | For rule extraction, we first identify initial phrase pairs on word-aligned sentence pairs by using the same criterion as most phrase-based translation models (Och and Ney, 2004) and Chiang’s HPB model (Chiang, 2005; Chiang, 2007). |
Introduction | Chiang’s hierarchical phrase-based (HPB) translation model utilizes synchronous context free grammar (SCFG) for translation derivation (Chiang, 2005; Chiang, 2007) and has been widely adopted in statistical machine translation (SMT). |
Introduction | [ Phrase-based Translation |
Abstract | Comparable to the state-of-the-art phrase-based system Moses, using packed forests in tree-to-tree translation results in a significant absolute improvement of 3.6 BLEU points over using l-best trees. |
Conclusion | Our system also achieves comparable performance with the state-of-the-art phrase-based system Moses. |
Experiments | The absence of such non-syntactic mappings prevents tree-based tree-to-tree models from achieving comparable results to phrase-based models. |
Related Work | They replace l-best trees with packed forests both in training and decoding and show superior translation quality over the state-of-the-art hierarchical phrase-based system. |
A Probabilistic Model for Phrase Table Extraction | If 6 takes the form of a scored phrase table, we can use traditional methods for phrase-based SMT to find P(e|f, 6) and concentrate on creating a model for P(6| (5 , .7: We decompose this posterior probability using Bayes law into the corpus likelihood and parameter prior probabilities |
Abstract | This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. |
Hierarchical ITG Model | and we confirm in the experiments in Section 7, using only minimal phrases leads to inferior translation results for phrase-based SMT. |
Introduction | The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs. |
Introduction | By incorporating the syntactic annotations of parse trees from both or either side(s) of the bitext, they are believed better than phrase-based counterparts in reorderings. |
Introduction | In contrast to conventional tree-to-tree approaches (Ding and Palmer, 2005; Quirk et al., 2005; Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009), which only make use of a single type of trees, our model is able to combine two types of trees, outperforming both phrase-based and tree-to-string systems. |
Introduction | Current tree-to-tree models (Xiong et al., 2007; Zhang et al., 2007; Liu et al., 2009) still have not outperformed the phrase-based system Moses (Koehn et al., 2007) significantly even with the help of forests.1 |
Related Work | This model shows a significant improvement over the state-of-the-art hierarchical phrase-based system (Chiang, 2005). |
Experiments | In SMT training, an in-house hierarchical phrase-based SMT decoder is implemented for our experiments. |
Introduction | For example, translation sense disambiguation approaches (Carpuat and Wu, 2005; Carpuat and Wu, 2007) are proposed for phrase-based SMT systems. |
Introduction | Meanwhile, for hierarchical phrase-based or syntax-based SMT systems, there is also much work involving rich contexts to guide rule selection (He et al., 2008; Liu et al., 2008; Marton and Resnik, 2008; Xiong et al., 2009). |
Related Work | Following this work, (Xiao et al., 2012) extended topic-specific lexicon translation models to hierarchical phrase-based translation models, where the topic information of synchronous rules was directly inferred with the help of document-level information. |
Bootstrapping Phrasal ITG from Word-based ITG | This section introduces a technique that bootstraps candidate phrase pairs for phrase-based ITG from word-based ITG Viterbi alignments. |
Experiments | Finally we ran 10 iterations of phrase-based ITG over the residual charts, using EM or VB, and extracted the Viterbi alignments. |
Summary of the Pipeline | From this alignment, phrase pairs are extracted in the usual manner, and a phrase-based translation system is trained. |
Variational Bayes for ITG | The drawback of maximum likelihood is obvious for phrase-based models. |
Abstract | We also provide new highly effective sentence selection methods that improve AL for phrase-based SMT in the multilingual and single language pair setting. |
Introduction | 0 We introduce new highly effective sentence selection methods that improve phrase-based SMT in the multilingual and single language pair setting. |
Sentence Selection: Multiple Language Pairs | For the single language pair setting, (Haffari et al., 2009) presents and compares several sentence selection methods for statistical phrase-based machine translation. |
Sentence Selection: Single Language Pair | Phrases are basic units of translation in phrase-based SMT models. |
Abstract | Using the resulting alignments for phrase-based translation systems offers no clear insights w.r.t. |
Conclusion | Table 2: Evaluation of phrase-based translation from German to English with the obtained alignments (for 100.000 sentence pairs). |
See e. g. the author’s course notes (in German), currently | In addition we evaluate the effect on phrase-based translation on one of the tasks. |
See e. g. the author’s course notes (in German), currently | We also check the effect of the various alignments (all produced by RegAligner) on translation performance for phrase-based translation, randomly choosing translation from German to English. |
Experiments | Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model (LM), a phrase translation model and a word-based lexicon model. |
Introduction | Within the phrase-based SMT framework there are mainly three stages where improved reordering could be integrated: In the preprocessing: the source sentence is reordered by heuristics, so that the word order of source and target sentences is similar. |
Introduction | In this way, syntax information can be incorporated into phrase-based SMT systems. |
Translation System Overview | In this paper, the phrase-based machine translation system |
Collaborative Decoding | Similar to a typical phrase-based decoder (Koehn, 2004), we associate each hypothesis with a coverage vector C to track translated source words in it. |
Collaborative Decoding | But to be a general framework, this step is necessary for some state-of-the-art phrase-based decoders (Koehn, 2007; Och and Ney, 2004) because in these decoders, hypotheses with different coverage vectors can coeXist in the same bin, or hypotheses associated with the same coverage vector might appear in different bins. |
Experiments | The first one (SYS 1) is re-implementation of Hiero, a hierarchical phrase-based decoder. |
Background 2.1 Terminology | In MT, spurious ambiguity occurs both in regular phrase-based systems (e.g., Koehn et al. |
Background 2.1 Terminology | Figure l: Segmentation ambiguity in phrase-based MT: two different segmentations lead to the same translation string. |
Background 2.1 Terminology | It can be used to encode exponentially many hypotheses generated by a phrase-based MT system (e.g., Koehn et al. |
Integration of inflection models with MT systems | For the phrase-based system, we generated the annotations needed by first parsing the source sentence e, aligning the source and candidate translations with the word-alignment model used in training, and projected the dependency tree to the target using the algorithm of (Quirk et al., 2005). |
Machine translation systems and data | We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems. |
Related work | In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system. |
Conclusions | We integrate our method into a state-of-the-art phrase-based baseline translation system, i.e., Moses (Koehn et al., 2007), and show that the integrated system consistently improves the performance of the baseline system on various NIST machine translation test sets. |
Experimental Results | Using the toolkit Moses (Koehn et al., 2007), we built a phrase-based baseline system by following |
Experimental Results | This is analogous to the concept of “phrase” in phrase-based MT. |
Introduction | Long-distance dependencies are common, and this creates problems for both RBMT and SMT systems (especially for phrase-based ones). |
Results | 081 Phrase-based SMT |
Results | 082 Phrase-based SMT |
Introduction | With phrase-based clustering, “Land of Odds” is grouped with many names that are labeled as company names, which is a strong indication that it is a company name as well. |
Introduction | The disambiguation power of phrases is also evidenced by the improvements of phrase-based machine translation systems (Koehn et. |
Introduction | We demonstrate the advantages of phrase-based clusters over word-based ones with experimental results from two distinct application domains: named entity recognition and query classification. |
Abstract | In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions. |
Conclusions | This work introduces two extensions to the well-known beam search algorithm for phrase-based machine translation. |
Introduction | Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011). |
Experiments | These input are simplified using our simplification system namely, the DRS-SM model and the phrase-based machine translation system (Section 3.2). |
Experiments | These four sentences are directly sent to the phrase-based machine translation system to produce simplified sentences. |
Simplification Framework | The DRS associated with the final M-node D fin is then mapped to a simplified sentence s’fm which is further simplified using the phrase-based machine translation system to produce the final simplified sentence ssimple. |
Methods | In this section, we discuss how a lattice from a multi-stack phrase-based decoder such as Moses (Koehn et al., 2007) can be desegmented to enable word-level features. |
Methods | A phrase-based decoder produces its output from left to right, with each operation appending the translation of a source phrase to a growing target hypothesis. |
Methods | The search graph of a phrase-based decoder can be interpreted as a lattice, which can be interpreted as a finite state acceptor over target strings. |
Abstract | We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems, and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. |
Phrase-based machine translation | The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year. |
Word alignment results | Unfortunately, as was shown by Fraser and Marcu (2007) AER can have weak correlation with translation performance as measured by BLEU score (Pa-pineni et al., 2002), when the alignments are used to train a phrase-based translation system. |
Experiment | We use the first three syntax-based systems (TT2S, TTS2S, FT2S) and Moses (Koehn et al., 2007), the state-of-the-art phrase-based system, as our baseline systems. |
Forest-based tree sequence to string model | We use seven basic features that are analogous to the commonly used features in phrase-based systems (Koehn, 2003): l) bidirectional rule mapping probabilities, 2) bidirectional lexical rule translation probabilities, 3) target language model, 4) number of rules used and 5) number of target words. |
Related work | Motivated by the fact that non-syntactic phrases make nontrivial contribution to phrase-based SMT, the tree sequence-based translation model is proposed (Liu et al., 2007; Zhang et al., 2008a) that uses tree sequence as the basic translation unit, rather than using single subtree as in the STSG. |