Index of papers in Proc. ACL 2013 that mention
  • phrase-based
Cohn, Trevor and Haffari, Gholamreza
Abstract
Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora.
Abstract
This paper presents a novel method for inducing phrase-based translation units directly from parallel data, which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Analysis
We have presented a novel method for leam-ing a phrase-based model of translation directly from parallel data which we have framed as leam-ing an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Experiments
As a baseline, we train a phrase-based model using the moses toolkit12 based on the word alignments obtained using GIZA++ in both directions and symmetrized using the grow-diag-final-and heuristic13 (Koehn et al., 2003).
Introduction
The phrase-based approach (Koehn et al., 2003) to machine translation (MT) has transformed MT from a narrow research topic into a truly useful technology to end users.
Introduction
Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data, from
Introduction
Firstly, many phrase-based phenomena which do not decompose into word translations (e.g., idioms) will be missed, as the underlying word-based alignment model is unlikely to propose the correct alignments.
Related Work
A number of other approaches have been developed for learning phrase-based models from bilingual data, starting with Marcu and Wong (2002) who developed an extension to IBM model 1 to handle multi-word units.
phrase-based is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Abstract
However phrase-based approaches are much less able to model sentence level effects between different phrase-pairs.
Experiments
However in this paper we limit our focus to inducing word alignments, i.e., by using the model to infer alignments which are then used in a standard phrase-based translation pipeline.
Experiments
We leave full decoding for later work, which we anticipate would further improve performance by exploiting gapping phrases and other phenomena that implicitly form part of our model but are not represented in the phrase-based decoder.
Introduction
Recent years have witnessed burgeoning development of statistical machine translation research, notably phrase-based (Koehn et al., 2003) and syntax-based approaches (Chiang, 2005; Galley et al., 2006; Liu et al., 2006).
Introduction
These approaches model sentence translation as a sequence of simple translation decisions, such as the application of a phrase translation in phrase-based methods or a grammar rule in syntax-based approaches.
Introduction
This conflicts with the intuition behind phrase-based MT, namely that translation decisions should be dependent on con-
Model
We consider a process in which the target string is generated using a left-to-right order, similar to the decoding strategy used by phrase-based machine translation systems (Koehn et al., 2003).
Model
In contrast to phrase-based models, we use words as our basic translation unit, rather than multi-word phrases.
Related Work
This idea has been developed explicitly in a number of previous approaches, in grammar based (Chiang, 2005) and phrase-based systems (Galley and Manning, 2010).
phrase-based is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Abstract
This paper proposes new distortion models for phrase-based SMT.
Distortion Model for Phrase-Based SMT
A Moses-style phrase-based SMT generates target hypotheses sequentially from left to right.
Introduction
To address this problem, there has been a lot of research done into word reordering: lexical reordering model (Tillman, 2004), which is one of the distortion models, reordering constraint (Zens et al., 2004), pre-ordering (Xia and Mc-Cord, 2004), hierarchical phrase-based SMT (Chiang, 2007), and syntax-based SMT (Yamada and Knight, 2001).
Introduction
Phrase-based SMT (Koehn et al., 2007) is a widely used SMT method that does not use a parser.
Introduction
Phrase-based SMT mainly1 estimates word reordering using distortion models2.
phrase-based is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
We introduce a shift-reduce parsing algorithm for phrase-based string-to-dependency translation.
Abstract
As our approach combines the merits of phrase-based and string-to-dependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.
Introduction
Modern statistical machine translation approaches can be roughly divided into two broad categories: phrase-based and syntax-based.
Introduction
Phrase-based approaches treat phrase, which is usually a sequence of consecutive words, as the basic unit of translation (Koehn et al., 2003; Och and Ney, 2004).
Introduction
As phrases are capable of memorizing local context, phrase-based approaches excel at handling local word selection and reordering.
phrase-based is mentioned in 32 sentences in this paper.
Topics mentioned in this paper:
Nguyen, ThuyLinh and Vogel, Stephan
Abstract
Hiero translation models have two limitations compared to phrase-based models: 1) Limited hypothesis space; 2) No lexicalized reordering model.
Abstract
Phrasal-Hiero still has the same hypothesis space as the original Hiero but incorporates a phrase-based distance cost feature and lexicalized reodering features into the chart decoder.
Abstract
The work consists of two parts: 1) for each Hiero translation derivation, find its corresponding discontinuous phrase-based path.
Introduction
Phrase-based and tree-based translation model are the two main streams in state-of-the-art machine translation.
Introduction
Yet, tree-based translation often underperforms phrase-based translation in language pairs with short range reordering such as Arabic-English translation (Zollmann et al., 2008; Birch et al., 2009).
Introduction
(2003) for our phrase-based system and Chiang (2005) for our Hiero system.
phrase-based is mentioned in 69 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Abstract
In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an au-tomatically-induced translation lexicon or a manually-edited translation dictionary.
Experiments
We construct two kinds of phrase-based models using Moses (Koehn et al., 2007): one uses out-of-domain data and the other uses in-domain data.
Introduction
Novel translation models, such as phrase-based models (Koehn et a., 2007), hierarchical phrase-based models (Chiang, 2007) and linguistically syntax-based models (Liu et a., 2006; Huang et al., 2006; Galley, 2006; Zhang et a1, 2008; Chiang, 2010; Zhang et al., 2011; Zhai et al., 2011, 2012) have been proposed and achieved higher and higher translation performance.
Introduction
Finally, they used the learned translation model directly to translate unseen data (Ravi and Knight, 2011; Nuhn et al., 2012) or incorporated the learned bilingual lexicon as a new in-domain translation resource into the phrase-based model which is trained with out-of-domain data to improve the domain adaptation performance in machine translation (Dou and Knight, 2012).
Introduction
level translation rules and learn a phrase-based model from the monolingual corpora.
Phrase Pair Refinement and Parameterization
It is well known that in the phrase-based SMT there are four translation probabilities and the reordering probability for each phrase pair.
Phrase Pair Refinement and Parameterization
The translation probabilities in the traditional phrase-based SMT include bidirectional phrase translation probabilities and bidirectional lexical weights.
Probabilistic Bilingual Lexicon Acquisition
In order to assign probabilities to each entry, we apply the Corpus Translation Probability which used in (Wu et al., 2008): given an in-domain source language monolingual data, we translate this data with the phrase-based model trained on the out-of-domain News data, the in-domain lexicon and the in-domain target language monolingual data (for language model estimation).
Related Work
For the target-side monolingual data, they just use it to train language model, and for the source-side monolingual data, they employ a baseline (word-based SMT or phrase-based SMT trained with small-scale bitext) to first translate the source sentences, combining the source sentence and its target translation as a bilingual sentence pair, and then train a new phrase-base SMT with these pseudo sentence pairs.
phrase-based is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Abstract
Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT.
Conclusion and Future Work
Unlike the previous pipeline approaches, which directly merge TM phrases into the final translation result, we integrate TM information of each source phrase into the phrase-based SMT at decoding.
Experiments
For the phrase-based SMT system, we adopted the Moses toolkit (Koehn et al., 2007).
Experiments
conducted using the Moses phrase-based decoder (Koehn et al., 2007).
Introduction
Statistical machine translation (SMT), especially the phrase-based model (Koehn et al., 2003), has developed very fast in the last decade.
Introduction
On a Chinese—English computer technical documents TM database, our experiments have shown that the proposed Model-III improves the translation quality significantly over either the pure phrase-based SMT or the TM systems when the fuzzy match score is above 0.4.
Problem Formulation
Compared with the standard phrase-based machine translation model, the translation problem is reformulated as follows (only based on the best TM, however, it is similar for multiple TM sentences):
Problem Formulation
mula (3) is just the typical phrase-based SMT model, and the second factor P(Mk|Lk, 2:) (to be specified in the Section 3) is the information derived from the TM sentence pair.
Problem Formulation
Therefore, we can still keep the original phrase-based SMT model and only pay attention to how to extract
phrase-based is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Andreas, Jacob and Vlachos, Andreas and Clark, Stephen
Experimental setup
Implementation In all experiments, we use the IBM Model 4 implementation from the GIZA++ toolkit (Och and Ney, 2000) for alignment, and the phrase-based and hierarchical models implemented in the Moses toolkit (Koehn et a1., 2007) for rule extraction.
Introduction
Our contributions are as follows: We develop a semantic parser using off-the-shelf MT components, exploring phrase-based as well as hierarchical models.
MT—based semantic parsing
We consider a phrase-based translation model (Koehn et al., 2003) and a hierarchical translation model (Chiang, 2005).
MT—based semantic parsing
Rules for the phrase-based model consist of pairs of aligned source and target sequences, while hierarchical rules are SCFG productions containing at most two instances of a single nonterminal symbol.
Related Work
The present work is also the first we are aware of which uses phrase-based rather than tree-based machine translation techniques to learn a semantic parser.
Related Work
multilevel rules composed from smaller rules, a process similar to the one used for creating phrase tables in a phrase-based MT system.
Results
We first compare the results for the two translation rule extraction models, phrase-based and hierar-chica1 (“MT-phrase” and “MT-hier” respectively in Table 1).
phrase-based is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Introduction
1We define translation units as phrases in phrase-based SMT, and as translation rules in syntax-based SMT.
Introduction
Specifically, the popular distortion or lexicalized reordering models in phrase-based SMT focus only on making good local prediction (i.e.
Introduction
Even though the experimental results carried out in this paper employ SCFG-based SMT systems, we would like to point out that our models is applicable to other systems including phrase-based SMT systems.
Maximal Orientation Span
3We use hierarchical phrase-based translation system as a case in point, but the merit is generalizable to other systems.
Maximal Orientation Span
Additionally, this illustration also shows a case where MOS acts as a cross-boundary context which effectively relaxes the context-free assumption of hierarchical phrase-based formalism.
Related Work
Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007).
Related Work
Our MOS concept is also closely related to hierarchical reordering model (Galley and Manning, 2008) in phrase-based decoding, which computes 0 of b with respect to a multi-block unit that may go beyond 19’.
phrase-based is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Corpus Data and Baseline SMT
Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
Experimental Setup and Results
The baseline English-to-Iraqi phrase-based SMT system was built as described in Section 3.
Introduction
In this paper, we describe a novel topic-based adaptation technique for phrase-based statistical machine translation (SMT) of spoken conversations.
Introduction
Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
Introduction
With this approach, we demonstrate significant improvements over a baseline phrase-based SMT system as measured by BLEU, TER and NIST scores on an English-to-Iraqi CSLT task.
phrase-based is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Experiments
Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model (LM), a phrase translation model and a word-based lexicon model.
Introduction
Within the phrase-based SMT framework there are mainly three stages where improved reordering could be integrated: In the preprocessing: the source sentence is reordered by heuristics, so that the word order of source and target sentences is similar.
Introduction
In this way, syntax information can be incorporated into phrase-based SMT systems.
Translation System Overview
In this paper, the phrase-based machine translation system
phrase-based is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Schoenemann, Thomas
Abstract
Using the resulting alignments for phrase-based translation systems offers no clear insights w.r.t.
Conclusion
Table 2: Evaluation of phrase-based translation from German to English with the obtained alignments (for 100.000 sentence pairs).
See e. g. the author’s course notes (in German), currently
In addition we evaluate the effect on phrase-based translation on one of the tasks.
See e. g. the author’s course notes (in German), currently
We also check the effect of the various alignments (all produced by RegAligner) on translation performance for phrase-based translation, randomly choosing translation from German to English.
phrase-based is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Braslavski, Pavel and Beloborodov, Alexander and Khalilov, Maxim and Sharoff, Serge
Introduction
Long-distance dependencies are common, and this creates problems for both RBMT and SMT systems (especially for phrase-based ones).
Results
081 Phrase-based SMT
Results
082 Phrase-based SMT
phrase-based is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: