Abstract | We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. |
Abstract | As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system . |
Experiments on Phrase-Based SMT | We use FBIS corpus to train the Chinese-to-English SMT systems . |
Experiments on Word Alignment | To train a Chinese-to-English SMT system , we need to perform both Chinese-to-English and |
Improving Phrase Table | Phrase-based SMT system automatically extracts bilingual phrase pairs from the word aligned bilingual corpus. |
Improving Phrase Table | These collocation probabilities are incorporated into the phrase-based SMT system as features. |
Improving Statistical Bilingual Word Alignment | This method ignores the correlation of the words in the same alignment unit, so an alignment may include many unrelated words2, which influences the performances of SMT systems . |
Introduction | 1993) is the base of most SMT systems . |
Introduction | Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT. |
Introduction | Then the phrase collocation probabilities are used as additional features in phrase-based SMT systems . |
Abstract | To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. |
Background | where Pr(e| f) is the probability that e is the translation of the given source string f. To model the posterior probability Pr(e| f) , most of the state-of-the-art SMT systems utilize the log-linear model proposed by Och and Ney (2002), as follows, |
Background | In this paper, u denotes a log-linear model that has Mfixed features {h1(f,e), ..., hM(f,e)}, ,1 = {3.1, ..., AM} denotes the M parameters of u, and u(/1) denotes a SMT system based on u with parameters ,1. |
Background | Suppose that there are T available SMT systems {u1(/1*1), ..., uT(/1*T)}, the task of system combination is to build a new translation system v(u1(/l*1), mm?ยป from mm), mfg}. |
Introduction | With the emergence of various structurally different SMT systems, more and more studies are focused on combining multiple SMT systems for achieving higher translation accuracy rather than using a single translation system. |
Introduction | One of the key factors in SMT system combination is the diversity in the ensemble of translation outputs (Macherey and Och, 2007). |
Introduction | However, this requirement cannot be met in many cases, since we do not always have the access to multiple SMT engines due to the high cost of developing and tuning SMT systems . |
Features | Given a source sentence f, let {enHV be the N -best list generated by an SMT system , and let ef, is the i-th word in en. |
Introduction | Automatically distinguishing incorrect parts from correct parts is therefore very desirable not only for post-editing and interactive machine translation (Ueffing and Ney, 2007) but also for SMT itself: either by rescoring hypotheses in the N-best list using the probability of correctness calculated for each hypothesis (Zens and Ney, 2006) or by generating new hypotheses using N -best lists from one SMT system or multiple sys- |
Introduction | 1) Calculate features that express the correctness of words either based on SMT model (e. g. translatiorfllanguage model) or based on SMT system output (e.g. |
Introduction | However, it is not adequate to just use these features as discussed in (Shi and Zhou, 2005) because the information that they carry is either from the inner components of SMT systems or from system outputs. |
Introduction | Further experiments in machine translation also suggest that the obtained subtree alignment can improve the performance of both phrase and syntax based SMT systems . |
Substructure Spaces for BTKs | Most linguistically motivated syntax based SMT systems require an automatic parser to perform the rule induction. |
Substructure Spaces for BTKs | We explore the effectiveness of subtree alignment for both phrase based and linguistically motivated syntax based SMT systems . |
Substructure Spaces for BTKs | The findings suggest that with the modeling of non-syntactic phrases maintained, more emphasis on syntactic phrases can benefit both the phrase and syntax based SMT systems . |
Experimental Evaluation | The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model. |
Introduction | A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003). |
Introduction | The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system . |