Abstract | We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems. |
Inflection prediction models | The dependency structure on the Russian side, indicated by solid arcs, is given by a treelet MT system (see Section 4.1), projected from the word dependency struc- |
Integration of inflection models with MT systems | The methods differ in the extent to which the factoring of the problem into two subprob-lems — predicting stems and predicting inflections — is reflected in the base MT systems . |
Integration of inflection models with MT systems | In the first method, the MT system is trained to produce fully inflected target words and the inflection model can change the inflections. |
Introduction | (Finkel et al., 2006), and in some cases, to factor the translation problem so that the baseline MT system can take advantage of the reduction in sparsity by being able to work on word stems. |
Machine translation systems and data | This is a syntactically-informed MT system , designed following (Quirk et al., 2005). |
Machine translation systems and data | For each language pair, we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation. |
Machine translation systems and data | All MT systems for a given language pair used the same datasets. |
Related work | In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system. |
Related work | This also makes the model portable and applicable to different types of MT systems . |
Related work | In contrast, we focus on methods of integration of an inflection prediction model with an MT system , and on evaluation of the model’s impact on translation. |
AL-SMT: Multilingual Setting | Our goal is to add a new language to this corpus, and at the same time to construct high quality MT systems from the existing languages (in the multilingual corpus) to the new language. |
Abstract | We introduce an active learning task of adding a new language to an existing multilingual set of parallel text and constructing high quality MT systems , from each language in the collection into this new target language. |
Introduction | In this paper, we consider how to use active learning (AL) in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language. |
Introduction | In this paper, we explore how multiple MT systems can be used to effectively pick instances that are more likely to improve training quality. |
Introduction | When we build multiple MT systems from multiple source languages to the new target language, each MT system can be seen as a different ‘view’ on the desired output translation. |
Abstract | Our best result improves over the best single MT system baseline by 1.0% BLEU and over a strong system selection baseline by 0.6% BLEU on a blind test set. |
Introduction | In this paper we study the use of sentence-level dialect identification together with various linguistic features in optimizing the selection of outputs of four different MT systems on input text that includes a mix of dialects. |
Introduction | Our best system selection approach improves over our best baseline single MT system by 1.0% absolute BLEU point on a blind test set. |
Related Work | Sawaf (2010) and Salloum and Habash (2013) used hybrid solutions that combine rule-based algorithms and resources such as leXicons and morphological analyzers with statistical models to map DA to MSA before using MSA-to-English MT systems . |
Related Work | In this paper we use four MT systems that translate from DA to English in different ways. |
Related Work | Our fourth MT system uses ELISSA, the DA-to-MSA MT tool by Salloum and Habash (2013), to produce an MSA pivot. |
Adaptive MT Quality Estimation | The source side of the QE training data Sq is combined with the input document Sd for MT system training data subsampling. |
Adaptive MT Quality Estimation | Once the document-specific MT system is trained, we use it to translate both the input document and the source QE training data, obtaining the translation Td and |
Adaptive MT Quality Estimation | As the QE model is adaptively retrained for each document-specific MT system , its prediction is more accurate and consistent. |
Document-specific MT System | Building a general MT system using all the parallel data not only produces a huge translation model (unless with very aggressive pruning), the performance on the given input document is suboptimal due to the unwanted dominance of out-of-domain data. |
Document-specific MT System | The document-specific system is built based on sub-sampling: from the parallel corpora we select sentence pairs that are the most similar to the sentences from the input document, then build the MT system with the sub-sampled sentence pairs. |
Introduction | Depending on the difficulty of the input sentences (sentence length, OOV words, complex sentence structures and the coverage of the MT system’s training data), some translation outputs can be perfect, while others are ungrammatical, missing important words or even totally garbled. |
Introduction | This shortcoming is one of the main obstacles for the adoption of MT systems , especially in machine assisted human translation: MT post-editing, where human translators have an option to edit MT proposals or translate from scratch. |
Introduction | In section 3 we will introduce the document-specific MT system built for post-editing. |
Static MT Quality Estimation | However for the post-editing task, we argue that it could also be cast as a classification problem: MT system |
Static MT Quality Estimation | We build a document-specific MT system to translate this document, then ask human translator to correct the translation output. |
Clustering for Cross Lingual Sentiment Analysis | If a language is truly resource scarce, it is mostly unlikely to have an MT system . |
Clustering for Cross Lingual Sentiment Analysis | Given that sentiment analysis is a less resource intensive task compared to machine translation, the use of an MT system is hard to justify for performing |
Conclusion and Future Work | For CLSA, clusters linked together using unlabelled parallel corpora do away with the need of translating labelled corpora from one language to another using an intermediary MT system or bilingual dictionary. |
Conclusion and Future Work | Further, this approach was found to be useful in cases where there are no MT systems to perform CLSA and the language of analysis is truly resource scarce. |
Conclusion and Future Work | Thus, wider implication of this study is that many widely spoken yet resource scare languages like Pashto, Sundanese, Hausa, Gujarati and Punjabi which do not have an MT system could now be analysed for sentiment. |
Discussions | A note on CLSA for truly resource scarce languages: Note that there is no publicly available MT system for English to Marathi. |
Introduction | However, many languages which are truly resource scarce, do not have an MT system or existing MT systems are not ripe to be used for CLSA (Balamurali et al., 2013). |
Introduction | No MT systems or bilingual dictionaries are used for this study. |
Related Work | Given the subtle and different ways the sentiment can be expressed which itself manifested as a result of cultural diversity amongst different languages, an MT system has to be of a superior quality to capture them. |
Abstract | We here extend lattice-based MERT and MBR algorithms to work with hypergraphs that encode a vast number of translations produced by MT systems based on Synchronous Context Free Grammars. |
Discussion | On hypergraphs produced by Hierarchical and Syntax Augmented MT systems , our MBR algorithm gives a 7X speedup relative to 1000-best MBR while giving comparable or even better performance. |
Discussion | We believe that our efficient algorithms will make them more widely applicable in both SCFG—based and phrase-based MT systems . |
Experiments | 6.2 MT System Description |
Experiments | Our phrase-based statistical MT system is similar to the alignment template system described in (Och and Ney, 2004; Tromble et al., 2008). |
Experiments | We also train two SCFG—based MT systems : a hierarchical phrase-based SMT (Chiang, 2007) system and a syntax augmented machine translation (SAMT) system using the approach described in Zollmann and Venugopal (2006). |
Introduction | In this paper, we extend MERT and MBR decoding to work on hypergraphs produced by SCFG—based MT systems . |
Minimum Bayes-Risk Decoding | MBR decoding for translation can be performed by reranking an N -best list of hypotheses generated by an MT system (Kumar and Byme, 2004). |
Minimum Bayes-Risk Decoding | We next extend the Lattice MBR decoding algorithm (Algorithm 3) to rescore hypergraphs produced by a SCFG based MT system . |
Methods 5.1 5W Systems | All three annotators were native English speakers who were not system developers for any of the SW systems that were being evaluated (to avoid biased grading, or assigning more blame to the MT system ). |
Methods 5.1 5W Systems | If the SW system picked an incorrectly translated argument (e. g., “baked a moon” instead of “baked a cake”), then the error would be assigned to the MT system . |
Results | Long-distance phrase movement is a common problem in Chinese-English MT, and many MT systems try to handle it (e. g., Wang et al. |
Results | Since MT systems are tuned for word-based overlap measures (such as BLEU), verb deletion is penalized equally as, for example, determiner deletion. |
SW System | In this section, we describe the individual systems that we evaluated, the combination strategy, the parsers that we tuned for the task, and the MT systems . |
SW System | Finally, Chinese-align used the alignments of three separate MT systems to translate the 5Ws: a phrase-based system, a hierarchical phrase-based system, and a syntax augmented hierarchical phrase-based system. |
SW System | Since the predicate is essential, it tried to detect when verbs were deleted in MT, and back-off to a different MT system . |
Gaussian Process Regression | In our quality estimation experiments we consider as metadata the MT system which produced the translation, and the identity of the source sentence being translated. |
Introduction | In this paper we model the task of predicting the quality of sentence translations using datasets that have been annotated by several judges with different levels of expertise and reliability, containing translations from a variety of MT systems and on a range of different types of sentences. |
Multitask Quality Estimation 4.1 Experimental Setup | Partitioning the data by annotator (,uA) gives the best baseline result, while there is less information from the MT system or sentence identity. |
Multitask Quality Estimation 4.1 Experimental Setup | The multitask learning methods performed best when using the annotator identity as the task descriptor, and less well for the MT system and sentence pair, where they only slightly improved over the baseline. |
Quality Estimation | Examples of applications of QE include improving post-editing efficiency by filtering out low quality segments which would require more effort and time to correct than translating from scratch (Specia et al., 2009), selecting high quality segments to be published as they are, without post-editing (Soricut and Echihabi, 2010), selecting a translation from either an MT system or a translation memory for post-editing (He et al., 2010), selecting the best translation from multiple MT systems (Specia et al., 2010), and highlighting subsegments that need revision (Bach et al., 2011). |
Quality Estimation | o It is often desirable to include alternative translations of source sentences produced by multiple MT systems , which requires multiple annotators for unbiased judgements, particularly for labels such as post-editing time (a translation seen a second time will require less editing effort). |
Quality Estimation | It contains 299 English sentences translated into Spanish using two or more of eight MT systems randomly selected from all system submissions for WMT11 (Callison-Burch et al., 2011). |
Abstract | We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system . |
Experimental Results | In the Chinese-to-English MT experiments, we test two state-of-the-art MT systems . |
Experimental Results | The MT systems are optimized with pairwise ranking optimization (Hopkins and May, 2011) to maximize BLEU (Papineni et al., 2002). |
Integrating Empty Categories in Machine Translation | We conducted some initial error analysis on our MT system output and found that most of the errors that are related to ECs are due to the missing *pro* and *PRO*. |
Integrating Empty Categories in Machine Translation | For example, for a hierarchical MT system , some phrase pairs and Hiero (Chiang, 2005) rules can be extracted with recovered *pro* and *PRO* at the Chinese side. |
Integrating Empty Categories in Machine Translation | In this work we also take advantages of the augmented Chinese parse trees (with ECs projected to the surface) and extract tree-to-string grammar (Liu et al., 2006) for a tree-to-string MT system . |
Related Work | We directly take advantage of the augmented parse trees in the tree-to-string grammar, which could have larger impact on the MT system performance. |
Baseline MT | As our baseline, we apply a high-performing Chinese-English MT system (Zheng, 2008; Zheng et al., 2009) based on hierarchical phrase-based translation framework (Chiang, 2005). |
Experiments | For example, the baseline MT system mistakenly translated a person name “3% 21%? |
Experiments | For example, the following sentence: “§K%%Efifi§i$fi%§§ infill, EEWHWZ’Q (Gao Meimei’s strength really is formidable, I really admire her)” was mistakenly translated into “Gao the strength of the America and the America also really strong , ah , really admire her” by the baseline MT system because the person name “33%;: (Gaomeimei)” was mistakenly segmented into three words “$3 (Gao)”, “% (the America)” and “% (the America)”. |
Experiments | Furthermore, we calculated three Pearson product-moment correlation coefficients between human judgment scores and name-aware BLEU scores of these two MT systems . |
Introduction | A typical statistical MT system can only translate 60% person names correctly (Ji et al., 2009). |
Related Work | Two types of humble strategies were previously attempted to build name translation components which operate in tandem and loosely integrate into conventional statistical MT systems: |
Related Work | Preprocessing: identify names in the source texts and propose name translations to the MT system ; the name translation results can be simply but aggressively transferred from the source to the target side using word alignment, or added into phrase table in order to |
Related Work | Some statistical MT systems (e.g. |
Discussion | Our results validate the hypothesis that it is possible to adapt an ordinary MT system into a working semantic parser. |
Introduction | Indeed, successful semantic parsers often resemble MT systems in several important respects, including the use of word alignment models as a starting point for rule extraction (Wong and Mooney, 2006; Kwiatkowski et al., 2010) and the use of automata such as tree transducers (Jones et al., 2012) to encode the relationship between NL and MRL. |
MT—based semantic parsing | Language modeling In addition to translation rules learned from a parallel corpus, MT systems also rely on an n-gram language model for the target language, estimated from a (typically larger) monolingual corpus. |
Related Work | UBL, like an MT system (and unlike most of the other systems discussed in this section), extracts rules at multiple levels of granularity by means of this splitting and unification procedure. |
Related Work | multilevel rules composed from smaller rules, a process similar to the one used for creating phrase tables in a phrase-based MT system . |
Results | In the results shown in Table 1 we observe that on English GeoQuery data, the hierarchical translation model achieves scores competitive with the state of the art, and in every language one of the MT systems achieves accuracy at least as good as a purpose-built semantic parser. |
Results | While differences in implementation and factors like programming language choice make a direct comparison of times necessarily imprecise, we note that the MT system takes less than three minutes to train on the GeoQuery corpus, while the publicly-available implementations of tsVB and UBL require roughly twenty minutes and five hours respectively on a 2.1 GHz CPU. |
Features and Training | Before elaborating the details of how the actual graph is constructed, we would like to first introduce how the graph-based translation consensus can be used in an MT system . |
Features and Training | When graph-based consensus is applied to an MT system , the graph will have nodes for training data, development (dev) data, and test data (details in Section 5). |
Graph-based Translation Consensus | Our MT system with graph-based translation consensus adopts the conventional log-linear model. |
Introduction | The principle of consensus can be sketched as “a translation candidate is deemed more plausible if it is supported by other translation candidates.” The actual formulation of the principle depends on whether the translation candidate is a complete sentence or just a span of it, whether the candidate is the same as or similar to the supporting candidates, and whether the supporting candidates come from the same or different MT system . |
Introduction | Others extend consensus among translations from the same MT system to those from different MT systems . |
Introduction | For the source (Chinese) span “fig 73 H T 57 3x91 the MT system produced the correct translation for the second sentence, but it failed to do so for the first one. |
Conclusions and future plans | This was the first attempt at making proper quantitative and qualitative evaluation of the English—>Russian MT systems . |
Conclusions and future plans | We have made the corpus comprising the source sentences, their human translations, translations by participating MT systems and the human evaluation data publicly available.8 |
Corpus preparation | We chose to retain the entire texts in the corpus rather than individual sentences, since some MT systems may use information beyond isolated sentences. |
Evaluation methodology | In our case the assessors were asked to make a pairwise comparison of two sentences translated by two different MT systems against a gold standard translation. |
Introduction | One of the main challenges in developing MT systems for Russian and for evaluating them is the need to deal with its free word order and complex morphology. |
Experiments | A Chinese-English and an English-Chinese MT system are trained on (C0, E0). |
Forward-Translation vs. Back-Translation | The only requirement is that the MT system needs to be bidirectional. |
Forward-Translation vs. Back-Translation | The procedure includes translating a text into certain foreign language with the MT system (Forward-Translation), and translating it back into the original language with the same system (Back-Translation). |
Forward-Translation vs. Back-Translation | Two possible reasons may explain this phenomenon: (l) in the first round of translation T 0 9 S1, some target word orders are reserved due to the reordering failure, and these reserved orders lead to a better result in the second round of translation; (2) the text generated by an MT system is more likely to be matched by the reversed but homologous MT system . |
Related Work | (2010) captures the structures implicitly by training an MT system on (SO, $1) and “translates” the SMT input to an MT-favored expression. |
Abstract | This paper presents PORTl, a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems . |
Abstract | We compare PORT-tuned MT systems to BLEU-tuned baselines in five experimental conditions involving four language pairs. |
Conclusions | Most important, our results show that PORT-tuned MT systems yield better translations than BLEU-tuned systems on several language pairs, according both to automatic metrics and human evaluations. |
Introduction | First, there is no evidence that any other tuning metric yields better MT systems . |
Introduction | In this work, our goal is to devise a metric that, like BLEU, is computationally cheap and language-independent, but that yields better MT systems than BLEU when used for tuning. |
Machine Translation as a Decipherment Task | MT Systems: We build and compare different MT systems under two training scenarios: |
Machine Translation as a Decipherment Task | Evaluation: All the MT systems are run on the Spanish test data and the quality of the resulting English translations are evaluated using two different measures—(1) Normalized edit distance score (Navarro, 2001),6 and (2) BLEU (Papineni et |
Machine Translation as a Decipherment Task | Results: Figure 3 compares the results of various MT systems (using parallel versus decipherment training) on the two test corpora in terms of edit distance scores (a lower score indicates closer match to the gold translation). |
Models 2.1 Baseline Models | Table 1 shows how morphemes are being used in the MT system . |
Models 2.1 Baseline Models | The first phase of our morphology prediction model is to train a MT system that produces morphologically simplified word forms in the target language. |
Models 2.1 Baseline Models | MT System Alignment: |
Background 2.1 Terminology | It can be used to encode exponentially many hypotheses generated by a phrase-based MT system (e.g., Koehn et al. |
Background 2.1 Terminology | (2003)) or a syntax-based MT system (e.g., Chiang (2007)). |
Background 2.1 Terminology | To approximate the intractable decoding problem of (2), most MT systems (Koehn et al., 2003; Chiang, 2007) use a simple Viterbi approximation, |
Introduction | They recover additional latent variables—so-called nuisance variables—that are not of interest to the user.1 For example, though machine translation (MT) seeks to output a string, typical MT systems (Koehn et al., 2003; Chiang, 2007) |
Variational Approximate Decoding | For each input sentence c, we assume that a baseline MT system generates a hypergraph HG(cc) that compactly encodes the derivation set D(cc) along with a score for each d E D(9c),5 which we interpret as p(y, d | c) (or proportional to it). |
Abstract | Recent work has shown success in using neural network language models (NNLMs) as features in MT systems . |
Model Variations | > 5 MT System |
Model Variations | In this section, we describe the MT system used in our experiments. |
Model Variations | Because of this, the baseline BLEU scores are much higher than a typical MT system — especially a real-time, production engine which must support many language pairs. |
Experiments | MT System We develop a machine translation baseline as follows. |
Experiments | MT System ADD single |
Experiments | As expected, the MT system slightly outperforms our models on most language pairs. |
A Class-based Model of Agreement | However, we conduct CRF inference in tandem with the translation decoding procedure (§3), creating an environment in which subsequent words of the observation are not available; the MT system has yet to generate the rest of the translation when the tagging features for a position are scored. |
Introduction | The MT system selects the correct verb stem, but with masculine inflection. |
Introduction | Agreement relations that cross statistical phrase boundaries are not explicitly modeled in most phrase-based MT systems (Avramidis and Koehn, 2008). |
Related Work | To our knowledge, Uszkoreit and Brants (2008) are the only recent authors to show an improvement in a state-of-the-art MT system using class-based LMs. |
Conclusion & Future Work | In this approach a number of MT systems are combined at decoding time in order to form an ensemble model. |
Conclusion & Future Work | We will also add capability of supporting syntax-based ensemble decoding and experiment how a phrase-based system can benefit from syntax information present in a syntax-aware MT system . |
Experiments & Results 4.1 Experimental Setup | The first group are the baseline results on the phrase-based system discussed in Section 2 and the second group are those of our hierarchical MT system . |
Experiments & Results 4.1 Experimental Setup | Since the Hiero baselines results were substantially better than those of the phrase-based model, we also implemented the best-performing baseline, linear mixture, in our Hiero-style MT system and in fact it achieves the hights BLEU score among all the baselines as shown in Table 2. |
Introduction | Discrimi-native optimization methods such as MERT (Och, 2003), MIRA (Crammer et al., 2006), PRO (Hopkins and May, 2011), and Downhill-Simplex (Nelder and Mead, 1965) have been influential in improving MT systems in recent years. |
Introduction | We want to build a MT system that does well with respect to many aspects of translation quality. |
Opportunities and Limitations | We introduce a new approach (PMO) for training MT systems on multiple metrics. |
Theory of Pareto Optimality 2.1 Definitions and Concepts | Here, the MT system’s Decode function, parameterized by weight vector w, takes in a foreign sentence f and returns a translated hypothesis h. The argmax operates in vector space and our goal is to find to leading to hypotheses on the Pareto Frontier. |
Abstract | As MT systems improve, the shortcomings of the n-gram based evaluation metrics are becoming more apparent. |
Abstract | State-of-the-art MT systems are often able to output fluent translations that are nearly grammatical and contain roughly the correct words, but still fail to eXpress meaning that is close to the input. |
Abstract | , 2006) is more adequacy-oriented, it is only employed in very large scale MT system evaluation instead of day-to-day research activities. |
Conclusion and Outlook | 2) and may find use in uncovering systematic shortcomings of MT systems . |
Conclusion and Outlook | To some extent, of course, this problem holds as well for state-of—the-art MT systems . |
EXpt. 1: Predicting Absolute Scores | Each language consists of 1500—2800 sentence pairs produced by 7—15 MT systems . |
Introduction | Figure l: Entailment status between an MT system hypothesis and a reference translation for equivalent (top) and nonequivalent (bottom) translations. |
Abstract | We propose and extensively evaluate a simple method for using alignment models to produce alignments better-suited for phrase-based MT systems , and show significant gains (as measured by BLEU score) in end-to-end translation systems for six languages pairs used in recent MT competitions. |
Adding agreement constraints | Most MT systems train an alignment model in each direction and then heuristically combine their predictions. |
Conclusions | The nature of the complicated relationship between word alignments, the corresponding extracted phrases and the effects on the final MT system still begs for better explanations and metrics. |
Introduction | used, we can get not only improvements in alignment performance, but also in the performance of the MT system that uses those alignments. |
Evaluation | We see, first of all, that the MT system benefits from our approach in most cases. |
Evaluation | Apart from showing the effects of the skeleton-based model, we also studied the behavior of the MT system under the different settings of search space. |
Related Work | More importantly, we develop a complete approach to this issue and show its effectiveness in a state-of-the-art MT system . |
Discussion and conclusion | An application of our idea outside the area of translation assistance is post-correction of the output of some MT systems that, as a last-resort heuristic, copy source words or phrases into their output, producing precisely the kind of input our system is trained on. |
Discussion and conclusion | Our classification-based approach may be able to resolve some of these cases operating as an add-on to a regular MT system —or as a independent post-correction system. |
Evaluation | These scores should generally be much better than the typical MT system performances as only local changes are made to otherwise “perfect” L2 sentences. |
Adaptive Online Algorithms | SGD is sensitive to the learning rate 77, which is difiicult to set in an MT system that mixes frequent “dense” features (like the language model) with sparse features (e. g., for translation rules). |
Experiments | We built Arabic-English and Chinese-English MT systems with Phrasal (Cer et al., 2010), a phrase-based system based on alignment templates (Och and Ney, 2004). |
Introduction | We introduce a new method for training feature-rich MT systems that is effective yet comparatively easy to implement. |
Experiments | The test set was translated by seven MT systems , and each translation has been manually judged for adequacy and fluency. |
Experiments | In addition, the translation outputs of the MT systems are also manually ranked according to their translation quality. |
Experiments | The NIST 2008 English-Chinese MT task consists of 127 documents with 1,830 segments, each with four reference translations and eleven automatic MT system translations. |
Conclusion | We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today. |
Experiments | We tune the the parameters of the MT system on a held-out development corpus of 1,172 parallel sentences, and test on a held-out parallel corpus of 746 parallel sentences. |
Related Work | (2009) confirm and extend these results, showing BLEU improvement for a hierarchical phrase-based MT system on a small Chinese corpus. |
Discussion | This dependency LM can also be used in hierarchical MT systems using lexical-ized CFG trees. |
Experiments | 0 filtered: a string-to-string MT system as in baseline. |
Introduction | In section 4, we describe the implementation details of our MT system . |