Abstract | In this paper, we propose to consider the translation quality of each sentence in the English-to-Chinese cross-language summarization process. |
Abstract | First, the translation quality of each English sentence in the document set is predicted with the SVM regression method, and then the quality score of each sentence is incorporated into the summarization process. |
Abstract | Finally, the English sentences with high translation quality and high informativeness are selected and translated to form the Chinese summary. |
Introduction | However, though machine translation techniques have been advanced a lot, the machine translation quality is far from satisfactory, and in many cases, the translated texts are hard to understand. |
Introduction | In order to address the above problem, we propose to consider the translation quality of the English sentences in the summarization process. |
Introduction | In particular, the translation quality of each English sentence is predicted by using the SVM regression method, and then the predicted MT quality score of each sentence is incorporated into the sentence evaluation process, and finally both informative and easy-to-translate sentences are selected and translated to form the Chinese summary. |
Related Work 2.1 Machine Translation Quality Prediction | In this study, we further predict the translation quality of an English sentence before the machine translation process, i.e., we do not leverage reference translation and the target sentence. |
The Proposed Approach | Each English sentence is associated with a score indicating its translation quality . |
The Proposed Approach | An English sentence with high translation quality score is more likely to be selected into the original English summary, and such English summary can be translated into a better Chinese summary. |
Abstract | Experimental results on spoken language translation show that this hybrid method significantly improves the translation quality , which outperforms the method using a source-target corpus of the same size. |
Experiments | Translation quality was evaluated using both the BLEU score proposed by Papineni et al. |
Experiments | In our experiments, only l-best Chinese or Spanish translation was used since using n-best results did not greatly improve the translation quality . |
Experiments | From the translation results, it can be seen that three methods achieved comparable translation quality on both ASR and CRR inputs, with the translation results on CRR inputs are much better than those on ASR inputs because of the errors in the ASR inputs. |
Introduction | As a result, we can build a synthetic multilingual corpus, which can be used to improve the translation quality . |
Introduction | The idea of using RBMT systems to improve the translation quality of SMT sysems has been explored in Hu et al. |
Introduction | Although previous studies proposed several pivot translation methods, there are no studies to combine different pivot methods for translation quality improvement. |
Using RBMT Systems for Pivot Translation | The translated test set can be added to the training data to further improve translation quality . |
Conclusion | o The sense-based translation model is able to substantially improve translation quality in terms of both BLEU and NIST. |
Conclusion | To the best of our knowledge, this is the first attempt to empirically verify the positive impact of word senses on translation quality . |
Experiments | Do word senses automatically induced by the HDP-based WSI improve translation quality ? |
Experiments | We evaluated translation quality with the case-insensitive BLEU-4 (Papineni et al., 2002) and NIST (Doddington, 2002). |
Experiments | Our second group of experiments were carried out to investigate whether the sense-base translation model is able to improve translation quality by comparing the system enhanced with our sense-based translation model against the baseline. |
Introduction | They report that WSD degenerates the translation quality of SMT. |
Introduction | With these word senses, we study in particular: 1) whether word senses can be directly integrated to SMT to improve translation quality and 2) whether WSI-based model can outperform the reformulated WSD in the context of SMT. |
Introduction | Results show that automatically learned word senses are able to improve translation quality and the sense-based translation model is better than the previous reformulated WSD. |
Related Work | Our experiments show that such word senses are able to improve translation quality . |
A Class-based Model of Agreement | Segmentation is typically applied as a bitext preprocessing step, and there is a rich literature on the effect of different segmentation schemata on translation quality (Koehn and Knight, 2003; Habash and Sadat, 2006; El Kholy and Habash, 2012). |
Conclusion and Outlook | Our class-based agreement model improves translation quality by promoting local agreement, but with a minimal increase in decoding time and no additional storage requirements for the phrase table. |
Discussion of Translation Results | 2 shows translation quality results on newswire, while Tbl. |
Experiments | We first evaluate the Arabic segmenter and tagger components independently, then provide English-Arabic translation quality results. |
Experiments | 5.2 Translation Quality |
Experiments | We evaluated translation quality with BLEU-4 (Pa-pineni et al., 2002) and computed statistical significance with the approximate randomization method of Riezler and Maxwell (2005).9 |
Introduction | However, using lexical coverage experiments, we show that there is ample room for translation quality improvements through better selection of forms that already exist in the translation model. |
Abstract | A manual evaluation of an English-to-German translation task shows that the subcategorization information has a positive impact on translation quality through better prediction of case. |
Conclusion | We showed in a manual evaluation that the proposed features have a positive impact on translation quality . |
Experiments and evaluation | We also present a manual evaluation of our best system which shows that the new features improve translation quality . |
Experiments and evaluation | While the inflection prediction systems (1-4) are significantly12 better than the surface-form system (0), the different versions of the inflection systems are not distinguishable in terms of BLEU; however, our manual evaluation shows that the new features have a positive impact on translation quality . |
Introduction | Integrating semantic role information into SMT has been demonstrated by various researchers to improve translation quality (cf. |
Translation pipeline | This outcome, in particular that adding the lemma of the preposition to the PP node helps to improve translation quality , has been observed before in tree restructuring work for improving translation (Huang and Knight, 2006). |
Translation pipeline | 3 Preliminary experiments showed that larger windows do not improve translation quality . |
Abstract | We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model. |
Discussion and Conclusion | However, adding such data in the sub-sampling process extracts more bilingual data for building the MT models, which slightly increase the model building time but increased the translation quality . |
Experiments | As seen in Table 4, we do not notice translation quality degradation. |
Introduction | Machine translation (MT) systems suffer from an inconsistent and unstable translation quality . |
Introduction | It is demonstrated in (Roukos et al., 2012) that document-specific MT models significantly improve the translation quality . |
Static MT Quality Estimation | 0 The average translation probability of the phrase translation pairs in the final translation, which provides the overall translation quality on the phrase level. |
Static MT Quality Estimation | The external features capture the syntactic structure of the source sentence, as well as the coverage of the training data with regard to the input sentence, which are good indicators of the translation quality . |
Abstract | We explain how to implement this extension efliciently for large-scale data (also released as a modification to GIZA++) and demonstrate, in experiments on Czech, Arabic, Chinese, and Urdu to English translation, significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Conclusion | The method is implemented as a modification to the open-source toolkit GIZA++, and we have shown that it significantly improves translation quality across four different language pairs. |
Experiments | As we will see below, we still obtained strong improvements in translation quality when hand-aligned data was unavailable. |
Experiments | We then tested the effect of word alignments on translation quality using the hierarchical phrase-based translation system Hiero (Chiang, 2007). |
Experiments | We ran some contrastive experiments to investigate the impact of hyperparameter tuning on translation quality . |
Introduction | In this paper, we propose a simple extension to the IBM/HMM models that is unsupervised like the IBM models, is as scalable as GIZA++ because it is implemented on top of GIZA++, and provides significant improvements in both alignment and translation quality . |
Introduction | Experiments on Czech-, Arabic-, Chinese- and Urdu-English translation (Section 3) demonstrate consistent significant improvements over IBM Model 4 in both word alignment (up to +6.7 F1) and translation quality (up to +1.4 B ). |
Abstract | Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2. |
Abstract | We compare our approach with Moses and observe the same performance, but a substantially better tradeoff between translation quality and speed. |
Conclusions | We compare our decoder to Moses, reaching a similar highest BLEU score, but clearly outperforming it in terms of scalability with respect to the tradeoff ratio between translation quality and speed. |
Experimental Evaluation | It yields nearly the same top performance with an even better tradeoff between translation quality and speed. |
Introduction | phrase translation candidates has a positive effect on both translation quality and speed. |
Search Algorithm Extensions | A better pre-selection can be expected to improve translation quality . |
Abstract | Additionally, we remove low confidence alignment links from the word alignment of a bilingual training corpus, which increases the alignment F-score, improves Chinese-English and Arabic-English translation quality and significantly reduces the phrase translation table size. |
Introduction | In section 4 we show how to improve a MaXEnt word alignment quality by removing low confidence alignment links, which also leads to improved translation quality as shown in section 5. |
Related Work | This is similar to the ”loose phrases” described in (Ayan and Dorr, 2006a), which increased the number of correct phrase translations and improved the translation quality . |
Translation | We measure the translation quality with automatic metrics including BLEU (Papineni et al., 2001) and TER (Snover et al., 2006). |
Translation | The higher the BLEU score is, or the lower the TER score is, the better the translation quality is. |
Translation | For newswire, the translation quality is improved by 0.44 on the whole test set and 1.1 on the tail documents, as measured by (TER-BLEU)/2. |
Abstract | Recent discriminative algorithms that accommodate sparse features have produced smaller than expected translation quality gains in large systems. |
Abstract | Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features. |
Experiments | Tables 2 and 3 show that adding tuning examples improves translation quality . |
Experiments | (2012) showed significant translation quality gains by tuning on the bitext. |
Experiments | When tuned on bitext5k the translation quality gains are significant for bitextSk-test relative to tuning on MT05/ 6/ 8, which has multiple references. |
Introduction | We conduct large-scale translation quality experiments on Arabic-English and Chinese-English. |
Abstract | Our models improve translation quality over the single generic label approach of Chiang (2005) and perform on par with the syntactically motivated approach from Zollmann and Venugopal (2006) on the N IST large Chinese—to—English translation task. |
Conclusion and discussion | Evaluated on a Chinese-to-English translation task, our approach improves translation quality over a popular PSCFG baseline—the hierarchical model of Chiang (2005) —and performs on par |
Experiments | We evaluate our approach by comparing translation quality , as evaluated by the IBM-BLEU (Papineni et al., 2002) metric on the NIST Chinese-to-English translation task using MT04 as development set to train the model parameters A, and MTOS, MT06 and MT08 as test sets. |
Experiments | In line with previous findings for syntax-augmented grammars (Zollmann and V0-gel, 2010), the source-side-based grammar does not reach the translation quality of its target-based counterpart; however, the model still outperforms the hi- |
Experiments | (2008), the impact of these rules on translation quality is negligible. |
Introduction | Label-based approaches have resulted in improvements in translation quality over the single X label approach (Zollmann et al., 2008; Mi and Huang, 2008); however, all the works cited here rely on stochastic parsers that have been trained on manually created syntactic treebanks. |
A semantic span can include one or more eus. | In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen. |
Experiments | According to the statistics in Table 1, we see that CSS is really widely distributed in the NIST and CWMT corpora, which implies that the translation quality may benefit substantially from the CSS information, if it is well considered in SMT. |
Experiments | To test the effectiveness of the proposed models, we have compared the translation quality of different integration strategies. |
Experiments | From the table, we cannot conclude that the EUC constraint will certainly promote translation quality , but the transfer model performs better with the constraint on most testing sets. |
Introduction | Our results show, that this leads to a better translation quality . |
Introduction | Our results show that the proposed phrase model training improves translation quality on the test set by 0.9 BLEU points over our baseline. |
Phrase Model Training | As (DeNero et al., 2006) have reported improvements in translation quality by interpolation of phrase tables produced by the generative and the heuristic model, we adopt this method and also report results using lo g-linear interpolation of the estimated model with the original model. |
Related Work | The model shows improvements in translation quality over the single-word-based IBM Model 4 (Brown et al., 1993) on a subset of the Canadian Hansards corpus. |
Related Work | They observe that due to several constraints and pruning steps, the trained phrase table is much smaller than the heuristically extracted one, while preserving translation quality . |
AL-SMT: Multilingual Setting | The translation quality is measured by TQ for individual systems M Fd_, E; it can be BLEU score or WEM’ER (Word error rate and position independent WER) which induces a maximization or minimization problem, respectively. |
AL-SMT: Multilingual Setting | This process is continued iteratively until a certain level of translation quality is met (we use the BLEU score, WER and PER) (Papineni et al., 2002). |
Introduction | We introduce a novel combined measure of translation quality for multiple target language outputs (the same content from multiple source languages). |
Introduction | However, if we start with only a small amount of initial parallel data for the new target language, then translation quality is very poor and requires a very large injection of human labeled data to be effective. |
Introduction | ting allows new features for active learning which we exploit to improve translation quality while reducing annotation effort. |
Abstract | These rules are employed to enrich the SMT inputs for translation quality improvement. |
Conclusion | The manual investigation on oral translation results indicate that the paraphrase rules capture four kinds of MT-favored transformation to ensure translation quality improvement. |
Discussion | investigate What kinds of transformation finally lead to the translation quality improvement. |
Forward-Translation vs. Back-Translation | Finally the translation quality of Back-Translation is evaluated by using the original source texts as references. |
Introduction | The translation quality of the SMT system is highly related to the coverage of translation models. |
Abstract | BLEU, TER) focus on different aspects of translation quality ; our multi-objective approach leverages these diverse aspects to improve overall quality. |
Introduction | These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU, which serve as surrogate objective for translation quality . |
Introduction | Ideally, we want to tune towards an automatic metric that has perfect correlation with human judgments of translation quality . |
Introduction | Different evaluation metrics focus on different aspects of translation quality . |
Experiments | Pruning most of the phrase table without much impact on translation quality is very important for translation especially in environments where memory and time constraints are imposed. |
Experiments | We can see a common phenomenon in both of the algorithms: for the first few thresholds, the phrase table becomes smaller and smaller while the translation quality is not much decreased, but the performance jumps a lot at a certain threshold (16 for Significance pruning, 0.8 for BRAE-based one). |
Experiments | As shown in Table 2, no matter What n is, the BRAE model can significantly improve the translation quality in the overall test data. |
Introduction | The experiments show that up to 72% of the phrase table can be discarded without significant decrease on the translation quality , and in decoding with phrasal semantic similarities up to 1.7 BLEU score improvement over the state-of-the-art baseline can be achieved. |
Abstract | Our experiments on two machine translation quality estimation datasets show uniform significant accuracy gains from multitask learning, and consistently outperform strong baselines. |
Conclusion | Our experiments showed how our approach outperformed competitive baselines on two machine translation quality regression problems, including the highly challenging problem of predicting post-editing time. |
Introduction | In addition to annotators’ own perceptions and expectations with respect to translation quality , a number of factors can affect their judgements on specific sentences. |
Introduction | We show in our experiments on two translation quality datasets that these multitask learning strategies are far superior to training individual per-task models or a single pooled model, and moreover that our multitask learning approach can achieve similar performance to these baselines using only a fraction of the training data. |
Abstract | Additionally, we also propose a new MT metric to appropriately evaluate the translation quality of informative words, by assigning different weights to different words according to their importance values in a document. |
Experiments | Furthermore, using external name translation table only did not improve translation quality in most test sets except for BOLT2. |
Experiments | Although the proposed model has significantly enhanced translation quality , some challenges remain. |
Name-aware MT Evaluation | In order to properly evaluate the translation quality of NAMT methods, we propose to modify the BLEU metric so that they can dynamically assign more weights to names during evaluation. |
Abstract | Experiments show that our approach helps to achieve significant improvements on translation quality . |
Experiment | The translation quality is evaluated by case-insensitive BLEU-4 with shortest length penalty. |
PAS-based Translation Framework | This harms the translation quality . |
Related Work | By incorporating the rich context information as features, they chose better rules for translation and yielded stable improvements on translation quality . |
Experiments and Results | The translation quality is evaluated by case-insensitive IBM BLEU-4 metric. |
Input Features for DNN Feature Learning | (2004) proposed a way of using term weight based models in a vector space as additional evidences for phrase pair translation quality . |
Introduction | Recently, many new features have been explored for SMT and significant performance have been obtained in terms of translation quality , such as syntactic features, sparse features, and reordering features. |
Related Work | (2012) improved translation quality of n-gram translation model by using a bilingual neural LM, where translation probabilities are estimated using a continuous representation of translation units in lieu of standard discrete representations. |
Abstract | We investigate this technique in the context of English-to-Arabic and English-to-Finnish translation, showing significant improvements in translation quality over desegmentation of l-best decoder outputs. |
Experimental Setup | 7We also experimented on log p(X \Y) as an additional feature, but observed no improvement in translation quality . |
Introduction | We demonstrate that significant improvements in translation quality can be achieved by training a linear model to re-rank this transformed translation space. |
Methods | This could improve translation quality , as it brings our training scenario closer to our test scenario (test BLEU is always measured on unsegmented references). |
End-to-End results | Note that name translation quality varies greatly between human translators, with error rates ranging from 8.2-15.0% (absolute). |
End-to-End results | To make sure our name transliterator does not degrade the overall translation quality , we evaluated our base SMT system with BLEU, as well as our transliteration-augmented SMT system. |
Introduction | Typically, translation quality is degraded rather than improved, for the following reasons: |
Introduction | A secondary goal is to make sure that our overall translation quality (as measured by BLEU) does not degrade as a result of the name-handling techniques we introduce. |
Experiments | We measured the overall translation quality with the help of 4-gram BLEU (Papineni et al., 2002), which was computed on tokenized and lower-cased data for both systems. |
Introduction | (2006) use syntactic annotations on the source language side and show significant improvements in translation quality . |
Introduction | (2009), and Chiang (2010), the integration of syntactic information on both sides tends to decrease translation quality because the systems become too restrictive. |
Introduction | The translation quality is automatically measured using BLEU scores, and we confirm the findings by providing linguistic evidence (see Section 5). |
Experiments | Is our topic similarity model able to improve translation quality in terms of BLEU? |
Experiments | This verifies that topic similarity model can improve the translation quality significantly. |
Introduction | To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality . |
Introduction | We further show that both the source-side and target-side topic distributions improve translation quality and their improvements are complementary to each other. |
Experiments | We evaluated the translation quality using the BLEU metric, as calculated by mteval-vl lb.pl with its default setting except that we used case-insensitive matching of n-grams. |
Experiments | As a result, packed forests enable tree-to-tree models to capture more useful source-target mappings and therefore improve translation quality . |
Introduction | Studies reveal that the absence of such non-syntactic mappings will impair translation quality dramatically (Marcu et al., 2006; Liu et al., 2007; DeNeefe et al., 2007; Zhang et al., 2008). |
Related Work | They replace l-best trees with packed forests both in training and decoding and show superior translation quality over the state-of-the-art hierarchical phrase-based system. |
Alternatives to Correlation-based Meta-evaluation | Figure 5: Maximum translation quality decreasing over similarly scored translation pairs. |
Conclusions | reliable for estimating the translation quality at the segment level, for predicting significant improvement between systems and for detecting poor and excellent translations. |
Introduction | The main goal of our work is to analyze to what extent deep linguistic features can contribute to the automatic evaluation of translation quality . |
Metrics and Test Beds | In all cases, translation quality is measured by comparing automatic translations against a set of human references. |
Abstract | Therefore, it is desirable to train all these parameters to directly maximize an objective that directly links to translation quality . |
Abstract | The training objective is an expected BLEU score, which is closely linked to translation quality . |
Abstract | The expected BLEU score is closely linked to translation quality and the regularization is essential when many parameters are trained at scale. |
Experiments | Table 4 shows translation quality for BLEU- and PORT-tuned systems, as assessed by automatic metrics. |
Introduction | Many of the metrics correlate better with human judgments of translation quality than BLEU, as shown in recent WMT Evaluation Task reports (Callison-Burch et |
Introduction | Results given below show that PORT correlates better with human judgments of translation quality than BLEU does, and sometimes outperforms METEOR in this respect, based on data from WMT (2008—2010). |
Abstract | We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. |
Experiments | We use BLEU (Papineni et al., 2002) and TER (Snover et al., 2006) to evaluate translation qualities . |
Introduction | We integrate the neighboring context of the subgraph in our transformation preference predictions, and this improve translation qualities further. |
Related Work | Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al., 2003) but found such restrictions failed to improve translation quality . |
Related Work | The translation quality significantly improved on a constrained task, and the perplexity improvements suggest that interpolating between 71- gram and syntactic LMs may hold promise on larger data sets. |
Related Work | By increasing the beam size and distortion limit of the baseline system, future work may examine whether a baseline system with comparable runtimes can achieve comparable translation quality . |
Abstract | We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality . |
Experiments | Our purpose is to induce phrase pairs to improve translation quality for domain adaptation. |
Experiments | some good methods to explore the potential of the given data to improve the translation quality . |
Experiments | The evaluation metric for the overall translation quality is case-insensitive BLEU4 (Papineni et al., 2002). |
Related Work | They reported extensive empirical analysis and improved word alignment accuracy as well as translation quality . |
Related Work | They estimated phrase-topic distributions in translation model adaptation and generated better translation quality . |
Conclusions | by interpolating them with less sparse ones, could in the future lead to an additional increase in translation quality . |
Experiments | These extra features assess translation quality past the synchronous grammar derivation and learning general reordering or word emission preferences for the language pair. |
Introduction | By advancing from structures which mimic linguistic syntax, to learning linguistically aware latent recursive structures targeting translation, we achieve significant improvements in translation quality for 4 different language pairs in comparison with a strong hierarchical translation baseline. |
Experiments | We evaluate the translation quality using the BLEU-4 metric (Pap-ineni et al., 2002), which is calculated by the script mteval-vllb.pl with its default setting which is case-insensitive matching of n-grams. |
Experiments | Those results suggest that restrictions on 625 rules won’t hurt the performance, but restrictions on 525 will hurt the translation quality badly. |
Experiments | This suggests that using dependency language model really improves the translation quality by less than 1 BLEU point. |
Abstract | The experimental results show that our method improves the performance of both word alignment and translation quality significantly. |
Experiments on Phrase-Based SMT | Here, we investigate three different collocation models for translation quality improvement. |
Experiments on Phrase-Based SMT | When the phrase collocation probabilities are incorporated into the SMT system, the translation quality is improved, achieving an absolute improvement of 0.85 BLEU score. |
Introduction | Our second, more fundamental, strategy replaces the use of loose surrogates of translation quality with a model that attempts to comprehensively assess meaning equivalence between references and MT hypotheses. |
Related Work | (2006) use the degree of overlap between the dependency trees of reference and hypothesis as a predictor of translation quality . |
Textual Entailment vs. MT Evaluation | We thus expect even noisy RTE features to be predictive for translation quality . |