Adaptive Quality Estimation for Machine Translation
Turchi, Marco and Anastasopoulos, Antonios and C. de Souza, José G. and Negri, Matteo

Article Structure


The automatic estimation of machine translation (MT) output quality is a hard task in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role.


After two decades of steady progress, research in statistical machine translation (SMT) started to cross its path with translation industry with tangible mutual benefit.

Related work

QE is generally cast as a supervised machine learning task, where a model trained from a collection of (source, target, label) instances is used to predict labels1 for new, unseen test items (Specia et al., 2010).

Online QE for CAT environments

When operating with advanced CAT tools, translators are presented with suggestions (either matching fragments from a translation memory or automatic translations produced by an MT system) for each sentence of a source document.

Evaluation framework

To measure the adaptation capability of different QE models, we experiment with a range of conditions defined by variable degrees of similarity between training and test data.

Experiments with WMT12 data

The motivations for experiments with training and test data featuring homogeneous label distributions are twofold.

Experiments with CAT data

To experiment with adaptive QB in more realistic conditions we used a CAT tool12 to collect two datasets of (source, target, posLedited target) English-Italian tuples.The source sentences in the datasets come from two documents from different domains, respectively legal (L) and information technology (IT).


In the CAT scenario, each translation job can be seen as a complex situation where the user (his personal style and background), the source document (the language and the domain) and the underlying technology (the translation memory and the MT engine that generate translation suggestions) contribute to make the task unique.


shared tasks

Appears in 6 sentences as: shared task (2) shared tasks (4)
In Adaptive Quality Estimation for Machine Translation
  1. In the last couple of years, research in the field received a strong boost by the shared tasks organized within the WMT workshop on SMT,2 which is also the framework of our first experiment in §5.
    Page 2, “Related work”
  2. 3For a comprehensive overview of the QE approaches proposed so far we refer the reader to the WMT12 and WMT13 QE shared task reports (Callison-Burch et al., 2012; Bojar et al., 2013).
    Page 2, “Related work”
  3. The tool, which implements a large number of features proposed by participants in the WMT QE shared tasks , has been modified to process one sentence at a time as requested for integration in a CAT environment;
    Page 3, “Online QE for CAT environments”
  4. 0 One artificial setting (§5) obtained from the WMT12 QE shared task data, in which train-ing/test instances are arranged to reflect homogeneous distributions of the HTER labels.
    Page 4, “Evaluation framework”
  5. To measure the adaptability of our model to a given test set we compute the Mean Absolute Error (MAE), a metric for regression problems also used in the WMT QE shared tasks .
    Page 5, “Evaluation framework”
  6. The results of previous WMT QE shared tasks have shown that these baseline features are particularly competitive in the regression task (with only few systems able to beat them at WMT12).
    Page 5, “Evaluation framework”

See all papers in Proc. ACL 2014 that mention shared tasks.

See all papers in Proc. ACL that mention shared tasks.

Back to top.

feature set

Appears in 3 sentences as: feature set (3)
In Adaptive Quality Estimation for Machine Translation
  1. 4.2 Performance indicator and feature set
    Page 5, “Evaluation framework”
  2. As our focus is on the algorithmic aspect, in all experiments we use the same feature set , which consists of the seventeen features proposed in (Specia et al., 2009).
    Page 5, “Evaluation framework”
  3. This feature set , fully described in (Callison-Burch et al., 2012), takes into account the complexity of the source sentence (e. g. number of tokens, number of translations per source word) and the fluency of the target translation (e. g. language model probabilities).
    Page 5, “Evaluation framework”

See all papers in Proc. ACL 2014 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

statistically significant

Appears in 3 sentences as: statistically significant (3)
In Adaptive Quality Estimation for Machine Translation
  1. 11Results marked with the “*” symbol are NOT statistically significant compared to the corresponding batch model.
    Page 6, “Experiments with WMT12 data”
  2. The others are always statistically significant at p§0.005, calculated with approximate randomization (Yeh, 2000).
    Page 6, “Experiments with WMT12 data”
  3. These results (MAE reductions are always statistically significant ) suggest that, when dealing with datasets with very different label distributions, the evident limitations of batch methods are more easily overcome by learning from scratch from the feedback of a new post-editor.
    Page 7, “Experiments with CAT data”

See all papers in Proc. ACL 2014 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

Support Vector

Appears in 3 sentences as: Support Vector (1) support vector (1) support vectors (1)
In Adaptive Quality Estimation for Machine Translation
  1. Aggressive Perceptron (Crammer et al., 2006),9 by comparing their performance with a batch learning strategy based on the Scikit-learn implementation of Support Vector Regression (SVR).10
    Page 5, “Evaluation framework”
  2. If the point is identified as a support vector , the parameters of the model are updated.
    Page 5, “Evaluation framework”
  3. In contrast with OSVR, which keeps track of the most important points seen in the past ( support vectors ), the update of the weights is done without considering the previously processed iI instances.
    Page 5, “Evaluation framework”

See all papers in Proc. ACL 2014 that mention Support Vector.

See all papers in Proc. ACL that mention Support Vector.

Back to top.