Conclusion and Outlook | We have used an off-the-shelf RTE system to compute these features, and demonstrated that a regression model over these features can outperform an ensemble of traditional MT metrics in two experiments on different datasets. |
EXpt. 1: Predicting Absolute Scores | We optimize the weights of our regression models on two languages and then predict the human scores on the third language. |
Experimental Evaluation | They are small regression models as described in Section 2 over component scores of four widely used MT metrics. |
Experimental Evaluation | 2The regression models can simulate the behaviour of each component by setting the weights appropriately, but are strictly more powerful. |
Experimental Evaluation | We therefore verified that the three nontrivial “baseline” regression models indeed confer a benefit over the default component combination scores: BLEU—1 (which outperformed BLEU-4 in the MetricsMATR 2008 evaluation), NIST-4, and TER (with all costs set to 1). |
Expt. 2: Predicting Pairwise Preferences | 6We also experimented with a logistic regression model that predicts binary preferences directly. |
Textual Entailment vs. MT Evaluation | This allows us to use an off-the-shelf RTE system to obtain features, and to combine them using a regression model as described in Section 2. |
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension | In this analysis, we train a regression model to fit the activation profile for the 12 phrase stimuli. |
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension | The regression model examined to what extent the semantic feature vectors (explanatory variables) can account for the variation in neural activity (response variable) across the 12 stimuli. |
Brain Imaging Experiments on Adj ec-tive-Noun Comprehension | All explanatory variables were entered into the regression model simultaneously. |