Alternatives to Correlation-based Meta-evaluation | PreCISIon of Significant improvement prediction |
Alternatives to Correlation-based Meta-evaluation | Recall of Significant Improvement |
Alternatives to Correlation-based Meta-evaluation | We now investigate to what extent a significant system improvement according to the metric implies a significant improvement according to human assessors, and Viceversa. |
Abstract | Using publicly available Chinese and English query logs, we demonstrate for both languages that our ranking technique exploiting bilingual data leads to significant improvements over a state-of-the-art monolingual ranking algorithm. |
Conclusions and Future Work | Our pairwise ranking scheme based on bilingual document pairs can easily integrate all kinds of similarities into the existing framework and significantly improves both English and Chinese ranking performance. |
Experiments and Results | The significant improvements over baseline (99% confidence) are bolded with the p-values given in parenthesis. |
Experiments and Results | * indicates significant improvement over IR (no similarity). |
Experiments and Results | Relatively fewer significant improvements can be made by heuristic H-1 (max score). |
Introduction | For both languages, we achieve significant improvements over monolingual Ranking SVM (RSVM) baselines (Herbrich et al., 2000; J oachims, 2002), which exploit a variety of monolingual features. |
Abstract | We also show that the monolingual translation probabilities obtained (i) are comparable to traditional semantic relatedness measures and (ii) significantly improve the results over the query likelihood and the vector-space model for answer finding. |
Conclusion and Future Work | Moreover, models based on translation probabilities yield significant improvement over baseline approaches for answer finding, especially when different types of training data are combined. |
Introduction | This extrinsic evaluation shows that our translation models significantly improve the results over the query likelihood and the vector-space model. |
Related Work | All in all, translation models have been shown to significantly improve the retrieval results over traditional baselines for document retrieval (Berger and Lafferty, 1999), question retrieval in Question & Answer archives (Jeon et al., 2005; Lee et al., 2008; Xue et al., 2008) and for sentence retrieval (Murdock and Croft, 2005). |
Abstract | Experimental results on spoken language translation show that this hybrid method significantly improves the translation quality, which outperforms the method using a source-target corpus of the same size. |
Abstract | Experimental results indicate that our method achieves consistent and significant improvement over individual translation outputs. |
Conclusion | Experimental results indicate that this method can consistently and significantly improve translation quality over individual translation outputs. |
Introduction | And this translation quality is higher than that of those produced by the system trained with a real Chinese-Spanish corpus; (3) Our sentence-level translation selection method consistently and significantly improves the translation quality over individual translation outputs in all of our experiments. |
Abstract | Experimental results on data sets for NIST Chinese-to—English machine translation task show that the co-decoding method can bring significant improvements to all baseline decoders, and the outputs from co-decoding can be used to further improve the result of system combination. |
Experiments | However, we did not observe any significant improvements for both combination schemes when n-best size is larger than 20. |
Introduction | We will present experimental results on the data sets of NIST Chinese-to-English machine translation task, and demonstrate that co-decoding can bring significant improvements to baseline systems. |
Recognition as a Generation Task | As can be seen in Table 6, using the latent variables significantly improved the performance (see DPLVM vs. CRF), and using the GI encoding improved the performance of both the DPLVM and the CRF. |
Results and Discussion | The results revealed that the latent variable model significantly improved the performance over the CRF model. |
Results and Discussion | Whereas the use of the latent variables still significantly improves the generation performance, using the GI encoding undermined the performance in this task. |
Analysis | 0 The constituent boundary matching feature (CBMF) is a very important feature, which by itself achieves significant improvement over the baseline (up to 1.13 BLEU). |
Experiments | Like (Marton and Resnik, 2008), we find that the XP+ feature obtains a significant improvement of 1.08 BLEU over the baseline. |
Introduction | Although experiments show that this constituent matching/violation counting feature achieves significant improvements on various language-pairs, one issue is that matching syntactic analysis can not always guarantee a good translation, and violating syntactic structure does not always induce a bad translation. |