Index of papers in Proc. ACL 2008 that mention
  • evaluation metrics
Chan, Yee Seng and Ng, Hwee Tou
Abstract
We propose an automatic machine translation (MT) evaluation metric that calculates a similarity score (based on precision and recall) of a pair of sentences.
Abstract
When evaluated on data from the ACL—07 MT workshop, our proposed metric achieves higher correlation with human judgements than all 11 automatic MT evaluation metrics that were evaluated during the workshop.
Introduction
Since human evaluation of MT output is time consuming and expensive, having a robust and accurate automatic MT evaluation metric that correlates well with human judgement is invaluable.
Introduction
Among all the automatic MT evaluation metrics , BLEU (Papineni et al., 2002) is the most widely used.
Introduction
During the recent ACL-07 workshop on statistical MT (Callison-Burch et al., 2007), a total of 11 automatic MT evaluation metrics were evaluated for correlation with human judgement.
Metric Design Considerations
We first review some aspects of existing metrics and highlight issues that should be considered when designing an MT evaluation metric .
Metric Design Considerations
The ACL-07 MT workshop evaluated the translation quality of MT systems on various translation tasks, and also measured the correlation (with human judgement) of 11 automatic MT evaluation metrics .
Metric Design Considerations
In this paper, we present MAXSIM, a new automatic MT evaluation metric that computes a similarity score between corresponding items across a sentence pair, and uses a bipartite graph to obtain an optimal matching between item pairs.
evaluation metrics is mentioned in 11 sentences in this paper.
Topics mentioned in this paper: