Index of papers in Proc. ACL 2011 that mention

Seen in text as:

Seen in 24 sentences in 2 papers.

Lo, Chi-kiu and Wu, Dekai

Abstract	As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent.
Abstract	But more accurate, nonautomatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle.
Abstract	We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semiautomated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor co st for the evaluation procedure.

evaluation metric is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

Chen, David and Dolan, William

Abstract	A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years.
Introduction	However, a lack of standard datasets and automatic evaluation metrics has impeded progress in the field.
Introduction	Second, we define a new evaluation metric , PINC (Paraphrase In N- gram Changes), that relies on simple BLEU-like n—gram comparisons to measure the degree of novelty of automatically generated paraphrases.
Paraphrase Evaluation Metrics	A good paraphrase, according to our evaluation metric , has few n-gram overlaps with the source sentence but many n- gram overlaps with the reference sentences.
Related Work	The more recently proposed metric PEM (Paraphrase Evaluation Metric ) (Liu et al., 2010) produces a single score that captures the semantic adequacy, fluency, and lexical dissimilarity of candidate paraphrases, relying on bilingual data to learn semantic equivalences without using n- gram similarity between candidate and reference sentences.

evaluation metric is mentioned in 5 sentences in this paper.

Topics mentioned in this paper: