Index of papers in Proc. ACL 2012 that mention

Seen in text as:

Seen in 35 sentences in 6 papers.

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

Abstract	Many machine translation (MT) evaluation metrics have been shown to correlate better with human judgment than BLEU.
Abstract	This paper presents PORTl, a new MT evaluation metric which combines precision, recall and an ordering metric and which is primarily designed for tuning MT systems.
BLEU and PORT	Several ordering measures have been integrated into MT evaluation metrics recently.
Experiments	3.1 PORT as an Evaluation Metric
Experiments	We studied PORT as an evaluation metric on WMT data; test sets include WMT 2008, WMT 2009, and WMT 2010 all-to-English, plus 2009, 2010 English-to-all submissions.
Experiments	This is because we designed PORT to carry out tuning; we did not optimize its performance as an evaluation metric , but rather, to optimize system tuning performance.
Introduction	Automatic evaluation metrics for machine translation (MT) quality are a key part of building statistical MT (SMT) systems.
Introduction	VIT Evaluation Metric for Tuning
Introduction	These methods perform repeated decoding runs with different system parameter values, which are tuned to optimize the value of the evaluation metric over a development set with reference translations.

evaluation metrics is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

Duh, Kevin and Sudoh, Katsuhito and Wu, Xianchao and Tsukada, Hajime and Nagata, Masaaki

Introduction	These methods are effective because they tune the system to maximize an automatic evaluation metric such as BLEU, which serve as surrogate objective for translation quality.
Introduction	While many alternatives have been proposed, such a perfect evaluation metric remains elusive.
Introduction	As a result, many MT evaluation campaigns now report multiple evaluation metrics (Callison—Burch et al., 2011; Paul, 2010).
Opportunities and Limitations	Leveraging the diverse perspectives of different evaluation metrics has the potential to improve overall quality.
Related Work	If a good evaluation metric could not be used for tuning, it would be a pity.

evaluation metrics is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Liu, Chang and Ng, Hwee Tou

Conclusion	In this work, we devise a new MT evaluation metric in the family of TESLA (Translation Evaluation of Sentences with Linear-programming-based Analysis), called TESLA-CELAB (Character-level Evaluation for Languages with Ambiguous word Boundaries), to address the problem of fuzzy word boundaries in the Chinese language, although neither the phenomenon nor the method is unique to Chinese.
Introduction	The Workshop on Statistical Machine Translation (WMT) hosts regular campaigns comparing different machine translation evaluation metrics (Callison-Burch et al., 2009; Callison-Burch et al., 2010; Callison-Burch et al., 2011).
Introduction	The work compared various MT evaluation metrics (BLEU, NIST, METEOR, GTM, 1 — TER) with different segmentation schemes, and found that treating every single character as a token (character-level MT evaluation) gives the best correlation with human judgments.
The Algorithm	Notice that all n-grams are put in the same matching problem regardless of n, unlike in translation evaluation metrics designed for European languages.
The Algorithm	This relationship is implicit in the matching problem for English translation evaluation metrics where words are well delimited.
The Algorithm	Many prior translation evaluation metrics such as MAXSIM (Chan and Ng, 2008) and TESLA (Liu et al., 2010; Dahlmeier et al., 2011) use the F-0.8 measure as the final score:

evaluation metrics is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Falk, Ingrid and Gardent, Claire and Lamirel, Jean-Charles

Abstract	We present a novel approach to the automatic acquisition of a Verbnet like classification of French verbs which involves the use (i) of a neural clustering method which associates clusters with features, (ii) of several supervised and unsupervised evaluation metrics and (iii) of various existing syntactic and semantic lexical resources.
Clustering Methods, Evaluation Metrics and Experimental Setup	3.2 Evaluation metrics
Clustering Methods, Evaluation Metrics and Experimental Setup	We use several evaluation metrics which bear on different properties of the clustering.
Clustering Methods, Evaluation Metrics and Experimental Setup	As pointed out in (Lamirel et al., 2008; Attik et al., 2006), unsupervised evaluation metrics based on cluster labelling and feature maximisation can prove very useful for identifying the best clustering strategy.
Features and Data	Moreover, for this data set, the unsupervised evaluation metrics (cf.

evaluation metrics is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Abu-Jbara, Amjad and Dasigi, Pradeep and Diab, Mona and Radev, Dragomir

Evaluation	Before describing the experiments and presenting the results, we first describe the evaluation metrics we use.
Evaluation	4.0.1 Evaluation Metrics
Evaluation	We use two evaluation metrics to evaluate subgroups detection accuracy: Purity and Entropy.

evaluation metrics is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Sun, Hong and Zhou, Ming

Paraphrasing with a Dual SMT System	MERT integrates the automatic evaluation metrics into the training process to achieve optimal end-to-end performance.
Paraphrasing with a Dual SMT System	(2) where G is the automatic evaluation metric for paraphrasing.
Paraphrasing with a Dual SMT System	2.2 Paraphrase Evaluation Metrics

evaluation metrics is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: