Index of papers in Proc. ACL 2013 that mention
  • evaluation metrics
Zhang, Congle and Baldwin, Tyler and Ho, Howard and Kimelfeld, Benny and Li, Yunyao
Conclusions
This evaluation metric allows for a deeper understanding of how certain normalization actions impact the output of the parser.
Evaluation
5.1 Evaluation Metrics
Evaluation
Therefore, we propose a new evaluation metric that directly equates normalization performance with the performance of a common downstream application—dependency parsing.
Introduction
Another potential problem with state-of-the-art normalization is the lack of appropriate evaluation metrics .
Introduction
For instance, it is unclear how performance measured by the typical normalization evaluation metrics of word error rate and BLEU score (Pap-ineni et al., 2002) translates into performance on a parsing task, where a well placed punctuation mark may provide more substantial improvements than changing a nonstandard word form.
Introduction
To address this problem, this work introduces an evaluation metric that ties normalization performance directly to the performance of a downstream dependency parser.
evaluation metrics is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Fournier, Chris
Abstract
This work proposes a new segmentation evaluation metric , named boundary similarity (B), an inter-coder agreement coefficient adaptation, and a confusion-matrix for segmentation that are all based upon an adaptation of the boundary edit distance in Fournier and Inkpen (2012).
Conclusions
In this work, a new segmentation evaluation metric , referred to as boundary similarity (B) is proposed as an unbiased metric, along with a boundary-edit-distance-based (BED-based) confusion matrix to compute predictably biased IR metrics such as precision and recall.
Conclusions
B also allows for an intuitive comparison of boundary pairs between segmentations, as opposed to the window counts of WD or the simplistic edit count normalization of S. When an unbiased segmentation evaluation metric is desired, this work recommends the usage of B and the use of an upper and lower bound to provide context.
Evaluation of Automatic Segmenters
An ideal segmentation evaluation metric should, in theory, place the three automatic segmenters between the upper and lower bounds in terms of performance if the metrics, and the segmenters, function properly.
Introduction
To select an automatic segmenter for a particular task, a variety of segmentation evaluation metrics have been proposed, including Pk, (Beeferman and Berger, 1999, pp.
evaluation metrics is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Martschat, Sebastian
Conclusions and Future Work
o the evaluation metrics employed are to be questioned (certainly),
Evaluation
5.1 Data and Evaluation Metrics
Evaluation
We evaluate our system with the coreference resolution evaluation metrics that were used for the CoNLL shared tasks on coreference, which are MUC (Vilain et al., 1995), B3 (Bagga and Baldwin, 1998) and CEAFe (Luo, 2005).
Evaluation
We also report the unweighted average of the three scores, which was the official evaluation metric in the shared tasks.
evaluation metrics is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Abstract
Experimental results show that our graph propagation method significantly improves performance over two strong baselines under intrinsic and extrinsic evaluation metrics .
Experiments & Results 4.1 Experimental Setup
Two intrinsic evaluation metrics that we use to evaluate the possible translations for oovs are Mean Reciprocal Rank (MRR) (Voorhees, 1999) and Recall.
Experiments & Results 4.1 Experimental Setup
Intrinsic evaluation metrics are faster to apply and are used to optimize different hyper-parameters of the approach (e.g.
Experiments & Results 4.1 Experimental Setup
BLEU (Papineni et al., 2002) is still the de facto evaluation metric for machine translation and we use that to measure the quality of our proposed approaches for MT.
evaluation metrics is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cai, Shu and Knight, Kevin
Conclusion and Future Work
We present an evaluation metric for whole-sentence semantic analysis, and show that it can be computed efficiently.
Introduction
In this work, we provide an evaluation metric that uses the degree of overlap between two whole-sentence semantic structures as the partial credit.
Semantic Overlap
Our evaluation metric measures precision, recall, and f-score of the triples in the second AMR against the triples in the first AMR, i.e., the amount of propositional overlap.
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lassalle, Emmanuel and Denis, Pascal
Abstract
Our experiments on the C0NLL-2012 Shared Task English datasets (gold mentions) indicate that our method is robust relative to different clustering strategies and evaluation metrics , showing large and consistent improvements over a single pairwise model using the same base features.
Experiments
5.3 Evaluation metrics
Introduction
As will be shown based on a variety of experiments on the CoNLL-2012 Shared Task English datasets, these improvements are consistent across different evaluation metrics and for the most part independent of the clustering decoder that was used.
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Conclusions and Future Work
We also proposed a new name-aware evaluation metric .
Introduction
Propose a new MT evaluation metric which can discriminate names and noninformative words (Section 4).
Name-aware MT Evaluation
Traditional MT evaluation metrics such as BLEU (Papineni et al., 2002) and Translation Edit Rate (TER) (Snover et al., 2006) assign the same weights to all tokens equally.
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto
Experiment 1: Textual Similarity
Three evaluation metrics are provided by the organizers of the SemEval-2012 STS task, all of which are based on Pearson correlation 7“ of human judgments with system outputs: (1) the correlation value for the concatenation of all five datasets (ALL), (2) a correlation value obtained on a concatenation of the outputs, separately normalized by least square (ALLnrm), and (3) the weighted average of Pearson correlations across datasets (Mean).
Experiment 1: Textual Similarity
Table 2 shows the scores obtained by ADW for the three evaluation metrics , as well as the Pearson correlation values obtained on each of the five test sets (rightmost columns).
Experiment 1: Textual Similarity
As can be seen from Table 2, our system (ADW) outperforms all the 88 participating systems according to all the evaluation metrics .
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Barzilay, Regina and Globerson, Amir
Evaluation Setup
Evaluation Metrics: We use two evaluation metrics .
Experiment and Analysis
Moreover, increasing the number of coarse annotations used in training leads to further improvement on different evaluation metrics .
Experiment and Analysis
Figure 5 also illustrates a slightly different characteristics of transfer performance between two evaluation metrics .
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Experiments
3.1 Data Set and Evaluation Metrics
Experiments
Evaluation Metrics : We evaluate the performance of question retrieval using the following metrics: Mean Average Precision (MAP) and Precision@N (P@N).
Our Approach
where feature vector (I) (q, d) = (SVSM(Q7 d), 8((11, d1), 8(Q2, d2), - - - ,8(QP, 003)), and 6 is the corresponding weight vector, we optimize this parameter for our evaluation metrics directly using the Powell Search algorithm (Paul et al., 1992) via cross-validation.
evaluation metrics is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: