SciSurf: Index of "evaluation metrics" in Proc. ACL 2014

Index of papers in Proc. ACL 2014 that mention

evaluation metrics

Seen in text as:

evaluation metrics (35)
evaluation metric (7)
Evaluation Metrics (6)

Seen in 48 sentences in 8 papers.

1. Using Discourse Structure Improves Machine Translation Evaluation

Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Then, we show that these measures can help improve a number of existing machine translation evaluation metrics both at the segment- and at the system-level.
Abstract	Rather than proposing a single new metric, we show that discourse information is complementary to the state-of-the-art evaluation metrics, and thus should be taken into account in the development of future richer evaluation metrics .
Experimental Setup	4.1 MT Evaluation Metrics
Introduction	We believe that the semantic and pragmatic information captured in the form of DTs (i) can help develop discourse-aware SMT systems that produce coherent translations, and (ii) can yield better MT evaluation metrics .
Introduction	In this paper, rather than proposing yet another MT evaluation metric, we show that discourse information is complementary to many existing evaluation metrics , and thus should not be ignored.
Introduction	We first design two discourse-aware similarity measures, which use DTs generated by a publicly-available discourse parser (J oty et al., 2012); then, we show that they can help improve a number of MT evaluation metrics at the segment- and at the system-level in the context of the WMT11 and the WMT12 metrics shared tasks (Callison-Burch et al., 2011; Callison-Burch et al., 2012).
Our Discourse-Based Measures	In order to develop a discourse-aware evaluation metric , we first generate discourse trees for the reference and the system-translated sentences using a discourse parser, and then we measure the similarity between the two discourse trees.
Related Work	A common argument, is that current automatic evaluation metrics such as BLEU are inadequate to capture discourse-related aspects of translation quality (Hardmeier and Federico, 2010; Meyer et al., 2012).
Related Work	Thus, there is consensus that discourse-informed MT evaluation metrics are needed in order to advance research in this direction.
Related Work	The field of automatic evaluation metrics for MT is very active, and new metrics are continuously being proposed, especially in the context of the evaluation campaigns that run as part of the Workshops on Statistical Machine Translation (WMT 2008-2012), and NIST Metrics for Machine Translation Challenge (MetricsMATR), among others.

evaluation metrics is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

2. XMEANT: Better semantic MT evaluation without reference translations

Lo, Chi-kiu and Beloucif, Meriem and Saers, Markus and Wu, Dekai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce XMEANT—a new cross-lingual version of the semantic frame based MT evaluation metric MEAN T—which can correlate even more closely with human adequacy judgments than monolingual MEANT and eliminates the need for expensive human references.
Introduction	It is well established that the MEANT family of metrics correlates better with human adequacy judgments than commonly used MT evaluation metrics (Lo and Wu, 2011a, 2012; Lo et al., 2012; Lo and Wu, 2013b; Machacek and Bojar, 2013).
Introduction	We therefore propose XMEANT, a cross-lingual MT evaluation metric , that modifies MEANT using (1) simple translation probabilities (in our experiments,
Related Work	2.1 MT evaluation metrics
Results	Table 1 shows that for human adequacy judgments at the sentence level, the f-score based XMEANT (l) correlates significantly more closely than other commonly used monolingual automatic MT evaluation metrics , and (2) even correlates nearly as well as monolingual MEANT.

evaluation metrics is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

3. A chance-corrected measure of inter-annotator agreement for syntax

Skjaerholt, Arne

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	With this in mind, it is striking that virtually all evaluations of syntactic annotation efforts use uncorrected parser evaluation metrics such as bracket F1 (for phrase structure) and accuracy scores (for dependencies).
Abstract	To evaluate our metric we first present a number of synthetic experiments to better control the sources of noise and gauge the metric’s responses, before finally contrasting the behaviour of our chance-corrected metric with that of uncorrected parser evaluation metrics on real
Conclusion	In this task inserting and deleting nodes is an integral part of the annotation, and if two annotators insert or delete different nodes the all-or-nothing requirement of identical yield of the LAS metric makes it impossible as an evaluation metric in this setting.
Real-world corpora	In our evaluation, we will contrast labelled accuracy, the standard parser evaluation metric , and our three 04 metrics.
Synthetic experiments	6The de facto standard parser evaluation metric in depen-

evaluation metrics is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

4. Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan, Kazi Saidul and Ng, Vincent

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	4.1 Evaluation Metrics
Evaluation	Designing evaluation metrics for keyphrase extraction is by no means an easy task.
Evaluation	To score the output of a keyphrase extraction system, the typical approach, which is also adopted by the SemEval—2010 shared task on keyphrase extraction, is (1) to create a mapping between the keyphrases in the gold standard and those in the system output using exact match, and then (2) score the output using evaluation metrics such as precision (P), recall (R), and F-score (F).

evaluation metrics is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

5. A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

Kawahara, Daisuke and Peterson, Daniel W. and Palmer, Martha

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Evaluations	We first describe our experimental settings and define evaluation metrics to evaluate induced soft clusterings of verb classes.
Experiments and Evaluations	4.2 Evaluation Metrics
Experiments and Evaluations	This kind of normalization for soft clusterings was performed for other evaluation metrics as in Springorum et al.

evaluation metrics is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

6. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	4.2 Evaluation Metrics
Experiments	We will use conventional sequence labeling evaluation metrics such as sequence accuracy and character accuracy2.
Experiments	3'Other evaluation metrics are also proposed by (Zheng et al., 2011a) which is only suitable for their system since our system uses a joint model

evaluation metrics is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. Word Segmentation of Informal Arabic with Domain Adaptation

Monroe, Will and Green, Spence and Manning, Christopher D.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	3.1 Evaluation metrics
Experiments	We use two evaluation metrics in our experiments.
Experiments	Our segmenter achieves higher scores than MADA and MADA-ARZ on all datasets under both evaluation metrics .

evaluation metrics is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

8. Product Feature Mining: Semantic Clues versus Syntactic Constituents

Xu, Liheng and Liu, Kang and Lai, Siwei and Zhao, Jun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	4.1 Datasets and Evaluation Metrics
Experiments	Evaluation Metrics : We evaluate the proposed method in terms of precision(P), recall(R) and F-measure(F).
Experiments	To take into account the correctly expanded terms for both positive and negative seeds, we use Accuracy as the evaluation metric,

evaluation metrics is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: