Index of papers in Proc. ACL 2010 that mention
  • F-score
Liu, Shujie and Li, Chi-Ho and Zhou, Ming
Abstract
On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Evaluation
An alternative criterion is the upper bound on alignment F-score , which essentially measures how many links in annotated alignment can be kept in ITG parse.
Evaluation
The calculation of F-score upper bound is done in a bottom-up way like ITG parsing.
Evaluation
The upper bound of alignment F-score can thus be calculated as well.
The DITG Models
The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure.
The DITG Models
Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment.
F-score is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Introduction
Using an adapted supertagger with ambiguity levels tuned to match the baseline system, we were also able to increase F-score on labelled grammatical relations by 0.75%.
Results
Interestingly, while the decrease in supertag accuracy in the previous experiment did not translate into a decrease in F-score, the increase in tag accuracy here does translate into an increase in F-score .
Results
The increase in F-score has two sources.
Results
As Table 6 shows, this change translates into an improvement of up to 0.75% in F-score on Section
F-score is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Li, Linlin and Roth, Benjamin and Sporleder, Caroline
Experiments
Ve only compare the F-score , since all the com-tared systems have an attempted rate7 of 1.0,
Experiments
), F-score (Fl).
Experiments
System F-score
F-score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Woodsend, Kristian and Lapata, Mirella
Experimental Setup
g 03 — Gj _ 0.25 — _ EX" _Q_ Q. G {x {T __ 0.2 - X _ EX 0.15 - _ Recall Precision F-score Recall Precision F-score Rouge-1 Rouge-L
Results
F-score is higher for the phrase-based system but not significantly.
Results
The sentence ILP model outperforms the lead baseline with respect to recall but not precision or F-score .
Results
The phrase ILP achieves a significantly better F-score over the lead baseline with both ROUGE-l and ROUGE-L.
F-score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Regneri, Michaela and Koller, Alexander and Pinkal, Manfred
Evaluation
We calculated precision, recall, and f-score for our system, the baselines, and the upper bound as follows, with allsystem being the number of pairs labelled as paraphrase or happens-before, allgold as the respective number of pairs in the gold standard and correct as the number of pairs labeled correctly by the system.
Evaluation
The f-score for the upper bound is in the column upper.
Evaluation
For the f-score values, we calculated the significance for the difference between our system and the baselines as well as the upper bound, using a resampling test (Edgington, 1986).
F-score is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Fowler, Timothy A. D. and Penn, Gerald
A Latent Variable CCG Parser
To determine statistical significance, we obtain p-values from Bikel’s randomized parsing evaluation comparator6, modified for use with tagging accuracy, F-score and dependency accuracy.
A Latent Variable CCG Parser
In this section we evaluate the parsers using the traditional PARSEVAL measures which measure recall, precision and F-score on constituents in
A Latent Variable CCG Parser
The Petrov parser has better results by a statistically significant margin for both labeled and unlabeled recall and unlabeled F-score .
F-score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiampojamarn, Sittichai and Kondrak, Grzegorz
Intrinsic evaluation
We report the alignment quality in terms of precision, recall and F-score .
Intrinsic evaluation
The F-score corresponding to perfect precision and the upper-bound recall is 94.75%.
Intrinsic evaluation
Overall, the MM models obtain lower precision but higher recall and F-score than 1-1 models, which is to be expected as the gold standard is defined in terms of MM links.
F-score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: