Index of papers in Proc. ACL 2008 that mention
  • F-score
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Evaluation
F-Score (F) is the geometric average of precision and recall; it is the most common non-referential detection metric.
Results
Table 4 gives precision, recall, F-score , and accuracy on the Train/ Test split.
Results
Note that while the LL system has high detection precision, it has very low recall, sharply reducing F-score .
Results
The MINIPL approach sacrifices some precision for much higher recall, but again has fairly low F-score .
F-score is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Vadas, David and Curran, James R.
Experiments
| PREC | RECALL | F-SCORE
Experiments
| PREC | RECALL | F-SCORE
Experiments
Table 2 shows that F-score has dropped by 0.61%.
Introduction
These features are targeted at improving the recovery of NP structure, increasing parser performance by 0.64% F-score .
F-score is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark
Word segmentation with adaptor grammars
We evaluated the f-score of the recovered word constituents (Goldwater et al., 2006b).
Word segmentation with adaptor grammars
Table 1: Word segmentation f-score results for all models, as a function of DP concentration parameter oz.
Word segmentation with adaptor grammars
With a = 1 and 04 = 10 we obtained a word segmentation f-score of 0.55.
F-score is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Experiments
tures in the updated version.5 However, our initial experiments show that, even with this much simpler feature set, our 50-best reranker performed equally well as theirs (both with an F-score of 91.4, see Tables 3 and 4).
Experiments
With only local features, our forest reranker achieves an F-score of 91.25, and with the addition of non-
Forest Reranking
yEeand where function F returns the F-score .
Forest Reranking
2In case multiple candidates get the same highest F-score , we choose the parse with the highest log probability from the baseline parser to be the oracle parse (Collins, 2000).
Introduction
we achieved an F-score of 91.7, which is a 19% error reduction from the l-best baseline, and outperforms both 50-best and 100-best reranking.
Supporting Forest Algorithms
4.1 Forest Oracle Recall that the Parseval F-score is the harmonic mean of labelled precision P and labelled recall R: 2PR _ 2|y fl y*| P + R lyl + |y*|
Supporting Forest Algorithms
In other words, the optimal F-score tree in a forest is not guaranteed to be composed of two optimal F-score subtrees.
Supporting Forest Algorithms
Shown in Pseudocode 4, we perform these computations in a bottom-up topological order, and finally at the root node TOP, we can compute the best global F-score by maximizing over different numbers of test brackets (line 7).
F-score is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Agirre, Eneko and Baldwin, Timothy and Martinez, David
Discussion
Table 8 sum-marises the results, showing that the error reduction rate (ERR) over the parsing F-score is up to 6.9%, which is remarkable given the relatively superficial strategy for incorporating sense information into the parser.
Experimental setting
We evaluate the parsers via labelled bracketing recall (R), precision (’P) and F-score (.731).
Results
The SFU representation produces the best results for Bikel (F-score 0.010 above baseline), while for Charniak the best performance is obtained with word+SF ( F-score 0.007 above baseline).
Results
Overall, Bikel obtains a superior F-score in all configurations.
Results
Again, the F-score for the semantic representations is better than the baseline in all cases.
F-score is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Davidov, Dmitry and Rappoport, Ari
Abstract
Our NR classification evaluation strictly follows the ACL SemEval-07 Task 4 datasets and protocol, obtaining an f-score of 70.6, as opposed to 64.8 of the best previous work that did not use the manually provided WordNet sense disambiguation tags.
Results
In fact, our results ( f-score 62.0, accuracy 64.5) are better than the averaged results (58.0, 61.1) of the group that did not utilize WN tags.
Results
Table 2 shows the HITS-based classification results ( F-score and Accuracy) and the number of positively labeled clusters (C) for each relation.
Results
We have used the exact evaluation procedure described in (Turney, 2006), achieving a class f-score average of 60.1, as opposed to 54.6 in (Turney, 2005) and 51.2 in (Nastase et al., 2006).
F-score is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: