Index of papers in Proc. ACL 2008 that mention

Seen in text as:

Seen in 60 sentences in 6 papers.

Bergsma, Shane and Lin, Dekang and Goebel, Randy

Evaluation	F-Score (F) is the geometric average of precision and recall; it is the most common non-referential detection metric.
Results	Table 4 gives precision, recall, F-score , and accuracy on the Train/ Test split.
Results	Note that while the LL system has high detection precision, it has very low recall, sharply reducing F-score .
Results	The MINIPL approach sacrifices some precision for much higher recall, but again has fairly low F-score .

F-score is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

Vadas, David and Curran, James R.

Experiments	\| PREC \| RECALL \| F-SCORE
Experiments	\| PREC \| RECALL \| F-SCORE
Experiments	Table 2 shows that F-score has dropped by 0.61%.
Introduction	These features are targeted at improving the recovery of NP structure, increasing parser performance by 0.64% F-score .

F-score is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

Johnson, Mark

Word segmentation with adaptor grammars	We evaluated the f-score of the recovered word constituents (Goldwater et al., 2006b).
Word segmentation with adaptor grammars	Table 1: Word segmentation f-score results for all models, as a function of DP concentration parameter oz.
Word segmentation with adaptor grammars	With a = 1 and 04 = 10 we obtained a word segmentation f-score of 0.55.

F-score is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

Huang, Liang

Experiments	tures in the updated version.5 However, our initial experiments show that, even with this much simpler feature set, our 50-best reranker performed equally well as theirs (both with an F-score of 91.4, see Tables 3 and 4).
Experiments	With only local features, our forest reranker achieves an F-score of 91.25, and with the addition of non-
Forest Reranking	yEeand where function F returns the F-score .
Forest Reranking	2In case multiple candidates get the same highest F-score , we choose the parse with the highest log probability from the baseline parser to be the oracle parse (Collins, 2000).
Introduction	we achieved an F-score of 91.7, which is a 19% error reduction from the l-best baseline, and outperforms both 50-best and 100-best reranking.
Supporting Forest Algorithms	4.1 Forest Oracle Recall that the Parseval F-score is the harmonic mean of labelled precision P and labelled recall R: 2PR _ 2\|y fl y\| P + R lyl + \|y\|
Supporting Forest Algorithms	In other words, the optimal F-score tree in a forest is not guaranteed to be composed of two optimal F-score subtrees.
Supporting Forest Algorithms	Shown in Pseudocode 4, we perform these computations in a bottom-up topological order, and finally at the root node TOP, we can compute the best global F-score by maximizing over different numbers of test brackets (line 7).

F-score is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

Agirre, Eneko and Baldwin, Timothy and Martinez, David

Discussion	Table 8 sum-marises the results, showing that the error reduction rate (ERR) over the parsing F-score is up to 6.9%, which is remarkable given the relatively superficial strategy for incorporating sense information into the parser.
Experimental setting	We evaluate the parsers via labelled bracketing recall (R), precision (’P) and F-score (.731).
Results	The SFU representation produces the best results for Bikel (F-score 0.010 above baseline), while for Charniak the best performance is obtained with word+SF ( F-score 0.007 above baseline).
Results	Overall, Bikel obtains a superior F-score in all configurations.
Results	Again, the F-score for the semantic representations is better than the baseline in all cases.

F-score is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Davidov, Dmitry and Rappoport, Ari

Abstract	Our NR classification evaluation strictly follows the ACL SemEval-07 Task 4 datasets and protocol, obtaining an f-score of 70.6, as opposed to 64.8 of the best previous work that did not use the manually provided WordNet sense disambiguation tags.
Results	In fact, our results ( f-score 62.0, accuracy 64.5) are better than the averaged results (58.0, 61.1) of the group that did not utilize WN tags.
Results	Table 2 shows the HITS-based classification results ( F-score and Accuracy) and the number of positively labeled clusters (C) for each relation.
Results	We have used the exact evaluation procedure described in (Turney, 2006), achieving a class f-score average of 60.1, as opposed to 54.6 in (Turney, 2005) and 51.2 in (Nastase et al., 2006).

F-score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: