Evaluation | F-Score (F) is the geometric average of precision and recall; it is the most common non-referential detection metric. |
Results | Table 4 gives precision, recall, F-score , and accuracy on the Train/ Test split. |
Results | Note that while the LL system has high detection precision, it has very low recall, sharply reducing F-score . |
Results | The MINIPL approach sacrifices some precision for much higher recall, but again has fairly low F-score . |
Experiments | | PREC | RECALL | F-SCORE |
Experiments | | PREC | RECALL | F-SCORE |
Experiments | Table 2 shows that F-score has dropped by 0.61%. |
Introduction | These features are targeted at improving the recovery of NP structure, increasing parser performance by 0.64% F-score . |
Word segmentation with adaptor grammars | We evaluated the f-score of the recovered word constituents (Goldwater et al., 2006b). |
Word segmentation with adaptor grammars | Table 1: Word segmentation f-score results for all models, as a function of DP concentration parameter oz. |
Word segmentation with adaptor grammars | With a = 1 and 04 = 10 we obtained a word segmentation f-score of 0.55. |
Experiments | tures in the updated version.5 However, our initial experiments show that, even with this much simpler feature set, our 50-best reranker performed equally well as theirs (both with an F-score of 91.4, see Tables 3 and 4). |
Experiments | With only local features, our forest reranker achieves an F-score of 91.25, and with the addition of non- |
Forest Reranking | yEeand where function F returns the F-score . |
Forest Reranking | 2In case multiple candidates get the same highest F-score , we choose the parse with the highest log probability from the baseline parser to be the oracle parse (Collins, 2000). |
Introduction | we achieved an F-score of 91.7, which is a 19% error reduction from the l-best baseline, and outperforms both 50-best and 100-best reranking. |
Supporting Forest Algorithms | 4.1 Forest Oracle Recall that the Parseval F-score is the harmonic mean of labelled precision P and labelled recall R: 2PR _ 2|y fl y*| P + R lyl + |y*| |
Supporting Forest Algorithms | In other words, the optimal F-score tree in a forest is not guaranteed to be composed of two optimal F-score subtrees. |
Supporting Forest Algorithms | Shown in Pseudocode 4, we perform these computations in a bottom-up topological order, and finally at the root node TOP, we can compute the best global F-score by maximizing over different numbers of test brackets (line 7). |
Discussion | Table 8 sum-marises the results, showing that the error reduction rate (ERR) over the parsing F-score is up to 6.9%, which is remarkable given the relatively superficial strategy for incorporating sense information into the parser. |
Experimental setting | We evaluate the parsers via labelled bracketing recall (R), precision (’P) and F-score (.731). |
Results | The SFU representation produces the best results for Bikel (F-score 0.010 above baseline), while for Charniak the best performance is obtained with word+SF ( F-score 0.007 above baseline). |
Results | Overall, Bikel obtains a superior F-score in all configurations. |
Results | Again, the F-score for the semantic representations is better than the baseline in all cases. |
Abstract | Our NR classification evaluation strictly follows the ACL SemEval-07 Task 4 datasets and protocol, obtaining an f-score of 70.6, as opposed to 64.8 of the best previous work that did not use the manually provided WordNet sense disambiguation tags. |
Results | In fact, our results ( f-score 62.0, accuracy 64.5) are better than the averaged results (58.0, 61.1) of the group that did not utilize WN tags. |
Results | Table 2 shows the HITS-based classification results ( F-score and Accuracy) and the number of positively labeled clusters (C) for each relation. |
Results | We have used the exact evaluation procedure described in (Turney, 2006), achieving a class f-score average of 60.1, as opposed to 54.6 in (Turney, 2005) and 51.2 in (Nastase et al., 2006). |