Index of papers in Proc. ACL 2010 that mention
  • F-measure
Spiegler, Sebastian and Flach, Peter A.
Conclusions
At an optimal decision threshold, however, both yielded a similar f-measure result.
Experiments and Results
F-measure
Experiments and Results
F-measure
Experiments and Results
Since both algorithms show different behaviour with increasing experience and PROMODES-H yields a higher f-measure across all datasets, we will investigate in the next experiments how these differences manifest themselves at the boundary level.
F-measure is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Riesa, Jason and Marcu, Daniel
Abstract
Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure , yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
Conclusion
We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today.
Discriminative training
Note that Equation 2 is equivalent to maximizing the sum of the F-measure and model score of y:
Experiments
These plots show the current F-measure on the training set as time passes.
Experiments
F-measure
Experiments
The first three columns of Table 2 show the balanced F-measure , Precision, and Recall of our alignments versus the two GIZA++ Model-4 baselines.
Word Alignment as a Hypergraph
(1) that using the structure of l-best English syntactic parse trees is a reasonable way to frame and drive our search, and (2) that F-measure approximately decomposes over hyperedges.
F-measure is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Reiter, Nils and Frank, Anette
Introduction
Neither recall nor f-measure are reported.
Introduction
The feature whose omission causes the biggest drop in f-measure is set aside as a strong feature.
Introduction
From the ranked list of features f1 to fn we evaluate increasingly extended feature sets f1.. fi for i = 2..n. We select the feature set that yields the best balanced performance, at 45.7% precision and 53.6% f-measure .
F-measure is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wu, Fei and Weld, Daniel S.
Conclusion
Comparing with TextRunner, WOEPOS runs at the same speed, but achieves an F-measure which is between 18% and 34% greater on three corpora; WOEparse achieves an F-measure which is between 72% and 91% higher than that of TextRunner, but runs about 30X times slower due to the time required for parsing.
Experiments
Figure 3: WOEPOS achieves an F-measure , which is between 18% and 34% better than TextRunner’s.
Experiments
Figure 4: WOEparse’s F-measure decreases more slowly with sentence length than WOEPOS and TextRunner, due to its better handling of difficult sentences using parser features.
Introduction
Compared with TextRunner (the state of the art) on three corpora, WOE yields between 72% and 91% improved F-measure — generalizing well beyond Wikipedia.
Wikipedia-based Open IE
As shown in the experiments on three corpora, WOEparse achieves an F-measure which is between 72% to 91% greater than TextRunner’s.
Wikipedia-based Open IE
As shown in the experiments, WOEPOS achieves an improved F-measure over TextRunner between 18% to 34% on three corpora, and this is mainly due to the increase on precision.
F-measure is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Elson, David and Dames, Nicholas and McKeown, Kathleen
Extracting Conversational Networks from Literature
Table 2: Precision, recall, and F-measure of three methods for detecting bilateral conversations in literary texts.
Extracting Conversational Networks from Literature
The precision and recall values shown for the baselines in Table 2 represent the highest performance we achieved by varying t between 0 and 1 (maximizing F-measure over 25).
Extracting Conversational Networks from Literature
Both baselines performed significantly worse in precision and F-measure than our quoted speech adjacency method for detecting conversations.
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Han, Xianpei and Zhao, Jun
Experiments
F-Measure (F): the harmonic mean of purity and inverse purity.
Experiments
We use F-measure as the primary measure just liking WePSl and WePS2.
Experiments
The F-Measure vs. 1 on three data sets
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Substructure Spaces for BTKs
The coefficient Oi for the composite kernel are tuned with respect to F-measure (F) on the development set of HIT corpus.
Substructure Spaces for BTKs
Those thresholds are also tuned on the development set of HIT corpus with respect to F-measure .
Substructure Spaces for BTKs
The evaluation is conducted by means of Precision (P), Recall (R) and F-measure (F).
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: