Index of papers in Proc. ACL 2009 that mention
  • F-measure
Persing, Isaac and Ng, Vincent
Abstract
Experimental results show that our algorithm yields a relative error reduction of 6.3% in F-measure for the minority classes in comparison to a baseline that learns solely from the labeled data.
Baseline Approaches
Results are reported in terms of precision (P), recall (R), and F-measure (F), which are computed by aggregating over the 14 shapers as follows.
Baseline Approaches
Our second baseline is similar to the first, except that we tune the classification threshold (CT) to optimize F-measure .
Baseline Approaches
Using the development data, we tune the 14 CTs jointly to optimize overall F-measure .
Evaluation
Micro-averaged 5-fold cross validation results of this baseline for all 14 shapers and for just 10 minority classes (due to our focus on improving minority class prediction) are expressed as percentages in terms of precision (P), recall (R), and F-measure (F) in the first row of Table 4.
Evaluation
As we can see, the baseline achieves an F-measure of 45.4 (14 shapers) and 35.4 (10 shapers).
Evaluation
Comparing these two results, the higher F-measure achieved using all 14 shapers can be attributed primarily to improvements in recall.
Introduction
In comparison to a supervised baseline approach where a classifier is acquired solely based on the training set, our bootstrapping approach yields a relative error reduction of 6.3% in F-measure for the minority classes.
Our Bootstrapping Algorithm
In particular, if the second baseline is used, we will tune CT and k jointly on the development data using the local search algorithm described previously, where we adjust the values of both CT and k for one of the 14 classifiers in each step of the search process to optimize the overall F-measure score.
F-measure is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok
Experiment
For performance evaluations of opinion and polarity detection, we use precision, recall, and F-measure , the same measure used to report the official results at the NTCIR MOAT workshop.
Experiment
System parameters are optimized for F-measure using NTCIR6 dataset with lenient evaluations.
Experiment
Model Precision Recall F-Measure BASELINE 0.305 0.866 0.451 VS 0.331 0.807 0.470 BM25 0.327 0.795 0.464 LM 0.325 0.794 0.461 LSA 0.315 0.806 0.453 PMI 0.342 0.603 0.436 DTP 0.322 0.778 0.455 VS-LSA 0.335 0.769 0.466 VS-PMI 0.311 0.833 0.453 VS-DTP 0.342 0.745 0.469
F-measure is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ohno, Tomohiro and Murata, Masaki and Matsubara, Shigeki
Discussion
Here, we compared our method with the baseline 3, of which F-measure was highest among four baselines described in Section 5.1.
Discussion
recall (%) precision (%) F-measure by human 89.82 (459/511) 89.82 (459/511) 89.82 our method 82.19 (420/511) 81.71 (420/514) 81.95
Discussion
In F-measure , our method achieved 91.24% (8 1 .95/ 89.82) of the result by the human annotator.
Experiment
recall (%) precision (%) F-measure
Experiment
On the other hand, the F-measure and the sentence accuracy of our method were 81.43 and 53.15%, respectively.
F-measure is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina
Abstract
On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure .
Experimental setup
In both cases our implementation achieves F-measure in the range of 69-70% on W8] 10, broadly in line with the performance reported by Klein and Manning (2002).
Experimental setup
To evaluate both our model as well as the baseline, we use (unlabeled) bracket precision, recall, and F-measure (Klein and Manning, 2002).
Experimental setup
We also report the upper bound on F-measure for binary trees.
Introduction
On average, over all the testing scenarios that we studied, our model achieves an absolute increase in F-measure of 8.8 points, and a 19% reduction in error relative to a theoretical upper bound.
Results
On average, the bilingual model gains 10.2 percentage points in precision, 7.7 in recall, and 8.8 in F-measure .
Results
The Korean-English pairing results in substantial improvements for Korean and quite large improvements for English, for which the absolute gain reaches 28 points in F-measure .
F-measure is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Huang, Liang and Liu, Qun
Conclusion and Future Works
It obtains considerable F-measure increment, about 0.8 point for word segmentation and 1 point for Joint S&T, with corresponding error reductions of 30.2% and 14%.
Conclusion and Future Works
Moreover, such improvement further brings striking F-measure increment for Chinese parsing, about 0.8 points, corresponding to an error propagation reduction of 38%.
Experiments
For word segmentation, the model after annotation adaptation (row 4 in upper subtable) achieves an F-measure increment of 0.8 points over the baseline model, corresponding to an error reduction of 30.2%; while for Joint S&T, the F-measure increment of the adapted model (row 4 in subtable below) is 1 point, which corresponds to an error reduction of 14%.
Experiments
Note that if we input the gold-standard segmented test set into the parser, the F-measure under the two definitions are the same.
Experiments
The parsing F-measure corresponding to the gold-standard segmentation, 82.35, represents the “oracle” accuracy (i.e., upperbound) of parsing on top of automatic word segmention.
F-measure is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Conclusion
lected among multiple alignments and it obtained 0.8 F-measure improvement over the single best Chinese-English aligner.
Conclusion
The second is the alignment link confidence measure, which selects the most reliable links from multiple alignments and obtained 1.5 F-measure improvement.
Improved MaXEnt Aligner with Confidence-based Link Filtering
the highest F-measure among the three aligners, although the algorithm described below can be applied to any aligner.
Improved MaXEnt Aligner with Confidence-based Link Filtering
For CE alignment, removing low confidence alignment links increased alignment precision by 5.5 point, while decreased recall by 1.8 point, and the overall alignment F-measure is increased by 1.3 point.
Improved MaXEnt Aligner with Confidence-based Link Filtering
When looking into the alignment links which are removed during the alignment link filtering process, we found that 80% of the removed links (1320 out of 1661 links) are incorrect alignments, For A-E alignment, it increased the precision by 3 points while reducing recall by 0.5 points, and the alignment F-measure is increased by about 1.5 points absolute, a 10% relative alignment error rate reduction.
F-measure is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Somasundaran, Swapna and Wiebe, Janyce
Experiments
#Correct’ Recall m and F-measure #guessed #relevant
Experiments
Finally, both of the OpPr systems are better than both baselines in Accuracy as well as F-measure for all four debates.
Experiments
The F-measure improves, on average, by 25 percentage points over the OpTopic system, and by 17 percentage points over the OpPMI system.
F-measure is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jeon, Je Hun and Liu, Yang
Experiments and results
The F-measure score using the initial training data is 0.69.
Experiments and results
Most of the previous work for prosodic event detection reported their results using classification accuracy instead of F-measure .
Experiments and results
Table 3: The results ( F-measure ) of prosodic event detection for supervised and co-training approaches.
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Uchimoto, Kiyotaka and Torisawa, Kentaro
Abstract
Experimental results show that our approach improved the F-measure by 3.6—10.3%.
Motivation
(2008), which was only applied for Japanese and achieved around 80% in F-measure .
Motivation
Experimental results showed that our method based on bilingual co-training improved the performance of monolingual hyponymy-relation acquisition about 3.6—10.3% in the F-measure .
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: