Experiments | Table 1: ROUGE-1 recall (R) and F-measure (F) results (%) on DUC-04. |
Experiments | On the other hand, when using £1(S) with an 04 < 1 (the value of 04 was determined on DUC-03 using a grid search), a ROUGE-1 F-measure score 38.65% |
Experiments | ROUGE-1 F-measure (%) |
Abstract | We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming most of the semi-supervised systems that were submitted. |
Conclusion | We evaluated it against the semi-supervised systems of NEWS10 and achieved high F-measure and performed better than most of the semi-supervised systems. |
Conclusion | We also evaluated our method on parallel corpora and achieved high F-measure . |
Experiments | “Our” shows the F-measure of our filtered data against the gold standard using the supplied evaluation tool, “Systems” is the total number of participants in the subtask, and “Rank” is the rank we would have obtained if our system had participated. |
Experiments | We calculate the F-measure of our filtered transliteration pairs against the supplied gold standard using the supplied evaluation tool. |
Experiments | On the English/Russian data set, our system achieves 76% F-measure which is not good compared with the systems that participated in the shared task. |
Introduction | We achieve an F-measure of up to 92% outperforming most of the semi-supervised systems. |
Results | For the alignment detection, we report the precision, recall, and F-measure associated with correctly detecting matches. |
Results | The threshold weight learned from the bias feature strongly influences the point at which real scores change from non-matches to matches, and given the threshold weight learned by the algorithm, we find an F-measure of 0.72, with precision(P) = 0.85 and recall(R) = 0.62. |
Results | By manually varying the threshold, we find a maximum F-measure of 0.76, with P=0.79 and R=0.74. |
Experiments | Overall, we improve the labelled F-measure by almost 1.1% and unlabelled F-measure by 0.6% over the baseline. |
Experiments | Table 6: Parsing time in seconds per sentence (vs. F-measure ) on section 00. |
Oracle Parsing | F-measure |
Oracle Parsing | The inverse relationship between model score and F-score shows that the supertagger restricts the parser to mostly good parses (under F-measure ) that the model would otherwise disprefer. |
Experiments | Using the same evaluation setting, our baseline RE system achieves a competitive 71.4 F-measure . |
Experiments | The results show that by using syntactico-semantic structures, we obtain significant F-measure improvements of 8.3, 7.2, and 5.5 for binary, coarse-grained, and fine-grained relation predictions respectively. |
Experiments | The results show that by leveraging syntactico-semantic structures, we obtain significant F-measure improvements of 8.2, 4.6, and 3.6 for binary, coarse-grained, and fine-grained relation predictions respectively. |
Evaluation and Discussion | The performance is measured using precision, recall, and f-measure . |
Evaluation and Discussion | We additionally used the weighted f-measure (wF) to aggregate the f-measure of the three categories. |
Evaluation and Discussion | The overall average of the weighted f-measure among issues was 0.68, 0.59, and 0.48 for the DrC, QbC, and Sim. |