Conclusions | At an optimal decision threshold, however, both yielded a similar f-measure result. |
Experiments and Results | F-measure |
Experiments and Results | F-measure |
Experiments and Results | Since both algorithms show different behaviour with increasing experience and PROMODES-H yields a higher f-measure across all datasets, we will investigate in the next experiments how these differences manifest themselves at the boundary level. |
Abstract | Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure , yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system. |
Conclusion | We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today. |
Discriminative training | Note that Equation 2 is equivalent to maximizing the sum of the F-measure and model score of y: |
Experiments | These plots show the current F-measure on the training set as time passes. |
Experiments | F-measure |
Experiments | The first three columns of Table 2 show the balanced F-measure , Precision, and Recall of our alignments versus the two GIZA++ Model-4 baselines. |
Word Alignment as a Hypergraph | (1) that using the structure of l-best English syntactic parse trees is a reasonable way to frame and drive our search, and (2) that F-measure approximately decomposes over hyperedges. |
Introduction | Neither recall nor f-measure are reported. |
Introduction | The feature whose omission causes the biggest drop in f-measure is set aside as a strong feature. |
Introduction | From the ranked list of features f1 to fn we evaluate increasingly extended feature sets f1.. fi for i = 2..n. We select the feature set that yields the best balanced performance, at 45.7% precision and 53.6% f-measure . |
Conclusion | Comparing with TextRunner, WOEPOS runs at the same speed, but achieves an F-measure which is between 18% and 34% greater on three corpora; WOEparse achieves an F-measure which is between 72% and 91% higher than that of TextRunner, but runs about 30X times slower due to the time required for parsing. |
Experiments | Figure 3: WOEPOS achieves an F-measure , which is between 18% and 34% better than TextRunner’s. |
Experiments | Figure 4: WOEparse’s F-measure decreases more slowly with sentence length than WOEPOS and TextRunner, due to its better handling of difficult sentences using parser features. |
Introduction | Compared with TextRunner (the state of the art) on three corpora, WOE yields between 72% and 91% improved F-measure — generalizing well beyond Wikipedia. |
Wikipedia-based Open IE | As shown in the experiments on three corpora, WOEparse achieves an F-measure which is between 72% to 91% greater than TextRunner’s. |
Wikipedia-based Open IE | As shown in the experiments, WOEPOS achieves an improved F-measure over TextRunner between 18% to 34% on three corpora, and this is mainly due to the increase on precision. |
Extracting Conversational Networks from Literature | Table 2: Precision, recall, and F-measure of three methods for detecting bilateral conversations in literary texts. |
Extracting Conversational Networks from Literature | The precision and recall values shown for the baselines in Table 2 represent the highest performance we achieved by varying t between 0 and 1 (maximizing F-measure over 25). |
Extracting Conversational Networks from Literature | Both baselines performed significantly worse in precision and F-measure than our quoted speech adjacency method for detecting conversations. |
Experiments | F-Measure (F): the harmonic mean of purity and inverse purity. |
Experiments | We use F-measure as the primary measure just liking WePSl and WePS2. |
Experiments | The F-Measure vs. 1 on three data sets |
Substructure Spaces for BTKs | The coefficient Oi for the composite kernel are tuned with respect to F-measure (F) on the development set of HIT corpus. |
Substructure Spaces for BTKs | Those thresholds are also tuned on the development set of HIT corpus with respect to F-measure . |
Substructure Spaces for BTKs | The evaluation is conducted by means of Precision (P), Recall (R) and F-measure (F). |