Index of papers in Proc. ACL 2014 that mention
  • F1 score
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Experiments
Evaluation Metric: We evaluated the results with both Mean Average Precision (MAP) and F1 Score .
Experiments
Macro—averaged F1 Score .0 .o .o .o .o 00 00 A A 01 00 00 00 00 00 I I I I I
Experiments
(b) Macro-Averaged F1 Score
F1 score is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Wang, Chang and Fan, James
Experiments
The F1 scores reported here are the average of all 5 rounds.
Experiments
The tree kemel-based approach and linear regression achieved similar F1 scores , while linear SVM made a 5% improvement over them.
Experiments
By integrating unlabeled data, the manifold model under setting (1) made a 15% improvement over linear regression model on F1 score , where the improvement was significant across all relations.
F1 score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Yang, Bishan and Cardie, Claire
Experiments
Table 4: F1 scores for each sentiment category (positive, negative and neutral) for semi-supervised sentiment classification on the MD dataset
Experiments
Table 4 shows the results in terms of F1 scores for each sentiment category (positive, negative and neutral).
Experiments
We observe that the DOCORACLE baseline provides very strong F1 scores on the positive and negative categories especially in the Books and Music domains, but very poor F1 on the neutral category.
F1 score is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Anzaroot, Sam and Passos, Alexandre and Belanger, David and McCallum, Andrew
Citation Extraction Data
Table 1: Set of constraints learned and F1 scores .
Citation Extraction Data
This final feature improves the F1 score on the cleaned test set from 94.0 F1 to 94.44 F1, which we use as a baseline score.
Citation Extraction Data
We asses performance in terms of field-level F1 score , which is the harmonic mean of precision and recall for predicted segments.
F1 score is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jansen, Peter and Surdeanu, Mihai and Clark, Peter
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%
We adapted them to this dataset by weighing each answer by its overlap with gold answers, where overlap is measured as the highest F1 score between the candidate and a gold answer.
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%
Thus, P@1 reduces to this F1 score for the top answer.
CR + LS + DMM + DPM 39.32* +24% 47.86* +20%
For example, if the best answer for a question appears at rank 2 with an F1 score of 0.3, the corresponding MRR score is 0.3 / 2.
F1 score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Qi and Ji, Heng
Experiments
The human F1 score on end-to-end relation extraction is only about 70%, which indicates it is a very challenging task.
Experiments
Furthermore, the F1 score of the inter-annotator agreement is 51.9%, which is only 2.4% above that of our proposed method.
Experiments
For entity mention extraction, our joint model achieved 79.7% on 5-fold cross-validation, which is comparable with the best F1 score 79.2% reported by (Florian et al., 2006) on single-fold.
F1 score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Experiment
We can see that the parsing F1 decreased by about 8.5 percentage points in F1 score when using automatically assigned POS tags instead of gold-standard ones, and this shows that the pipeline approach is greatly affected by the quality of its preliminary POS tagging step.
Experiment
Compared with the JointParsing system which does not employ any alignment strategy, the Padding system only achieved a slight improvement on parsing F1 score , but no improvement on POS tagging accuracy.
Experiment
In contrast, our StateAlign system achieved an improvement of 0.6% on parsing F1 score and 0.4% on POS tagging accuracy.
F1 score is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: