Experiment and Discussion | Table 1 shows the micro and macro averages of F1 scores . |
Experiment and Discussion | Moreover, the macro-averaged F1 scores clearly showed improvements resulting from using role groups. |
Experiment and Discussion | In Table 2, we show that the micro-averaged F1 score for roles having 10 instances or less was improved (by 15.46 points) when all role groups were used. |
Coreference Subtask Analysis | The MUC scoring algorithm (Vilain et a1., 1995) computes the F1 score (harmonic mean) of precision and recall based on the identifcation of unique coreference links. |
Coreference Subtask Analysis | Precision and recall for a set of documents are computed as the mean over all CEs in the documents and the F1 score of precision and recall is reported. |
Resolution Complexity | We then count the number of unique correct/incorrect links that the system introduced on top of the correct partial clustering and compute precision, recall, and F1 score . |