Coreference Subtask Analysis | The MUC scoring algorithm (Vilain et a1., 1995) computes the F1 score (harmonic mean) of precision and recall based on the identifcation of unique coreference links. |
Coreference Subtask Analysis | The B3 algorithm (Bagga and Baldwin, 1998) computes a precision and recall score for each CE: |
Coreference Subtask Analysis | Precision and recall for a set of documents are computed as the mean over all CEs in the documents and the F1 score of precision and recall is reported. |
Dependency-based evaluation | Overlap between the candidate bag and the reference bag is calculated in the form of precision, recall, and the f-measure (with precision and recall equally weighted). |
Discussion and future work | Much like the inverse relation of precision and recall , changes and additions that improve a metric’s correlation with human scores for model summaries often weaken the correlation for system summaries, and vice versa. |
Lexical-Functional Grammar and the LFG parser | (2004) obtains high precision and recall rates. |
Fitting the Model | We also measure the quality of the two observed grammars/dictionaries by computing their precision and recall against the grammar/dictionary we observe in the gold tagging.4 We find that precision of the observed grammar increases from 0.73 (EM) to 0.94 (IP+EM). |
Fitting the Model | Figure 6: Comparison of observed grammars from the model tagging vs. gold tagging in terms of precision and recall measures. |
Restarts and More Data | Figure 7: Comparison of observed dictionaries from the model tagging vs. gold tagging in terms of precision and recall measures. |