Introduction | Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature. |
Reranking with SVMs 4.1 Methods | precision and recall labeled and unlabeled precision and recall for each parser’s best parse |
Reranking with SVMs 4.1 Methods | per-label precision and recall (dep) precision and recall for each type of dependency obtained from each parser’s best parse (using zero if not defined for lack of predicted or gold dependencies with a given label) |
Reranking with SVMs 4.1 Methods | n-best precision and recall (nbest) labeled and unlabeled precision and recall for each parser’s top five parses, along with the same features for the most accurate of these parses |
Evaluation of lexical similarity in context | Figure 3 shows the influence of the threshold value to select relevant pairs, when considering precision and recall of the pairs that are kept when choosing the threshold, evaluated against the human annotation of relevance in context. |
Evaluation of lexical similarity in context | In case one wants to optimize the F-score (the harmonic mean of precision and recall ) when extracting relevant pairs, we can see that the optimal point is at .24 for a threshold of .22 on Lin’s score. |
Experiments: predicting relevance in context | Figure 3: Precision and recall on relevant links with respect to a threshold on the similarity measure (Lin’s score) |
Experiments: predicting relevance in context | We have seen that the relevant/not relevant classification is very imbalanced, biased towards the “not relevant” category (about 11%/89%), so we applied methods dedicated to counterbalance this, and will focus on the precision and recall of the predicted relevant links. |
Experiments: predicting relevance in context | Other popular methods (maximum entropy, SVM) have shown slightly inferior combined F-score, even though precision and recall might yield more important variations. |
Conclusion | We also present results on the precision and recall tradeoffs inherent in this task. |
Experiments | We note that previous work achieves higher ancestor precision, while our approach achieves a more even balance between precision and recall . |
Experiments | Of course, precision and recall should both ideally be high, even if some applications weigh one over the other. |
Introduction | We note that our approach falls at a different point in the space of performance tradeoffs from past work — by producing complete, highly articulated trees, we naturally see a more even balance between precision and recall , while past work generally focused on precision.1 To |
Introduction | 1While different applications will value precision and recall differently, and past work was often intentionally precision-focused, it is certainly the case that an ideal solution would maximize both. |
Comparative Evaluation | MENTA is the closest system to ours, obtaining slightly higher precision and recall . |
Comparative Evaluation | Notably, however, MENTA outputs the first WordNet sense of entity for 13.17% of all the given answers, which, despite being correct and accounted in precision and recall , is uninformative. |
Phase 1: Inducing the Page Taxonomy | Not only does our taxonomy show high precision and recall in extracting ambiguous hypernyms, it also disambiguates more than 3/4 of the hypernyms with high precision. |
Experiments | Figure 4: Labeled precision and recall relative to normal-form model is used. |
Experiments | Table 1 shows the accuracies of all parsers on the development set, in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank. |
Experiments | To probe this further we compare labeled precision and recall relative to dependency length, as measured by the distance between the two words in a dependency, grouped into bins of 5 values. |