Index of papers in Proc. ACL that mention
  • precision and recall
Zeller, Britta and Šnajder, Jan and Padó, Sebastian
Building the Resource
We then revised the rules with the aim of increasing both precision and recall .
Evaluation
To obtain reliable estimates of both precision and recall , we decided to draw two different samples: (l) a sample of lemma pairs sampled from the induced derivational families, on which we estimate precision (P-sample) and (2) a sample of lemma pairs sampled from the set of possibly derivationally related lemma pairs, on which we estimate recall (R-sample).
Evaluation
Table 5: Precision and recall on test samples
Introduction
We conduct a thorough evaluation of the induced derivational families both regarding precision and recall .
Results
We omit the F1 score because its use for precision and recall estimates from different samples is unclear.
Results
The string distance-based approaches achieve more balanced precision and recall scores.
Results
Note that for these methods, precision and recall can be traded off against each other by varying the number of clusters; we chose the number of clusters by optimizing the F1 score on the calibration and validaton sets.
precision and recall is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Abstract
We attempt to tease apart the effects that this simple but effective modification has on alignment precision and recall tradeoffs, and how rare and common words are affected across several language pairs.
Word alignment results
Consequently, in addition to AER, we focus on precision and recall .
Word alignment results
Figure 3 shows the change in precision and recall with the amount of provided training data for the Hansards corpus.
Word alignment results
We see that agreement constraints improve both precision and recall when we
precision and recall is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Muller, Philippe and Fabre, Cécile and Adam, Clémentine
Evaluation of lexical similarity in context
Figure 3 shows the influence of the threshold value to select relevant pairs, when considering precision and recall of the pairs that are kept when choosing the threshold, evaluated against the human annotation of relevance in context.
Evaluation of lexical similarity in context
In case one wants to optimize the F-score (the harmonic mean of precision and recall ) when extracting relevant pairs, we can see that the optimal point is at .24 for a threshold of .22 on Lin’s score.
Experiments: predicting relevance in context
Figure 3: Precision and recall on relevant links with respect to a threshold on the similarity measure (Lin’s score)
Experiments: predicting relevance in context
We have seen that the relevant/not relevant classification is very imbalanced, biased towards the “not relevant” category (about 11%/89%), so we applied methods dedicated to counterbalance this, and will focus on the precision and recall of the predicted relevant links.
Experiments: predicting relevance in context
Other popular methods (maximum entropy, SVM) have shown slightly inferior combined F-score, even though precision and recall might yield more important variations.
precision and recall is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Duan, Manjuan and White, Michael
Introduction
Therefore, to develop a more nuanced self-monitoring reranker that is more robust to such parsing mistakes, we trained an SVM using dependency precision and recall features for all three parses, their n-best parsing results, and per-label precision and recall for each type of dependency, together with the realizer’s normalized perceptron model score as a feature.
Reranking with SVMs 4.1 Methods
precision and recall labeled and unlabeled precision and recall for each parser’s best parse
Reranking with SVMs 4.1 Methods
per-label precision and recall (dep) precision and recall for each type of dependency obtained from each parser’s best parse (using zero if not defined for lack of predicted or gold dependencies with a given label)
Reranking with SVMs 4.1 Methods
n-best precision and recall (nbest) labeled and unlabeled precision and recall for each parser’s top five parses, along with the same features for the most accurate of these parses
precision and recall is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Baldridge, Jason and Knight, Kevin
Conclusion
Table 6: Comparison of grammar/lexicon observed in the model tagging vs. gold tagging in terms of precision and recall measures for supertagging on CCG—TUT.
Experiments
Precision and recall of grammar and lexicon.
Experiments
Table 3: Comparison of grammar/lexicon observed in the model tagging vs. gold tagging in terms of precision and recall measures for supertagging on CCGbank data.
Experiments
We can obtain a more-fine grained understanding of how the models differ by considering the precision and recall values for the grammars and lexicons of the different models, given in Table 3.
precision and recall is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Abstract
Experimental results show our method can achieve high precision and recall in measure word generation.
Conclusion and Future Work
Experimental results show that our method not only achieves high precision and recall for generating measure words, but also improves the quality of English-to-Chinese SMT systems.
Experiments
Table 3 and Table 4 show the precision and recall of our measure word generation method.
Experiments
In addition to precision and recall , we also evaluate the Bleu score (Papineni et al., 2002) changes before and after applying our measure word generation method to the SMT output.
Experiments
Table 6 and Table 7 show the precision and recall when using different features.
precision and recall is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Park, Keun Chan and Jeong, Yoonjae and Myaeng, Sung Hyon
Conclusion and Future Work
For experience detection, the performance was very promising, closed to 92% in precision and recall when all the features were used.
Experience Detection
We not only compared our results with the baseline in terms of precision and recall but also
Experience Detection
The performance for the best case with all the features included is very promising, closed to 92% precision and recall .
Experience Detection
In order to see the effect of including individual features in the feature set, precision and recall were measured after eliminating a particular feature from the full set.
Lexicon Construction
Note that the precision and recall are macro-averaged values across the two classes, activity and state.
precision and recall is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hoffmann, Raphael and Zhang, Congle and Ling, Xiao and Zettlemoyer, Luke and Weld, Daniel S.
Experimental Setup
Aggregate Extraction Let A6 be the set of extracted relations for any of the systems; we compute aggregate precision and recall by comparing A6 with A.
Experimental Setup
We then report precision and recall for each system on this set of sampled sentences.
Experiments
Since the data contains an unbalanced number of instances of each relation, we also report precision and recall for each of the ten most frequent relations.
Experiments
Table 1 presents this approximate precision and recall for MULTIR on each of the relations, along with statistics we computed to measure the quality of the weak supervision.
Introduction
use supervised learning of relation-specific examples, which can achieve high precision and recall .
Related Work
While they offer high precision and recall , these methods are unlikely to scale to the thousands of relations found in text on the Web.
precision and recall is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan
Conclusion
We also present results on the precision and recall tradeoffs inherent in this task.
Experiments
We note that previous work achieves higher ancestor precision, while our approach achieves a more even balance between precision and recall .
Experiments
Of course, precision and recall should both ideally be high, even if some applications weigh one over the other.
Introduction
We note that our approach falls at a different point in the space of performance tradeoffs from past work — by producing complete, highly articulated trees, we naturally see a more even balance between precision and recall , while past work generally focused on precision.1 To
Introduction
1While different applications will value precision and recall differently, and past work was often intentionally precision-focused, it is certainly the case that an ideal solution would maximize both.
precision and recall is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hoffmann, Raphael and Zhang, Congle and Weld, Daniel S.
Conclusion
Many researchers are trying to use IE to create large-scale knowledge bases from natural language text on the Web, but existing relation-specific techniques do not scale to the thousands of relations encoded in Web text — while relation-independent techniques suffer from lower precision and recall , and do not canonicalize the relations.
Extraction with Lexicons
We expect that lists with higher similarity are more likely to contain phrases which are related to our seeds; hence, by varying the similarity threshold one may produce lexicons representing different compromises between lexicon precision and recall .
Introduction
Open extraction is more scalable, but has lower precision and recall .
Related Work
Open IE, self-supervised learning of unlexicalized, relation-independent extractors (Banko et al., 2007), is a more scalable approach, but suffers from lower precision and recall , and doesn’t canonicalize the relations.
Related Work
The goal of set expansion techniques is to generate high precision sets of related items; hence, these techniques are evaluated based on lexicon precision and recall .
precision and recall is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
Abstract
In this work, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log—linear model aiming for a balanced precision and recall .
Conclusions
In this paper, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall .
Discussions
The generic phrase training algorithm follows an information retrieval perspective as in (Venugopal et al., 2003) but aims to improve both precision and recall with the trainable log-linear model.
Discussions
It implies a balancing process between precision and recall .
Introduction
As in information retrieval, precision and recall issues need to be addressed with a right balance for building a phrase translation table.
precision and recall is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wu, Fei and Weld, Daniel S.
Abstract
This paper presents WOE, an open IE system which improves dramatically on TextRunner’s precision and recall .
Abstract
WOE can operate in two modes: when restricted to P08 tag features, it runs as quickly as TextRunner, but when set to use dependency-parse features its precision and recall rise even higher.
Introduction
high precision and recall , they are limited by the availability of training data and are unlikely to scale to the thousands of relations found in text on the Web.
Introduction
WOE can operate in two modes: when restricted to shallow features like part-of-speech (POS) tags, it runs as quickly as Textrunner, but when set to use dependency-parse features its precision and recall rise even higher.
precision and recall is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kobdani, Hamidreza and Schuetze, Hinrich and Schiehlen, Michael and Kamp, Hans
Results and Discussion
Both precision and recall are improved with two exceptions: recall of B3 decreases from line 2 to 3 and from 15 to 16.
Results and Discussion
In contrast to F1, there is no consistent trend for precision and recall .
Results and Discussion
But this higher variability for precision and recall is to be expected since every system trades the two measures off differently.
precision and recall is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Habash, Nizar and Roth, Ryan
Results
We present the results in terms of F-score only for simplicity; we then conduct an error analysis that examines precision and recall .
Results
We consider the performance in terms of precision and recall in addition to F-score — see Table 7 (a).
Results
Overall, there is no major tradeoff between precision and recall across the different settings; although we can observe the following: (i) adding more training data helps precision more than recall (over three times more) — compare the last two columns in Table 7 (a); and (ii) the best setting has a slightly lower precision than all features, although a much better recall — compare columns 4 and 5 in Table 7 (a).
precision and recall is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
do not report (NR) separate values for precision and recall on this dataset.
Introduction
Differences in both precision and recall between the baseline and the other systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
Introduction
Differences in both precision and recall between the baseline and the Span-HMM systems are statistically significant at p < 0.01 using the two-tailed Fisher’s exact test.
precision and recall is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Stoyanov, Veselin and Gilbert, Nathan and Cardie, Claire and Riloff, Ellen
Coreference Subtask Analysis
The MUC scoring algorithm (Vilain et a1., 1995) computes the F1 score (harmonic mean) of precision and recall based on the identifcation of unique coreference links.
Coreference Subtask Analysis
The B3 algorithm (Bagga and Baldwin, 1998) computes a precision and recall score for each CE:
Coreference Subtask Analysis
Precision and recall for a set of documents are computed as the mean over all CEs in the documents and the F1 score of precision and recall is reported.
precision and recall is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Macherey, Klaus
Conclusion
The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments.
Experimental Results
The bidirectional model improves both precision and recall relative to all heuristic combination techniques, including grow-diag-final (Koehn et al., 2003).
Experimental Results
As our model only provides small improvements in alignment precision and recall for the union combiner, the magnitude of the BLEU improvement is not surprising.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Spiegler, Sebastian and Flach, Peter A.
Experiments and Results
It seems that the model gets quickly saturated in terms of incorporating new information and therefore precision and recall do not drastically change for increasing dataset sizes.
Experiments and Results
For this reason we broke down the summary measures of precision and recall into their original components: true/false positive (TP/FF) and negative (TN/FN) counts presented in the 2 x 2 contingency table of Figure 1.
Experiments and Results
The optimal solution applying [1* = 0.38 is more balanced between precision and recall and
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ritter, Alan and Mausam and Etzioni, Oren
Experiments
In figure 5 we compare the precision and recall of LDA-SP against the top two performing systems described by Pantel et al.
Experiments
We find that LDA-SP achieves both higher precision and recall than ISP.IIM-\/.
Experiments
Figure 5: Precision and recall on the inference filtering task.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ponvert, Elias and Baldridge, Jason and Erk, Katrin
CD
While the first level of constituent analysis has high precision and recall on NPs, the second level often does well finding prepositional phrases (PPS), especially in WSJ; see Table 7.
Phrasal punctuation revisited
The table shows absolute improvement (+) or decline (—) in precision and recall when phrasal punctuation is removed from the data.
Tasks and Benchmark
It measures precision and recall on constituents produced by a parser as compared to gold standard constituents.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Larkin, Samuel
BLEU and PORT
2.2.1 Precision and Recall
BLEU and PORT
To combine precision and recall , we tried four averaging methods: arithmetic (A), geometric (G), harmonic (H), and quadratic (Q) mean.
BLEU and PORT
We chose the quadratic mean to combine precision and recall , as follows:
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Elson, David and Dames, Nicholas and McKeown, Kathleen
Extracting Conversational Networks from Literature
The precision and recall of our method for detecting conversations is shown in Table 2.
Extracting Conversational Networks from Literature
To calculate precision and recall for the two baseline social networks, we set a threshold 75 to derive a binary prediction from the continuous edge weights.
Extracting Conversational Networks from Literature
The precision and recall values shown for the baselines in Table 2 represent the highest performance we achieved by varying t between 0 and 1 (maximizing F-measure over 25).
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Fitting the Model
We also measure the quality of the two observed grammars/dictionaries by computing their precision and recall against the grammar/dictionary we observe in the gold tagging.4 We find that precision of the observed grammar increases from 0.73 (EM) to 0.94 (IP+EM).
Fitting the Model
Figure 6: Comparison of observed grammars from the model tagging vs. gold tagging in terms of precision and recall measures.
Restarts and More Data
Figure 7: Comparison of observed dictionaries from the model tagging vs. gold tagging in terms of precision and recall measures.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Flati, Tiziano and Vannella, Daniele and Pasini, Tommaso and Navigli, Roberto
Comparative Evaluation
MENTA is the closest system to ours, obtaining slightly higher precision and recall .
Comparative Evaluation
Notably, however, MENTA outputs the first WordNet sense of entity for 13.17% of all the given answers, which, despite being correct and accounted in precision and recall , is uninformative.
Phase 1: Inducing the Page Taxonomy
Not only does our taxonomy show high precision and recall in extracting ambiguous hypernyms, it also disambiguates more than 3/4 of the hypernyms with high precision.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Owczarzak, Karolina
Dependency-based evaluation
Overlap between the candidate bag and the reference bag is calculated in the form of precision, recall, and the f-measure (with precision and recall equally weighted).
Discussion and future work
Much like the inverse relation of precision and recall , changes and additions that improve a metric’s correlation with human scores for model summaries often weaken the correlation for system summaries, and vice versa.
Lexical-Functional Grammar and the LFG parser
(2004) obtains high precision and recall rates.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xu, Wenduan and Clark, Stephen and Zhang, Yue
Experiments
Figure 4: Labeled precision and recall relative to normal-form model is used.
Experiments
Table 1 shows the accuracies of all parsers on the development set, in terms of labeled precision and recall over the predicate-argument dependencies in CCGBank.
Experiments
To probe this further we compare labeled precision and recall relative to dependency length, as measured by the distance between the two words in a dependency, grouped into bins of 5 values.
precision and recall is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: