Index of papers in Proc. ACL 2013 that mention
  • F-measure
Darwish, Kareem
Cross-lingual Features
This led to an overall improvement in F-measure of 1.8 to 3.4 points (absolute) or 4.2% to 5.7% (relative).
Cross-lingual Features
ture, transliteration mining slightly lowered precision — except for the TWEETS test set where the drop in precision was significant — and positively increased recall, leading to an overall improvement in F-measure for all test sets.
Related Work
They reported 80%, 37%, and 47% F-measure for locations, organizations, and persons respectively on the ANERCORP dataset that they created and publicly released.
Related Work
They reported 87%, 46%, and 52% F-measure for locations, organizations, and persons respectively.
Related Work
Using POS tagging generally improved recall at the expense of precision, leading to overall improvements in F-measure .
F-measure is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Wilson, Theresa and Yarowsky, David
Lexicon Bootstrapping
The set of parameters 5 is optimized using a grid search on the development data using F-measure for subjectivity classification.
Lexicon Evaluations
and F-measure results.
Lexicon Evaluations
For polarity classification we get comparable F-measure but much higher recall for Lg compared to SW N.
Lexicon Evaluations
Figure 1: Precision (x-axis), recall (y-axis) and F-measure (in the table) for English: L?
F-measure is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Guinaudeau, Camille and Strube, Michael
Experiments
Since the model can give the same score for a permutation and the original document, we also compute F-measure where recall is correct/total and precision equals correct/decisions.
Experiments
For evaluation purposes, the accuracy still corresponds to the number of correct ratings divided by the number of comparisons, while the F-measure combines recall and precision measures.
Experiments
Moreover, in contrast to the first experiment, when accounting for the number of entities “shared” by two sentences (PW), values of accuracy and F-measure are lower.
F-measure is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Conclusions
Table 3: Classification performance in F-measure for semantically ambiguous words on the most frequently confused descriptive tags in the movie domain.
Experiments
F-Measure
Experiments
Figure 2: F-measure for semantic clustering performance.
Experiments
As expected, we see a drop in F-measure on all models on descriptive tags.
F-measure is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, Ruey-Cheng
Concluding Remarks
Using MDL alone, one proposed method outperforms the original regularized compressor (Chen et al., 2012) in precision by 2 percentage points and in F-measure by 1.
Evaluation
Segmentation performance is measured using word-level precision (P), recall (R), and F-measure (F).
Evaluation
We found that, in all three settings, G2 outperforms the baseline by 1 to 2 percentage points in F-measure .
Evaluation
The best performance result achieved by G2 in our experiment is 81.7 in word-level F-measure , although this was obtained from search setting (c), using a heuristic p value 0.37.
F-measure is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Abstract
This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner.
Conclusion
We also proposed a model that scores alignments given source and target sentence reorderings that improves a supervised alignment model by 2.6 points in f-Measure .
Results and Discussions
Type f-Measure (words)
Results and Discussions
The f-Measure of this aligner is 78.1% (see row 1, column 2).
Results and Discussions
Method f-Measure mBLEU Base Correction model 78.1 55.1 Correction model, C(fl'la) 78.1 56.4 P(alfl'), C(fl'la) 80.7 57.6
F-measure is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mohtarami, Mitra and Lan, Man and Tan, Chew Lim
Analysis and Discussions
F-Measure (SO Prediction) Ch \l
Analysis and Discussions
F-Measure (IQAPs Inference) 3:.
Analysis and Discussions
F-Measure Ch m 0'!
Related Works
F-Measure H N U) .5 Ln 0 O O O O
F-measure is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob
Abstract
We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs.
Experiments 5.1 Data Sources
To test the classifier’s performance we evaluated it against a list of positive and negative examples of bilingual term pairs using the measures of precision, recall and F-measure .
Method
First, we evaluate the performance of the classifier on a held-out term-pair list from EUROVOC using the standard measures of recall, precision and F-measure .
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.
Abstract
Results over a dataset of entities from four product domains show that the proposed approach achieves significantly above baseline F-measure of 0.96.
Experimental Evaluation
Of the three individual modules, the n-gram and clustering methods achieve F-measure of around 0.9, while the ontology-based module performs only modestly above baseline.
Experimental Evaluation
The final system that employed all modules produced an F-measure of 0.960, a significant (p < 0.01) absolute increase of 15.4% over the baseline.
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
Experiments
Each learner uses a small amount of development data to tune a threshold on scores for predicting new-sense or not-a-new-sense, using macro F-measure as an objective.
Experiments
are relatively weak for predicting new senses on EMEA data but stronger on Subs (TYPEONLY AUC performance is higher than both baselines) and even stronger on Science data (TYPEONLY AUC and f-measure performance is higher than both baselines as well as the ALLFEA-TURESmodel).
Experiments
Recall that the microlevel evaluation computes precision, recall, and f-measure for all word tokens of a given word type and then averages across word types.
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun
Experiments
The performance measurement for word segmentation is balanced F-measure , F = 2PR/ (P + R), a function of precision P and recall R, where P is the percentage of words in segmentation results that are segmented correctly, and R is the percentage of correctly segmented words in the gold standard words.
Experiments
wikipedia brings an F-measure increment of 0.93 points.
Introduction
Experimental results show that, the knowledge implied in the natural annotations can significantly improve the performance of a baseline segmenter trained on CTB 5.0, an F-measure increment of 0.93 points on CTB test set, and an average increment of 1.53 points on 7 other domains.
F-measure is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: