SciSurf: Index of "F-measure" in Proc. ACL 2013

Index of papers in Proc. ACL 2013 that mention

F-measure

Seen in text as:

F-measure (48)
f-Measure (5)
F-Measure (5)
f-measure (3)

Seen in 60 sentences in 11 papers.

1. Named Entity Recognition using Cross-lingual Resources: Arabic as an Example

Darwish, Kareem

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-lingual Features	This led to an overall improvement in F-measure of 1.8 to 3.4 points (absolute) or 4.2% to 5.7% (relative).
Cross-lingual Features	ture, transliteration mining slightly lowered precision — except for the TWEETS test set where the drop in precision was significant — and positively increased recall, leading to an overall improvement in F-measure for all test sets.
Related Work	They reported 80%, 37%, and 47% F-measure for locations, organizations, and persons respectively on the ANERCORP dataset that they created and publicly released.
Related Work	They reported 87%, 46%, and 52% F-measure for locations, organizations, and persons respectively.
Related Work	Using POS tagging generally improved recall at the expense of precision, leading to overall improvements in F-measure .

F-measure is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

2. Exploring Sentiment in Social Media: Bootstrapping Subjectivity Clues from Multilingual Twitter Streams

Volkova, Svitlana and Wilson, Theresa and Yarowsky, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Lexicon Bootstrapping	The set of parameters 5 is optimized using a grid search on the development data using F-measure for subjectivity classification.
Lexicon Evaluations	and F-measure results.
Lexicon Evaluations	For polarity classification we get comparable F-measure but much higher recall for Lg compared to SW N.
Lexicon Evaluations	Figure 1: Precision (x-axis), recall (y-axis) and F-measure (in the table) for English: L?

F-measure is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

3. Graph-based Local Coherence Modeling

Guinaudeau, Camille and Strube, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Since the model can give the same score for a permutation and the original document, we also compute F-measure where recall is correct/total and precision equals correct/decisions.
Experiments	For evaluation purposes, the accuracy still corresponds to the number of correct ratings divided by the number of comparisons, while the F-measure combines recall and precision measures.
Experiments	Moreover, in contrast to the first experiment, when accounting for the number of entities “shared” by two sentences (PW), values of accuracy and F-measure are lower.

F-measure is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

4. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Table 3: Classification performance in F-measure for semantically ambiguous words on the most frequently confused descriptive tags in the movie domain.
Experiments	F-Measure
Experiments	Figure 2: F-measure for semantic clustering performance.
Experiments	As expected, we see a drop in F-measure on all models on descriptive tags.

F-measure is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

5. An improved MDL-based compression algorithm for unsupervised word segmentation

Chen, Ruey-Cheng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Concluding Remarks	Using MDL alone, one proposed method outperforms the original regularized compressor (Chen et al., 2012) in precision by 2 percentage points and in F-measure by 1.
Evaluation	Segmentation performance is measured using word-level precision (P), recall (R), and F-measure (F).
Evaluation	We found that, in all three settings, G2 outperforms the baseline by 1 to 2 percentage points in F-measure .
Evaluation	The best performance result achieved by G2 in our experiment is 81.7 in word-level F-measure , although this was obtained from search setting (c), using a heuristic p value 0.37.

F-measure is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

6. Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This approach generates alignments that are 2.6 f-Measure points better than a baseline supervised aligner.
Conclusion	We also proposed a model that scores alignments given source and target sentence reorderings that improves a supervised alignment model by 2.6 points in f-Measure .
Results and Discussions	Type f-Measure (words)
Results and Discussions	The f-Measure of this aligner is 78.1% (see row 1, column 2).
Results and Discussions	Method f-Measure mBLEU Base Correction model 78.1 55.1 Correction model, C(fl'la) 78.1 56.4 P(alfl'), C(fl'la) 80.7 57.6

F-measure is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

7. Probabilistic Sense Sentiment Similarity through Hidden Emotions

Mohtarami, Mitra and Lan, Man and Tan, Chew Lim

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis and Discussions	F-Measure (SO Prediction) Ch \l
Analysis and Discussions	F-Measure (IQAPs Inference) 3:.
Analysis and Discussions	F-Measure Ch m 0'!
Related Works	F-Measure H N U) .5 Ln 0 O O O O

F-measure is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

8. Extracting bilingual terminologies from comparable corpora

Aker, Ahmet and Paramita, Monica and Gaizauskas, Rob

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We test our approach on a held-out test set from EUROVOC and perform precision, recall and f-measure evaluations for 20 European language pairs.
Experiments 5.1 Data Sources	To test the classifier’s performance we evaluated it against a list of positive and negative examples of bilingual term pairs using the measures of precision, recall and F-measure .
Method	First, we evaluate the performance of the classifier on a held-out term-pair list from EUROVOC using the standard measures of recall, precision and F-measure .

F-measure is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

9. Automatic Term Ambiguity Detection

Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Results over a dataset of entities from four product domains show that the proposed approach achieves significantly above baseline F-measure of 0.96.
Experimental Evaluation	Of the three individual modules, the n-gram and clustering methods achieve F-measure of around 0.9, while the ontology-based module performs only modestly above baseline.
Experimental Evaluation	The final system that employed all modules produced an F-measure of 0.960, a significant (p < 0.01) absolute increase of 15.4% over the baseline.

F-measure is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

10. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Each learner uses a small amount of development data to tune a threshold on scores for predicting new-sense or not-a-new-sense, using macro F-measure as an objective.
Experiments	are relatively weak for predicting new senses on EMEA data but stronger on Subs (TYPEONLY AUC performance is higher than both baselines) and even stronger on Science data (TYPEONLY AUC and f-measure performance is higher than both baselines as well as the ALLFEA-TURESmodel).
Experiments	Recall that the microlevel evaluation computes precision, recall, and f-measure for all word tokens of a given word type and then averages across word types.

F-measure is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

11. Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

Jiang, Wenbin and Sun, Meng and Lü, Yajuan and Yang, Yating and Liu, Qun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The performance measurement for word segmentation is balanced F-measure , F = 2PR/ (P + R), a function of precision P and recall R, where P is the percentage of words in segmentation results that are segmented correctly, and R is the percentage of correctly segmented words in the gold standard words.
Experiments	wikipedia brings an F-measure increment of 0.93 points.
Introduction	Experimental results show that, the knowledge implied in the natural annotations can significantly improve the performance of a baseline segmenter trained on CTB 5.0, an F-measure increment of 0.93 points on CTB test set, and an average increment of 1.53 points on 7 other domains.

F-measure is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: