Index of papers in Proc. ACL 2011 that mention

Seen in text as:

Seen in 44 sentences in 5 papers.

Habash, Nizar and Roth, Ryan

Abstract	Our best approach achieves a roughly ~15% absolute increase in F-score over a simple but reasonable baseline.
Results	We present the results in terms of F-score only for simplicity; we then conduct an error analysis that examines precision and recall.
Results	Feature Set F-score %Imp word 43.85 —word+nw 43.86 N0 word+na 44.78 2.1 word+lem 45.85 4.6 word+pos 45.91 4.7 word+nw+pos+lem+na 46.34 5.7
Results	Feature Set F-score %Imp word 43.85 —

F-score is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

Khapra, Mitesh M. and Joshi, Salil and Chatterjee, Arindam and Bhattacharyya, Pushpak

Discussions	For small seed sizes, the F-score of bilingual bootstrapping is consistently better than the F-score obtained by training only on the seed data without using any bootstrapping.
Discussions	To further illustrate this, we take some sample points from the graph and compare the number of tagged words needed by BiBoot and OnlySeed to reach the same (or nearly the same) F-score .
Experimental Setup	Seed Size v/s F-score
Experimental Setup	80 70 60 go‘ 50 g 40 O (I) [L 30 20 OnlySeed éfi ' I. WFS 10 “ BiBoot ' 0 [I I I I MonoBoot 7777 ~ 0 1000 2000 3000 4000 5000 Seed Size (words) Figure 1: Comparison of BiBoot, MonoBoot, OnlySeed and WF S on Hindi Health data Seed Size v/s F-score 80 $3 9 O O (I) [L OnlySeed éfi ' WF ..= BiBoot ' 0 , I I I MonoBoot 7777 ~ 0 1000 2000 3000 4000 5000
Experimental Setup	Seed Size v/s F-score
Results	a. BiBoot: This curve represents the F-score obtained after 10 iterations by using bilingual bootstrapping with different amounts of seed data.
Results	b. MonoBoot: This curve represents the F-score obtained after 10 iterations by using monolingual bootstrapping with different amounts of seed data.
Results	c. OnlySeed: This curve represents the F-score obtained by training on the seed data alone without using any bootstrapping.

F-score is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

Han, Bo and Baldwin, Timothy

Conclusion and Future Work	In normalisation, we compared our method with two benchmark methods from the literature, and achieved that highest F-score and BLEU score by integrating dictionary lookup, word similarity and context support modelling.
Experiments	We evaluate detection performance by token-level precision, recall and F-score (6 = 1).
Experiments	For candidate selection, we once again evaluate using token-level precision, recall and F-score .
Experiments	Additionally, we evaluate using the BLEU score over the normalised form of each message, as the SMT method can lead to perturbations of the token stream, vexing standard precision, recall and F-score evaluation.

F-score is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

Auli, Michael and Lopez, Adam

Experiments	Labelled F-score 00 \l 00
Oracle Parsing	To answer this question we computed oracle best and worst values for labelled dependency F-score using the algorithm of Huang (2008) on the hybrid model of Clark and Curran (2007), the best model of their C&C parser.
Oracle Parsing	Labelleld F-score
Oracle Parsing	Digging deeper, we compared parser model score against Viterbi F—score and oracle F-score at a va-

F-score is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Chen, Harr and Benson, Edward and Naseem, Tahira and Barzilay, Regina

Experimental Setup	For these reasons, we evaluate on both sentence-level and token-level precision, recall, and F-score .
Results	However, the best F-Score corresponding to the optimal number of clusters is 42.2, still far below our model’s 66.0 F-score .
Results	Our results show a large gap in F-score between the sentence and token-level evaluations for both the USP baseline and our model.

F-score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: