Index of papers in Proc. ACL 2011 that mention

Seen in text as:

Seen in 16 sentences in 4 papers.

Chambers, Nathanael and Jurafsky, Dan

Abstract	We evaluate on the MUC-4 terrorism dataset and show that we induce template structure very similar to hand—created gold structure, and we extract role fillers with an F1 score of .40, approaching the performance of algorithms that require full knowledge of the templates.
Discussion	We achieved results with comparable precision, and an F1 score of .40 that approaches prior algorithms that rely on handcrafted knowledge.
Information Extraction: Slot Filling	The bombing template performs best with an F1 score of .72.
Specific Evaluation	Kidnap improves most significantly in F1 score (7 Fl points absolute), but the others only change slightly.
Standard Evaluation	The standard evaluation for this corpus is to report the F1 score for slot type accuracy, ignoring the template type.
Standard Evaluation	F1 Score Kidnap Bomb Arson Attack Results .53 .43 .42 .16 / .25

F1 score is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Rüd, Stefan and Ciaramita, Massimiliano and Müller, Jens and Schütze, Hinrich

Results and discussion	Lin and Wu (2009) report an F1 score of 90.90 on the original split of the CoNLL data.
Results and discussion	Our F1 scores > 92% can be explained by a combination of randomly partitioning the data and the fact that the four-class problem is easier than the five-class problem LOC-ORG-PER-MISC-O.
Results and discussion	We use the t-test to compute significance on the two sets of five F1 scores from the two experiments that are being compared (two-tailed, p < .01 for t > 3.36).8 CoNLL scores that are significantly different from line c7 are marked with >\|<.

F1 score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Hoffmann, Raphael and Zhang, Congle and Ling, Xiao and Zettlemoyer, Luke and Weld, Daniel S.

Experiments	At the highest recall point, MULTIR reaches 72.4% precision and 51.9% recall, for an F1 score of 60.5%.
Experiments	On average across relations, precision increases 12 points but recall drops 26 points, for an overall reduction in F1 score from 60.5% to 40.3%.
Related Work	(2010) describe a system similar to KYLIN, but which dynamically generates lexicons in order to handle sparse data, learning over 5000 Infobox relations with an average F1 score of 61%.

F1 score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Singh, Sameer and Subramanya, Amarnag and Pereira, Fernando and McCallum, Andrew

Experiments	Table 2: F1 Scores on the Wikipedia Link Data.
Experiments	We use N = 100, 500 and the B3 F1 score results obtained set for each case are shown in Figure 7.
Introduction	On this dataset, our proposed model yields a B3 (Bagga and Baldwin, 1998) F1 score of 73.7%, improving over the baseline by 16% absolute (corresponding to 38% error reduction).

F1 score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: