Index of papers in Proc. ACL 2010 that mention

Seen in text as:

Seen in 46 sentences in 7 papers.

Liu, Shujie and Li, Chi-Ho and Zhou, Ming

Abstract	On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++.
Evaluation	An alternative criterion is the upper bound on alignment F-score , which essentially measures how many links in annotated alignment can be kept in ITG parse.
Evaluation	The calculation of F-score upper bound is done in a bottom-up way like ITG parsing.
Evaluation	The upper bound of alignment F-score can thus be calculated as well.
The DITG Models	The MERT module for DITG takes alignment F-score of a sentence pair as the performance measure.
The DITG Models	Given an input sentence pair and the reference annotated alignment, MERT aims to maximize the F-score of DITG-produced alignment.

F-score is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen

Introduction	Using an adapted supertagger with ambiguity levels tuned to match the baseline system, we were also able to increase F-score on labelled grammatical relations by 0.75%.
Results	Interestingly, while the decrease in supertag accuracy in the previous experiment did not translate into a decrease in F-score, the increase in tag accuracy here does translate into an increase in F-score .
Results	The increase in F-score has two sources.
Results	As Table 6 shows, this change translates into an improvement of up to 0.75% in F-score on Section

F-score is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

Li, Linlin and Roth, Benjamin and Sporleder, Caroline

Experiments	Ve only compare the F-score , since all the com-tared systems have an attempted rate7 of 1.0,
Experiments	), F-score (Fl).
Experiments	System F-score

F-score is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Woodsend, Kristian and Lapata, Mirella

Experimental Setup	g 03 — Gj _ 0.25 — _ EX" _Q_ Q. G {x {T __ 0.2 - X _ EX 0.15 - _ Recall Precision F-score Recall Precision F-score Rouge-1 Rouge-L
Results	F-score is higher for the phrase-based system but not significantly.
Results	The sentence ILP model outperforms the lead baseline with respect to recall but not precision or F-score .
Results	The phrase ILP achieves a significantly better F-score over the lead baseline with both ROUGE-l and ROUGE-L.

F-score is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

Regneri, Michaela and Koller, Alexander and Pinkal, Manfred

Evaluation	We calculated precision, recall, and f-score for our system, the baselines, and the upper bound as follows, with allsystem being the number of pairs labelled as paraphrase or happens-before, allgold as the respective number of pairs in the gold standard and correct as the number of pairs labeled correctly by the system.
Evaluation	The f-score for the upper bound is in the column upper.
Evaluation	For the f-score values, we calculated the significance for the difference between our system and the baselines as well as the upper bound, using a resampling test (Edgington, 1986).

F-score is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Fowler, Timothy A. D. and Penn, Gerald

A Latent Variable CCG Parser	To determine statistical significance, we obtain p-values from Bikel’s randomized parsing evaluation comparator6, modified for use with tagging accuracy, F-score and dependency accuracy.
A Latent Variable CCG Parser	In this section we evaluate the parsers using the traditional PARSEVAL measures which measure recall, precision and F-score on constituents in
A Latent Variable CCG Parser	The Petrov parser has better results by a statistically significant margin for both labeled and unlabeled recall and unlabeled F-score .

F-score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Jiampojamarn, Sittichai and Kondrak, Grzegorz

Intrinsic evaluation	We report the alignment quality in terms of precision, recall and F-score .
Intrinsic evaluation	The F-score corresponding to perfect precision and the upper-bound recall is 94.75%.
Intrinsic evaluation	Overall, the MM models obtain lower precision but higher recall and F-score than 1-1 models, which is to be expected as the gold standard is defined in terms of MM links.

F-score is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: