Evaluation | The collection of queries is a random sample of fully-anonymized queries in English submitted by Web users in 2006. |
Evaluation | To test whether that is the case, a random sample of 200 class labels, out of the 2,614 labels found to be potentially-useful specific concepts, are manually annotated as correct, subjectively correct or incorrect, as shown in Table 2. |
Evaluation | Rather than inspecting a random sample of classes, the evaluation validates the results against a reference set of 40 gold-standard classes that were manually assembled as part of previous work (Pasca, 2007). |
Experimental Results | We estimate the quality of paraphrases by annotating a random sample as correct/incorrect and calculating the accuracy. |
Experimental Results | We estimate the precision (P) of the extracted instances by annotating a random sample of instances as correct/incorrect. |
Experimental Results | We randomly sampled 50 instances of the “acquisition” and “birthplace” relations from the system and the baseline outputs. |