Evaluation | Human evaluation was performed by evaluators on Amazon’s Mechanical Turk service, shown to be effective for natural language annotation in Snow et al. |
Evaluation | For each of the 10 relations that appeared most frequently in our test data (according to our classifier), we took samples from the first 100 and 1000 instances of this relation generated in each experiment, and sent these to Mechanical Turk for |
Evaluation | Each predicted relation instance was labeled as true or false by between 1 and 3 labelers on Mechanical Turk . |