A Case Study | (2008) includes 10 non-expert annotations for each of the 800 items in the RTEl testset, collected with Amazon’s Mechanical Turk . |
Introduction | In recent years, the possibility to undertake large-scale annotation projects with hundreds or thousands of annotators has become a reality thanks to online crowdsourcing methods such as Amazon’s Mechanical Turk and Games with a Purpose. |
Related Work | Similarly, crowdsourcing via microworking sites like Amazon’s Mechanical Turk has been used in several annotation experiments related to tasks such as affect analysis, event annotation, sense definition and word sense disambiguation (Snow et al., 2008; Rumshisky, 2011; Rumshisky et al., 2012), amongst others.12 |
Related Work | 12See also the papers presented at the NAACL 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (t inyurl . |
Abstract | We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics. |
Conclusions | We assessed the performance of our algorithm via a Mechanical Turk experiment. |
Evaluation | To evaluate our algorithm’s performance, we designed a Mechanical Turk (MTurk) experiment in which human annotators assess the quality of the questions that our algorithm generates for a sample of news articles. |
Introduction | To test the performance of our algorithm, we conducted a Mechanical Turk experiment that assessed the quality of suggested questions for news articles on celebrities. |