Abstract | In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowd-source annotations. |
Experiments | We then sent these tweets to Amazon Mechanical Turk for annotation. |
Experiments | In order to evaluate our approach in real world scenarios, instead of creating a high quality annotated dataset and then introducing artificial noise, we followed the common practice of crowdsouc-ing, and collected emotion annotations through Amazon Mechanical Turk (AMT). |
Experiments | Amazon Mechanical Turk Annotation: we posted the set of 100K tweets to the workers on AMT for emotion annotation. |
Introduction | There are generally two ways to collect annotations of a dataset: through a few expert annotators, or through crowdsourcing services (e.g., Amazon’s Mechanical Turk ). |
Introduction | We employ Amazon’s Mechanical Turk (AMT) to label the emotions of Twitter data, and apply the proposed methods to the AMT dataset with the goals of improving the annotation quality at low cost, as well as learning accurate emotion classifiers. |
Abstract | In an Amazon Mechanical Turk evaluation, users pref-ered SUMMA ten times as often as flat MDS and three times as often as timelines. |
Experiments | We hired Amazon Mechanical Turk (AMT) workers and assigned two topics to each worker. |
Introduction | We conducted an Amazon Mechanical Turk (AMT) evaluation where AMT workers compared the output of SUMMA to that of timelines and flat summaries. |
Introduction | In an Amazon Mechanical Turk (AMT) experiment (§4), we found that humans achieved an average accuracy of 61.3%: not that high, but better than chance, indicating that it is somewhat possible for humans to predict greater message spread from different deliveries of the same information. |
Introduction | We first ran a pilot study on Amazon Mechanical Turk (AMT) to determine whether humans can identify, based on wording differences alone, which of two topic- and author- controlled tweets is spread more widely. |
Introduction | We outperform the average human accuracy of 61% reported in our Amazon Mechanical Turk experiments (for a different data sample); fiTAC+ff+time fails to do so. |
Abstract | Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels. |
Experiments | The data was created by taking various Wikipedia articles and giving them to five Amazon Mechanical Turkers to annotate. |
Introduction | Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk , as well as methods like distant supervision and co-training that involve automatically generating training data. |