Index of papers in Proc. ACL that mention
  • Mechanical Turk
Chen, David
Abstract
We show that by changing the grammar of the formal meaning representation language and training on additional data collected from Amazon’s Mechanical Turk we can further improve the results.
Collecting Additional Data with Mechanical Turk
We validate this claim by collecting additional training data for the navigation domain using Mechanical Turk (Snow et al., 2008).
Collecting Additional Data with Mechanical Turk
Thus, we created two tasks on Mechanical Turk .
Collecting Additional Data with Mechanical Turk
The instructions used for the follower problems were mainly collected from our Mechanical Turk instructor task with some of the instructions coming from data collected by MacMahon (2007) that was not used by Chen and Mooney (2011).
Conclusion
In addition, we showed that changing the MRG and collecting additional training data from Mechanical Turk further improve the performance of the overall navigation system.
Experiments
In addition to SGOLL and SGOLL with the new MRG, we also look at augmenting each of the training splits with the data we collected using Mechanical Turk .
Experiments
training data with additional instructions and follower traces collected from Mechanical Turk produced the best results.
Experiments
Even after incorporating the new Mechanical Turk data into the training set, SGOLL still takes much less time to build a lexicon.
Introduction
gorithm can scale to larger datasets, we present results on collecting and training on additional data from Amazon’s Mechanical Turk .
Mechanical Turk is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Mrozinski, Joanna and Whittaker, Edward and Furui, Sadaoki
Conclusions
In this paper we have described a dynamic and inexpensive method of collecting a corpus of questions and answers using the Amazon Mechanical Turk framework.
Corpus description
The total cost of producing the corpus was about $350, consisting of $310 paid in workers rewards and $40 in Mechanical Turk fees, including all the trials conducted during the development of the final system.
Corpus description
However, the Mechanical Turk framework provided additional information for each assignment, for example the time workers spent on the task.
Discussion and future work
We believe that this unexpected result is due to the distributed nature of the worker pool in Mechanical Turk .
Introduction
We collected our corpus through Amazon Mechanical Turk service 4 (MTurk).
Setup
2.1 Mechanical Turk
Setup
Mechanical Turk is a Web-based service, offered by Amazon.com, Inc.
Mechanical Turk is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
de Marneffe, Marie-Catherine and Manning, Christopher D. and Potts, Christopher
Abstract
To evaluate the methods, we collected examples of question—answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’.
Corpus description
Table 2: Mean entropy values and standard deviation obtained in the Mechanical Turk experiment for each question—answer pair category.
Corpus description
To assess the degree to which each answer conveys ‘yes’ or ‘no’ in context, we use response distributions from Mechanical Turk workers.
Corpus description
Despite variant individual judgments, aggregate annotations done with Mechanical Turk have been shown to be reliable (Snow et a1., 2008; Sheng et a1., 2008; Munro et a1., 2010).
Evaluation and results
To evaluate the techniques, we pool the Mechanical Turk ‘definite yes’ and ‘probable yes’ categories into a single category ‘Yes’, and we do the same for ‘definite no’ and ‘probable no’.
Mechanical Turk is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tratz, Stephen and Hovy, Eduard
Evaluation
Due to the relatively high speed and low cost of Amazon’s Mechanical Turk serVice, we chose to use Mechanical Turkers as our annotators.
Evaluation
Using Mechanical Turk to obtain inter-annotator agreement figures has several drawbacks.
Taxonomy
We then embarked on a series of changes, testing each generation by annotation using Amazon’s Mechanical Turk service, a relatively quick and inexpensive online platform where requesters may publish tasks for anonymous online workers (Turkers) to perform.
Taxonomy
Mechanical Turk has been previously used in a variety of NLP research, including recent work on noun compounds by Nakov (2008) to collect short phrases for linking the nouns within noun compounds.
Taxonomy
For the Mechanical Turk annotation tests, we created five sets of 100 noun compounds from noun compounds automatically extracted from a random subset of New York Times articles written between 1987 and 2007 (Sandhaus, 2008).
Mechanical Turk is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Conclusion
Deploying the framework on Mechanical Turk over a two-month period yielded 85K English descriptions for 2K videos, one of the largest paraphrase data resources publicly available.
Data Collection
We deployed the task on Amazon’s Mechanical Turk , with video segments selected from YouTube.
Data Collection
Figure l: A screenshot of our annotation task as it was deployed on Mechanical Turk .
Data Collection
We deployed our data collection framework on Mechanical Turk over a two-month period from July to September in 2010, collecting 2,089 video segments and 85,550 English descriptions.
Experiments
We randomly selected a thousand sentences from our data and collected two paraphrases of each using Mechanical Turk .
Related Work
We designed our data collection framework for use on crowdsourcing platforms such as Amazon’s Mechanical Turk .
Mechanical Turk is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Nakashole, Ndapandula and Mitchell, Tom M.
Fact Candidates
3.1 Mechanical Turk Study
Fact Candidates
We deployed an annotation study on Amazon Mechanical Turk (MTurk)3, a crowdsourcing platform for tasks requiring human input.
Fact Candidates
For training and testing data, we used the labeled data from the Mechanical Turk study.
Introduction
A Mechanical Turk study we carried out revealed that there is a significant correlation between objectivity of language and trustworthiness of sources.
Introduction
To test this hypothesis, we designed a Mechanical Turk study.
Introduction
(3) Objectivity Classifier: Using labeled data from the Mechanical Turk study, we developed and trained an objectivity classifier which performed better than prior proposed lexicons from literature.
Mechanical Turk is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Martineau, Justin and Chen, Lu and Cheng, Doreen and Sheth, Amit
Abstract
In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowd-source annotations.
Experiments
We then sent these tweets to Amazon Mechanical Turk for annotation.
Experiments
In order to evaluate our approach in real world scenarios, instead of creating a high quality annotated dataset and then introducing artificial noise, we followed the common practice of crowdsouc-ing, and collected emotion annotations through Amazon Mechanical Turk (AMT).
Experiments
Amazon Mechanical Turk Annotation: we posted the set of 100K tweets to the workers on AMT for emotion annotation.
Introduction
There are generally two ways to collect annotations of a dataset: through a few expert annotators, or through crowdsourcing services (e.g., Amazon’s Mechanical Turk ).
Introduction
We employ Amazon’s Mechanical Turk (AMT) to label the emotions of Twitter data, and apply the proposed methods to the AMT dataset with the goals of improving the annotation quality at low cost, as well as learning accurate emotion classifiers.
Mechanical Turk is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris
Crowdsourcing Translation
This data set consists 1,792 Urdu sentences from a variety of news and online sources, each paired with English translations provided by nonprofessional translators on Mechanical Turk .
Introduction
Rather than relying on volunteers or gamifica-tion, NLP research into crowdsourcing translation has focused on hiring workers on the Amazon Mechanical Turk (MTurk) platform (Callison-Burch, 2009).
Related work
Our setup uses anonymous crowd workers hired on Mechanical Turk , whose motivation to participate is financial.
Related work
Most NLP research into crowdsourcing has focused on Mechanical Turk , following pioneering work by Snow et al.
Related work
Although hiring professional translators to create bilingual training data for machine translation systems has been deemed infeasible, Mechanical Turk has provided a low cost way of creating large volumes of translations (Callison-Burch, 2009; Ambati and Vogel, 2010).
Mechanical Turk is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Boyd-Graber, Jordan and Satinoff, Brianna
Discussion
We hope to engage these algorithms with more sophisticated users than those on Mechanical Turk to measure how these models can help them better explore and understand large, uncurated data sets.
Getting Humans in the Loop
We solicited approximately 200 judgments from Mechanical Turk , a popular crowdsourcing platform that has been used to gather linguistic annotations (Snow et a1., 2008), measure topic quality (Chang et al., 2009), and supplement traditional inference techniques for topic models (Chang, 2010).
Getting Humans in the Loop
Figure 5 shows the interface used in the Mechanical Turk tests.
Getting Humans in the Loop
Figure 5: Interface for Mechanical Turk experiments.
Mechanical Turk is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Rokhlenko, Oleg and Szpektor, Idan
Abstract
We tested the suggestions generated by our algorithm via a Mechanical Turk experiment, which showed a significant improvement over the strongest baseline of more than 45% in all metrics.
Conclusions
We assessed the performance of our algorithm via a Mechanical Turk experiment.
Evaluation
To evaluate our algorithm’s performance, we designed a Mechanical Turk (MTurk) experiment in which human annotators assess the quality of the questions that our algorithm generates for a sample of news articles.
Introduction
To test the performance of our algorithm, we conducted a Mechanical Turk experiment that assessed the quality of suggested questions for news articles on celebrities.
Mechanical Turk is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob
Annotation
We used Amazon Mechanical Turk (AMT) to collect ratings of claims.
Introduction
This dataset was annotated by Mechanical Turk workers who gave ratings for the factuality of the scoped claims in each Twitter message.
Modeling factuality judgments
While these findings must be interpreted with caution, they suggest that readers — at least, Mechanical Turk workers — use relatively little independent judgment to assess the validity of quoted text that they encounter on Twitter.
Related work
(2012) conduct an empirical evaluation of FactBank ratings from Mechanical Turk workers, finding a high degree of disagreement between raters.
Mechanical Turk is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Endriss, Ulle and Fernández, Raquel
A Case Study
(2008) includes 10 non-expert annotations for each of the 800 items in the RTEl testset, collected with Amazon’s Mechanical Turk .
Introduction
In recent years, the possibility to undertake large-scale annotation projects with hundreds or thousands of annotators has become a reality thanks to online crowdsourcing methods such as Amazon’s Mechanical Turk and Games with a Purpose.
Related Work
Similarly, crowdsourcing via microworking sites like Amazon’s Mechanical Turk has been used in several annotation experiments related to tasks such as affect analysis, event annotation, sense definition and word sense disambiguation (Snow et al., 2008; Rumshisky, 2011; Rumshisky et al., 2012), amongst others.12
Related Work
12See also the papers presented at the NAACL 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk (t inyurl .
Mechanical Turk is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Labutov, Igor and Lipson, Hod
Abstract
Using an artificial language vocabulary, we evaluate a set of algorithms for generating code-switched text automatically by presenting it to Mechanical Turk subjects and measuring recall in a sentence completion task.
Experiments
We carried out experiments on the effectiveness of our approach using the Amazon Mechanical Turk platform.
Model
For collecting data about which words are likely to be “predicted” given their content, we developed an Amazon Mechanical Turk task that presented turkers with excerpts of a short story (English translation of “The Man who Repented” by
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Christensen, Janara and Soderland, Stephen and Bansal, Gagan and Mausam
Abstract
In an Amazon Mechanical Turk evaluation, users pref-ered SUMMA ten times as often as flat MDS and three times as often as timelines.
Experiments
We hired Amazon Mechanical Turk (AMT) workers and assigned two topics to each worker.
Introduction
We conducted an Amazon Mechanical Turk (AMT) evaluation where AMT workers compared the output of SUMMA to that of timelines and flat summaries.
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Regneri, Michaela and Koller, Alexander and Pinkal, Manfred
Conclusion
We want to thank Dustin Smith for the OMICS data, Alexis Palmer for her support with Amazon Mechanical Turk , Nils Bendfeldt for the creation of all web forms and Ines Rehbein for her effort
Evaluation
We presented each pair to 5 non-experts, all US residents, via Mechanical Turk .
Related Work
In particular, the use of the Amazon Mechanical Turk , which we use here, has been evaluated and shown to be useful for language processing tasks (Snow et al., 2008).
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tan, Chenhao and Lee, Lillian and Pang, Bo
Introduction
In an Amazon Mechanical Turk (AMT) experiment (§4), we found that humans achieved an average accuracy of 61.3%: not that high, but better than chance, indicating that it is somewhat possible for humans to predict greater message spread from different deliveries of the same information.
Introduction
We first ran a pilot study on Amazon Mechanical Turk (AMT) to determine whether humans can identify, based on wording differences alone, which of two topic- and author- controlled tweets is spread more widely.
Introduction
We outperform the average human accuracy of 61% reported in our Amazon Mechanical Turk experiments (for a different data sample); fiTAC+ff+time fails to do so.
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tibshirani, Julie and Manning, Christopher D.
Abstract
Annotation errors can significantly hurt classifier performance, yet datasets are only growing noisier with the increased use of Amazon Mechanical Turk and techniques like distant supervision that automatically generate labels.
Experiments
The data was created by taking various Wikipedia articles and giving them to five Amazon Mechanical Turkers to annotate.
Introduction
Low-quality annotations have become even more common in recent years with the rise of Amazon Mechanical Turk , as well as methods like distant supervision and co-training that involve automatically generating training data.
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mintz, Mike and Bills, Steven and Snow, Rion and Jurafsky, Daniel
Evaluation
Human evaluation was performed by evaluators on Amazon’s Mechanical Turk service, shown to be effective for natural language annotation in Snow et al.
Evaluation
For each of the 10 relations that appeared most frequently in our test data (according to our classifier), we took samples from the first 100 and 1000 instances of this relation generated in each experiment, and sent these to Mechanical Turk for
Evaluation
Each predicted relation instance was labeled as true or false by between 1 and 3 labelers on Mechanical Turk .
Mechanical Turk is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: