Index of papers in Proc. ACL that mention
  • Turkers
de Marneffe, Marie-Catherine and Manning, Christopher D. and Potts, Christopher
Abstract
Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.
Analysis and discussion
Figure 5: Correlation between agreement among Turkers and whether the system gets the correct answer.
Analysis and discussion
For each dialogue, we plot a circle at Turker response entropy and either 1 = correct inference or 0 = incorrect inference, except the points are jittered a little vertically to show where the mass of data lies.
Analysis and discussion
late almost perfectly with the Turkers’ responses.
Corpus description
Given a written dialogue between speakers A and B, Turkers were asked to judge what B’s answer conveys: ‘definite yes’, ‘probable yes’, ‘uncertain’, ‘probable no’ , ‘definite no’.
Corpus description
For each dialogue, we got answers from 30 Turkers , and we took the dominant response as the correct one though we make extensive use of the full response distributions in evaluating our approach.2 We also computed entropy values for the distribution of answers for each item.
Corpus description
2120 Turkers were involved (the median number of items done was 28 and the mean 56.5).
Evaluation and results
In the case of the scalar modifiers experiment, there were just two examples whose dominant response from the Turkers was ‘Uncertain’, so we have left that category out of the results.
Evaluation and results
We count an inference as successful if it matches the dominant Turker response category.
Turkers is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Tratz, Stephen and Hovy, Eduard
Evaluation
Due to the relatively high speed and low cost of Amazon’s Mechanical Turk serVice, we chose to use Mechanical Turkers as our annotators.
Evaluation
The first and most significant drawback is that it is impossible to force each Turker to label every data point without putting all the terms onto a single web page, which is highly impractical for a large taxonomy.
Evaluation
Some Turkers may label every compound, but most do not.
Taxonomy
We then embarked on a series of changes, testing each generation by annotation using Amazon’s Mechanical Turk service, a relatively quick and inexpensive online platform where requesters may publish tasks for anonymous online workers ( Turkers ) to perform.
Taxonomy
Turkers were asked to select one or, if they deemed it appropriate, two categories for each noun pair.
Taxonomy
In addition to influencing the category definitions, some taxonomy groupings were altered with the hope that this would improve inter-annotator agreement for cases where Turker disagreement was systematic.
Turkers is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Labutov, Igor and Lipson, Hod
Experiments
Our experimental procedure was as follows: 162 turkers were partitioned into four groups, each corresponding to a treatment condition: OPT (N=34), HF (N=4l), RANDOM (N=43), MAN (N=44).
Experiments
Font-size correlates with the score given by judge turkers in evaluating guesses of other turkers that were presented with the same text, but the word replaced with a blank.
Experiments
Turkers were solicited to participate in a study that involved “reading a short story with a twist” (title of HIT).
Model
For collecting data about which words are likely to be “predicted” given their content, we developed an Amazon Mechanical Turk task that presented turkers with excerpts of a short story (English translation of “The Man who Repented” by
Model
Turkers were required to type in their best guess, and the number of semantically similar guesses were counted by an average number of 6 other turkers .
Model
Turkers that judged the semantic similarity of the guesses of other turkers achieved an average Cohen’s kappa agreement of 0.44, indicating fair to poor agreement.
Turkers is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard
Abstract
customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert) generated deceptive reviews.
Dataset Construction
3.1 Turker set, using Mechanical Turk
Dataset Construction
Anyone with basic programming skills can create Human Intelligence Tasks (HITs) and access a marketplace of anonymous online workers ( Turkers ) willing to complete the tasks.
Dataset Construction
to create their dataset, such as restricting task to Turkers located in the United States, and who maintain an approval rating of at least 90%.
Experiments
Specifically, we reframe it as a intra-domain multi-class classification task, where given the labeled training data from one domain, we learn a classifier to classify reviews according to their source, i.e., Employee, Turker and Customer.
Feature-based Additive Model
If we instead use SVM, for example, we would have to train classifiers one by one (due to the distinct features from different sources) to draw conclusions regarding the differences between Turker vs Expert vs truthful reviews, positive expert vs negative expert reviews, or reviews from different domains.
Feature-based Additive Model
ysource E {employee, turker , customer}}
Introduction
Despite the advantages of soliciting deceptive gold-standard material from Turkers (it is easy, large-scale, and affordable), it is unclear whether Turkers are representative of the general population that generate fake reviews, or in other words, Ott et al.’s data set may correspond to only one type of online deceptive opinion spam — fake reviews generated by people who have never been to offerings or experienced the entities.
Introduction
In contrast to existing work (Ott et al., 2011; Li et al., 2013b), our new gold standard includes three types of reviews: domain expert deceptive opinion spam (Employee), crowdsourced deceptive opinion spam ( Turker ), and truthful Customer reviews (Customer).
Related Work
created a gold-standard collection by employing Turkers to write fake reviews, and followup research was based on their data (Ott et al., 2012; Ott et al., 2013; Li et al., 2013b; Feng and Hirst, 2013).
Turkers is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris
Crowdsourcing Translation
52 different Turkers took part in the translation task, each translating 138 sentences on average.
Crowdsourcing Translation
In the editing task, 320 Turkers participated, averaging 56 sentences each.
Problem Formulation
We form two graphs: the first graph (GT) represents Turkers (translator/editor pairs) as nodes; the second graph (G0) represents candidate translated and
Problem Formulation
GT = (VT, ET) is a weighted undirected graph representing collaborations between Turkers .
Problem Formulation
The mutual reinforcement framework couples the two random walks on GT and G0 that rank candidates and Turkers in isolation.
Related work
ferent Turkers for a collection of Urdu sentences that had been previously professionally translated by the Linguistics Data Consortium.
Related work
They also hired US-based Turkers to edit the translations, since the translators were largely based in Pakistan and exhibited errors that are characteristic of speakers of English as a language.
Turkers is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Feng, Song and Kang, Jun Seok and Kuznetsova, Polina and Choi, Yejin
Experimental Results 11
About 300 unique Turkers participated the evaluation tasks.
Experimental Results 11
Otherwise we treat them as ambiguous cases.17 Figure 3 shows a part of the AMT task, where Turkers are presented with questions that help judges to determine the subtle connotative polarity of each word, then asked to rate the degree of connotation on a scale from -5 (most negative) and 5 (most positive).
Experimental Results 11
17We allow Turkers to mark words that can be used with both positive and negative connotation, which results in about 7% of words that are excluded from the gold standard set.
Turkers is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.
Dataset Construction and Human Performance
Crowdsourcing services such as AMT have made large-scale data annotation and collection efforts financially affordable by granting anyone with basic programming skills access to a marketplace of anonymous online workers (known as Turkers ) willing to complete small tasks.
Dataset Construction and Human Performance
To ensure that opinions are written by unique authors, we allow only a single submission per Turker .
Dataset Construction and Human Performance
We also restrict our task to Turkers who are located in the United States, and who maintain an approval rating of at least 90%.
Turkers is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Recasens, Marta and Danescu-Niculescu-Mizil, Cristian and Jurafsky, Dan
Human Perception of Biased Language
Turkers were shown Wikipedia’s definition of a “biased statement” and two example sentences that illustrated the two types of bias, framing and epistemological.
Human Perception of Biased Language
Before the 10 sentences, turkers were asked to list the languages they spoke as well as their primary language in primary school.
Human Perception of Biased Language
On average, it took turkers about four minutes to complete each HIT.
Turkers is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Nakashole, Ndapandula and Tylenda, Tomasz and Weikum, Gerhard
Evaluation
To evaluate the quality of types assigned to emerging entities, we presented turkers with sentences from the news tagged with out-of-KB entities and the types inferred by the methods under test.
Evaluation
The turkers task was to assess the correctness of types assigned to an entity mention.
Evaluation
To make it easy to understand the task for the turkers , we combined the extracted entity and type into a sentence.
Turkers is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob
Annotation
To ensure quality control we required the Turkers to have at least 85% hit approval rating and to reside in the United States, because the Twitter messages in our dataset were related to American politics.
Annotation
we obtained five independent ratings from Turkers satisfying the above qualifications.
Annotation
We also allowed for “Not Applicable” option to capture ratings where the Turkers did not have sufficient knowledge about the statement or if the statement was not really a claim.
Turkers is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Daxenberger, Johannes and Gurevych, Iryna
Corpus
It was important to find a reasonable amount of corresponding edit-turn-pairs before the actual annotation could take place, as we needed a certain amount of positive seeds to keep turkers from simply labeling pairs as non-corresponding all the time.
Corpus
The resulting 750 pairs have each been annotated by five turkers .
Corpus
The turkers were presented the turn text, the turn topic name, the edit in its context, and the edit comment (if present).
Turkers is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kang, Jun Seok and Feng, Song and Akoglu, Leman and Choi, Yejin
Evaluation 11: Human Evaluation on ConnotationWordNet
We first describe the labeling process of sense-level connotation: We selected 350 polysemous words and one of their senses, and each Turker was asked to rate the connotative polarity of a given word (or of a given sense), from -5 to 5, 0 being the neutral.7 For each word, we asked 5 Turkers to rate and we took the average of the 5 ratings as the connotative intensity score of the word.
Evaluation 11: Human Evaluation on ConnotationWordNet
7Because senses in WordNet can be tricky to understand, care should be taken in designing the task so that the Turkers will focus only on the corresponding sense of a word.
Evaluation 11: Human Evaluation on ConnotationWordNet
As an incentive, each Turker was rewarded $0.07 per hit which consists of 10 words to label.
Turkers is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: