Index of papers in Proc. ACL 2014 that mention
  • Turkers
Labutov, Igor and Lipson, Hod
Experiments
Our experimental procedure was as follows: 162 turkers were partitioned into four groups, each corresponding to a treatment condition: OPT (N=34), HF (N=4l), RANDOM (N=43), MAN (N=44).
Experiments
Font-size correlates with the score given by judge turkers in evaluating guesses of other turkers that were presented with the same text, but the word replaced with a blank.
Experiments
Turkers were solicited to participate in a study that involved “reading a short story with a twist” (title of HIT).
Model
For collecting data about which words are likely to be “predicted” given their content, we developed an Amazon Mechanical Turk task that presented turkers with excerpts of a short story (English translation of “The Man who Repented” by
Model
Turkers were required to type in their best guess, and the number of semantically similar guesses were counted by an average number of 6 other turkers .
Model
Turkers that judged the semantic similarity of the guesses of other turkers achieved an average Cohen’s kappa agreement of 0.44, indicating fair to poor agreement.
Turkers is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard
Abstract
customer generated truthful reviews, Turker generated deceptive reviews and employee (domain-expert) generated deceptive reviews.
Dataset Construction
3.1 Turker set, using Mechanical Turk
Dataset Construction
Anyone with basic programming skills can create Human Intelligence Tasks (HITs) and access a marketplace of anonymous online workers ( Turkers ) willing to complete the tasks.
Dataset Construction
to create their dataset, such as restricting task to Turkers located in the United States, and who maintain an approval rating of at least 90%.
Experiments
Specifically, we reframe it as a intra-domain multi-class classification task, where given the labeled training data from one domain, we learn a classifier to classify reviews according to their source, i.e., Employee, Turker and Customer.
Feature-based Additive Model
If we instead use SVM, for example, we would have to train classifiers one by one (due to the distinct features from different sources) to draw conclusions regarding the differences between Turker vs Expert vs truthful reviews, positive expert vs negative expert reviews, or reviews from different domains.
Feature-based Additive Model
ysource E {employee, turker , customer}}
Introduction
Despite the advantages of soliciting deceptive gold-standard material from Turkers (it is easy, large-scale, and affordable), it is unclear whether Turkers are representative of the general population that generate fake reviews, or in other words, Ott et al.’s data set may correspond to only one type of online deceptive opinion spam — fake reviews generated by people who have never been to offerings or experienced the entities.
Introduction
In contrast to existing work (Ott et al., 2011; Li et al., 2013b), our new gold standard includes three types of reviews: domain expert deceptive opinion spam (Employee), crowdsourced deceptive opinion spam ( Turker ), and truthful Customer reviews (Customer).
Related Work
created a gold-standard collection by employing Turkers to write fake reviews, and followup research was based on their data (Ott et al., 2012; Ott et al., 2013; Li et al., 2013b; Feng and Hirst, 2013).
Turkers is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris
Crowdsourcing Translation
52 different Turkers took part in the translation task, each translating 138 sentences on average.
Crowdsourcing Translation
In the editing task, 320 Turkers participated, averaging 56 sentences each.
Problem Formulation
We form two graphs: the first graph (GT) represents Turkers (translator/editor pairs) as nodes; the second graph (G0) represents candidate translated and
Problem Formulation
GT = (VT, ET) is a weighted undirected graph representing collaborations between Turkers .
Problem Formulation
The mutual reinforcement framework couples the two random walks on GT and G0 that rank candidates and Turkers in isolation.
Related work
ferent Turkers for a collection of Urdu sentences that had been previously professionally translated by the Linguistics Data Consortium.
Related work
They also hired US-based Turkers to edit the translations, since the translators were largely based in Pakistan and exhibited errors that are characteristic of speakers of English as a language.
Turkers is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob
Annotation
To ensure quality control we required the Turkers to have at least 85% hit approval rating and to reside in the United States, because the Twitter messages in our dataset were related to American politics.
Annotation
we obtained five independent ratings from Turkers satisfying the above qualifications.
Annotation
We also allowed for “Not Applicable” option to capture ratings where the Turkers did not have sufficient knowledge about the statement or if the statement was not really a claim.
Turkers is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Daxenberger, Johannes and Gurevych, Iryna
Corpus
It was important to find a reasonable amount of corresponding edit-turn-pairs before the actual annotation could take place, as we needed a certain amount of positive seeds to keep turkers from simply labeling pairs as non-corresponding all the time.
Corpus
The resulting 750 pairs have each been annotated by five turkers .
Corpus
The turkers were presented the turn text, the turn topic name, the edit in its context, and the edit comment (if present).
Turkers is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kang, Jun Seok and Feng, Song and Akoglu, Leman and Choi, Yejin
Evaluation 11: Human Evaluation on ConnotationWordNet
We first describe the labeling process of sense-level connotation: We selected 350 polysemous words and one of their senses, and each Turker was asked to rate the connotative polarity of a given word (or of a given sense), from -5 to 5, 0 being the neutral.7 For each word, we asked 5 Turkers to rate and we took the average of the 5 ratings as the connotative intensity score of the word.
Evaluation 11: Human Evaluation on ConnotationWordNet
7Because senses in WordNet can be tricky to understand, care should be taken in designing the task so that the Turkers will focus only on the corresponding sense of a word.
Evaluation 11: Human Evaluation on ConnotationWordNet
As an incentive, each Turker was rewarded $0.07 per hit which consists of 10 words to label.
Turkers is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: