Abstract | Integrating work from psychology and computational linguistics, we develop and compare three approaches to detecting deceptive opinion spam, and ultimately develop a classifier that is nearly 90% accurate on our gold-standard opinion spam dataset. |
Conclusion and Future Work | In this work we have developed the first large-scale dataset containing gold-standard deceptive opinion spam. |
Dataset Construction and Human Performance | In this section, we report our efforts to gather (and validate with human judgments) the first publicly available opinion spam dataset with gold-standard deceptive opinions. |
Dataset Construction and Human Performance | To solicit gold-standard deceptive opinion spam using AMT, we create a pool of 400 Human-Intelligence Tasks (HITS) and allocate them evenly across our 20 chosen hotels. |
Introduction | Indeed, in the absence of gold-standard data, related studies (see Section 2) have been forced to utilize ad hoc procedures for evaluation. |
Introduction | In contrast, one contribution of the work presented here is the creation of the first large-scale, publicly available6 dataset for deceptive opinion spam research, containing 400 truthful and 400 gold-standard deceptive reviews. |
Related Work | Using product review data, and in the absence of gold-standard deceptive opinions, they train models using features based on the review text, reviewer, and product, to distinguish between duplicate opinions7 (considered deceptive spam) and non-duplicate opinions (considered truthful). |
Related Work | of gold-standard data, based on the distortion of popularity rankings. |
Related Work | Both of these heuristic evaluation approaches are unnecessary in our work, since we compare gold-standard deceptive and truthful opinions. |
Experiments | For each low-frequency code c, we hold out all training documents that include 0 in their gold-standard code set. |
Method | Labelling: Each candidate code is assigned a binary label (present or absent) based on whether it appears in the gold-standard code set. |
Method | process can not introduce gold-standard codes that were not proposed by the dictionary. |
Method | The gold-standard code set for the document is used to infer a gold-standard label sequence for these codes (top right). |
Experiments | Finally we get 12,245 tweets, forming the gold-standard data set. |
Experiments | The gold-standard data set is evenly split into two parts: One for training and the other for testing. |
Experiments | Precision is a measure of what percentage the output labels are correct, and recall tells us to what percentage the labels in the gold-standard data set are correctly labeled, while F1 is the harmonic mean of precision and recall. |
CD | checked the recall of all brackets generated by CCL against gold-standard constituent chunks. |
CD | CCM scores are italicized as a reminder that CCM uses gold-standard POS sequences as input, so its results are not strictly comparable to the others. |
Introduction | Recent work (Headden III et al., 2009; Cohen and Smith, 2009; Hanig, 2010; Spitkovsky et al., 2010) has largely built on the dependency model with valence of Klein and Manning (2004), and is characterized by its reliance on gold-standard part-of—speech (POS) annotations: the models are trained on and evaluated using sequences of POS tags rather than raw tokens. |