Modeling Factuality Judgments in Social Media Text
Soni, Sandeep and Mitra, Tanushree and Gilbert, Eric and Eisenstein, Jacob

Article Structure

Abstract

How do journalists mark quoted content as certain or uncertain, and how do readers interpret these signals?

Introduction

Contemporary journalism is increasingly conducted through social media services like Twitter (Lotan et al., 2011; Hermida et al., 2012).

Text data

We gathered a dataset of Twitter messages from 103 professional journalists and bloggers who work in the field of American Politics.1 Tweets were gathered using Twitter’s streaming API, extracting the complete permissible timeline up to February 23, 2014.

Annotation

We used Amazon Mechanical Turk (AMT) to collect ratings of claims.

Modeling factuality judgments

Having obtained a corpus of factuality ratings, we now model the factors that drive these ratings.

Related work

Factuality and Veridicality The creation of FactBank (Sauri and Pustejovsky, 2009) has enabled recent work on the factuality (or “veridicality”) of event mentions in text.

Conclusion

Perceptions of the factuality of quoted content are influenced by the cue words used to introduce them, while extra-linguistic factors, such as the source and the author, did not appear to be relevant in our experiments.

Topics

social media

Appears in 5 sentences as: social media (5)
In Modeling Factuality Judgments in Social Media Text
  1. Contemporary journalism is increasingly conducted through social media services like Twitter (Lotan et al., 2011; Hermida et al., 2012).
    Page 1, “Introduction”
  2. However, less is known about this phenomenon in social media — a domain whose endemic uncertainty makes proper treatment of factuality even more crucial (Morris et al., 2012).
    Page 1, “Introduction”
  3. search, which focuses on quoted statements in social media text.
    Page 5, “Related work”
  4. Credibility in social media Recent work in the area of computational social science focuses on understanding credibility cues on Twitter.
    Page 5, “Related work”
  5. The search for reliable signals of information credibility in social media has led to the construction of automatic classifiers to identify credible tweets (Castillo et al., 2011).
    Page 5, “Related work”

See all papers in Proc. ACL 2014 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.

Turkers

Appears in 5 sentences as: Turkers (5)
In Modeling Factuality Judgments in Social Media Text
  1. To ensure quality control we required the Turkers to have at least 85% hit approval rating and to reside in the United States, because the Twitter messages in our dataset were related to American politics.
    Page 3, “Annotation”
  2. we obtained five independent ratings from Turkers satisfying the above qualifications.
    Page 3, “Annotation”
  3. We also allowed for “Not Applicable” option to capture ratings where the Turkers did not have sufficient knowledge about the statement or if the statement was not really a claim.
    Page 3, “Annotation”
  4. Figure 6 shows the set of instructions provided to the Turkers , and Figure 5 illustrates the annotation interface.2
    Page 3, “Annotation”
  5. We excluded tweets for which three or more Turkers gave a rating of “Not Applicable,” leaving us with a dataset of 1170 tweets.
    Page 3, “Annotation”

See all papers in Proc. ACL 2014 that mention Turkers.

See all papers in Proc. ACL that mention Turkers.

Back to top.

Linear regression

Appears in 4 sentences as: Linear regression (3) linear regressions (1)
In Modeling Factuality Judgments in Social Media Text
  1. Table 3: Linear regression error rates for each feature group.
    Page 4, “Modeling factuality judgments”
  2. We performed another set of linear regressions , again using the mean certainty rating as the dependent variable.
    Page 4, “Modeling factuality judgments”
  3. Cue Words Figure 7: Linear regression coefficients for frequently-occurring cue words.
    Page 5, “Modeling factuality judgments”
  4. Cue Groups Figure 8: Linear regression coefficients for cue word group.
    Page 5, “Modeling factuality judgments”

See all papers in Proc. ACL 2014 that mention Linear regression.

See all papers in Proc. ACL that mention Linear regression.

Back to top.

Mechanical Turk

Appears in 4 sentences as: Mechanical Turk (4)
In Modeling Factuality Judgments in Social Media Text
  1. This dataset was annotated by Mechanical Turk workers who gave ratings for the factuality of the scoped claims in each Twitter message.
    Page 1, “Introduction”
  2. We used Amazon Mechanical Turk (AMT) to collect ratings of claims.
    Page 3, “Annotation”
  3. While these findings must be interpreted with caution, they suggest that readers — at least, Mechanical Turk workers — use relatively little independent judgment to assess the validity of quoted text that they encounter on Twitter.
    Page 4, “Modeling factuality judgments”
  4. (2012) conduct an empirical evaluation of FactBank ratings from Mechanical Turk workers, finding a high degree of disagreement between raters.
    Page 5, “Related work”

See all papers in Proc. ACL 2014 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.

named entity

Appears in 3 sentences as: named entity (3)
In Modeling Factuality Judgments in Social Media Text
  1. We present a new dataset of Twitter messages that use FactBank predicates (e.g., claim, say, insist) to scope the claims of named entity sources.
    Page 1, “Introduction”
  2. 0 Finally, we restrict consideration to tweets in which the source contains a named entity or twitter usemame.
    Page 2, “Text data”
  3. 0 Source: represented by the named entity or username in the source field (see Figure 4) Journalist: represented by their Twitter ID Claim: represented by a bag-of-words vector from the claim field (Figure 4)
    Page 3, “Modeling factuality judgments”

See all papers in Proc. ACL 2014 that mention named entity.

See all papers in Proc. ACL that mention named entity.

Back to top.