The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
Tan, Chenhao and Lee, Lillian and Pang, Bo

Article Structure

Abstract

Consider a person trying to spread an important message on a social network.

Introduction

How does one make a message “successful”?

Topics

unigrams

Appears in 14 sentences as: unigram (9) unigrams (11)
In The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
  1. twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
    Page 6, “Introduction”
  2. We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams ( unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
    Page 6, “Introduction”
  3. headline unigram TT YES (53%) headline bigram TTTT * YES (52%)
    Page 6, “Introduction”
  4. We also consider tagged bag-of-words (“BOW”) features, which includes all the unigram (wordzPOS pair) and bigram features that appear more than 10 times in the cross-validation data.
    Page 7, “Introduction”
  5. This yields 3,568 unigram features and 4,095 bigram features, for a total of 7,663 so-called I ,2- gram features.
    Page 7, “Introduction”
  6. Furthermore, note the superior performance of unigrams trained on TAC data vs fiTAC+ff+time — which is similar to our unigrams but trained on a larger but non-TAC dataset that included metadata.
    Page 8, “Introduction”
  7. Not surprisingly, the TAC-trained BOW features ( unigram and 1,2-gram) show impressive predictive power in this task: many of our custom features can be captured by bag-of-word features, in a way.
    Page 8, “Introduction”
  8. The top three rows of Table 12 show the best custom and best and worst unigram features for our method; the bottom two rows show the best and worst unigrams for fiTAC+ff+time.
    Page 9, “Introduction”
  9. As for unigram features, not surprisingly, “rt” and “retweet” are top features for both our approach and fiTAC+ff+time.
    Page 9, “Introduction”
  10. However, the other unigrams for the two methods seem to be a bit different in spirit.
    Page 9, “Introduction”
  11. Some of the unigrams determined to be most poor only by our method appear to be both surprising and yet plausible in retrospect: “icymi” (abbreviation for “in case you missed it”) tends to indicate a direct repetition of older information, so people might prefer to retweet the earlier version; “thanks” and “sorry” could correspond to personal thank-yous and apologies not meant to be shared with a broader audience, and similarly @-mentioning another user may indicate a tweet intended only for that person.
    Page 9, “Introduction”

See all papers in Proc. ACL 2014 that mention unigrams.

See all papers in Proc. ACL that mention unigrams.

Back to top.

bigram

Appears in 7 sentences as: bigram (8)
In The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
  1. twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
    Page 6, “Introduction”
  2. We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams ( bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
    Page 6, “Introduction”
  3. 16The tokens [at], [hashtag], [url] were ignored in the unigram-model case to prevent their undue influence, but retained in the bigram model to capture longer-range usage (“combination”) patterns.
    Page 6, “Introduction”
  4. headline unigram TT YES (53%) headline bigram TTTT * YES (52%)
    Page 6, “Introduction”
  5. We also consider tagged bag-of-words (“BOW”) features, which includes all the unigram (wordzPOS pair) and bigram features that appear more than 10 times in the cross-validation data.
    Page 7, “Introduction”
  6. This yields 3,568 unigram features and 4,095 bigram features, for a total of 7,663 so-called I ,2- gram features.
    Page 7, “Introduction”
  7. Our approach best 15 custom twitter bigram , length (chars), rt (the word), retweet (the word), verb, verb retweet score, personal unigram, proper noun, number, noun, positive words, please (the word), proper noun retweet score, indefinite articles (a,an), adjective best 20 unigrams rt, retweet, [num], breaking, is, win, never, ., people, need, official, officially, are, please, november, world, girl, !!
    Page 9, “Introduction”

See all papers in Proc. ACL 2014 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

language model

Appears in 6 sentences as: language model (4) language models (4)
In The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
  1. This crawling process also yielded 632K TAC pairs whose only difference was spacing, and an additional 558M “unpaired” tweets; as shown later in this paper, we used these extra corpora for computing language models and other auxiliary information.
    Page 3, “Introduction”
  2. Table 5: Conformity to the community and one’s own past, measured via scores assigned by various language models .
    Page 6, “Introduction”
  3. We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
    Page 6, “Introduction”
  4. scoring by a language model built from New York Times headlines.17
    Page 6, “Introduction”
  5. We group the features introduced in §5.1 into 16 lexicon-based features (Table 3, 8, 9, 10), 9 informativeness features (Table 4), 6 language model features (Table 5, 6), 6 rt score features (Table 7), and 2 readability features (Table 11).
    Page 7, “Introduction”
  6. Among custom features, we see that community and personal language models , informativeness, retweet scores, sentiment, and generality are represented.
    Page 9, “Introduction”

See all papers in Proc. ACL 2014 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

Amazon Mechanical Turk

Appears in 3 sentences as: Amazon Mechanical Turk (3)
In The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
  1. In an Amazon Mechanical Turk (AMT) experiment (§4), we found that humans achieved an average accuracy of 61.3%: not that high, but better than chance, indicating that it is somewhat possible for humans to predict greater message spread from different deliveries of the same information.
    Page 2, “Introduction”
  2. We first ran a pilot study on Amazon Mechanical Turk (AMT) to determine whether humans can identify, based on wording differences alone, which of two topic- and author- controlled tweets is spread more widely.
    Page 4, “Introduction”
  3. We outperform the average human accuracy of 61% reported in our Amazon Mechanical Turk experiments (for a different data sample); fiTAC+ff+time fails to do so.
    Page 8, “Introduction”

See all papers in Proc. ACL 2014 that mention Amazon Mechanical Turk.

See all papers in Proc. ACL that mention Amazon Mechanical Turk.

Back to top.

Mechanical Turk

Appears in 3 sentences as: Mechanical Turk (3)
In The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter
  1. In an Amazon Mechanical Turk (AMT) experiment (§4), we found that humans achieved an average accuracy of 61.3%: not that high, but better than chance, indicating that it is somewhat possible for humans to predict greater message spread from different deliveries of the same information.
    Page 2, “Introduction”
  2. We first ran a pilot study on Amazon Mechanical Turk (AMT) to determine whether humans can identify, based on wording differences alone, which of two topic- and author- controlled tweets is spread more widely.
    Page 4, “Introduction”
  3. We outperform the average human accuracy of 61% reported in our Amazon Mechanical Turk experiments (for a different data sample); fiTAC+ff+time fails to do so.
    Page 8, “Introduction”

See all papers in Proc. ACL 2014 that mention Mechanical Turk.

See all papers in Proc. ACL that mention Mechanical Turk.

Back to top.