Danescu-Niculescu-Mizil, Cristian and Cheng, Justin and Kleinberg, Jon and Lee, Lillian
Article Structure
Abstract
Understanding the ways in which information achieves widespread public awareness is a research question of significant interest.
Hello. My name is Inigo Montoya.
Understanding what items will be retained in the public consciousness, and why, is a question of fundamental interest in many domains, including marketing, politics, entertainment, and social media; as we all know, many items barely register, whereas others catch on and take hold in many people’s minds.
I’m ready for my closeup.
2.1 Data
Never send a human to do a machine’s job.
We now discuss experiments that investigate the hypotheses discussed in §1.
A long time ago, in a galaxy far, far away
How an item’s linguistic form affects the reaction it generates has been studied in several contexts, including evaluations of product reviews [9], political speeches [12], online posts [13], scientific papers [14], and retweeting of Twitter posts [36].
I think this is the beginning of a beautiful friendship.
Motivated by the broad question of what kinds of information achieve widespread public awareness, we studied the the effect of phrasing on a quote’s memorability.
Topics
language model
Appears in 13 sentences as: language model (6) language models (4) language models’ (1) language” model (2) language” models (2)
In You Had Me at Hello: How Phrasing Affects Memorability
- First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their non-memorable counterparts.
Page 2, “Hello. My name is Inigo Montoya.”
- In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes.
Page 3, “Hello. My name is Inigo Montoya.”
- In order to assess different levels of lexical and syntactic distinctiveness, we employ a total of six Laplace-smoothed8 language models : l-gram, 2-gram, and 3-gram word LMs and l-gram, 2-gram and 3-gram part-of-speech9 LMs.
Page 5, “Never send a human to do a machine’s job.”
- As indicated in Table 3, for each of our lexical “common language” models , in about 60% of the quote pairs, the memorable quote is more distinctive.
Page 5, “Never send a human to do a machine’s job.”
- The language models’ vocabulary was that of the entire training corpus.
Page 5, “Never send a human to do a machine’s job.”
- Table 3: Distinctiveness: percentage of quote pairs in which the the memorable quote is more distinctive than the non-memorable one according to the respective “common language” model .
Page 6, “Never send a human to do a machine’s job.”
- Specifically, we train one language model on memorable quotes and another on non-memorable quotes
Page 6, “Never send a human to do a machine’s job.”
- (Non)memorable Slogans N ewswire language models 1—gram 56.15%** 33.77%*** lexical 2—gram 51.51% 25.15%*** 3—gram 52.44% 28.89%*** 1—gram 73.09%*** 68.27%*** syntactic 2—gram 64.04%*** 50.21% 3—gram 62.88%*** 55.09%***
Page 7, “Never send a human to do a machine’s job.”
- Table 5: Cross-domain concept of “memorable” language: percentage of slogans that have higher likelihood under the memorable language model than under the non-memorable one (for each of the six language models considered).
Page 7, “Never send a human to do a machine’s job.”
- Rightmost column: for reference, the percentage of newswire sentences that have higher likelihood under the memorable language model than under the non-memorable one.
Page 7, “Never send a human to do a machine’s job.”
- We also note that the higher likelihood of slogans under a “memorable language” model is not simply occurring for the trivial reason that this model predicts all other large bodies of text better.
Page 7, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention language model.
See all papers in Proc. ACL that mention language model.
Back to top.
bag-of-words
Appears in 5 sentences as: bag-of-words (5)
In You Had Me at Hello: How Phrasing Affects Memorability
- Our first formulation of the prediction task uses a standard bag-of-words model“).
Page 7, “Never send a human to do a machine’s job.”
- If there were no information in the textual content of a quote to determine whether it were memorable, then an SVM employing bag-of-words features should perform no better than chance.
Page 7, “Never send a human to do a machine’s job.”
- Even a relatively small number of distinctiveness features, on their own, improve significantly over the much larger bag-of-words model.
Page 7, “Never send a human to do a machine’s job.”
- Thus, the main conclusion from these prediction tasks is that abstracting notions such as distinctiveness and generality can produce relatively streamlined models that outperform much heavier-weight bag-of-words models, and can suggest steps toward approaching the performance of human judges who — very much unlike our system — have the full cultural context in which movies occur at their disposal.
Page 7, “Never send a human to do a machine’s job.”
- Accuracies statistically significantly greater than bag-of-words according to a two-tailed t-test are indicated with *(p<.05) and **(p<.01).
Page 8, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention bag-of-words.
See all papers in Proc. ACL that mention bag-of-words.
Back to top.
part-of-speech
Appears in 4 sentences as: part-of-speech (5)
In You Had Me at Hello: How Phrasing Affects Memorability
- Interestingly, this distinctiveness takes place at the level of words, but not at the level of other syntactic features: the part-of-speech composition of memorable quotes is in fact more likely with respect to newswire.
Page 2, “Hello. My name is Inigo Montoya.”
- Thus, we can think of memorable quotes as consisting, in an aggregate sense, of unusual word choices built on a scaffolding of common part-of-speech patterns.
Page 2, “Hello. My name is Inigo Montoya.”
- In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes.
Page 3, “Hello. My name is Inigo Montoya.”
- We then develop models using features based on the measures formulated earlier in this section: generality measures (the four listed in Table 4); distinctiveness measures (likelihood according to l, 2, and 3-gram “common language” models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them); and similarity-to-slogans measures (likelihood according to l, 2, and 3-gram slogan-language models at the lexical and part-of-speech level for each quote in the pair, their differences, and pairwise comparisons between them).
Page 7, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention part-of-speech.
See all papers in Proc. ACL that mention part-of-speech.
Back to top.
human judgments
Appears in 3 sentences as: human judges (1) human judgments (2)
In You Had Me at Hello: How Phrasing Affects Memorability
- None of these observations, however, serve as definitions, and indeed, we believe it desirable to not pre-commit to an abstract definition, but rather to adopt an operational formulation based on external human judgments .
Page 2, “Hello. My name is Inigo Montoya.”
- In designing our study, we focus on a domain in which (i) there is rich use of language, some of which has achieved deep cultural penetration; (ii) there already exist a large number of external human judgments — perhaps implicit, but in a form we can extract; and (iii) we can control for the setting in which the text was used.
Page 2, “Hello. My name is Inigo Montoya.”
- Thus, the main conclusion from these prediction tasks is that abstracting notions such as distinctiveness and generality can produce relatively streamlined models that outperform much heavier-weight bag-of-words models, and can suggest steps toward approaching the performance of human judges who — very much unlike our system — have the full cultural context in which movies occur at their disposal.
Page 7, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention human judgments.
See all papers in Proc. ACL that mention human judgments.
Back to top.
model trained
Appears in 3 sentences as: model trained (3) models trained (1)
In You Had Me at Hello: How Phrasing Affects Memorability
- First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their non-memorable counterparts.
Page 2, “Hello. My name is Inigo Montoya.”
- In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes.
Page 3, “Hello. My name is Inigo Montoya.”
- In particular, the newswire section of the Brown corpus is predicted better at the lexical level by the language model trained on non-memorable quotes.
Page 7, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention model trained.
See all papers in Proc. ACL that mention model trained.
Back to top.
statistically significant
Appears in 3 sentences as: statistically significant (2) statistically significantly (1)
In You Had Me at Hello: How Phrasing Affects Memorability
- For the null hypothesis of random guessing, these results are statistically significant , p < 2‘6 m .016.
Page 4, “I’m ready for my closeup.”
- Table 2 shows that all the subjects performed (sometimes much) better than chance, and against the null hypothesis that all subjects are guessing randomly, the results are statistically significant , p < 2‘6 m .016.
Page 4, “I’m ready for my closeup.”
- Accuracies statistically significantly greater than bag-of-words according to a two-tailed t-test are indicated with *(p<.05) and **(p<.01).
Page 8, “Never send a human to do a machine’s job.”
See all papers in Proc. ACL 2012 that mention statistically significant.
See all papers in Proc. ACL that mention statistically significant.
Back to top.