``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives

Texts and dialogues often express information indirectly.

An important challenge for natural language processing is how to learn not only basic linguistic meanings but also how those meanings are systematically enriched when expressed in context.

Indirect speech acts are studied by Clark (1979), Perrault and Allen (1980), Allen and Perrault (1980) and Asher and Lascarides (2003), who identify a wide range of factors that govern how speakers convey their intended messages and how hearers seek to uncover those messages from uncertain and conflicting signals.

Since indirect answers are likely to arise in interviews, to gather instances of question—answer pairs involving gradable modifiers (which will serve to evaluate the learning techniques), we use online CNN interview transcripts from five different shows aired between 2000 and 2008 (Anderson Cooper, Larry King Live, Late Edition, Lou Dobbs Tonight, The Situation Room).

In this section, we present the methods we propose for grounding the meanings of scalar modifiers.

Our primary goal is to evaluate how well we can learn the relevant scalar and entailment relationships from the Web.

Performance is extremely good on the “Adverb —same adjective” and “Negation — same adjective” cases because the ‘Yes’ answer is fairly direct for them (though adverbs like basically introduce an interesting level of uncertainty).

We set out to find techniques for grounding basic meanings from text and enriching those meanings based on information from the immediate linguistic context.

Appears in 13 sentences as: Turker (2) Turkers (8) Turkers’ (3)

In *``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives*

- Our experimental results closely match the Turkers’ response data, demonstrating that meanings can be learned from Web data and that such meanings can drive pragmatic inference.Page 1, “Abstract”
- Given a written dialogue between speakers A and B, Turkers were asked to judge what B’s answer conveys: ‘definite yes’, ‘probable yes’, ‘uncertain’, ‘probable no’ , ‘definite no’.Page 3, “Corpus description”
- For each dialogue, we got answers from 30 Turkers , and we took the dominant response as the correct one though we make extensive use of the full response distributions in evaluating our approach.2 We also computed entropy values for the distribution of answers for each item.Page 3, “Corpus description”
- 2120 Turkers were involved (the median number of items done was 28 and the mean 56.5).Page 3, “Corpus description”
- In the case of the scalar modifiers experiment, there were just two examples whose dominant response from the Turkers was ‘Uncertain’, so we have left that category out of the results.Page 7, “Evaluation and results”
- We count an inference as successful if it matches the dominant Turker response category.Page 7, “Evaluation and results”
- Figure 5: Correlation between agreement among Turkers and whether the system gets the correct answer.Page 8, “Analysis and discussion”
- For each dialogue, we plot a circle at Turker response entropy and either 1 = correct inference or 0 = incorrect inference, except the points are jittered a little vertically to show where the mass of data lies.Page 8, “Analysis and discussion”
- late almost perfectly with the Turkers’ responses.Page 9, “Analysis and discussion”
- In the latter case only, the system output (uncertain) doesn’t correlate with the Turkers’ judgment (where the dominant answer is ‘probable yes’ with 15 responses, and 11 answers are ‘uncertain’).Page 9, “Analysis and discussion”
- If we restrict attention to the 66 examples on which the Turkers completely agreed about which of these three categories was intended (again pooling ‘probable’ and ‘definite’), then the percentage of correct inferences rises to 89% (59 correct inferences).Page 9, “Analysis and discussion”

See all papers in *Proc. ACL 2010* that mention Turkers.

See all papers in *Proc. ACL* that mention Turkers.

Back to top.

Appears in 6 sentences as: Mechanical Turk (6)

In *``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives*

- To evaluate the methods, we collected examples of question—answer pairs involving scalar modifiers from CNN transcripts and the Dialog Act corpus and use response distributions from Mechanical Turk workers to assess the degree to which each answer conveys ‘yes’ or ‘no’.Page 1, “Abstract”
- Table 2: Mean entropy values and standard deviation obtained in the Mechanical Turk experiment for each question—answer pair category.Page 3, “Corpus description”
- To assess the degree to which each answer conveys ‘yes’ or ‘no’ in context, we use response distributions from Mechanical Turk workers.Page 3, “Corpus description”
- Despite variant individual judgments, aggregate annotations done with Mechanical Turk have been shown to be reliable (Snow et a1., 2008; Sheng et a1., 2008; Munro et a1., 2010).Page 3, “Corpus description”
- Figure 1: Design of the Mechanical Turk experiment.Page 4, “Corpus description”
- To evaluate the techniques, we pool the Mechanical Turk ‘definite yes’ and ‘probable yes’ categories into a single category ‘Yes’, and we do the same for ‘definite no’ and ‘probable no’.Page 7, “Evaluation and results”

See all papers in *Proc. ACL 2010* that mention Mechanical Turk.

See all papers in *Proc. ACL* that mention Mechanical Turk.

Back to top.

Appears in 5 sentences as: logistic regression (5)

In *``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives*

- A logistic regression model can capture these facts.Page 6, “Methods”
- tract ages from the positive and negative snippets obtained, and we fit a logistic regression to these data.Page 6, “Methods”
- The logistic regression thus has only one factor — the unit of measure (age in the case of little kids).Page 6, “Methods”
- The fitted logistic regression model (black line) has a statistically significant coefficient for response entropy (p < 0.001).Page 8, “Analysis and discussion”
- Figure 5 plots the relationship between the response entropy and the accuracy of our decision procedure, along with a fitted logistic regression model using response entropy to predict whether our system’s inference was correct.Page 9, “Analysis and discussion”

See all papers in *Proc. ACL 2010* that mention logistic regression.

See all papers in *Proc. ACL* that mention logistic regression.

Back to top.

Appears in 3 sentences as: regression model (3)

In *``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives*

- A logistic regression model can capture these facts.Page 6, “Methods”
- The fitted logistic regression model (black line) has a statistically significant coefficient for response entropy (p < 0.001).Page 8, “Analysis and discussion”
- Figure 5 plots the relationship between the response entropy and the accuracy of our decision procedure, along with a fitted logistic regression model using response entropy to predict whether our system’s inference was correct.Page 9, “Analysis and discussion”

See all papers in *Proc. ACL 2010* that mention regression model.

See all papers in *Proc. ACL* that mention regression model.

Back to top.

Appears in 3 sentences as: WordNet (3)

In *``Was It Good? It Was Provocative.'' Learning the Meaning of Scalar Adjectives*

- We also replace the negation and the adjective by the antonyms given in WordNet (using the first sense).Page 6, “Methods”
- (2008) use WordNet to develop sentiment lexicons in which each word has a positive or negative value associated with it, representing its strength.Page 7, “Analysis and discussion”
- The algorithm begins with seed sets of positive, negative, and neutral terms, and then uses the synonym and antonym structure of WordNet to expand those initial sets and refine the relative strength values.Page 7, “Analysis and discussion”

See all papers in *Proc. ACL 2010* that mention WordNet.

See all papers in *Proc. ACL* that mention WordNet.

Back to top.