Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
Qazvinian, Vahed and Radev, Dragomir R.

Article Structure

Abstract

We analyze collective discourse, a collective human behavior in content generation, and show that it exhibits diversity, a property of general collective systems.

Introduction

In sociology, the term collective behavior is used to denote mass activities that are not centrally coordinated (Blumer, 1951).

Prior Work

In summarization, a number of previous methods have focused on diversity.

Data Annotation

The datasets used in our experiments represent two completely different categories: news headlines, and scientific citation sentences.

Diversity

We study the diversity of ways with which human summarizers talk about the same story or event and explain why such a diversity exists.

Diversity-based Ranking

In previous sections we showed that the diversity seen in human summaries could be according to different nuggets or phrases that represent the same factoid.

Conclusion and Future Work

Our experiments on two different categories of human-written summaries (headlines and citations) showed that a lot of the diversity seen in human summarization comes from different nuggets that may actually represent the same semantic information (i.e., factoids).

Acknowledgments

This work is supported by the National Science Foundation grant number 118-0705 832 and grant number 118-0968489.

Topics

distributional similarities

Appears in 7 sentences as: distributional similarities (3) Distributional Similarity (2) distributional similarity (2)
In Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
  1. Finally, we present a ranker that employs distributional similarities to build a network of words, and captures the diversity of perspectives by detecting communities in this network.
    Page 1, “Abstract”
  2. 5.1 Distributional Similarity
    Page 6, “Diversity-based Ranking”
  3. In order to capture the nuggets of equivalent semantic classes, we use a distributional similarity of
    Page 6, “Diversity-based Ranking”
  4. The method based on the distributional similarities of words outperforms other methods in the citations category.
    Page 8, “Diversity-based Ranking”
  5. Table 4 shows top 3 headlines from 3 rankers: word distributional similarity (WDS), C-LexRank, and MMR.
    Page 8, “Diversity-based Ranking”
  6. 3: red sox win world series l:red sox scale the rockies MMR 2: boston sweep colorado to win world series 3: rookies respond in first crack at the big time C—LR=C-LexRank; WDS=Word Distributional Similarity
    Page 9, “Diversity-based Ranking”
  7. Finally, we proposed a ranking system that employs word distributional similarities to identify semantically equivalent words, and compared it with a wide
    Page 9, “Conclusion and Future Work”

See all papers in Proc. ACL 2011 that mention distributional similarities.

See all papers in Proc. ACL that mention distributional similarities.

Back to top.

cosine similarity

Appears in 6 sentences as: cosine similarities (1) cosine similarity (5)
In Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
  1. Once a lexical similarity graph is built, they modify the graph based on cluster information and perform LexRank on the modified cosine similarity graph.
    Page 2, “Prior Work”
  2. The edges between corresponding nodes (dz) represent the cosine similarity between them is above a threshold (0.10 following (Erkan and Radev, 2004)).
    Page 8, “Diversity-based Ranking”
  3. Maximal Marginal Relevance (MMR) (Carbonell and Goldstein, 1998) uses the pairwise cosine similarity matrix and greedily chooses sentences that are the least similar to those already in the summary.
    Page 8, “Diversity-based Ranking”
  4. C-LexRank is a clustering-based model in which the cosine similarities of document pairs are used to build a network of documents.
    Page 8, “Diversity-based Ranking”
  5. C-LexRank focuses on finding communities of documents using their cosine similarity .
    Page 8, “Diversity-based Ranking”
  6. The reason is that different nuggets that represent the same factoid often have no words in common (e.g., “victory” and “glory”) and won’t be captured by a lexical measure like cosine similarity .
    Page 8, “Diversity-based Ranking”

See all papers in Proc. ACL 2011 that mention cosine similarity.

See all papers in Proc. ACL that mention cosine similarity.

Back to top.

human annotators

Appears in 4 sentences as: human annotations (1) human annotators (3)
In Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
  1. Previously (Lin and Hovy, 2002) had shown that information overlap judgment is a difficult task for human annotators .
    Page 4, “Data Annotation”
  2. For each n-gram, w, in a given headline, we look if 212 is part of any nugget in either human annotations .
    Page 4, “Data Annotation”
  3. Table 2 shows the unigram, bigram, and trigram-based average H between the two human annotators (Humanl, Human2).
    Page 4, “Data Annotation”
  4. These results suggest that human annotators can reach substantial agreement when bigram and trigram nuggets are examined, and has reasonable agreement for unigram nuggets.
    Page 4, “Data Annotation”

See all papers in Proc. ACL 2011 that mention human annotators.

See all papers in Proc. ACL that mention human annotators.

Back to top.

social media

Appears in 3 sentences as: social media (3)
In Learning From Collective Human Behavior to Introduce Diversity in Lexical Choice
  1. In this paper, we focus on the computational analysis of collective discourse, a collective behavior seen in interactive content contribution and text summarization in online social media .
    Page 1, “Introduction”
  2. In social media , discourse (Grosz and Sidner, 1986) is Often a collective reaction to an event.
    Page 1, “Introduction”
  3. Finally, recent research on analyzing online social media shown a growing interest in mining news stories and headlines because of its broad applications ranging from “meme” tracking and spike detection (Leskovec et al., 2009) to text summarization (Barzilay and McKeown, 2005).
    Page 2, “Prior Work”

See all papers in Proc. ACL 2011 that mention social media.

See all papers in Proc. ACL that mention social media.

Back to top.