Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization
Nenkova, Ani and Louis, Annie

Article Structure

Abstract

Different summarization requirements could make the writing of a good summary more difficult, or easier.

Introduction

In certain situations even the best automatic summarizers or professional writers can find it hard to write a good summary of a set of articles.

Preliminary analysis and distinctions: DUC 2001

Generic multi-document summarization was featured as a task at the Document Understanding Conference (DUC) in four years, 2001 through 2004.

Features

We implemented 14 features for our analysis of input set difficulty.

Feature selection

Table 4 shows the results from a one-sided t-test comparing the values of the various features for the easy and difficult input set classes.

Classification results

We used the 192 sets from multi-document summarization DUC evaluations in 2002 (55 generic sets), 2003 (30 generic summary sets and 7 viewpoint sets) and 2004 (50 generic and 50 biography sets) to train and test a logistic regression classifier.

Conclusions

We have addressed the question of What makes the writing of a summary for a multi-document input difficult.

Topics

Cosine similarity

Appears in 4 sentences as: Cosine similarity (2) cosine similarity (2)
In Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization
  1. Cosine similarity between the document vector representations is probably the easiest and most commonly used among the various similarity measures.
    Page 5, “Features”
  2. The cosine similarity between two (document representation) vectors v1 and 212 is given by 0036 = W. A value of 0 indicates that the vectors are orthogonal and dissimilar, a value of 1 indicates perfectly similar documents in terms of the words con-
    Page 5, “Features”
  3. To compute the cosine overlap features, we find the pairwise cosine similarity between each two documents in an input set and compute their average.
    Page 5, “Features”
  4. Cosine similarity measures the overlap between two documents based on all the words appearing in them.
    Page 7, “Features”

See all papers in Proc. ACL 2008 that mention Cosine similarity.

See all papers in Proc. ACL that mention Cosine similarity.

Back to top.

logistic regression

Appears in 3 sentences as: Logistic regression (1) logistic regression (2)
In Can You Summarize This? Identifying Correlates of Input Difficulty for Multi-Document Summarization
  1. We used the 192 sets from multi-document summarization DUC evaluations in 2002 (55 generic sets), 2003 (30 generic summary sets and 7 viewpoint sets) and 2004 (50 generic and 50 biography sets) to train and test a logistic regression classifier.
    Page 7, “Classification results”
  2. Table 6: Logistic regression classification results (accuracy, precision, recall and f—measure) for balanced data of 100-Word summaries from DUC’02 through DUC’04.
    Page 8, “Classification results”
  3. Experiments with a logistic regression classifier based on the features further confirms that input cohesiveness is predictive of the difficulty it will pose to automatic summarizers.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2008 that mention logistic regression.

See all papers in Proc. ACL that mention logistic regression.

Back to top.