Generating Impact-Based Summaries for Scientific Literature
Mei, Qiaozhu and Zhai, ChengXiang

Article Structure

Abstract

In this paper, we present a study of a novel summarization problem, i.e., summarizing the impact of a scientific publication.

Introduction

The volume of scientific literature has been growing rapidly.

Impact Summarization

Following the existing work on topical summarization of scientific literature (Paice, 1981; Paice and Jones, 1993), we define an impact-based summary of a paper as a set of sentences extracted from a paper that can reflect the impact of the paper, where “impact” is roughly defined as the influence of the paper on research of similar or related topics as reflected in the citations of the paper.

Language Models for Impact Summarization

3.1 Impact language models

Estimation of Impact Language Models

Intuitively, the impact of a paper is mostly reflected in the citation context.

Experiments and Results

5.1 Experiment Design 5.1.1 Test set construction

Related Work

General text summarization, including single document summarization (Luhn, 1958; Goldstein et al., 1999) and multi-document summarization (Kraaij et al., 2001; Radev et al., 2003) has been well studied; our work is under the framework of extractive summarization (Luhn, 1958; McKeown and Radev, 1995; Goldstein et al., 1999; Kraaij et al., 2001), but our problem formulation differs from any existing formulation of the summarization problem.

Conclusions

We have defined and studied the novel problem of summarizing the impact of a research paper.

Topics

language model

Appears in 23 sentences as: language model (15) language model: (1) language modeling (3) language models (7)
In Generating Impact-Based Summaries for Scientific Literature
  1. We propose language modeling methods for solving this problem, and study how to incorporate features such as authority and proximity to accurately estimate the impact language model .
    Page 1, “Abstract”
  2. We propose language models to exploit both the citation context and original content of a paper to generate an impact-based summary.
    Page 2, “Introduction”
  3. We study how to incorporate features such as authority and proximity into the estimation of language models .
    Page 2, “Introduction”
  4. We propose and evaluate several different strategies for estimating the impact language model , which is key to impact summarization.
    Page 2, “Introduction”
  5. In Section 2 and 3, we define the impact-based summarization problem and propose the general language modeling approach.
    Page 2, “Introduction”
  6. In Section 4, we present different strategies and features for estimating an impact language model , a key challenge in impact summarization.
    Page 2, “Introduction”
  7. To solve these challenges, in the next section, we propose to model impact with un-igram language models and score sentences using
    Page 2, “Impact Summarization”
  8. We further propose methods for estimating the impact language model based on several features including the authority of citations, and the citation proximity.
    Page 3, “Impact Summarization”
  9. 3.1 Impact language models
    Page 3, “Language Models for Impact Summarization”
  10. We thus propose to represent such a virtual impact query with a unigram language model .
    Page 3, “Language Models for Impact Summarization”
  11. Such a model is expected to assign high probabilities to those words that can describe the impact of paper d, just as we expect a query language model in ad hoc retrieval to assign high probabilities to words that tend to occur in relevant documents (Ponte and Croft, 1998).
    Page 3, “Language Models for Impact Summarization”

See all papers in Proc. ACL 2008 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

gold standard

Appears in 5 sentences as: gold standard (4) gold standards (1)
In Generating Impact-Based Summaries for Scientific Literature
  1. The sentences that were decided as covering some influential content were then collected as the gold standard impact summary for the paper.
    Page 5, “Experiments and Results”
  2. We assume that the title of a paper will always be included in the summary, so we excluded the title both when constructing the gold standard and when generating a summary.
    Page 5, “Experiments and Results”
  3. The gold standard summaries have a minimum length of 5 sentences and a maximum length of 18 sentences; the median length is 9 sentences.
    Page 5, “Experiments and Results”
  4. These 14 impact-based summaries are used as gold standards for our experiments, based on which all summaries generated by the system are evaluated.
    Page 5, “Experiments and Results”
  5. Indeed, this can also explain why MEAD-Doc+Cite tends to perform worse than LEAD by ROUGE-L since if MEAD-Doc+Cite picks up sentences from the citation context rather than the original papers, it would not match as well with the gold standard as LEAD which selects sentences from the origi-
    Page 6, “Experiments and Results”

See all papers in Proc. ACL 2008 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.