Using Supervised Bigram-based ILP for Extractive Summarization
Li, Chen and Qian, Xian and Liu, Yang

Article Structure

Abstract

In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework.

Introduction

Extractive summarization is a sentence selection problem: identifying important summary sentences from one or multiple documents.

Proposed Method 2.1 Bigram Gain Maximization by ILP

We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work.

Experiments

3.1 Data

Experiment and Analysis

4.1 Experimental Results

Related Work

We briefly describe some prior work on summarization in this section.

Conclusion and Future Work

In this paper, we leverage the ILP method as a core component in our summarization system.

Topics

bigrams

Appears in 114 sentences as: Bigram (6) bigram (57) bigrams (76) bigram’s (5)
In Using Supervised Bigram-based ILP for Extractive Summarization
  1. In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework.
    Page 1, “Abstract”
  2. For each bigram , a regression model is used to estimate its frequency in the reference summary.
    Page 1, “Abstract”
  3. The regression model uses a variety of indicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary.
    Page 1, “Abstract”
  4. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains.
    Page 1, “Abstract”
  5. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
    Page 1, “Abstract”
  6. They used bigrams as such language concepts.
    Page 1, “Introduction”
  7. Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function.
    Page 1, “Introduction”
  8. In this paper, we propose to find a candidate summary such that the language concepts (e. g., bigrams ) in this candidate summary and the reference summary can have the same frequency.
    Page 1, “Introduction”
  9. To estimate the bigram frequency in the summary, we propose to use a supervised regression model that is discriminatively trained using a variety of features.
    Page 2, “Introduction”
  10. We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work.
    Page 2, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  11. In addition, we expect that the bigram oriented ILP is consistent with the ROUGE-2 measure widely used for summarization evaluation.
    Page 2, “Proposed Method 2.1 Bigram Gain Maximization by ILP”

See all papers in Proc. ACL 2013 that mention bigrams.

See all papers in Proc. ACL that mention bigrams.

Back to top.

ILP

Appears in 67 sentences as: ILP (73) ILP: (1) ILP’s (1)
In Using Supervised Bigram-based ILP for Extractive Summarization
  1. In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming ( ILP ) framework.
    Page 1, “Abstract”
  2. During testing, the sentence selection problem is formulated as an ILP problem to maximize the bigram gains.
    Page 1, “Abstract”
  3. We demonstrate that our system consistently outperforms the previous ILP method on different TAC data sets, and performs competitively compared to the best results in the TAC evaluations.
    Page 1, “Abstract”
  4. We also conducted various analysis to show the impact of bigram selection, weight estimation, and ILP setup.
    Page 1, “Abstract”
  5. Many methods have been developed for this problem, including supervised approaches that use classifiers to predict summary sentences, graph based approaches to rank the sentences, and recent global optimization methods such as integer linear programming ( ILP ) and submodular methods.
    Page 1, “Introduction”
  6. Gillick and Favre (Gillick and Favre, 2009) introduced the concept-based ILP for summariza-
    Page 1, “Introduction”
  7. This ILP method is formally represented as below (see (Gillick and Favre, 2009) for more details):
    Page 1, “Introduction”
  8. There are two important components in this concept-based ILP : one is how to select the concepts (Ci); the second is how to set up their weights (wi).
    Page 1, “Introduction”
  9. In addition, in the previous concept-based ILP method, the constraints are with respect to the appearance of language concepts, hence it cannot distinguish the importance of different language concepts in the reference summary.
    Page 2, “Introduction”
  10. Our method can decide not only which language concepts to use in ILP , but also the frequency of these language concepts in the candidate summary.
    Page 2, “Introduction”
  11. Our experiments on several TAC summarization data sets demonstrate this proposed method outperforms the previous ILP system and often the best performing TAC system.
    Page 2, “Introduction”

See all papers in Proc. ACL 2013 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

regression model

Appears in 16 sentences as: Regression Model (1) regression model (15)
In Using Supervised Bigram-based ILP for Extractive Summarization
  1. For each bigram, a regression model is used to estimate its frequency in the reference summary.
    Page 1, “Abstract”
  2. The regression model uses a variety of indicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary.
    Page 1, “Abstract”
  3. To estimate the bigram frequency in the summary, we propose to use a supervised regression model that is discriminatively trained using a variety of features.
    Page 2, “Introduction”
  4. 2.2 Regression Model for Bigram Frequency Estimation
    Page 2, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  5. We propose to use a regression model for this.
    Page 2, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  6. To train this regression model using the given reference abstractive summaries, rather than trying to minimize the squared error as typically done, we propose a new objective function.
    Page 3, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  7. Each bigram is represented using a set of features in the above regression model .
    Page 3, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  8. In our method, we first extract all the bigrams from the selected sentences and then estimate each bigram’s N We f using the regression model .
    Page 4, “Experiments”
  9. When training our bigram regression model , we use each of the 4 reference summaries separately, i.e., the bigram frequency is obtained from one reference summary.
    Page 4, “Experiments”
  10. We used the estimated value from the regression model ; the ICSI system just uses the bigram’s document frequency in the original text as weight.
    Page 4, “Experiment and Analysis”
  11. # bigrams used in our regression model 2140.7 (i.e., in selected sentences)
    Page 5, “Experiment and Analysis”

See all papers in Proc. ACL 2013 that mention regression model.

See all papers in Proc. ACL that mention regression model.

Back to top.

objective function

Appears in 6 sentences as: objective function (6)
In Using Supervised Bigram-based ILP for Extractive Summarization
  1. Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function .
    Page 1, “Introduction”
  2. where 0;, is an auxiliary variable we introduce that is equal to |nbflaef — :8 2(3) * 715,8 , and nbyef is a constant that can be dropped from the objective function .
    Page 2, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  3. To train this regression model using the given reference abstractive summaries, rather than trying to minimize the squared error as typically done, we propose a new objective function .
    Page 3, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  4. The objective function for training is thus to minimize the KL distance:
    Page 3, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  5. Finally, we replace Nb,ref in Formula (15) with Eq (14) and get the objective function below:
    Page 3, “Proposed Method 2.1 Bigram Gain Maximization by ILP”
  6. They used a modified objective function in order to consider whether the selected sentence is globally optimal.
    Page 8, “Related Work”

See all papers in Proc. ACL 2013 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.