Automatic Generation of Story Highlights
Woodsend, Kristian and Lapata, Mirella

Article Structure

Abstract

In this paper we present a joint content selection and compression model for single-document summarization.

Introduction

Summarization is the process of condensing a source text into a shorter version while preserving its information content.

Related work

Much effort in automatic summarization has been devoted to sentence extraction which is often formalized as a classification task (Kupiec et al., 1995).

The Task

Given a document, we aim to produce three or four short sentences covering its main topics, much like the “Story Highlights” accompanying the (online) CNN news articles.

Modeling

The objective of our model is to create the most informative story highlights possible, subject to constraints relating to sentence length, overall summary length, topic coverage, and grammaticality.

Experimental Setup

Training We obtained phrase-based salience scores using a supervised machine learning algorithm.

Results

We report results on the highlight generation task in Figure 3 with ROUGE-1 and ROUGE-L (error bars indicate the 95% confidence interval).

Conclusions

In this paper we proposed a joint content selection and compression model for single-document summarization.

Topics

ILP

Appears in 33 sentences as: ILP (39)
In Automatic Generation of Story Highlights
  1. We encode these constraints through the use of integer linear programming ( ILP ), a well-studied optimization framework that is able to search the entire solution space efficiently.
    Page 2, “Introduction”
  2. Martins and Smith (2009) formulate a joint sentence extraction and summarization model as an ILP .
    Page 2, “Related work”
  3. Headline generation models typically extract individual words from a document to produce a very short summary, whereas we extract phrases and ensure that they are combined into grammatical sentences through our ILP constraints.
    Page 3, “Related work”
  4. Our approach therefore uses an ILP formulation which will provide a globally optimal solution, and which can be efficiently solved using standard optimization tools.
    Page 4, “Modeling”
  5. These edges are important to our formulation, as they will be represented by binary decision variables in the ILP .
    Page 4, “Modeling”
  6. ILP model The merged phrase structure tree, such as shown in Figure 2(b), is the actual input to our model.
    Page 5, “Modeling”
  7. Constraint (lg) tells the ILP to create a highlight if one of its constituent phrases is chosen.
    Page 6, “Modeling”
  8. solved an ILP for each document.
    Page 7, “Experimental Setup”
  9. The ILP model (see Equation (1)) was parametrized as follows: the maximum number of highlights NS was 4, the overall limit on length LT was 75 tokens, the length of each highlight was in the range of [8, 28] tokens, and the topic coverage set ‘T contained the top 5 tf.idf words.
    Page 7, “Experimental Setup”
  10. These parameters were chosen to capture the properties seen in the majority of the training set; they were also relaxed enough to allow a feasible solution of the ILP model (with hard constraints) for all the documents in the test set.
    Page 7, “Experimental Setup”
  11. To solve the ILP model we used the ZIB Optimization Suite software (Achterberg, 2007; Koch, 2004; Wunderling, 1996).
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

sentence compression

Appears in 8 sentences as: Sentence compression (1) sentence compression (7)
In Automatic Generation of Story Highlights
  1. Sentence compression is often regarded as a promising first step towards ameliorating some of the problems associated with extractive summarization.
    Page 1, “Introduction”
  2. Interfacing extractive summarization with a sentence compression module could improve the conciseness of the generated summaries and render them more informative (Jing, 2000; Lin, 2003; Zajic et al., 2007).
    Page 1, “Introduction”
  3. Despite the bulk of work on sentence compression and summarization (see Clarke and Lapata 2008 and Mani 2001 for overviews) only a handful of approaches attempt to do both in a joint model (Daume III and Marcu, 2002; Daume III, 2006; Lin, 2003; Martins and Smith, 2009).
    Page 1, “Introduction”
  4. One reason for this might be the performance of sentence compression systems which falls short of attaining grammaticality levels of human output.
    Page 1, “Introduction”
  5. A few previous approaches have attempted to interface sentence compression with summarization.
    Page 2, “Related work”
  6. The latter optimizes an objective function consisting of two parts: an extraction component, essentially a non-greedy variant of maximal marginal relevance (McDonald, 2007), and a sentence compression component, a more compact reformulation of Clarke and Lapata (2008) based on the output of a dependency parser.
    Page 2, “Related work”
  7. There are no sentence length or grammaticality constraints, as there is no sentence compression .
    Page 7, “Experimental Setup”
  8. Furthermore, as a standalone sentence compression system it yields state of the art performance, comparable to McDonald’s (2006) discriminative model and superior to Hedge Trimmer (Zajic et al., 2007), a less sophisticated deterministic system.
    Page 9, “Results”

See all papers in Proc. ACL 2010 that mention sentence compression.

See all papers in Proc. ACL that mention sentence compression.

Back to top.

phrase-based

Appears in 7 sentences as: phrase-based (7)
In Automatic Generation of Story Highlights
  1. The model operates over a phrase-based representation of the source document which we obtain by merging information from PCFG parse trees and dependency graphs.
    Page 1, “Abstract”
  2. Training We obtained phrase-based salience scores using a supervised machine learning algorithm.
    Page 6, “Experimental Setup”
  3. The SVM was trained with the same features used to obtain phrase-based salience scores, but with sentence-level labels (labels (1) and (2) positive, (3) negative).
    Page 7, “Experimental Setup”
  4. Figure 3: ROUGE-l and ROUGE-L results for phrase-based ILP model and two baselines, with error bars showing 95% confidence levels.
    Page 8, “Experimental Setup”
  5. F-score is higher for the phrase-based system but not significantly.
    Page 8, “Results”
  6. The highlights created by the sentence ILP were considered significantly more verbose (0c < 0.05) than those created by the phrase-based system and the CNN abstractors.
    Page 8, “Results”
  7. Table 5 shows the output of the phrase-based system for the documents in Table 1.
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention phrase-based.

See all papers in Proc. ACL that mention phrase-based.

Back to top.

news articles

Appears in 6 sentences as: news article (1) news articles (5)
In Automatic Generation of Story Highlights
  1. If our goal is to summarize news articles , then we may be better off selecting the first n sentences of the document.
    Page 1, “Introduction”
  2. Examples of CNN news articles with human-authored highlights are shown in Table 1.
    Page 2, “Introduction”
  3. Given a document, we aim to produce three or four short sentences covering its main topics, much like the “Story Highlights” accompanying the (online) CNN news articles .
    Page 3, “The Task”
  4. The majority were news articles , but the set also contained a mixture of editorials, commentary, interviews and reviews.
    Page 4, “The Task”
  5. Highlights on a small screen deVice would presumably be shorter than highlights for news articles on the web.
    Page 5, “Modeling”
  6. Participants were presented with a news article and its corresponding highlights and were asked to rate the latter along three dimensions: informativeness (do the highlights represent the article’s main topics?
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention news articles.

See all papers in Proc. ACL that mention news articles.

Back to top.

F-score

Appears in 5 sentences as: (2) F-score (4)
In Automatic Generation of Story Highlights
  1. g 03 — Gj _ 0.25 — _ EX" _Q_ Q. G {x {T __ 0.2 - X _ EX 0.15 - _ Recall Precision F-score Recall Precision F-score Rouge-1 Rouge-L
    Page 8, “Experimental Setup”
  2. F-score is higher for the phrase-based system but not significantly.
    Page 8, “Results”
  3. The sentence ILP model outperforms the lead baseline with respect to recall but not precision or F-score .
    Page 8, “Results”
  4. The phrase ILP achieves a significantly better F-score over the lead baseline with both ROUGE-l and ROUGE-L.
    Page 8, “Results”
  5. The phrase ILP model achieves a significantly better F-score (for both ROUGE-1 and ROUGE-2) over the lead baseline, the sentence ILP model, and Martins and Smith.
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

SVM

Appears in 4 sentences as: SVM (4)
In Automatic Generation of Story Highlights
  1. We learned the feature weights with a linear SVM , using the software SVM-OOPS (Woodsend and Gondzio, 2009).
    Page 6, “Experimental Setup”
  2. For each phrase, features were extracted and salience scores calculated from the feature weights determined through SVM training.
    Page 7, “Experimental Setup”
  3. The distance from the SVM hyperplane represents the salience score.
    Page 7, “Experimental Setup”
  4. The SVM was trained with the same features used to obtain phrase-based salience scores, but with sentence-level labels (labels (1) and (2) positive, (3) negative).
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention SVM.

See all papers in Proc. ACL that mention SVM.

Back to top.

feature weights

Appears in 3 sentences as: feature weights (3)
In Automatic Generation of Story Highlights
  1. We learned the feature weights with a linear SVM, using the software SVM-OOPS (Woodsend and Gondzio, 2009).
    Page 6, “Experimental Setup”
  2. This tool gave us directly the feature weights as well as support vector values, and it allowed different penalties to be applied to positive and negative misclassifications, enabling us to compensate for the unbalanced data set.
    Page 6, “Experimental Setup”
  3. For each phrase, features were extracted and salience scores calculated from the feature weights determined through SVM training.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention feature weights.

See all papers in Proc. ACL that mention feature weights.

Back to top.

learning algorithm

Appears in 3 sentences as: learning algorithm (3)
In Automatic Generation of Story Highlights
  1. We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details).
    Page 5, “Modeling”
  2. Let fi denote the salience score for phrase i, determined by the machine learning algorithm , and li is its length in tokens.
    Page 5, “Modeling”
  3. Training We obtained phrase-based salience scores using a supervised machine learning algorithm .
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.

linear programming

Appears in 3 sentences as: linear program (1) linear programming (2)
In Automatic Generation of Story Highlights
  1. Using an integer linear programming formulation, the model learns to select and combine phrases subject to length, coverage and grammar constraints.
    Page 1, “Abstract”
  2. We encode these constraints through the use of integer linear programming (ILP), a well-studied optimization framework that is able to search the entire solution space efficiently.
    Page 2, “Introduction”
  3. Grammaticality, length and coverage requirements are encoded as constraints in an integer linear program .
    Page 9, “Conclusions”

See all papers in Proc. ACL 2010 that mention linear programming.

See all papers in Proc. ACL that mention linear programming.

Back to top.

machine learning

Appears in 3 sentences as: machine learning (3)
In Automatic Generation of Story Highlights
  1. We obtain these scores from the output of a supervised machine learning algorithm that predicts for each phrase whether it should be included in the highlights or not (see Section 5 for details).
    Page 5, “Modeling”
  2. Let fi denote the salience score for phrase i, determined by the machine learning algorithm, and li is its length in tokens.
    Page 5, “Modeling”
  3. Training We obtained phrase-based salience scores using a supervised machine learning algorithm.
    Page 6, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention machine learning.

See all papers in Proc. ACL that mention machine learning.

Back to top.

unigram

Appears in 3 sentences as: unigram (3)
In Automatic Generation of Story Highlights
  1. The mapping of sentence labels to phrase labels was unsupervised: if the phrase came from a sentence labeled (1), and there was a unigram overlap (excluding stop words) between the phrase and any of the original highlights, we marked this phrase with a positive label.
    Page 6, “Experimental Setup”
  2. Our feature set comprised surface features such as sentence and paragraph position information, POS tags, unigram and bigram overlap with the title, and whether high-scoring tf.idf words were present in the phrase (66 features in total).
    Page 6, “Experimental Setup”
  3. We report unigram overlap (ROUGE-l) as a means of assessing informativeness and the longest common subsequence (ROUGE-L) as a means of assessing fluency.
    Page 7, “Experimental Setup”

See all papers in Proc. ACL 2010 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.