Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
Almeida, Miguel and Martins, Andre

Article Structure

Abstract

We present a dual decomposition framework for multi-document summarization, using a model that jointly extracts and compresses sentences.

Introduction

Automatic text summarization is a seminal problem in information retrieval and natural language processing (Luhn, 1958; Baxendale, 1958; Ed-mundson, 1969).

Extractive Summarization

In extractive summarization, we are given a set of sentences D := {31, .

Compressive Summarization

We now turn to compressive summarization, which does not limit the summary sentences to be verbatim extracts from the original documents; in-

MultiTask Learning

We next turn to the problem of learning the model from training data.

Experiments

5.1 Experimental setup

Conclusions

We presented a multitask learning framework for compressive summarization, leveraging data for related tasks in a principled manner.

Topics

ILP

Appears in 11 sentences as: ILP (10) ILPs (1)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. All approaches above are based on integer linear programming ( ILP ), suffering from slow runtimes, when compared to extractive systems.
    Page 1, “Introduction”
  2. A second inconvenience of ILP-based approaches is that they do not exploit the modularity of the problem, since the declarative specification required by ILP solvers discards important structural information.
    Page 1, “Introduction”
  3. This can be converted into an ILP and addressed with off-the-shelf solvers (Gillick et al., 2008).
    Page 2, “Extractive Summarization”
  4. A drawback of this approach is that solving an ILP exactly is NP-hard.
    Page 2, “Extractive Summarization”
  5. 4 was converted into an ILP and fed to an off-the-she1f solver (Martins and Smith, 2009; Berg-Kirkpatrick et al., 2011; Woodsend and Lapata, 2012).
    Page 4, “Compressive Summarization”
  6. All these systems require ILP solvers.
    Page 7, “Experiments”
  7. We conducted another set of experiments to compare the runtime of our compressive summarizer based on AD3 with the runtimes achieved by GLPK, the ILP solver used by Berg-Kirkpatrick et al.
    Page 7, “Experiments”
  8. ROUGE-2 ILP Exact 10.394 12.40 LP-Relax.
    Page 8, “Experiments”
  9. 2.265 12.38 ADS-5000 0.952 12.38 AD3-1000 0.406 12.30 AD3-200 0.159 12.15 Extractive ( ILP ) 0.265 11.16
    Page 8, “Experiments”
  10. erations of AD3 in {200, 1000, 5000}, and clocked the time spent by GLPK to solve the exact ILPs and their relaxations.
    Page 8, “Experiments”
  11. We see that our proposed configuration (AD3-1000) is orders of magnitude faster than the ILP solver, and 5 times faster than its relaxed variant, while keeping similar accuracy levels.11 The gain when the number of iterations in AD3 is increased to 5000 is small, given that the runtime is more
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention ILP.

See all papers in Proc. ACL that mention ILP.

Back to top.

sentence compression

Appears in 10 sentences as: sentence compression (9) sentence compressor (1)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. In addition, we propose a multitask learning framework to take advantage of existing data for extractive summarization and sentence compression .
    Page 1, “Abstract”
  2. For example, such solvers are unable to take advantage of efficient dynamic programming routines for sentence compression (McDonald, 2006).
    Page 1, “Introduction”
  3. 0 We propose multitask learning (§4) as a principled way to train compressive summarizers, using auxiliary data for extractive summarization and sentence compression .
    Page 2, “Introduction”
  4. However, extending these models to allow for sentence compression (as will be detailed in §3) breaks the diminishing returns property, making submodular optimization no longer applicable.
    Page 2, “Extractive Summarization”
  5. similar manner as described in §2, but with an additional component for the sentence compressor , and slight modifications in the other components.
    Page 4, “Compressive Summarization”
  6. In addition, we included hard constraints to prevent the deletion of certain arcs, following previous work in sentence compression (Clarke and Lapata, 2008).
    Page 5, “Compressive Summarization”
  7. The goal is to take advantage of existing data for related tasks, such as extractive summarization (task #2), and sentence compression (task #3).
    Page 5, “MultiTask Learning”
  8. 0 For the sentence compression task, the parts correspond to arc-deletion features only.
    Page 6, “MultiTask Learning”
  9. (2011), but we augmented the training data with extractive summarization and sentence compression datasets, to help train the
    Page 7, “Experiments”
  10. For sentence compression , we adapted the Simple English Wikipedia dataset of Woodsend and Lapata (2011), containing aligned sentences for 15,000 articles from the English and Simple English Wikipedias.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention sentence compression.

See all papers in Proc. ACL that mention sentence compression.

Back to top.

extractive systems

Appears in 8 sentences as: extractive system (2) extractive systems (6)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. Up to now, extractive systems have been the most popular in multi-document summarization.
    Page 1, “Introduction”
  2. However, extractive systems are rather limited in the summaries they can produce.
    Page 1, “Introduction”
  3. All approaches above are based on integer linear programming (ILP), suffering from slow runtimes, when compared to extractive systems .
    Page 1, “Introduction”
  4. Experiments on TAC data (§5) yield state-of-the-art results, with runtimes similar to that of extractive systems .
    Page 2, “Introduction”
  5. The bottom rows show the results achieved by our implementation of a pure extractive system (similar to the learned extractive summarizer of Berg-Kirkpatrick et al., 2011); a system that post-combines extraction and compression components trained separately, as in Martins and Smith (2009); and our compressive summarizer trained as a single task, and in the multitask setting.
    Page 7, “Experiments”
  6. The ROUGE and Pyramid scores show that the compressive summarizers (when properly trained) yield considerable benefits in content coverage over extractive systems , confirming the results of Berg-Kirkpatrick et al.
    Page 7, “Experiments”
  7. Our ROUGE-2 score (12.30%) is, to our knowledge, the highest reported on the TAG-2008 dataset, with little harm in grammaticality with respect to an extractive system that preserves the original sentences.
    Page 7, “Experiments”
  8. Results show that the state of the art is improved in automatic and manual metrics, with speeds close to extractive systems .
    Page 8, “Conclusions”

See all papers in Proc. ACL 2013 that mention extractive systems.

See all papers in Proc. ACL that mention extractive systems.

Back to top.

dynamic programming

Appears in 5 sentences as: dynamic programming (5)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. For example, such solvers are unable to take advantage of efficient dynamic programming routines for sentence compression (McDonald, 2006).
    Page 1, “Introduction”
  2. 8 efficiently with dynamic programming (using the Viterbi algorithm for trees); the total cost is linear in Ln.
    Page 4, “Compressive Summarization”
  3. 8; this can be done in 0(Ln) time with dynamic programming , as discussed in §3.2.
    Page 4, “Compressive Summarization”
  4. This can be computed exactly in time 0(B 25:1 Lnk ) , through dynamic programming .
    Page 5, “Compressive Summarization”
  5. 9 can be used for the maximization above: for tasks #1—#2, we solve a relaxation by running AD3 without rounding, and for task #3 we use dynamic programming ; see Table 1.
    Page 6, “MultiTask Learning”

See all papers in Proc. ACL 2013 that mention dynamic programming.

See all papers in Proc. ACL that mention dynamic programming.

Back to top.

optimization problem

Appears in 4 sentences as: optimization problem (2) optimization problem: (1) optimization problems (1)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. els (maximizing relevance, and penalizing redundancy) lead to submodular optimization problems (Lin and Bilmes, 2010), which are NP-hard but ap-proximable through greedy algorithms; learning is possible with standard structured prediction algorithms (Sipos et al., 2012; Lin and Bilmes, 2012).
    Page 1, “Introduction”
  2. By designing a quality score function g : {0, 1}N —> R, this can be cast as a global optimization problem with a knapsack constraint:
    Page 2, “Extractive Summarization”
  3. 1, one obtains the following Boolean optimization problem:
    Page 2, “Extractive Summarization”
  4. In previous work, the optimization problem in Eq.
    Page 4, “Compressive Summarization”

See all papers in Proc. ACL 2013 that mention optimization problem.

See all papers in Proc. ACL that mention optimization problem.

Back to top.

score function

Appears in 4 sentences as: score function (4) score functions (2)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. By designing a quality score function g : {0, 1}N —> R, this can be cast as a global optimization problem with a knapsack constraint:
    Page 2, “Extractive Summarization”
  2. Then, the following quality score function is defined:
    Page 2, “Extractive Summarization”
  3. Here, we follow the latter work, by combining a coverage score function g with sentence-level compression score functions hl, .
    Page 3, “Compressive Summarization”
  4. For the compression score function, we follow Martins and Smith (2009) and decompose it as a sum of local score functions pmg defined on dependency arcs:
    Page 4, “Compressive Summarization”

See all papers in Proc. ACL 2013 that mention score function.

See all papers in Proc. ACL that mention score function.

Back to top.

bigram

Appears in 3 sentences as: bigram (2) bigrams (2)
In Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
  1. 1Previous work has modeled concepts as events (Filatova and Hatzivassiloglou, 2004), salient words (Lin and Bilmes, 2010), and word bigrams (Gillick etal., 2008).
    Page 2, “Extractive Summarization”
  2. (2011), we used stemmed word bigrams as concepts, to which we associate the following concept features ((DCOV): indicators for document counts, features indicating if each of the words in the bigram is a stop-word, the earliest position in a document each concept occurs, as well as two and three-way conjunctions of these features.
    Page 5, “Compressive Summarization”
  3. We generated oracle extracts by maximizing bigram recall with respect to the manual abstracts, as described in Berg-Kirkpatrick et al.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.