Summarization Through Submodularity and Dispersion
Dasgupta, Anirban and Kumar, Ravi and Ravi, Sujith

Article Structure

Abstract

We propose a new optimization framework for summarization by generalizing the submodular framework of (Lin and Bilmes, 2011).

Introduction

Summarization is a classic text processing problem.

Related Work

Automatic summarization is a well-studied problem in the literature.

Framework

In this section we present the summarization framework.

Using the Framework

Next, we describe how the framework described in Section 3 can be applied to our tasks of interest, i.e., summarizing documents or user-generated content (in our case, comments).

Experiments

5.1 Data

Conclusions

We introduced a new general-purpose graph-based summarization framework that combines a submodular coverage function with a non-submodular dispersion function.

Topics

objective function

Appears in 16 sentences as: Objective function (2) objective function (14)
In Summarization Through Submodularity and Dispersion
  1. We start by describing a generic objective function that can be widely applied to several summarization scenarios.
    Page 2, “Framework”
  2. This objective function is the sum of a monotone submodular coverage function and a non-submodular dispersion function.
    Page 2, “Framework”
  3. We then describe a simple greedy algorithm for optimizing this objective function with provable approximation guarantees for three natural dispersion functions.
    Page 2, “Framework”
  4. generate a graph and instantiate our summarization objective function with specific components that capture the desiderata of a given summarization task.
    Page 5, “Using the Framework”
  5. We model this property in our objective function as follows.
    Page 5, “Using the Framework”
  6. We then add this component to our objective function as w(S) = Zues Mu)-
    Page 5, “Using the Framework”
  7. The corresponding contribution to the objective function is 2363 IS 0 B|1/2.
    Page 5, “Using the Framework”
  8. The corresponding contribution to the objective function is ZDep IS 0 D | 1/2.
    Page 5, “Using the Framework”
  9. of our system that approximates the submodular objective function proposed by (Lin and Bilmes, 2011).7 As shown in the results, our best system8 which uses the hs dispersion function achieves a better ROUGE-1 F-score than all other systems.
    Page 7, “Experiments”
  10. (3) We also analyze the contributions of individual components of the new objective function towards summarization performance by selectively setting certain parameters to 0.
    Page 7, “Experiments”
  11. However, since individual components within our objective function are parametrized it is easy to tune them for a specific task or genre.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention objective function.

See all papers in Proc. ACL that mention objective function.

Back to top.

news article

Appears in 7 sentences as: news article (5) news articles (2)
In Summarization Through Submodularity and Dispersion
  1. On the other hand, in the case of user-generated content (say, comments on a news article ), even though the text is short, one is faced with a different set of problems: volume (popular articles generate more than 10,000 comments), noise (most comments are vacuous, linguistically deficient, and tangential to the article), and redundancy (similar views are expressed by multiple commenters).
    Page 1, “Introduction”
  2. We then conduct experiments on two corpora: the DUC 2004 corpus and a corpus of user comments on news articles .
    Page 2, “Introduction”
  3. Depending on the summarization application, 0 can refer to the set of documents (e.g., newswire) related to a particular topic as in standard summarization; in other scenarios (e. g., user-generated content), it is a collection of comments associated with a news article or a blog post, etc.
    Page 2, “Framework”
  4. We extracted a set of news articles and corresponding user comments from Yahoo!
    Page 6, “Experiments”
  5. We then run our summarization algorithm on the instantiated graph to produce a summary for each news article .
    Page 6, “Experiments”
  6. In addition, each news article and corresponding set of comments were presented to three human annotators.
    Page 6, “Experiments”
  7. They were asked to select a subset of comments (at most 20 comments) that best represented a summary capturing the most popular as well as diverse set of views and opinions expressed by different users that are relevant to the given news article .
    Page 6, “Experiments”

See all papers in Proc. ACL 2013 that mention news article.

See all papers in Proc. ACL that mention news article.

Back to top.

dependency relations

Appears in 6 sentences as: dependency relation (2) dependency relations (4)
In Summarization Through Submodularity and Dispersion
  1. We first use a dependency parser (de Mameffe et al., 2006) to parse each sentence and extract the set of dependency relations associated with the sentence.
    Page 5, “Using the Framework”
  2. For example, the sentence “I adore tennis” is represented by the dependency relations (nsubj: adore, I) and (dobj: adore, tennis).
    Page 5, “Using the Framework”
  3. Each sentence represents a single node u in the graph (unless otherwise specified) and is comprised of a set of dependency relations (or ngrams) present in the sentence.
    Page 5, “Using the Framework”
  4. For each pair of nodes (u,v) in the graph, we compute the semantic similarity score (using WordNet) between every pair of dependency relation (rel: a, b) in u and v as: s(u,v) = Z WN(a,-,aj) >< WN(b,-,bj),
    Page 5, “Using the Framework”
  5. where rel is a relation type (e.g., nsubj) and a, b are the two arguments present in the dependency relation (1) does not exist for some relations).
    Page 5, “Using the Framework”
  6. For each node u, we define as the number of documents |Curel Q 0| from the collection such that at least one of the dependency relations rel E u appeared in a sentence within some document c E Curd.
    Page 5, “Using the Framework”

See all papers in Proc. ACL 2013 that mention dependency relations.

See all papers in Proc. ACL that mention dependency relations.

Back to top.

WordNet

Appears in 6 sentences as: WordNet (6)
In Summarization Through Submodularity and Dispersion
  1. For each pair of nodes (u,v) in the graph, we compute the semantic similarity score (using WordNet ) between every pair of dependency relation (rel: a, b) in u and v as: s(u,v) = Z WN(a,-,aj) >< WN(b,-,bj),
    Page 5, “Using the Framework”
  2. WN(w,—, wj) is defined as the WordNet similarity score between words 212,- and to]?
    Page 5, “Using the Framework”
  3. 2There exists various semantic relatedness measures based on WordNet (Patwardhan and Pedersen, 2006).
    Page 5, “Using the Framework”
  4. In our experiments, for WN we pick one that is based on the path length between the two words in the WordNet graph.
    Page 5, “Using the Framework”
  5. This allows us to perform approximate matching of syntactic treelets obtained from the dependency parses using semantic ( WordNet ) similarity.
    Page 5, “Using the Framework”
  6. from dependency parse tree) along with computing similarity in semantic spaces (using WordNet ) clearly produces an improvement in the summarization quality (+1.4 improvement in ROUGE-l F-score).
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

spanning tree

Appears in 5 sentences as: spanning tree (7)
In Summarization Through Submodularity and Dispersion
  1. We consider three natural dispersion functions on the sentences in a summary: sum of all-pair sentence dissimilarities, the weight of the minimum spanning tree on the sentences, and the minimum of all-pair sentence dissimilarities.
    Page 2, “Introduction”
  2. For two sets 8 and T, let P be the set of un-ordered pairs {u, v} where u E S and v E T. Our focus is on the following dispersion functions: the sum measure hs(S, T) = vakp d(u,v), the spanning tree measure ht(S, T) given by the cost of the minimum spanning tree of the set S UT, and the min measure hm(S, T) = minmvkp d(u, 2)).
    Page 3, “Framework”
  3. To prove (ii) let T be the tree obtained by adding all points of O \ S directly to their respective closest points on the minimum spanning tree of S. T is a spanning tree , and hence a Steiner tree, for the points in set S U 0.
    Page 3, “Framework”
  4. The proof follows by noting that we get a spanning tree by connecting each u,- to its closest point in Si_1.
    Page 3, “Framework”
  5. The cost of this spanning tree is 2233],; d(uj, Sj_1) and this tree is also the result of the greedy algorithm run in an online fashion on the input sequence {uh .
    Page 3, “Framework”

See all papers in Proc. ACL 2013 that mention spanning tree.

See all papers in Proc. ACL that mention spanning tree.

Back to top.

dependency parse

Appears in 4 sentences as: dependency parse (2) dependency parser (1) dependency parses (1)
In Summarization Through Submodularity and Dispersion
  1. Instead, we model sentences using a structured representation, i.e., its syntax structure using dependency parse trees.
    Page 5, “Using the Framework”
  2. We first use a dependency parser (de Mameffe et al., 2006) to parse each sentence and extract the set of dependency relations associated with the sentence.
    Page 5, “Using the Framework”
  3. This allows us to perform approximate matching of syntactic treelets obtained from the dependency parses using semantic (WordNet) similarity.
    Page 5, “Using the Framework”
  4. from dependency parse tree) along with computing similarity in semantic spaces (using WordNet) clearly produces an improvement in the summarization quality (+1.4 improvement in ROUGE-l F-score).
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention dependency parse.

See all papers in Proc. ACL that mention dependency parse.

Back to top.

F-score

Appears in 4 sentences as: F-score (4)
In Summarization Through Submodularity and Dispersion
  1. of our system that approximates the submodular objective function proposed by (Lin and Bilmes, 2011).7 As shown in the results, our best system8 which uses the hs dispersion function achieves a better ROUGE-1 F-score than all other systems.
    Page 7, “Experiments”
  2. (4) To understand the effect of utilizing syntactic structure and semantic similarity for constructing the summarization graph, we ran the experiments using just the unigrams and bigrams; we obtained a ROUGE-1 F-score of 37.1.
    Page 7, “Experiments”
  3. 7Note that Lin & Bilmes (2011) report a slightly higher ROUGE-1 score ( F-score 38.90) on DUC 2004.
    Page 7, “Experiments”
  4. from dependency parse tree) along with computing similarity in semantic spaces (using WordNet) clearly produces an improvement in the summarization quality (+1.4 improvement in ROUGE-l F-score ).
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

semantic similarity

Appears in 4 sentences as: semantic similarity (4)
In Summarization Through Submodularity and Dispersion
  1. We identify similar views/opinions by computing semantic similarity rather than using standard similarity measures (such as cosine similarity based on exact lexical matches between different nodes in the graph).
    Page 5, “Using the Framework”
  2. For each pair of nodes (u,v) in the graph, we compute the semantic similarity score (using WordNet) between every pair of dependency relation (rel: a, b) in u and v as: s(u,v) = Z WN(a,-,aj) >< WN(b,-,bj),
    Page 5, “Using the Framework”
  3. Using the syntactic structure along with semantic similarity helps us identify useful (valid) nuggets of information within comments (or documents), avoid redundancies, and identify similar views in a semantic space.
    Page 5, “Using the Framework”
  4. (4) To understand the effect of utilizing syntactic structure and semantic similarity for constructing the summarization graph, we ran the experiments using just the unigrams and bigrams; we obtained a ROUGE-1 F-score of 37.1.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention semantic similarity.

See all papers in Proc. ACL that mention semantic similarity.

Back to top.

edge weights

Appears in 3 sentences as: edge weights (3)
In Summarization Through Submodularity and Dispersion
  1. Furthermore, the edge weights 3(u, 2)) represent pairwise similarity between sentences or comments (e.g., similarity between views expressed in different comments).
    Page 5, “Using the Framework”
  2. The edge weights are then used to define the inter-sentence distance metric d(u, v) for the different dispersion functions.
    Page 5, “Using the Framework”
  3. The edge weights are then normalized across all edges in the
    Page 5, “Using the Framework”

See all papers in Proc. ACL 2013 that mention edge weights.

See all papers in Proc. ACL that mention edge weights.

Back to top.

graph-based

Appears in 3 sentences as: Graph-based (1) graph-based (2)
In Summarization Through Submodularity and Dispersion
  1. We propose a very general graph-based summarization framework that combines a submodular function with a non-submodular dispersion function.
    Page 2, “Introduction”
  2. Graph-based methods have been used for summarization (Ganesan et al., 2010), but in a different context—using paths in graphs to produce very short abstractive summaries.
    Page 2, “Related Work”
  3. We introduced a new general-purpose graph-based summarization framework that combines a submodular coverage function with a non-submodular dispersion function.
    Page 8, “Conclusions”

See all papers in Proc. ACL 2013 that mention graph-based.

See all papers in Proc. ACL that mention graph-based.

Back to top.

similarity score

Appears in 3 sentences as: similarity score (3)
In Summarization Through Submodularity and Dispersion
  1. For each pair of nodes (u,v) in the graph, we compute the semantic similarity score (using WordNet) between every pair of dependency relation (rel: a, b) in u and v as: s(u,v) = Z WN(a,-,aj) >< WN(b,-,bj),
    Page 5, “Using the Framework”
  2. WN(w,—, wj) is defined as the WordNet similarity score between words 212,- and to]?
    Page 5, “Using the Framework”
  3. For example, the sentences “I adore tennis” and “Everyone likes tennis” convey the same view and should be assigned a higher similarity score as opposed to “I hate tennis”.
    Page 5, “Using the Framework”

See all papers in Proc. ACL 2013 that mention similarity score.

See all papers in Proc. ACL that mention similarity score.

Back to top.