SciSurf: Index of 'Query-Chain Focused Summarization'

Query-Chain Focused Summarization

Baumel, Tal and Cohen, Raphael and Elhadad, Michael

Published in Proc. ACL, 2014

Article Structure

Abstract

Update summarization is a form of multi—document summarization where a document set must be summarized in the context of other documents assumed to be known.

Introduction

In the past 10 years, the general objective of text summarization has been refined into more specific tasks.

Query- Chain Summarization

In this work, we focus on the zoom in aspect of the exploratory search process described above.

Previous Work

We first review the closely related tasks of Update Summarization and Query-Focused Summarization.

Dataset Collection

We now describe how we have constructed a dataset to evaluate QCFS algorithms, which we are publishing freely.

Algorithms

In this section, we first explain how we adapted the previously mentioned methods to the QCFS task, thus producing 3 strong baselines.

Evaluation

6.1 Evaluation Dataset

Conclusions

We presented a new summarization task tailored for the needs of exploratory search system.

Topics

topic model

Appears in 12 sentences as: Topic Model (2) Topic model (1) topic model (8) topic models (1)

In Query-Chain Focused Summarization

We present an algorithm for Query—Chain Summarization based on a new LDA topic model variant.
Page 1, “Abstract”
We introduce a new algorithm to address the task of Query-Chain Focused Summarization, based on a new LDA topic model variant, and present an evaluation which demonstrates it improves on these baselines.
Page 2, “Introduction”
As evidenced since (Daume and Marcu, 2006), Bayesian techniques have proven effective at this task: we construct a latent topic model on the basis of the document set and the query.
Page 3, “Previous Work”
This topic model effectively serves as a query expansion mechanism, which helps assess the relevance of individual sentences to the original que-
Page 3, “Previous Work”
In recent years, three major techniques have emerged to perform multi-document summarization: graph-based methods such as LexRank (Erkan and Radev, 2004) for multi document summarization and Biased-LexRank (Otterbacher et al., 2008) for query focused summarization, language model methods such as KLSum (Haghighi and Vanderwende, 2009) and variants of KLSum based on topic models such as BayesSum (Daume and Marcu, 2006) and TopicSum (Haghighi and Vanderwende, 2009).
Page 3, “Previous Work”
TopicSum uses an LDA-like topic model (Blei et al.
Page 3, “Previous Work”
We developed a novel Topic Model to identify words that are associated to the current query and not shared with the previous queries.
Page 7, “Algorithms”
Figure 3 Plate Model for Our Topic Model
Page 7, “Algorithms”
We implemented inference over this topic model using Gibbs Sampling (we distribute the code of the sampler together with our dataset).
Page 7, “Algorithms”
After the topic model is applied to the current query, we apply KLSum only on words that are assigned to the new content topic.
Page 7, “Algorithms”
When running this topic model on our dataset, we observe: Dc mean size was 978 words and 375 unique words.
Page 7, “Algorithms”

See all papers in Proc. ACL 2014 that mention topic model.

See all papers in Proc. ACL that mention topic model.

unigram

Appears in 6 sentences as: Unigram (1) unigram (6)

In Query-Chain Focused Summarization

KLSum adopts a language model approach to compute relevance: the documents in the input set are modeled as a distribution over words (the original algorithm uses a unigram distribution over the bag of words in documents D).
Page 3, “Previous Work”
KLSum is a sentence extraction algorithm: it searches for a subset of the sentences in D with a unigram distribution as similar as possible to that of the overall collection D, but with a limited length.
Page 3, “Previous Work”
After the words are classified, the algorithm uses a KLSum variant to find the summary that best matches the unigram distribution of topic specific words.
Page 3, “Previous Work”
When constructing a summary, we update the unigram distribution of the constructed summary so that it includes a smoothed distribution of the previous summaries in order to eliminate redundancy between the successive steps in the chain.
Page 6, “Algorithms”
For example, when we summarize the documents that were retrieved as a result to the first query, we calculate the unigram distribution in the same manner as we did in Focused KLSum; but for the second query, we calculate the unigram distribution as if all the sentences we selected for the previous summary were selected for the current query too, with a damping factor.
Page 6, “Algorithms”
In this variant, the Unigram Distribution estimate of word X is computed as:
Page 6, “Algorithms”

See all papers in Proc. ACL 2014 that mention unigram.

See all papers in Proc. ACL that mention unigram.

PageRank

Appears in 4 sentences as: PageRank (4)

In Query-Chain Focused Summarization

After the graph is generated, the PageRank algorithm (Page et al., 1999) is used to determine the most central linguistic units in the graph.
Page 3, “Previous Work”
PageRank spreads the query similarity of a vertex to its close neighbors, so that we rank higher sentences that are similar to other sentences which are similar to the query.
Page 3, “Previous Work”
After creating the graph, PageRank is run to rank sentences.
Page 8, “Algorithms”
Finally, instead of PageRank , we used SimRank (Haveliwala, 2002) to identify the nodes most similar to the query node and not only the central sentences in the graph.
Page 8, “Algorithms”

See all papers in Proc. ACL 2014 that mention PageRank.

See all papers in Proc. ACL that mention PageRank.