A Hybrid Hierarchical Model for Multi-Document Summarization

Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization.

Extractive approach to multi-document summarization (MDS) produces a summary by selecting sentences from original documents.

There are many studies on the principles governing multi-document summarization to produce coherent and semantically relevant summaries.

Our MDS system, hybrid hierarchical summarizer, HybHSum, is based on an hybrid learning approach to extract sentences for generating summary.

The sumHLDA constructs a hierarchical tree structure of candidate sentences (per document cluster) by positioning summary sentences on the tree.

Each candidate sentence 0m, m = l..|O| is represented with a multidimensional vector of q features fm = {fmh ..., fmq}.

In this section we describe a number of experiments using our hybrid model on 100 document clusters each containing 25 news articles from DUC2005-2006 tasks.

In this paper, we presented a hybrid model for multi-document summarization.

Appears in 12 sentences as: regression model (12)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference.Page 1, “Abstract”
- Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary.Page 1, “Abstract”
- In this paper, we present a novel approach that formulates MDS as a prediction problem based on a two-step hybrid model: a generative model for hierarchical topic discovery and a regression model for inference.Page 1, “Introduction”
- We construct a hybrid learning algorithm by extracting salient features to characterize summary sentences, and implement a regression model for inference (Fig.3).Page 1, “Introduction”
- Our aim is to find features that can best represent summary sentences as described in § 5, — implementation of a feasible inference method based on a regression model to enable scoring of sentences in test document clusters without retraining, (which has not been investigated in generative summarization models) described in § 5.2.Page 2, “Introduction”
- Our approach differs from the early work, in that, we combine a generative hierarchical model and regression model to score sentences in new documents, eliminating the need for building a generative model for new document clusters.Page 2, “Background and Motivation”
- We build a regression model using sentence scores as output and selected salient features as input variables described below:Page 5, “Regression Model”
- (4), we train a regression model .Page 6, “Regression Model”
- Once the SVR model is trained, we use it to predict the scores of ntest number of sentences in test (unseen) document clusters, Otest = {01, “plowstl Our HybHSum captures the sentence characteristics with a regression model using sentences in different document clusters.Page 6, “Regression Model”
- Later, we build a regression model with the same features as our HybHSum to create a summary.Page 7, “Experiments and Discussions”
- We keep the parameters and the features of the regression model of hierarchical HybHSum intact for consistency.Page 7, “Experiments and Discussions”

See all papers in *Proc. ACL 2010* that mention regression model.

See all papers in *Proc. ACL* that mention regression model.

Back to top.

Appears in 11 sentences as: unigram (9) unigrams (5)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- * sparse unigram distributions (siml) at each topic I on com: similarity between p(w0m,l 17 Com: vl) and p(wsn,l zsn : 17 Com: vl)Page 4, “Tree-Based Sentence Scoring”
- — siml: We define two sparse (discrete) unigram distributions for candidate 0m and summary 3,, at each node Z on a vocabulary identified with words generated by the topic at that node, v; C V. Given wom = {2111, ...,wl0m|}, let WOW; C wom be the set of words in am that are generated from topic zom at level I on path com.Page 4, “Tree-Based Sentence Scoring”
- The discrete unigram distribution pom; = p(w0m,l zom = l, cowvl) represents the probability over all words 2); assigned to topic zom at level 1, by sampling only for words in woml.Page 4, “Tree-Based Sentence Scoring”
- Fig.1.b depicts a sample path illustrating sparse unigram distributions of 0m and 3m at each level as well as their topic proportions, szm , and 19%”.Page 5, “Tree-Based Sentence Scoring”
- (I) nGram Meta-Features (NMF): For each document cluster D, we identify most frequent (nonstop word) unigrams, i.e., vfreq {wiflzl C V, where 7“ is a model parameter of number of most frequent unigram features.Page 5, “Regression Model”
- We measure observed unigram probabilities for each 212,- E vfreq with pD(wi) = nD(w,-)/ 2'32, 7mm), where nD(w,-) is the number of times 212,- appears in D and |V| is the total number of unigrams .Page 5, “Regression Model”
- To characterize this feature, we reuse the 7“ most frequent unigrams , i.e., w,- E vfreq.Page 5, “Regression Model”
- Given sentence 0m, let d be the document that am belongs to, i.e., 0m 6 d. We measure unigram probabilities for each 212,- by p(wi 6 0m) 2 nd(wi E 0m)/nD(wz-), where nd(wi 6 am) is the number of times 212,- appears in d and my is the number of times 212,- appears in D. For any ith feature, the value is fmi = 0, if given sentence does not contain wi, otherwise fmi = p(wi 6 am).Page 5, “Regression Model”
- We measure the average unigram probability of a sentence by: p(0m) = Zweam $37012), where PD is the observed unigram probability in the document collection D and |0m| is the total number of words in am.Page 6, “Regression Model”
- We use R-l (recall against unigrams ), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams).Page 6, “Experiments and Discussions”
- Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams.Page 8, “Experiments and Discussions”

See all papers in *Proc. ACL 2010* that mention unigram.

See all papers in *Proc. ACL* that mention unigram.

Back to top.

Appears in 10 sentences as: topic model (8) topic models (2)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model .Page 1, “Abstract”
- We present a probabilistic topic model on sentence level building on hierarchical Latent Dirichlet Allocation (hLDA) (Blei et al., 2003a), which is a generalization of LDA (Blei et al., 2003b).Page 1, “Introduction”
- One of the challenges of using a previously trained topic model is that the new document might have a totally new vocabulary or may include many other specific topics, which may or may not exist in the trained model.Page 2, “Background and Motivation”
- A common method is to rebuild a topic model for new sets of documents (Haghighi and Vanderwende, 2009), which has proven to produce coherent summaries.Page 2, “Background and Motivation”
- We build a summary-focused hierarchical probabilistic topic model , sumHLDA, for each document cluster at sentence level, because it enables capturing expected topic distributions in given sentences directly from the model.Page 2, “Summary-Focused Hierarchical Model”
- 1Please refer to (Blei et al., 2003b) and (Blei et al., 2003a) for details and demonstrations of topic models .Page 2, “Summary-Focused Hierarchical Model”
- * HIERSUM : (Haghighi and Vanderwende, 2009) A generative summarization method based on topic models , which uses sentences as an additional level.Page 7, “Experiments and Discussions”
- * HbeSum (Hybrid Flat Summarizer): To investigate the performance of hierarchical topic model , we build another hybrid model using flat LDA (Blei et al., 2003b).Page 7, “Experiments and Discussions”
- Compared to the HbeSum built on LDA, both HybHSum1&2 yield better performance indicating the effectiveness of using hierarchical topic model in summarization task.Page 8, “Experiments and Discussions”
- We demonstrated that implementation of a summary focused hierarchical topic model to discover sentence structures as well as construction of a discriminative method for inference can benefit summarization quality on manual and automatic evaluation metrics.Page 9, “Conclusion”

See all papers in *Proc. ACL 2010* that mention topic model.

See all papers in *Proc. ACL* that mention topic model.

Back to top.

Appears in 8 sentences as: LDA (8)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- We present a probabilistic topic model on sentence level building on hierarchical Latent Dirichlet Allocation (hLDA) (Blei et al., 2003a), which is a generalization of LDA (Blei et al., 2003b).Page 1, “Introduction”
- A hierarchical model is particularly appealing to summarization than a ”flat” model, e. g. LDA (Blei et al., 2003b), in that one can discover ”abstract” and ”specific” topics.Page 2, “Background and Motivation”
- * HbeSum (Hybrid Flat Summarizer): To investigate the performance of hierarchical topic model, we build another hybrid model using flat LDA (Blei et al., 2003b).Page 7, “Experiments and Discussions”
- In LDA each sentence is a superposition of all K topics with sentence specific weights, there is no hierarchical relation between topics.Page 7, “Experiments and Discussions”
- Instead of the new tree-based sentence scoring (§ 4), we present a similar method using topics from LDA on sentence level.Page 7, “Experiments and Discussions”
- Note that in LDA the topic-word distributions gb are over entire vocabulary, and topic mixing proportions for sentences 6 are over all the topics discovered from sentences in a document cluster.Page 7, “Experiments and Discussions”
- Hence, we define siml and simg measures for LDA using topic-word proportions gb (in place of discrete topic-word distributions from each level in Eq.2) and topic mixing weights 6 in sentences (in place of topic proportions in Eq.3) respectively.Page 7, “Experiments and Discussions”
- Compared to the HbeSum built on LDA , both HybHSum1&2 yield better performance indicating the effectiveness of using hierarchical topic model in summarization task.Page 8, “Experiments and Discussions”

See all papers in *Proc. ACL 2010* that mention LDA.

See all papers in *Proc. ACL* that mention LDA.

Back to top.

Appears in 6 sentences as: topic distribution (1) topic distributions (5)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- Such models can yield comparable or better performance on DUC and other evaluations, since representing documents as topic distributions rather than bags of words diminishes the effect of lexical variability.Page 1, “Introduction”
- Our focus is on identifying similarities of candidate sentences to summary sentences using a novel tree based sentence scoring algorithm, concerning topic distributions at different levels of the discovered hierarchy as described in § 3 and § 4,Page 1, “Introduction”
- An alternative yet feasible solution, presented in this work, is building a model that can summarize new document clusters using characteristics of topic distributions of training documents.Page 2, “Background and Motivation”
- We discover hidden topic distributions of sentences in a given document cluster along with provided summary sentences based on hLDA described in (Blei et al., 2003a)1.Page 2, “Summary-Focused Hierarchical Model”
- We build a summary-focused hierarchical probabilistic topic model, sumHLDA, for each document cluster at sentence level, because it enables capturing expected topic distributions in given sentences directly from the model.Page 2, “Summary-Focused Hierarchical Model”
- Each node is associated with a topic distribution over words.Page 3, “Summary-Focused Hierarchical Model”

See all papers in *Proc. ACL 2010* that mention topic distributions.

See all papers in *Proc. ACL* that mention topic distributions.

Back to top.

Appears in 5 sentences as: bigram (4) bigrams (3)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- We similarly include bigram features in the experiments.Page 5, “Regression Model”
- We also include bigram extensions of DMF features.Page 5, “Regression Model”
- We use sentence bigram frequency, sentence rank in a document, and sentence size as additional fea-Page 6, “Regression Model”
- We use R-l (recall against unigrams), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams ).Page 6, “Experiments and Discussions”
- Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams .Page 8, “Experiments and Discussions”

See all papers in *Proc. ACL 2010* that mention bigram.

See all papers in *Proc. ACL* that mention bigram.

Back to top.

Appears in 4 sentences as: statistical significance (2) statistically significant (2)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- Results in bold show statistical significance over baseline in corresponding metric.Page 7, “Experiments and Discussions”
- When stop words are used the HybHSumg outperforms state-of-the-art by 2.5-7% except R-2 (with statistical significance ).Page 8, “Experiments and Discussions”
- Results are statistically significant based on t—test.Page 8, “Experiments and Discussions”
- All results in Table 4 are statistically significant (based on t-test on 95% confidence level.)Page 8, “Experiments and Discussions”

See all papers in *Proc. ACL 2010* that mention statistical significance.

See all papers in *Proc. ACL* that mention statistical significance.

Back to top.

Appears in 3 sentences as: generative model (3)

In *A Hybrid Hierarchical Model for Multi-Document Summarization*

- In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference.Page 1, “Abstract”
- In this paper, we present a novel approach that formulates MDS as a prediction problem based on a two-step hybrid model: a generative model for hierarchical topic discovery and a regression model for inference.Page 1, “Introduction”
- Our approach differs from the early work, in that, we combine a generative hierarchical model and regression model to score sentences in new documents, eliminating the need for building a generative model for new document clusters.Page 2, “Background and Motivation”

See all papers in *Proc. ACL 2010* that mention generative model.

See all papers in *Proc. ACL* that mention generative model.

Back to top.