A Risk Minimization Framework for Extractive Speech Summarization
Lin, Shih-Hsiang and Chen, Berlin

Article Structure

Abstract

In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations.

Introduction

Automated summarization systems which enable user to quickly digest the important information conveyed by either a single or a cluster of documents are indispensible for managing the rapidly growing amount of textual information and multimedia content (Mani and Maybury, 1999).

Background

Speech summarization can be conducted using either supervised or unsupervised methods (Furui et al., 2004, McKeown et al., 2005, Lin et al., 2008).

A risk minimization framework for extractive summarization

Extractive summarization can be viewed as a decision making process in which the summarizer attempts to select a representative subset of sentences or paragraphs from the original documents.

Proposed Methods

There are many ways to construct the above mentioned three componen mod ls, i.e., the sentence generative model FED | 513 , the sentence prior model P(Sj), and the loss function L(S,.,Sj).

Experimental setup 5.1 Data

The summarization dataset used in this research is a widely used broadcast news corpus collected by the Academia Sinica and the Public Television Service Foundation of Taiwan between November 2001 and April 2003 (Wang et al., 2005).

Experimental results and discussions 6.1 Baseline experiments

In the first set of experiments, we evaluate the baseline performance of the LM and BC summarizers (cf.

Conclusions and future work

We have proposed a risk minimization framework for extractive speech summarization, which enjoys several advantages.

Topics

loss function

Appears in 14 sentences as: Loss function (1) loss function (11) loss function: (1) loss functions (2)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. In addition, the introduction of various loss functions also provides the summarization framework with a flexible but systematic way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively.
    Page 1, “Abstract”
  2. Stated formally, a decision problem may consist of four basic elements: 1) an observation 0 from a random variable 0 , 2) a set of possible decisions (or actions) a e A , 3) the state of nature 669 , and 4) a loss function L(ai,6) which specifies the cost associated with a chosen decision a, given that 6 is the true state of nature.
    Page 3, “A risk minimization framework for extractive summarization”
  3. itself; (2) PfDlSjS is the sentence generative probability that captures the degree of relevance of S j to the residual document D ; and (3) L(Si,Sj) is the loss function that characterizes the relationship between sentence Si and any other sentence S j.
    Page 4, “A risk minimization framework for extractive summarization”
  4. There are many ways to construct the above mentioned three componen mod ls, i.e., the sentence generative model FED | 513 , the sentence prior model P(Sj), and the loss function L(S,.,Sj).
    Page 4, “Proposed Methods”
  5. 4.3 Loss function
    Page 5, “Proposed Methods”
  6. The loss function introduced in the proposed summarization framework is to measure the relationship between any pair of sentences.
    Page 5, “Proposed Methods”
  7. Consequently, the loss function can be built on the notion of the similarity measure.
    Page 5, “Proposed Methods”
  8. (10) VZtT=IZt2J XV tT=IZt2J The loss function is thus defined by L(S,,Sj)=1—Sim(5,.,sj) (11)
    Page 5, “Proposed Methods”
  9. Once the sentence generative model P(13 | S j), the sentence prior model P(Sj) and the loss function L(Si,Sj) have been properly estimated, the summary sentences can be selected iteratively by (8) according to a predefined target summarization ratio.
    Page 5, “Proposed Methods”
  10. To alleviate this problem, the concept of maximum marginal relevance (MMR) (Carbonell and Goldstein, 1998), which performs sentence selection iteratively by striking the balance between topic relevance and coverage, can be incorporated into the loss function:
    Page 5, “Proposed Methods”
  11. We start by considering a special case where a 0-1 loss function is used in (8), namely, the loss function will take value 0 if the two sentences are identical, and 1 otherwise.
    Page 5, “Proposed Methods”

See all papers in Proc. ACL 2010 that mention loss function.

See all papers in Proc. ACL that mention loss function.

Back to top.

LM

Appears in 8 sentences as: LM (8)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. In order to estimate the sentence generative probability, we explore the language modeling ( LM ) approach, which has been introduced to a wide spectrum of IR tasks and demonstrated with good empirical success, to predict the sentence generative probability.
    Page 4, “Proposed Methods”
  2. In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w)
    Page 4, “Proposed Methods”
  3. In the first set of experiments, we evaluate the baseline performance of the LM and BC summarizers (cf.
    Page 6, “Experimental results and discussions 6.1 Baseline experiments”
  4. Second, the supervised summarizer (i.e., BC) outperforms the unsupervised summarizer (i.e., LM ).
    Page 7, “Experimental results and discussions 6.1 Baseline experiments”
  5. One is that BC is trained with the handcrafted document-summary sentence labels in the development set while LM is instead conducted in a purely unsupervised manner.
    Page 7, “Experimental results and discussions 6.1 Baseline experiments”
  6. Another is that BC utilizes a rich set of features to characterize a given spoken sentence while LM is constructed solely on the basis of the lexical (unigram) information.
    Page 7, “Experimental results and discussions 6.1 Baseline experiments”
  7. (13)), which just show a simple combination of BC and LM .
    Page 7, “Experimental results and discussions 6.1 Baseline experiments”
  8. If we further compare the results achieved by MMR with those of BC and LM as shown in Table 3, we can find significant improvements both for the TD and SD cases.
    Page 8, “Experimental results and discussions 6.1 Baseline experiments”

See all papers in Proc. ACL 2010 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

CRF

Appears in 6 sentences as: CRF (6)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. To this end, several popular machine-leaming methods could be utilized, like Bayesian classifier (BC) (Kupiec et al., 1999), Gaussian mixture model (GMM) (Fattah and Ren, 2009) , hidden Markov model (HMM) (Conroy and O'leary, 2001), support vector machine (SVM) (Kolcz et al., 2001), maximum entropy (ME) (Ferrier, 2001), conditional random field ( CRF ) (Galley, 2006; Shen et al., 2007), to name a few.
    Page 2, “Background”
  2. Although such supervised summarizers are effective, most of them (except CRF ) usually implicitly assume that sentences are independent of each other (the so-called “bag-ofsentences” assumption) and classify each sentence individually without leveraging the relationship among the sentences (Shen et al., 2007).
    Page 2, “Background”
  3. In the final set of experiments, we compare our proposed summarization methods with a few existing summarization methods that have been widely used in various summarization tasks, including LEAD, VSM, LexRank and CRF ; the corresponding results are shown in Table 5.
    Page 8, “Experimental results and discussions 6.1 Baseline experiments”
  4. To our surprise, CRF does not provide superior results as compared to the other summarization methods.
    Page 8, “Experimental results and discussions 6.1 Baseline experiments”
  5. One possible explanation is that the structural evidence of the spoken documents in the test set is not strong enough for CRF to show its advantage of modeling the local structural information among sentences.
    Page 8, “Experimental results and discussions 6.1 Baseline experiments”
  6. TD 0.431 0.315 0.383 CRF
    Page 8, “Experimental results and discussions 6.1 Baseline experiments”

See all papers in Proc. ACL 2010 that mention CRF.

See all papers in Proc. ACL that mention CRF.

Back to top.

generative model

Appears in 5 sentences as: generative model (5)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. There are many ways to construct the above mentioned three componen mod ls, i.e., the sentence generative model FED | 513 , the sentence prior model P(Sj), and the loss function L(S,.,Sj).
    Page 4, “Proposed Methods”
  2. 4.1 Sentence generative model
    Page 4, “Proposed Methods”
  3. In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w)
    Page 4, “Proposed Methods”
  4. Interested readers may refer to (Zhai, 2008; Chen et al., 2009) for a thorough discussion on various ways to construct the sentence generative model .
    Page 4, “Proposed Methods”
  5. Once the sentence generative model P(13 | S j), the sentence prior model P(Sj) and the loss function L(Si,Sj) have been properly estimated, the summary sentences can be selected iteratively by (8) according to a predefined target summarization ratio.
    Page 5, “Proposed Methods”

See all papers in Proc. ACL 2010 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

unigram

Appears in 4 sentences as: unigram (4)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w)
    Page 4, “Proposed Methods”
  2. To mitigate this potential defect, a unigram probability estimated from a general collection, which models the general distribution of words in the target language, is often used to smooth the sentence model.
    Page 4, “Proposed Methods”
  3. They are, respectively, the ROUGE-l ( unigram ) measure, the ROUGE-2 (bigram) measure and the ROUGE-L (longest common subsequence) measure (Lin, 2004).
    Page 6, “Experimental setup 5.1 Data”
  4. Another is that BC utilizes a rich set of features to characterize a given spoken sentence while LM is constructed solely on the basis of the lexical ( unigram ) information.
    Page 7, “Experimental results and discussions 6.1 Baseline experiments”

See all papers in Proc. ACL 2010 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

iteratively

Appears in 3 sentences as: iteratively (3)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. sentences of a given document can be iteratively chosen (i.e., one at each iteration) from the document until the aggregated summary reaches a predefined target summarization ratio.
    Page 4, “A risk minimization framework for extractive summarization”
  2. Once the sentence generative model P(13 | S j), the sentence prior model P(Sj) and the loss function L(Si,Sj) have been properly estimated, the summary sentences can be selected iteratively by (8) according to a predefined target summarization ratio.
    Page 5, “Proposed Methods”
  3. To alleviate this problem, the concept of maximum marginal relevance (MMR) (Carbonell and Goldstein, 1998), which performs sentence selection iteratively by striking the balance between topic relevance and coverage, can be incorporated into the loss function:
    Page 5, “Proposed Methods”

See all papers in Proc. ACL 2010 that mention iteratively.

See all papers in Proc. ACL that mention iteratively.

Back to top.

structural features

Appears in 3 sentences as: Structural features (1) structural features (2)
In A Risk Minimization Framework for Extractive Speech Summarization
  1. A spoken sentence Si is characterized by set of T indicative features X i ={xi1,---,xiT}, and they may include lexical features (Koumpis and Renals, 2000), structural features (Maskey and Hirschberg, 2003), acoustic features (Inoue et al., 2004), discourse features (Zhang et al., 2007) and relevance features (Lin et al., 2009).
    Page 2, “Background”
  2. Structural features
    Page 6, “Experimental setup 5.1 Data”
  3. The input to BC consists of a set of 28 indicative features used to characterize a spoken sentence, including the structural features , the lexical features, the acoustic features and the relevance feature.
    Page 6, “Experimental setup 5.1 Data”

See all papers in Proc. ACL 2010 that mention structural features.

See all papers in Proc. ACL that mention structural features.

Back to top.