Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
Cheung, Jackie Chi Kit and Penn, Gerald

Article Structure

Abstract

In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text.

Introduction

In automatic summarization, centrality has been one of the guiding principles for content selection in extractive systems.

Related Work

Domain-dependent template-based summarization systems have been an alternative to extractive systems which make use of rich knowledge about a domain and information extraction techniques to generate a summary, possibly using a natural language generation system (Radev and McKeown, 1998; White et al., 2001; McKeown et al., 2002).

Theoretical basis of our analysis

Many existing summarization evaluation methods rely on word or N-gram overlap measures, but these measures are not appropriate for our analysis.

Experiments

We conducted our experiments on the data and results of the TAC 2010 summarization workshop.

Conclusion

We have presented a series of studies to distinguish human-written informative summaries from the summaries produced by current systems.

Topics

in-domain

Appears in 11 sentences as: in-domain (11)
In Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
  1. Third, we consider how domain knowledge may be useful as a resource for an abstractive system, by showing that key parts of model summaries can be reconstructed from the source plus related in-domain documents.
    Page 2, “Introduction”
  2. They were originally proposed in the context of single-document summarization, where they were calculated using in-domain (relevant) vs. out-of-domain (irrelevant) text.
    Page 2, “Related Work”
  3. In multi-document summarization, the in-domain text has been replaced by the source text cluster (Conroy et al., 2006), thus they are now
    Page 2, “Related Work”
  4. Traditional systems that perform semantic inference do so from a set of known facts about the domain in the form of a knowledge base, but as we have seen, most extractive summarization systems do not make much use of in-domain corpora.
    Page 8, “Experiments”
  5. We examine adding in-domain text to the source text to see how this would affect coverage.
    Page 8, “Experiments”
  6. As shown in Table 6, the effect of adding more in-domain text on caseframe coverage is substantial, and noticeably more than using out-of-domain text.
    Page 8, “Experiments”
  7. The implication of this result is that it may be possible to generate better summaries by mining in-domain text for relevant caseframes.
    Page 8, “Experiments”
  8. Table 6: The effect on caseframe coverage of adding in-domain and out-of—domain documents.
    Page 8, “Experiments”
  9. The difference between adding in-domain and out-of—domain text is significant p < 10—3 (Study 3).
    Page 8, “Experiments”
  10. However, our results are also positive in that we find that nearly all model summary caseframes can be found in the source text together with some in-domain documents.
    Page 8, “Conclusion”
  11. Domain inference, on the other hand, and a greater use of in-domain documents as a knowledge source for domain inference, are very promising indeed.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2013 that mention in-domain.

See all papers in Proc. ACL that mention in-domain.

Back to top.

extractive systems

Appears in 8 sentences as: extractive system (2) extractive systems (6)
In Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
  1. In automatic summarization, centrality has been one of the guiding principles for content selection in extractive systems .
    Page 1, “Introduction”
  2. Domain-dependent template-based summarization systems have been an alternative to extractive systems which make use of rich knowledge about a domain and information extraction techniques to generate a summary, possibly using a natural language generation system (Radev and McKeown, 1998; White et al., 2001; McKeown et al., 2002).
    Page 2, “Related Work”
  3. Several studies complement this paper by examining the best possible extractive system using current evaluation measures, such as ROUGE (Lin and Hovy, 2003; Conroy et al., 2006).
    Page 2, “Related Work”
  4. They find that the best possible extractive systems score higher or as highly than human summarizers, but it is unclear whether this means the oracle summaries are actually as useful as human ones in an extrinsic setting.
    Page 2, “Related Work”
  5. In our study, we compared the characteristics of summaries generated by the eight human summarizers with those generated by the peer summaries, which are basically extractive systems .
    Page 4, “Experiments”
  6. Purely extractive systems would thus be expected to score 1.0, as would systems that perform text compression by remov-
    Page 5, “Experiments”
  7. Peer 2 shows a relatively high level of aggregation despite being an extractive system .
    Page 5, “Experiments”
  8. These results present a fundamental limit to extractive systems , and also text simplification and sentence fusion methods based solely on the source text.
    Page 7, “Experiments”

See all papers in Proc. ACL 2013 that mention extractive systems.

See all papers in Proc. ACL that mention extractive systems.

Back to top.

statistically significant

Appears in 5 sentences as: statistical significance (1) statistically significant (2) statistically significantly (2)
In Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
  1. The model average is statistically significantly different from all the other conditions p < 10—7 (Study 1).
    Page 5, “Experiments”
  2. The averages of the tested conditions are shown in Table 2, and are statistically significant .
    Page 5, “Experiments”
  3. The differences in density between the human average and the non-baseline conditions are highly statistically significant , according to paired two-tailed Wilcoxon signed-rank tests for the statistic calculated for each topic cluster.
    Page 6, “Experiments”
  4. The statistical significance results are unchanged.
    Page 7, “Experiments”
  5. The model average is statistically significantly different from all the other conditions p < 10—8 (Study 3).
    Page 8, “Experiments”

See all papers in Proc. ACL 2013 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.

semantic role

Appears in 4 sentences as: semantic role (3) semantic roles (1)
In Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
  1. Caseframes are shallow approximations of semantic roles which are well suited to characterizing a domain by its slots.
    Page 2, “Introduction”
  2. A caseframe is a shallow approximation of the semantic role structure of a proposition-bearing unit like a verb, and are
    Page 3, “Theoretical basis of our analysis”
  3. In particular, they are (gov, role) pairs, where gov is a proposition-bearing element, and role is an approximation of a semantic role with gov as its head (See Figure 1 for examples).
    Page 3, “Theoretical basis of our analysis”
  4. Caseframes do not consider the dependents of the semantic role approximations.
    Page 3, “Theoretical basis of our analysis”

See all papers in Proc. ACL 2013 that mention semantic role.

See all papers in Proc. ACL that mention semantic role.

Back to top.