A Critical Reassessment of Evaluation Baselines for Speech Summarization
Penn, Gerald and Zhu, Xiaodan

Article Structure

Abstract

We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus.

Problem definition and related literature

Speech is arguably the most basic, most natural form of human communication.

Setting of the experiment

2.1 Provenance of the data

Results and analysis

3.1 Lecture corpus

Future Work

In terms of future work in light of these results, clearly the most important challenge is to formulate an experimental alternative to measuring against a subjectively classified gold standard in which annotators are forced to commit to relative salience judgements with no attention to goal orientation and no requirement to synthesize the meanings of larger units of structure into a coherent message.

Topics

POS tags

Appears in 3 sentences as: POS tags (3)
In A Critical Reassessment of Evaluation Baselines for Speech Summarization
  1. A decision tree (C4.5, Release 8) is used to detect false starts, trained on the POS tags and trigger-word status of the first and last four words of sentences from a training set.
    Page 4, “Setting of the experiment”
  2. For (both WH-and Yesfl\Io) question identification, another C4.5 classifier was trained on 2,000 manually annotated sentences using utterance length, POS bigram occurrences, and the POS tags and trigger-word status of the first and last five words of an utterance.
    Page 5, “Setting of the experiment”
  3. Taking ASR transcripts as input, we use the Brill tagger (Brill, 1995) to assign POS tags to each word.
    Page 5, “Setting of the experiment”

See all papers in Proc. ACL 2008 that mention POS tags.

See all papers in Proc. ACL that mention POS tags.

Back to top.

state of the art

Appears in 3 sentences as: state of the art (3)
In A Critical Reassessment of Evaluation Baselines for Speech Summarization
  1. We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus.
    Page 1, “Abstract”
  2. The purpose of this paper is not so much to introduce a new way of summarizing speech, as to critically reappraise how well the current state of the art really works.
    Page 2, “Problem definition and related literature”
  3. These four results provide us with valuable insight into the current state of the art in speech summarization: it is not summarization, the aspiration to measure the relative merits of knowledge sources has masked the prominence of some very simple baselines, and the Zechner & Waibel pipe-ASR-output—into-text-summarizer model is still very competitive — what seems to matter more than having access to the raw spoken data is simply knowing that it is spoken data, so that the most relevant, still textually available features can be used.
    Page 3, “Problem definition and related literature”

See all papers in Proc. ACL 2008 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

statistically significant

Appears in 3 sentences as: statistical significance (1) statistically significant (2)
In A Critical Reassessment of Evaluation Baselines for Speech Summarization
  1. The best performance is achieved by using all of the features together, but the length baseline, which uses only those features in bold type from Figure 3, is very close (no statistically significant difference), as is MMR.6
    Page 6, “Results and analysis”
  2. The difference with respect to either of these baselines is statistically significant within the popular 10—30% compression range, as is the classifier trained on all features but acoustic
    Page 6, “Results and analysis”
  3. It is entirely possible that, within this protocol, the baselines that have performed so well in our experiments, such as length or, in read news, position, will utterly fail, and that less traditional acoustic or spoken language features will genuinely, and with statistical significance , add value to a purely transcript-based text summarization system.
    Page 8, “Future Work”

See all papers in Proc. ACL 2008 that mention statistically significant.

See all papers in Proc. ACL that mention statistically significant.

Back to top.