Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
Wintrode, Jonathan and Khudanpur, Sanjeev

Article Structure

Abstract

We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models.

Introduction

The spoken term detection task arises as a key subtask in applying NLP applications to spoken content.

Motivation

We seek a workable definition of broad document context beyond N-gram models that will improve term detection performance on an arbitrary set of queries.

Term and Document Frequency Statistics

To this point we have assumed an implicit property of low-frequency words which Church and Gale state concisely in their 1999 study of inverse document frequency:

Term Detection Re-scoring

We summarize our re-scoring of repeated words with the observation: given a correct detection, the likelihood of additional terms in the same documents should increase.

Results

Now that we have tested word repetition-based re-scoring on a small Tagalog development set we want to know if our approach, and particularly our a estimate is sufficiently robust to apply broadly.

Conclusions

Leveraging the burstiness of content words, we have developed a simple technique to consistently boost term detection performance across languages.

Topics

co-occurrence

Appears in 10 sentences as: co-occurrence (10)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. Instead of taking a broad view of topic context in spoken documents, variability of word co-occurrence statistics across corpora leads us to focus instead the on phenomenon of word repetition within single documents.
    Page 1, “Abstract”
  2. In order to arrive at our eventual solution, we take the BABEL Tagalog corpus and analyze word co-occurrence and repetition statistics in detail.
    Page 2, “Introduction”
  3. Our observation of the variability in co-occurrence statistics between Tagalog training and development partitions leads us to narrow the scope of document context to same word co-occurrences, i.e.
    Page 2, “Introduction”
  4. Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams.
    Page 2, “Motivation”
  5. The difficulty in this approach arises from the variability in word co-occurrence statistics.
    Page 2, “Motivation”
  6. Unfortunately, estimates of co-occurrence from small corpora are not very consistent, and often over- or underestimate concurrence probabilities needed for term detection.
    Page 2, “Motivation”
  7. Figure 1: Correlation between the co-occurrence counts in the training and held-out sets for a fixed keyword (term) and all its “context” words.
    Page 2, “Motivation”
  8. However, if the goal is to help a speech retrieval system detect content-rich (and presumably infrequent) keywords, then using word co-occurrence information (i.e.
    Page 3, “Motivation”
  9. In light of this finding, we will restrict the type of context we use for term detection to the co-occurrence of the term itself elsewhere within the document.
    Page 3, “Motivation”
  10. As with word co-occurrence , we consider if estimates of Padapt(w) from training data are consistent when estimated on development data.
    Page 7, “Term Detection Re-scoring”

See all papers in Proc. ACL 2014 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

language models

Appears in 10 sentences as: language model (1) language models (9)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models .
    Page 1, “Abstract”
  2. ASR systems traditionally use N-gram language models to incorporate prior knowledge of word occurrence patterns into prediction of the next word in the token stream.
    Page 1, “Introduction”
  3. Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
    Page 1, “Introduction”
  4. The strength of this phenomenon suggests it may be more viable for improving term-detection than, say, topic-sensitive language models .
    Page 2, “Introduction”
  5. The re-scoring approach we present is closely related to adaptive or cache language models (Je-linek, 1997; Kuhn and De Mori, 1990; Kneser and Steinbiss, 1993).
    Page 3, “Motivation”
  6. The primary difference between this and previous work on similar language models is the narrower focus here on the term detection task, in which we consider each search term in isolation, rather than all words in the vocabulary.
    Page 3, “Motivation”
  7. A similar phenomenon is observed concerning adaptive language models (Church, 2000).
    Page 5, “Term and Document Frequency Statistics”
  8. In general, we can think of using word repetitions to re-score term detection as applying a limited form of adaptive or cache language model (Je-linek, 1997).
    Page 5, “Term and Document Frequency Statistics”
  9. In applying the burstiness quantity to term detection, we recall that the task requires us to locate a particular instance of a term, not estimate a count, hence the utility of N-gram language models predicting words in sequence.
    Page 5, “Term and Document Frequency Statistics”
  10. We train ASR acoustic and language models from the training corpus using the Kaldi speech recognition toolkit (Povey et al., 2011) following the default BABEL training and search recipe which is described in detail by Chen et al.
    Page 8, “Results”

See all papers in Proc. ACL 2014 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

N-gram

Appears in 9 sentences as: N-gram (9)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models.
    Page 1, “Abstract”
  2. ASR systems traditionally use N-gram language models to incorporate prior knowledge of word occurrence patterns into prediction of the next word in the token stream.
    Page 1, “Introduction”
  3. N-gram models cannot, however, capture complex linguistic or topical phenomena that occur outside the typical 3-5 word scope of the model.
    Page 1, “Introduction”
  4. Confidence scores from an ASR system (which incorporate N-gram probabilities) are optimized in order to produce the most likely sequence of words rather than the accuracy of individual word detections.
    Page 1, “Introduction”
  5. Looking at broader document context within a more limited task might allow us to escape the limits of N-gram performance.
    Page 1, “Introduction”
  6. We seek a workable definition of broad document context beyond N-gram models that will improve term detection performance on an arbitrary set of queries.
    Page 2, “Motivation”
  7. A number of efforts have been made to augment traditional N-gram models with latent topic information (Khudanpur and Wu, 1999; Florian and Yarowsky, 1999; Liu and Liu, 2008; Hsu and Glass, 2006; Naptali et al., 2012) including some of the early work on Probabilistic Latent Semantic Analysis by Hofmann (2001).
    Page 3, “Motivation”
  8. In applying the burstiness quantity to term detection, we recall that the task requires us to locate a particular instance of a term, not estimate a count, hence the utility of N-gram language models predicting words in sequence.
    Page 5, “Term and Document Frequency Statistics”
  9. Using word repetitions, we effectively use a broad document context outside of the typical 2-5 N-gram window.
    Page 9, “Conclusions”

See all papers in Proc. ACL 2014 that mention N-gram.

See all papers in Proc. ACL that mention N-gram.

Back to top.

unigram

Appears in 7 sentences as: Unigram (1) unigram (6) unigrams (1)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. Figure 4: Difference between observed and predicted IDFw for Tagalog unigrams .
    Page 5, “Term and Document Frequency Statistics”
  2. 3.1 Unigram Probabilities
    Page 5, “Term and Document Frequency Statistics”
  3. We encounter the burstiness property of words again by looking at unigram occurrence probabilities.
    Page 5, “Term and Document Frequency Statistics”
  4. We compare the unconditional unigram probability (the probability that a given word token is 21)) with the conditional unigram probability, given the term has occurred once in the document.
    Page 5, “Term and Document Frequency Statistics”
  5. Figure 6: Difference between conditional and unconditional unigram probabilities for Tagalog
    Page 6, “Term and Document Frequency Statistics”
  6. pain; > 0) = fl (4) ZDszD I Figure 6 shows the difference between conditional and unconditional unigram probabilities.
    Page 6, “Term and Document Frequency Statistics”
  7. Given that adaptation values are roughly an order of magnitude higher than the conditional unigram probabilities, in the next two sections we describe how we use adaptation to boost term detection scores.
    Page 6, “Term and Document Frequency Statistics”

See all papers in Proc. ACL 2014 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

topic models

Appears in 5 sentences as: topic model (1) topic modeling (1) topic models (3)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams.
    Page 2, “Motivation”
  2. Indeed there is work in the literature that shows that various topic models , latent or otherwise, can be useful for improving lan-
    Page 2, “Motivation”
  3. In the information retrieval community, clustering and latent topic models have yielded improvements over traditional vector space models.
    Page 3, “Motivation”
  4. information retrieval, and again, interpolate latent topic models with N-grams to improve retrieval performance.
    Page 4, “Motivation”
  5. In these efforts, the topic model information was helpful in boosting retrieval performance above the baseline vector space or N- gram models.
    Page 4, “Motivation”

See all papers in Proc. ACL 2014 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

confidence score

Appears in 3 sentences as: confidence score (2) Confidence scores (1)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. Confidence scores from an ASR system (which incorporate N-gram probabilities) are optimized in order to produce the most likely sequence of words rather than the accuracy of individual word detections.
    Page 1, “Introduction”
  2. For each term t and document d we propose interpolating the ASR confidence score for a particular detection td with the top scoring hit in d which we’ll call £1.
    Page 6, “Term Detection Re-scoring”
  3. However to verify that this approach is worth pursuing, we sweep a range of small 04 values, on the assumption that we still do want to mostly rely on the ASR confidence score for term detection.
    Page 6, “Term Detection Re-scoring”

See all papers in Proc. ACL 2014 that mention confidence score.

See all papers in Proc. ACL that mention confidence score.

Back to top.

contextual information

Appears in 3 sentences as: context information (1) contextual information (2)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models.
    Page 1, “Abstract”
  2. We will show that by focusing on contextual information in the form of word repetition within documents, we obtain consistent improvement across five languages in the so called Base Phase of the IARPA BABEL program.
    Page 1, “Introduction”
  3. Clearly topic or context information is relevant to a retrieval type task, but we need a stable, consistent framework in which to apply it.
    Page 4, “Motivation”

See all papers in Proc. ACL 2014 that mention contextual information.

See all papers in Proc. ACL that mention contextual information.

Back to top.

N-grams

Appears in 3 sentences as: N-grams (3)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
    Page 1, “Introduction”
  2. Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams .
    Page 2, “Motivation”
  3. information retrieval, and again, interpolate latent topic models with N-grams to improve retrieval performance.
    Page 4, “Motivation”

See all papers in Proc. ACL 2014 that mention N-grams.

See all papers in Proc. ACL that mention N-grams.

Back to top.

NIST

Appears in 3 sentences as: NIST (5)
In Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection
  1. The BABEL task is modeled on the 2006 NIST Spoken Term Detection evaluation ( NIST , 2006) but focuses on limited resource conditions.
    Page 2, “Introduction”
  2. The primary metric for the BABEL program, Actual Term Weighted Value (ATWV) is defined by NIST using a cost function of the false alarm probability P(FA) and P(l\/liss), averaged over a set of queries ( NIST , 2006).
    Page 7, “Term Detection Re-scoring”
  3. At our disposal, we have the five BABEL languages — Tagalog, Cantonese, Pashto, Turkish and Vietnamese — as well as the development data from the NIST 2006 English evaluation.
    Page 8, “Results”

See all papers in Proc. ACL 2014 that mention NIST.

See all papers in Proc. ACL that mention NIST.

Back to top.