Discriminative state tracking for spoken dialog systems
Metallinou, Angeliki and Bohus, Dan and Williams, Jason

Article Structure

Abstract

In spoken dialog systems, statistical state tracking aims to improve robustness to speech recognition errors by tracking a posterior distribution over hidden dialog states.

Introduction

Spoken dialog systems interact with users via natural language to help them achieve a goal.

Data and experimental design

We use data from the public deployment of two systems in the Spoken Dialog Challenge (Black et al.

Generative state tracking

Generative state tracking approaches leverage models that describe how SLU results are generated from a hidden dialog state, denoted g. The user’s true (unobserved) action u is conditioned on g and the system action a via a user action model P(u|g, a), and also on the observed SLU result 712 via a model of how SLU results are generated Given a prior distribution 19(9) and a result 712, an updated distribution 19’ (9) can be computed by summing over all hidden user actions u:

Topics

confidence score

Appears in 9 sentences as: confidence score (5) confidence scores (3) confidence scoring (1)
In Discriminative state tracking for spoken dialog systems
  1. For dialog state tracking, most commercial systems use handcrafted heuristics, selecting the SLU result with the highest confidence score , and discarding alternatives.
    Page 2, “Introduction”
  2. The two systems differed in acoustic models, confidence scoring model, state tracking method and parameters, number of supported routes (8 vs 40, for D81 and D82 respectively), presence of minor bugs, and user population.
    Page 2, “Data and experimental design”
  3. As a baseline, we construct a handcrafted state tracking rule that follows a strategy common in commercial systems: it returns the SLU result with the maximum confidence score , ignoring all other hypotheses.
    Page 4, “Data and experimental design”
  4. For example, if the user says “no” to an explicit confirmation or “go back” to an implicit confirmation, they are asked the same question again, which gives an opportunity for a higher confidence score .
    Page 4, “Data and experimental design”
  5. The performance of this baseline (BASELINE in Table 3) is relatively strong because the top SLU result is by far most likely to be correct, and because the confidence score was already trained with slot-specific speech data (Williams and Bal-akrishnan (2009), Williams (2012)).
    Page 4, “Data and experimental design”
  6. Base features consider information about the current turn, including rank of the current SLU result (current hypothesis), the SLU result confidence score (s) in the current N-best list, the difference between the current hypothesis score and the best hypothesis score in the current N-best list, etc.
    Page 5, “Generative state tracking”
  7. Those include the number of times an SLU result has been observed before, the number of times an SLU result has been observed before at a specific rank such as rank 1, the sum and average of confidence scores of SLU results across all past recognitions, the number of possible past user negations 0r confirmations 0f the current SLU result etc.
    Page 5, “Generative state tracking”
  8. For example, from the current turn, we use the number of distinct SLU results, the entropy of the confidence scores , the best path score of the word confusion network, etc.
    Page 5, “Generative state tracking”
  9. We also include features that contain aggregate information about the sequence of all N-best lists up to the current turn, such as the mean and variance of N-best list lengths, the number of distinct SLU results observed so far, the entropy of their corresponding confidence scores , and others.
    Page 5, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention confidence score.

See all papers in Proc. ACL that mention confidence score.

Back to top.

feature set

Appears in 7 sentences as: feature set (5) feature sets (2)
In Discriminative state tracking for spoken dialog systems
  1. portance of different feature sets for this task, and measure the amount of data required to reliably train our model.
    Page 2, “Introduction”
  2. In DIS-CDYNl, we use the original feature set , ignoring the problem described above (so that the general features contribute no information), resulting in M + K weights.
    Page 7, “Generative state tracking”
  3. The analysis of various feature sets indicates that the ASR/SLU error correlation (confusion) features yield the largest improvement — c.f.
    Page 8, “Generative state tracking”
  4. feature set be compared to b in Table 3.
    Page 8, “Generative state tracking”
  5. While Table 3 shows performance measures averaged across all turns, Table 4 breaks down performance measures by slot, using the full feature set bch and the realistic MISMATCH dataset.
    Page 8, “Generative state tracking”
  6. Table 4: Performance per slot on dataset MISMATCH using the full feature set bch.
    Page 9, “Generative state tracking”
  7. Results show that the proposed model exceeds both the accuracy and probability quality of all baselines when using the richest feature set , which includes information about common ASR confusions and dialog history.
    Page 9, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention feature set.

See all papers in Proc. ACL that mention feature set.

Back to top.

maximum entropy

Appears in 7 sentences as: Maximum entropy (1) maximum entropy (6)
In Discriminative state tracking for spoken dialog systems
  1. In this work we will use maximum entropy models.
    Page 4, “Generative state tracking”
  2. 4.1 Maximum entropy models
    Page 5, “Generative state tracking”
  3. The maximum entropy framework (Berger et al.
    Page 5, “Generative state tracking”
  4. 1Although in practice, maximum entropy model constraints render weights for one class redundant.
    Page 6, “Generative state tracking”
  5. Figure 2: The DISCFIXED model is a traditional maximum entropy model for classification.
    Page 6, “Generative state tracking”
  6. However, unlike in traditional maximum entropy models such as the fixed-position model above, these features functions are dynamically defined when presented with each turn.
    Page 7, “Generative state tracking”
  7. With feature functions defined this way, standard maximum entropy optimization is then applied to learn the corresponding set of M + K weights, denoted M. Fig.
    Page 7, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention maximum entropy.

See all papers in Proc. ACL that mention maximum entropy.

Back to top.

probability distribution

Appears in 5 sentences as: probability distribution (5)
In Discriminative state tracking for spoken dialog systems
  1. The task is to assign a probability distribution over the G dialog state hypotheses, plus a meta-hypothesis which indicates that none of the G hypotheses is correct.
    Page 1, “Introduction”
  2. Also note that the dialog state tracker is not predicting the contents of the dialog state hypotheses; the dialog state hypotheses contents are given by some external process, and the task is to predict a probability distribution over them, where the probability assigned to a hypothesis indicates the probability that it is correct.
    Page 1, “Introduction”
  3. Dialog state tracking can be seen an analogous to assigning a probability distribution over items on an ASR N-best list given speech input and the recognition output, including the contents of the N-best list.
    Page 1, “Introduction”
  4. Another analogous task is assigning a probability distribution over a set of URLs given a search query and the URLs.
    Page 2, “Introduction”
  5. (1996)) models the conditional probability distribution of the label 3/ given features X, p(y|x) via an exponential model of the form:
    Page 5, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention probability distribution.

See all papers in Proc. ACL that mention probability distribution.

Back to top.

generative models

Appears in 3 sentences as: generative model (1) generative models (2)
In Discriminative state tracking for spoken dialog systems
  1. (2010); Thomson and Young (2010)) use generative models that capture how the SLU results are generated from hidden dialog states.
    Page 2, “Introduction”
  2. As an illustration, in Figure 1, a generative model might fail to assign the highest score to the correct hypothesis (61C) after the second turn.
    Page 2, “Introduction”
  3. In contrast to generative models , discriminative approaches to dialog state tracking directly predict the correct state hypothesis by leveraging discrim-inatively trained conditional models of the form (9(9) 2 P(g| f), where f are features extracted from various sources, e.g.
    Page 4, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention generative models.

See all papers in Proc. ACL that mention generative models.

Back to top.

model parameters

Appears in 3 sentences as: model parameters (2) models parameters (1)
In Discriminative state tracking for spoken dialog systems
  1. where ¢,(x, y) are feature functions jointly defined on features and labels, and A, are the model parameters .
    Page 5, “Generative state tracking”
  2. This formulation also decouples the number of models parameters (i.e.
    Page 6, “Generative state tracking”
  3. Second, model parameters in DISCIND are trained independently of competing hypotheses.
    Page 8, “Generative state tracking”

See all papers in Proc. ACL 2013 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.