A Nonparametric Bayesian Approach to Acoustic Model Discovery
Lee, Chia-ying and Glass, James

Article Structure

Abstract

We investigate the problem of acoustic modeling in which prior language-specific knowledge and transcribed data are unavailable.

Introduction

Acoustic models are an indispensable component of speech recognizers.

Related Work

Unsupervised Sub-word Modeling We follow the general guideline used in (Lee et al., 1988; Garcia and Gish, 2006; Chan and Lee, 2011) and approach the problem of unsupervised acoustic modeling by solving three sub-problems of the task: segmentation, clustering and modeling each cluster.

Problem Formulation

The goal of our model, given a set of spoken utterances, is to jointly learn the following:

Model

We aim to discover and model a set of sub-word units that represent the spoken data.

Inference

We employ Gibbs sampling (Gelman et al., 2004) to approximate the posterior distribution of the hidden variables in our model.

Experimental Setup

To the best of our knowledge, there are no standard corpora for evaluating unsupervised methods for acoustic modeling.

Results

Fig.

Conclusion

We present a Bayesian unsupervised approach to the problem of acoustic modeling.

Topics

feature vectors

Appears in 12 sentences as: feature vector (5) feature vectors (7)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. 1 illustrates how the speech signal of a single word utterance banana is converted to a sequence of feature vectors to £13711.
    Page 3, “Problem Formulation”
  2. Segment (p;- k) We define a segment to be composed of feature vectors between two boundary frames.
    Page 3, “Problem Formulation”
  3. Hidden State (8%) Since we assume the observed data are generated by HMMs, each feature vector , 30%, has an associated hidden state index.
    Page 3, “Problem Formulation”
  4. Mixture ID (mi) Similarly, each feature vector is assumed to be emitted by the state GMM it belongs to.
    Page 3, “Problem Formulation”
  5. Given the cluster label, choose a hidden state for each feature vector :13; in the segment.
    Page 4, “Model”
  6. Use the chosen Gaussian mixture to generate the observed feature vector
    Page 4, “Model”
  7. 2, where the shaded circle denotes the observed feature vectors , and the squares denote the hyperparameters of the priors used in our model.
    Page 4, “Model”
  8. For example, the boundary variables deterministically construct the duration of each segment, d, which in turn sets the number of feature vectors that should be generated for a segment.
    Page 4, “Model”
  9. For each of the remaining feature vectors in
    Page 5, “Inference”
  10. Mixture ID (mt) For each feature vector in a segment, given the cluster label CM and the hidden state index st, the derivation of the conditional posterior probability of its mixture ID is straightforward:
    Page 5, “Inference”
  11. where mm is the set of mixture IDs of feature vectors that belong to state 8 of HMM c. The mth entry of fl’ is fl + themcqs 6(mt, m), where we use
    Page 5, “Inference”

See all papers in Proc. ACL 2012 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

hyperparameters

Appears in 7 sentences as: hyperparameter (1) Hyperparameters (1) hyperparameters (6)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. 2, where the shaded circle denotes the observed feature vectors, and the squares denote the hyperparameters of the priors used in our model.
    Page 4, “Model”
  2. We use P - - - ) to denote a conditional posterior probability given observed data, all the other variables, and hyperparameters for the model.
    Page 5, “Inference”
  3. The conjugate prior we use for the two variables is a normal-Gamma distribution with hyperparameters ,uo, 14:0, a0 and fig (Murphy, 2007).
    Page 6, “Inference”
  4. Assume we use a symmetric Dirichlet distribution with a positive hyperparameter
    Page 6, “Inference”
  5. Table 1: The values of the hyperparameters of our model, where [id and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix of training data.
    Page 8, “Experimental Setup”
  6. Hyperparameters and Training Iterations The values of the hyperparameters of our model are shown in Table l, where Md and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix computed from training data.
    Page 8, “Experimental Setup”
  7. 6In the future, we plan to extend the model and infer the values of these hyperparameters from data directly.
    Page 8, “Results”

See all papers in Proc. ACL 2012 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

generative process

Appears in 5 sentences as: generative process (5)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. In the next section, we show the generative process our model uses to generate the observed data.
    Page 3, “Problem Formulation”
  2. Here, we describe the generative process our model uses to generate the observed utterances and present the corresponding graphical model.
    Page 4, “Model”
  3. For clarity, we assume that the values of the boundary variables bi are given in the generative process .
    Page 4, “Model”
  4. The generative process indicates that our model ignores utterance boundaries and views the entire data as concatenated spoken segments.
    Page 4, “Model”
  5. The graphical model representing this generative process is shown in Fig.
    Page 4, “Model”

See all papers in Proc. ACL 2012 that mention generative process.

See all papers in Proc. ACL that mention generative process.

Back to top.

Gibbs sampling

Appears in 5 sentences as: Gibbs sampler (2) Gibbs sampling (3)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. We implement the inference process using Gibbs sampling .
    Page 1, “Introduction”
  2. We employ Gibbs sampling (Gelman et al., 2004) to approximate the posterior distribution of the hidden variables in our model.
    Page 4, “Inference”
  3. To apply Gibbs sampling to our problem, we need to derive the conditional posterior distributions of each hidden variable of the model.
    Page 4, “Inference”
  4. 2, the Gibbs sampler can draw a new value for CM by sampling from the normalized distribution.
    Page 5, “Inference”
  5. Our Gibbs sampler hypothesizes bt’s value by sampling from the normalized distribution of Eq.
    Page 6, “Inference”

See all papers in Proc. ACL 2012 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

latent variables

Appears in 5 sentences as: latent variables (5)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. In contrast to the previous methods, we approach the problem by modeling the three sub-problems as well as the unknown set of sub-word units as latent variables in one nonparametric Bayesian model.
    Page 1, “Introduction”
  2. For the domain our problem is applied to, our model has to include more latent variables and is more complex.
    Page 2, “Related Work”
  3. We model the three subtasks as latent variables in our approach.
    Page 2, “Problem Formulation”
  4. In this section, we describe the observed data, latent variables , and auxiliary variables
    Page 2, “Problem Formulation”
  5. In the next section, we show how to infer the value of each of the latent variables in Fig.
    Page 4, “Model”

See all papers in Proc. ACL 2012 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.

F-score

Appears in 4 sentences as: F-score (4)
In A Nonparametric Bayesian Approach to Acoustic Model Discovery
  1. the-art unsupervised method and improves the relative F-score by 18.8 points (Dusan and Rabiner, 2006).
    Page 2, “Introduction”
  2. We follow the suggestion of (Scharenborg et al., 2010) and use a 20-ms tolerance window to compute recall, precision rates and F-score of the segmentation our model proposed for TIMIT’s training set.
    Page 7, “Experimental Setup”
  3. unit(%) Recall Precision F-score Dusan (2006) 75.2 66.8 70.8 Qiao et al.
    Page 9, “Results”
  4. When compared to the baseline in which the number of phone boundaries in each utterance was also unknown (Dusan and Rabiner, 2006), our model outperforms in both recall and precision, improving the relative F-score by 18.8%.
    Page 9, “Results”

See all papers in Proc. ACL 2012 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.