A joint model of word segmentation and phonological variation for English word-final /t/-deletion
Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine

Article Structure

Abstract

Word-final /t/-deletion refers to a common phenomenon in spoken English where words such as /W€Sfl “west” are pronounced as [was] “wes” in certain contexts.

Introduction

Computational models of word segmentation try to solve one of the first problems language learners have to face: breaking an unsegmented stream of sound segments into individual words.

Background and related work

The work of Elsner et al.

The computational model

Our models build on the Unigram and the Bigram model introduced in Goldwater et al.

Experiments 4.1 The data

We are interested in how well our model handles /t/-deletion in real data.

Discussion

There are two interesting findings from our experiments.

Conclusion and outlook

We presented a joint model for word segmentation and the learning of phonological rule probabilities from a corpus of transcribed speech.

Topics

Bigram

Appears in 31 sentences as: Bigram (31) bigram (4)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.1
    Page 1, “Abstract”
  2. We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms.
    Page 1, “Introduction”
  3. Our models build on the Unigram and the Bigram model introduced in Goldwater et al.
    Page 2, “The computational model”
  4. Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).
    Page 2, “The computational model”
  5. Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.
    Page 3, “The computational model”
  6. The generative process mimics the intuitively plausible idea of generating underlying forms from some kind of syntactic model (here, a Bigram language model) and then mapping the underlying form to an observed surface-form through the application of a phonological rule component, here represented by the collection of rule probabilities pc.
    Page 3, “The computational model”
  7. Figure 2: Mathematical description of our joint Bigram model.
    Page 3, “The computational model”
  8. We refer the reader to their paper for additional details such as how to calculate the Bigram probabilities in Figure 4.
    Page 4, “The computational model”
  9. Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context.
    Page 6, “Experiments 4.1 The data”
  10. For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops
    Page 6, “Experiments 4.1 The data”
  11. Note how the Unigram model always suffers in the LEARN- p condition whereas the Bigram model’s performance is actually best for LEARN- p in the left-right setting.
    Page 6, “Experiments 4.1 The data”

See all papers in Proc. ACL 2013 that mention Bigram.

See all papers in Proc. ACL that mention Bigram.

Back to top.

Unigram

Appears in 22 sentences as: Unigram (22) unigram (1)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms.
    Page 1, “Introduction”
  2. Our models build on the Unigram and the Bigram model introduced in Goldwater et al.
    Page 2, “The computational model”
  3. Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).
    Page 2, “The computational model”
  4. Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context.
    Page 6, “Experiments 4.1 The data”
  5. For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops
    Page 6, “Experiments 4.1 The data”
  6. Unigram
    Page 6, “Experiments 4.1 The data”
  7. Note how the Unigram model always suffers in the LEARN- p condition whereas the Bigram model’s performance is actually best for LEARN- p in the left-right setting.
    Page 6, “Experiments 4.1 The data”
  8. empirical 0.62 0.42 0.36 0.23 0.15 0.07 Unigram 0.41 0.33 0.17 0.07 0.05 0.00 Bigram 0.70 0.58 0.43 0.17 0.13 0.06
    Page 6, “Experiments 4.1 The data”
  9. Note how the Unigram model severely underestimates and the Bigram model slightly overestimates the probabilities.
    Page 6, “Experiments 4.1 The data”
  10. In fact, comparing the inferred probabilities to the “ground truth” indicates that the Bigram model estimates the true probabilities more accurately than the Unigram model, as illustrated in Table 3 for the left-right setting.
    Page 6, “Experiments 4.1 The data”
  11. The Bigram model somewhat overestimates the probability for all postconsonantal contexts but the Unigram model severely underestimates the probability of /t/-deletion across all contexts.
    Page 6, “Experiments 4.1 The data”

See all papers in Proc. ACL 2013 that mention Unigram.

See all papers in Proc. ACL that mention Unigram.

Back to top.

word segmentation

Appears in 19 sentences as: Word segmentation (1) word segmentation (19) word segmention (1)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. Current computational models of unsupervised word segmentation usually assume idealized input that is devoid of these kinds of variation.
    Page 1, “Abstract”
  2. We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model.
    Page 1, “Abstract”
  3. We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s.
    Page 1, “Abstract”
  4. Computational models of word segmentation try to solve one of the first problems language learners have to face: breaking an unsegmented stream of sound segments into individual words.
    Page 1, “Introduction”
  5. This permits us to develop a joint generative model for both word segmentation and variation which we plan to extend to handle more phenomena in future work.
    Page 2, “Background and related work”
  6. They do not aim for a joint model that also handles word segmentation , however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data.
    Page 2, “Background and related work”
  7. Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation .
    Page 3, “The computational model”
  8. Bayesian word segmentation models try to compactly represent the observed data in terms of a small set of units (word types) and a short analysis (a small number of word tokens).
    Page 4, “The computational model”
  9. This allows us to investigate the strength of the statistical signal for the deletion rule without confounding it with the word segmentation performance, and to see how the different contextual settings uniform, right and left-right handle the data.
    Page 6, “Experiments 4.1 The data”
  10. Table 5: /t/-recovery F-scores when performing joint word segmention in the left-right setting, averaged over two runs (standard errors less than 2%).
    Page 7, “Experiments 4.1 The data”
  11. Finally, we are also interested to learn how well we can do word segmentation and underlying /t/-recovery jointly.
    Page 7, “Experiments 4.1 The data”

See all papers in Proc. ACL 2013 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

joint model

Appears in 7 sentences as: joint model (7)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. We extend a nonparametric model of word segmentation by adding phonological rules that map from underlying forms to surface forms to produce a mathematically well-defined joint model as a first step towards handling variation and segmentation in a single model.
    Page 1, “Abstract”
  2. We analyse how our model handles /t/-deletion on a large corpus of transcribed speech, and show that the joint model can perform word segmentation and recover underlying /t/s.
    Page 1, “Abstract”
  3. However, as they point out, combining the segmentation and the variation model into one joint model is not straightforward and usual inference procedures are infeasible, which requires the use of several heuristics.
    Page 2, “Background and related work”
  4. They do not aim for a joint model that also handles word segmentation, however, and rather than training their model on an actual corpus, they evaluate on constructed lists of examples, mimicking frequencies of real data.
    Page 2, “Background and related work”
  5. Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.
    Page 3, “The computational model”
  6. (2009) segmentation models, exact inference is infeasible for our joint model .
    Page 4, “The computational model”
  7. We presented a joint model for word segmentation and the learning of phonological rule probabilities from a corpus of transcribed speech.
    Page 8, “Conclusion and outlook”

See all papers in Proc. ACL 2013 that mention joint model.

See all papers in Proc. ACL that mention joint model.

Back to top.

F-score

Appears in 5 sentences as: F-score (6)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. We evaluate the model in terms of F-score , the harmonic mean of recall (the fraction of underlying /t/s the model correctly recovered) and precision (the fraction of underlying /t/s the model predicted that were correct).
    Page 6, “Experiments 4.1 The data”
  2. Looking at the segmentation performance this isn’t too surprising: the Unigram model’s poorer token F-score , the standard measure of segmentation performance on a word token level, suggests that it misses many more boundaries than the Bigram model to begin with and, consequently, can’t recover any potential underlying /t/s at these boundaries.
    Page 7, “Experiments 4.1 The data”
  3. The generally worse performance of handling variation as measured by /t/-recovery F-score when performing joint segmentation is consistent with the finding of Elsner et al.
    Page 7, “Experiments 4.1 The data”
  4. We find that our Bigram model reaches 77% /t/-recovery F-score when run with knowledge of true word-boundaries and when it can make use of both the preceeding and the following phonological context, and that unlike the Unigram model it is able to learn the probability of /t/-deletion in different contexts.
    Page 8, “Conclusion and outlook”
  5. When performing joint word segmentation on the Buckeye corpus, our Bigram model reaches around above 55% F-score for recovering deleted /t/s with a word segmentation F-score of around 72% which is 2% better than running a Bigram model that does not model /t/-deletion.
    Page 8, “Conclusion and outlook”

See all papers in Proc. ACL 2013 that mention F-score.

See all papers in Proc. ACL that mention F-score.

Back to top.

segmentation model

Appears in 5 sentences as: segmentation model (4) segmentation models (2)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. Bayesian word segmentation models try to compactly represent the observed data in terms of a small set of units (word types) and a short analysis (a small number of word tokens).
    Page 4, “The computational model”
  2. (2009) segmentation models , exact inference is infeasible for our joint model.
    Page 4, “The computational model”
  3. To give an impression of the impact of /t/-deletion, we also report numbers for running only the segmentation model on the Buckeye data with no deleted /t/s and on the data with deleted /t/s.
    Page 7, “Experiments 4.1 The data”
  4. Also note that in the GOLD- p condition, our joint Bigram model performs almost as well on data with /t/-deletions as the word segmentation model on data that includes no variation at all.
    Page 7, “Experiments 4.1 The data”
  5. NO- p are scores for running just the word segmentation model with no /t/-deletion rule on the data that includes /t/-deletion, NO-VAR for running just the word segmentation model on the data with no /t/-deletions.
    Page 8, “Discussion”

See all papers in Proc. ACL 2013 that mention segmentation model.

See all papers in Proc. ACL that mention segmentation model.

Back to top.

generative model

Appears in 4 sentences as: generative model (3) generative models (1)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. They propose a pipeline architecture involving two separate generative models , one for word-segmentation and one for phonological variation.
    Page 2, “Background and related work”
  2. This permits us to develop a joint generative model for both word segmentation and variation which we plan to extend to handle more phenomena in future work.
    Page 2, “Background and related work”
  3. One of the advantages of an explicitly defined generative model such as ours is that it is straightforward to gradually extend it by adding more cues, as we point out in the discussion.
    Page 4, “The computational model”
  4. A major advantage of our generative model is the ease and transparency with which its assumptions can be modified and extended.
    Page 8, “Conclusion and outlook”

See all papers in Proc. ACL 2013 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

generative process

Appears in 3 sentences as: generative process (3)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. This generative process is repeated for each utterance 2', leading to multiple utterances of the form Um, .
    Page 2, “The computational model”
  2. The generative process mimics the intuitively plausible idea of generating underlying forms from some kind of syntactic model (here, a Bigram language model) and then mapping the underlying form to an observed surface-form through the application of a phonological rule component, here represented by the collection of rule probabilities pc.
    Page 3, “The computational model”
  3. Using a more realistic generative process for the underlying forms, for example an Adaptor Grammar (Johnson et al., 2007), could address this shortcoming in future work without changing the overall architecture of the model although novel inference algorithms might be required.
    Page 8, “Discussion”

See all papers in Proc. ACL 2013 that mention generative process.

See all papers in Proc. ACL that mention generative process.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampler (1) Gibbs sampling (2)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. ,8“, for unknown n. A major insight in Goldwater’s work is that rather than sampling over the latent variables in the model directly (the number of which we don’t even know), we can instead perform Gibbs sampling over a set of boundary variables (91, .
    Page 4, “The computational model”
  2. Figure 5: The relation between the observed sequence of segments (bottom), the boundary variables b1,...,bIW|_1 the Gibbs sampler operates over (in squares), the latent sequence of surface forms and the latent sequence of underlying forms.
    Page 5, “The computational model”
  3. To test our Gibbs sampling inference procedure, we ran it on artificial data generated according to the model itself.
    Page 6, “Experiments 4.1 The data”

See all papers in Proc. ACL 2013 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

graphical model

Appears in 3 sentences as: graphical model (3)
In A joint model of word segmentation and phonological variation for English word-final /t/-deletion
  1. Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).
    Page 2, “The computational model”
  2. Figure 2 gives the mathematical description of the graphical model and Table 1 provides a key to the variables of our model.
    Page 2, “The computational model”
  3. Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.
    Page 3, “The computational model”

See all papers in Proc. ACL 2013 that mention graphical model.

See all papers in Proc. ACL that mention graphical model.

Back to top.