Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob

Article Structure

Abstract

During early language acquisition, infants must learn both a lexicon and a model of phonetics that explains how lexical items can vary in pronunciation—for instance “the” might be realized as [6i] or [69].

Introduction

Infants acquiring their first language confront two difficult cognitive problems: building a lexicon of word forms, and learning basic phonetics and phonology.

Related work

Our work is inspired by the lexical-phonetic model of Feldman et al.

Lexical-phonetic model

Our lexical-phonetic model is defined using the standard noisy channel framework: first a sequence of intended word tokens is generated using a language model, and then each token is transformed by a probabilistic finite-state transducer to produce the observed surface sequence.

Inference

Global optimization of the model posterior is difficult; instead we use Viterbi EM (Spitkovsky et al., 2010; Allahverdyan and Galstyan, 2011).

Experiments

5.1 Dataset

Conclusion

We have presented a noisy-channel model that simultaneously learns a lexicon, a bigram language model, and a model of phonetic variation, while using only the noisy surface forms as training data.

Topics

language model

Appears in 12 sentences as: language model (11) language modeling (1)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
    Page 1, “Abstract”
  2. Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model ).
    Page 2, “Introduction”
  3. Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
    Page 2, “Introduction”
  4. But unlike speech recognition, we have no (intended-form, surface-form) training pairs to train the phonetic model, nor even a dictionary of intended-form strings to train the language model .
    Page 2, “Introduction”
  5. Instead, we initialize the noise model using feature weights based on universal linguistic principles (e. g., a surface phone is likely to share articulatory features with the intended phone) and use a bootstrapping process to iteratively infer the intended forms and retrain the language model and noise model.
    Page 2, “Introduction”
  6. In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model .
    Page 2, “Related work”
  7. Our lexical-phonetic model is defined using the standard noisy channel framework: first a sequence of intended word tokens is generated using a language model , and then each token is transformed by a probabilistic finite-state transducer to produce the observed surface sequence.
    Page 3, “Lexical-phonetic model”
  8. The language modeling term relating to the intended string again factors into multiple components.
    Page 5, “Inference”
  9. Because neither the transducer nor the language model are perfect models of the true distribution, they can have incompatible dynamic ranges.
    Page 5, “Inference”
  10. 3The transducer scores can be cached since they depend only on surface forms, but the language model scores cannot.
    Page 6, “Inference”
  11. Nonetheless, it represents phonetic variability more realistically than the Bernstein-Ratner—Brent corpus, while still maintaining the lexical characteristics of infant-directed speech (as compared to the Buckeye corpus, with its much larger vocabulary and more complex language model ).
    Page 7, “Experiments”

See all papers in Proc. ACL 2012 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

gold standard

Appears in 6 sentences as: gold standard (6)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. We test the model using a corpus of child-directed speech with realistic phonetic variation and either gold standard or automatically induced word boundaries.
    Page 1, “Abstract”
  2. Our main contribution is a joint lexical-phonetic model that infers intended forms from segmented surface forms; we test the system using input with either gold standard word boundaries or boundaries induced by an existing unsupervised segmentation model (Goldwater et al., 2009).
    Page 2, “Introduction”
  3. In our first set of experiments we evaluate how well our system clusters together surface forms derived from the same intended form, assuming gold standard word boundaries.
    Page 7, “Experiments”
  4. We do not evaluate the induced intended forms directly against the gold standard intended forms—we want to evaluate cluster memberships and not labels.
    Page 7, “Experiments”
  5. Instead we compute a one-to-one mapping between our induced lexical items and the gold standard , maximizing the agreement between the two (Haghighi and Klein, 2006).
    Page 7, “Experiments”
  6. Whether trained using gold standard or automatically induced word boundaries, the model recovers lexical items more effectively than a system that assumes no phonetic variability; moreover, the use of word-level context is key to the model’s success.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention gold standard.

See all papers in Proc. ACL that mention gold standard.

Back to top.

word segmentation

Appears in 5 sentences as: word segmentation (5)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. For example, many models of word segmentation implicitly or explicitly build a lexicon while segmenting the input stream of phonemes into word tokens; in nearly all cases the phonemic input is created from an orthographic transcription using a phonemic dictionary, thus abstracting away from any phonetic variability (Brent, 1999; Venkataraman, 2001; Swingley, 2005; Goldwater et al., 2009, among others).
    Page 1, “Introduction”
  2. A final line of related work is on word segmentation .
    Page 2, “Related work”
  3. In addition to the models mentioned in Section 1, which use phonemic input, a few models of word segmentation have been tested using phonetic input (Fleck, 2008; Rytting, 2007; Daland and Pierrehum—bert, 2010).
    Page 2, “Related work”
  4. This version, which contains 9790 utterances (33399 tokens, 1321 types), is now standard for word segmentation , but contains no phonetic variability.
    Page 6, “Experiments”
  5. As a simple extension of our model to the case of unknown word boundaries, we interleave it with an existing model of word segmentation , olpseg (Gold-
    Page 8, “Experiments”

See all papers in Proc. ACL 2012 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

word-level

Appears in 5 sentences as: word-level (5)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model).
    Page 2, “Introduction”
  2. In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model.
    Page 2, “Related work”
  3. Here, we use a naturalistic corpus, demonstrating that lexical-phonetic learning is possible in this more general setting and that word-level context information is important for doing so.
    Page 2, “Related work”
  4. It is the first model of lexical-phonetic acquisition to include word-level context and to be tested on an infant-directed corpus with realistic phonetic variability.
    Page 8, “Conclusion”
  5. Whether trained using gold standard or automatically induced word boundaries, the model recovers lexical items more effectively than a system that assumes no phonetic variability; moreover, the use of word-level context is key to the model’s success.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention word-level.

See all papers in Proc. ACL that mention word-level.

Back to top.

bigram

Appears in 4 sentences as: bigram (4)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model).
    Page 2, “Introduction”
  2. Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
    Page 2, “Introduction”
  3. In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model.
    Page 2, “Related work”
  4. We have presented a noisy-channel model that simultaneously learns a lexicon, a bigram language model, and a model of phonetic variation, while using only the noisy surface forms as training data.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

log-linear

Appears in 4 sentences as: log-linear (4)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
    Page 1, “Abstract”
  2. Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
    Page 2, “Introduction”
  3. (2008), we parameterize these distributions with a log-linear model.
    Page 4, “Lexical-phonetic model”
  4. In modern phonetics and phonology, these generalizations are usually expressed as Optimality Theory constraints; log-linear models such as ours have previously been used to implement stochas-
    Page 4, “Lexical-phonetic model”

See all papers in Proc. ACL 2012 that mention log-linear.

See all papers in Proc. ACL that mention log-linear.

Back to top.

log-linear model

Appears in 4 sentences as: log-linear model (3) log-linear models (1)
In Bootstrapping a Unified Model of Lexical and Phonetic Acquisition
  1. We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
    Page 1, “Abstract”
  2. Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
    Page 2, “Introduction”
  3. (2008), we parameterize these distributions with a log-linear model .
    Page 4, “Lexical-phonetic model”
  4. In modern phonetics and phonology, these generalizations are usually expressed as Optimality Theory constraints; log-linear models such as ours have previously been used to implement stochas-
    Page 4, “Lexical-phonetic model”

See all papers in Proc. ACL 2012 that mention log-linear model.

See all papers in Proc. ACL that mention log-linear model.

Back to top.