Modelling function words improves unsupervised word segmentation
Johnson, Mark and Christophe, Anne and Dupoux, Emmanuel and Demuth, Katherine

Article Structure

Abstract

Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic “function words” at the beginnings and endings of collocations of (possibly multisyllabic) words.

Introduction

Over the past two decades psychologists have investigated the role that function words might play in human language acquisition.

Word segmentation with Adaptor Grammars

Perhaps the simplest word segmentation model is the unigram model, where utterances are modeled as sequences of words, and where each word is a sequence of segments (Brent, 1999; Goldwater et al., 2009).

Word segmentation results

This section presents results of running our Adaptor Grammar models on subsets of the Bernstein-Ratner (1987) corpus of child-directed English.

Are “function words” on the left or right periphery?

We have shown that a model that expects function words on the left periphery performs more accurate word segmentation on English, where function words do indeed typically occur on the left periphery, leaving open the question: how could a learner determine whether function words generally appear on the left or the right periphery of phrases in the language they are learning?

Conclusions and future work

This paper showed that the word segmentation accuracy of a state-of-the-art Adaptor Grammar model is significantly improved by extending it so that it explicitly models some properties of function words.

Topics

word segmentation

Appears in 24 sentences as: Word segmentation (1) word segmentation (23) word segmentations (2)
In Modelling function words improves unsupervised word segmentation
  1. Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic “function words” at the beginnings and endings of collocations of (possibly multisyllabic) words.
    Page 1, “Abstract”
  2. This modification improves unsupervised word segmentation on the standard Bernstein-Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case “function words”, setting a new state-of-the-art of 92.4% token f-score.
    Page 1, “Abstract”
  3. We do this by comparing two computational models of word segmentation which differ solely in the way that they model function words.
    Page 1, “Introduction”
  4. (1996) and Brent (1999) our word segmentation models identify word boundaries from unsegmented sequences of phonemes corresponding to utterances, effectively performing unsupervised learning of a lexicon.
    Page 1, “Introduction”
  5. a word segmentation model should segment this as ju want tu si 69 buk, which is the IPA representation of “you want to see the book”.
    Page 1, “Introduction”
  6. While absolute accuracy is not directly relevant to the main point of the paper, we note that the models that learn generalisations about function words perform unsupervised word segmentation at 92.5% token f-score on the standard Bernstein-Ratner (1987) corpus, which improves the previous state-of-the-art by more than 4%.
    Page 1, “Introduction”
  7. Section 2 describes the specific word segmentation models studied in this paper, and the way we extended them to capture certain properties of function words.
    Page 2, “Introduction”
  8. The word segmentation experiments are presented in section 3, and section 4 discusses how a learner could determine whether function words occur on the left-periphery or the right-periphery in the language they are learning.
    Page 2, “Introduction”
  9. As section 2 explains in more detail, word segmentation is such a case: words are composed of syllables and belong to phrases or collocations, and modelling this structure improves word segmentation accuracy.
    Page 3, “Introduction”
  10. We use the MCMC procedure here since this has been successfully applied to word segmentation problems in previous work (Johnson, 2008).
    Page 4, “Introduction”
  11. Perhaps the simplest word segmentation model is the unigram model, where utterances are modeled as sequences of words, and where each word is a sequence of segments (Brent, 1999; Goldwater et al., 2009).
    Page 4, “Word segmentation with Adaptor Grammars”

See all papers in Proc. ACL 2014 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

content words

Appears in 11 sentences as: content word (3) content words (9) “content words” (1) “content word” (1)
In Modelling function words improves unsupervised word segmentation
  1. Their experiments suggest that function words play a special role in the acquisition process: children learn function words before they learn the vast bulk of the associated content words , and they use function words to help identify context words.
    Page 1, “Introduction”
  2. Traditional descriptive linguistics distinguishes function words, such as determiners and prepositions, from content words , such as nouns and verbs, corresponding roughly to the distinction between functional categories and lexical categories of modern generative linguistics (Fromkin, 2001).
    Page 2, “Introduction”
  3. Function words differ from content words in at
    Page 2, “Introduction”
  4. 1. there are usually far fewer function word types than content word types in a language
    Page 2, “Introduction”
  5. 2. function word types typically have much higher token frequency than content word types
    Page 2, “Introduction”
  6. 5. each function word class is associated with specific content word classes (e.g., deter-
    Page 2, “Introduction”
  7. 6. semantically, content words denote sets of objects or events, while function words denote more complex relationships over the entities denoted by content words
    Page 2, “Introduction”
  8. 7. historically, the rate of innovation of function words is much lower than the rate of innovation of content words (i.e., function words are typically “closed class”, while content words are “open class”)
    Page 2, “Introduction”
  9. Crucially for our purpose, infants of this age were shown to exploit frequent function words to segment neighboring content words (Shi and Lepage, 2008; Hallé et al., 2008).
    Page 2, “Introduction”
  10. This means that “function words” are memoised independently of the “content words” that Word expands to; i.e., the model learns distinct “function word” and “content word” vocabularies.
    Page 5, “Word segmentation with Adaptor Grammars”
  11. Thus, the present model, initially aimed at segmenting words from continuous speech, shows three interesting characteristics that are also exhibited by human infants: it distinguishes between function words and content words (Shi and Werker, 2001), it allows learners to acquire at least some of the function words of their language (e. g. (Shi et al., 2006)); and furthermore, it may also allow them to start grouping together function words according to their category (Cauvet et al., 2014; Shi and Melancon, 2010).
    Page 7, “Word segmentation results”

See all papers in Proc. ACL 2014 that mention content words.

See all papers in Proc. ACL that mention content words.

Back to top.

f-score

Appears in 7 sentences as: f-score (8)
In Modelling function words improves unsupervised word segmentation
  1. This modification improves unsupervised word segmentation on the standard Bernstein-Ratner (1987) corpus of child-directed English by more than 4% token f-score compared to a model identical except that it does not special-case “function words”, setting a new state-of-the-art of 92.4% token f-score .
    Page 1, “Abstract”
  2. While absolute accuracy is not directly relevant to the main point of the paper, we note that the models that learn generalisations about function words perform unsupervised word segmentation at 92.5% token f-score on the standard Bernstein-Ratner (1987) corpus, which improves the previous state-of-the-art by more than 4%.
    Page 1, “Introduction”
  3. that achieves the best token f-score expects function words to appear at the left edge of phrases.
    Page 2, “Introduction”
  4. The starting point and baseline for our extension is the adaptor grammar with syllable structure phonotactic constraints and three levels of collo-cational structure (5-21), as prior work has found that this yields the highest word segmentation token f-score (Johnson and Goldwater, 2009).
    Page 5, “Word segmentation with Adaptor Grammars”
  5. f-score prec1s10n recall Baseline 0.872 0.918 0.956 + left FWs 0.924 0.935 0.990 + left + right FWs 0.912 0.957 0.953
    Page 6, “Word segmentation results”
  6. Figure 2 presents the standard token and lexicon (i.e., type) f-score evaluations for word segmentations proposed by these models (Brent, 1999), and Table 1 summarises the token and lexicon f-scores for the major models discussed in this paper.
    Page 6, “Word segmentation results”
  7. It is interesting to note that adding “function words” improves token f-score by more than 4%, corresponding to a 40% reduction in overall error rate.
    Page 6, “Word segmentation results”

See all papers in Proc. ACL 2014 that mention f-score.

See all papers in Proc. ACL that mention f-score.

Back to top.

segmentation models

Appears in 7 sentences as: segmentation model (3) segmentation models (4)
In Modelling function words improves unsupervised word segmentation
  1. Inspired by experimental psychological findings suggesting that function words play a special role in word learning, we make a simple modification to an Adaptor Grammar based Bayesian word segmentation model to allow it to learn sequences of monosyllabic “function words” at the beginnings and endings of collocations of (possibly multisyllabic) words.
    Page 1, “Abstract”
  2. (1996) and Brent (1999) our word segmentation models identify word boundaries from unsegmented sequences of phonemes corresponding to utterances, effectively performing unsupervised learning of a lexicon.
    Page 1, “Introduction”
  3. a word segmentation model should segment this as ju want tu si 69 buk, which is the IPA representation of “you want to see the book”.
    Page 1, “Introduction”
  4. Section 2 describes the specific word segmentation models studied in this paper, and the way we extended them to capture certain properties of function words.
    Page 2, “Introduction”
  5. Perhaps the simplest word segmentation model is the unigram model, where utterances are modeled as sequences of words, and where each word is a sequence of segments (Brent, 1999; Goldwater et al., 2009).
    Page 4, “Word segmentation with Adaptor Grammars”
  6. The next two subsections review the Adaptor Grammar word segmentation models presented in Johnson (2008) and Johnson and Goldwater (2009): section 2.1 reviews how phonotac-tic syllable-structure constraints can be expressed with Adaptor Grammars, while section 2.2 reviews how phrase-like units called “collocations” capture inter-word dependencies.
    Page 4, “Word segmentation with Adaptor Grammars”
  7. (2009) point out the detrimental effect that inter-word dependencies can have on word segmentation models that assume that the words of an utterance are independently generated.
    Page 5, “Word segmentation with Adaptor Grammars”

See all papers in Proc. ACL 2014 that mention segmentation models.

See all papers in Proc. ACL that mention segmentation models.

Back to top.

language acquisition

Appears in 6 sentences as: language acquisition (6)
In Modelling function words improves unsupervised word segmentation
  1. Over the past two decades psychologists have investigated the role that function words might play in human language acquisition .
    Page 1, “Introduction”
  2. The goal of this paper is to determine whether computational models of human language acquisition can provide support for the hypothesis that
    Page 1, “Introduction”
  3. function words are treated specially in human language acquisition .
    Page 1, “Introduction”
  4. Properties 1—4 suggest that function words might play a special role in language acquisition because they are especially easy to identify, while property 5 suggests that they might be useful for identifying lexical categories.
    Page 2, “Introduction”
  5. In addition, it is plausible that function words play a crucial role in children’s acquisition of more complex syntactic phenomena (Christophe et al., 2008; Demuth and McCullough, 2009), so it is interesting to investigate the roles they might play in computational models of language acquisition .
    Page 2, “Introduction”
  6. Of course this work only scratches the surface in terms of investigating the role of function words in language acquisition .
    Page 9, “Conclusions and future work”

See all papers in Proc. ACL 2014 that mention language acquisition.

See all papers in Proc. ACL that mention language acquisition.

Back to top.

subtrees

Appears in 5 sentences as: subtrees (5)
In Modelling function words improves unsupervised word segmentation
  1. They define distributions over the trees specified by a context-free grammar, but unlike probabilistic context-free grammars, they “learn” distributions over the possible subtrees of a user-specified set of “adapted” nonterminals.
    Page 2, “Introduction”
  2. set of parameters, if the set of possible subtrees of the adapted nonterminals is infinite).
    Page 3, “Introduction”
  3. Informally, Adaptor Grammars can be viewed as caching entire subtrees of the adapted nonterminals.
    Page 3, “Introduction”
  4. richer” behaviour causes the distribution of subtrees to follow a power-law (the power is specified by the ox parameter of the PYP).
    Page 4, “Introduction”
  5. Because Word is an adapted nonterminal, the adaptor grammar memoises Word subtrees , which corresponds to learning the phone sequences for the words of the language.
    Page 4, “Word segmentation with Adaptor Grammars”

See all papers in Proc. ACL 2014 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

segmentations

Appears in 4 sentences as: segmentations (4)
In Modelling function words improves unsupervised word segmentation
  1. 800 sample segmentations of each utterance.
    Page 6, “Word segmentation results”
  2. The most frequent segmentation in these 800 sample segmentations is the one we score in the evaluations below.
    Page 6, “Word segmentation results”
  3. Here we evaluate the word segmentations found by the “function word” Adaptor Grammar model described in section 2.3 and compare it to the baseline grammar with collocations and phonotactics from Johnson and Goldwater (2009).
    Page 6, “Word segmentation results”
  4. Figure 2 presents the standard token and lexicon (i.e., type) f-score evaluations for word segmentations proposed by these models (Brent, 1999), and Table 1 summarises the token and lexicon f-scores for the major models discussed in this paper.
    Page 6, “Word segmentation results”

See all papers in Proc. ACL 2014 that mention segmentations.

See all papers in Proc. ACL that mention segmentations.

Back to top.