Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
Johnson, Mark

Article Structure

Abstract

Adaptor grammars (Johnson et al., 2007b) are a nonparametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees.

Introduction

How humans acquire language is arguably the central issue in the scientific study of language.

From PCFGs to Adaptor Grammars

This section introduces adaptor grammars as an extension of PCFGs; for a more detailed exposition see Johnson et al.

Word segmentation with adaptor grammars

We now turn to linguistic applications of adaptor grammars, specifically, to models of unsupervised word segmentation.

Conclusion and future work

This paper has shown how adaptor grammars can be used to study a variety of different linguistic hy-

Topics

unigram

Appears in 25 sentences as: Unigram (3) unigram (27)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. Figure l: The unigram word adaptor grammar, which uses a unigram model to generate a sequence of words, where each word is a sequence of phonemes.
    Page 4, “Word segmentation with adaptor grammars”
  2. 3.1 Unigram word adaptor grammar
    Page 4, “Word segmentation with adaptor grammars”
  3. (2007a) presented an adaptor grammar that defines a unigram model of word segmentation and showed that it performs as well as the unigram DP word segmentation model presented by (Goldwater et al., 2006a).
    Page 4, “Word segmentation with adaptor grammars”
  4. The adaptor grammar that encodes a unigram word segmentation model shown in Figure 1.
    Page 4, “Word segmentation with adaptor grammars”
  5. Figure 2: The unigram word adaptor grammar of Figure 1 where regular expressions are expanded using new unadapted right—branchin g nonterminals.
    Page 4, “Word segmentation with adaptor grammars”
  6. Figure 3: A parse of the phonemic representation of “you want to see the book” produced by unigram word adaptor grammar of Figure 1.
    Page 4, “Word segmentation with adaptor grammars”
  7. The unigram word adaptor grammar generates parses such as the one shown in Figure 3.
    Page 4, “Word segmentation with adaptor grammars”
  8. (2007), a unigram word segmentation model tends to undersegment and misanalyse collocations as individual words.
    Page 4, “Word segmentation with adaptor grammars”
  9. This is presumably because the unigram model has no way to capture dependencies between words in collocations except to make the collocation into a single word.
    Page 4, “Word segmentation with adaptor grammars”
  10. 3.2 Unigram morphology adaptor grammar
    Page 4, “Word segmentation with adaptor grammars”
  11. Figure 4: The unigram morphology adaptor grammar, which generates each Sentence as a sequence of Words, and each Word as a Stem optionally followed by a Suffix.
    Page 5, “Word segmentation with adaptor grammars”

See all papers in Proc. ACL 2008 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

word segmentation

Appears in 23 sentences as: Word segmentation (1) word segmentation (27)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. We show that simultaneously learning syllable structure and collocations improves word segmentation accuracy compared to models that learn these independently.
    Page 1, “Introduction”
  2. This paper applies adaptor grammars to word segmentation and morphological acquisition.
    Page 2, “Introduction”
  3. We now turn to linguistic applications of adaptor grammars, specifically, to models of unsupervised word segmentation .
    Page 3, “Word segmentation with adaptor grammars”
  4. Table 1: Word segmentation f-score results for all models, as a function of DP concentration parameter oz.
    Page 4, “Word segmentation with adaptor grammars”
  5. Table 1 summarizes the word segmentation f-scores for all models described in this paper.
    Page 4, “Word segmentation with adaptor grammars”
  6. (2007a) presented an adaptor grammar that defines a unigram model of word segmentation and showed that it performs as well as the unigram DP word segmentation model presented by (Goldwater et al., 2006a).
    Page 4, “Word segmentation with adaptor grammars”
  7. The adaptor grammar that encodes a unigram word segmentation model shown in Figure 1.
    Page 4, “Word segmentation with adaptor grammars”
  8. With a = 1 and 04 = 10 we obtained a word segmentation f-score of 0.55.
    Page 4, “Word segmentation with adaptor grammars”
  9. (2007), a unigram word segmentation model tends to undersegment and misanalyse collocations as individual words.
    Page 4, “Word segmentation with adaptor grammars”
  10. This section investigates whether learning morphology together with word segmentation improves word segmentation accuracy.
    Page 4, “Word segmentation with adaptor grammars”
  11. Here we combine that adaptor grammar with the unigram word segmentation grammar to produce the adaptor grammar shown in Figure 4, which is designed to simultaneously learn both word segmentation and morphology.
    Page 5, “Word segmentation with adaptor grammars”

See all papers in Proc. ACL 2008 that mention word segmentation.

See all papers in Proc. ACL that mention word segmentation.

Back to top.

subtrees

Appears in 12 sentences as: subtrees (15) subtree’s (1)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. Adaptor grammars (Johnson et al., 2007b) are a nonparametric Bayesian extension of Probabilistic Context-Free Grammars (PCFGs) which in effect learn the probabilities of entire subtrees .
    Page 1, “Abstract”
  2. Second, we can generalize over arbitrary subtrees rather than local trees in much the way done in DOP or tree substitution grammar (Bod, 1998; J oshi, 2003), which leads to adaptor grammars.
    Page 2, “Introduction”
  3. Informally, the units of generalization of adaptor grammars are entire subtrees , rather than just local trees, as in PCFGs.
    Page 2, “Introduction”
  4. Just as in tree substitution grammars, each of these subtrees behaves as a new context-free rule that expands the subtree’s root node to its leaves, but unlike a tree substitution grammar, in which the subtrees are specified in advance, in an adaptor grammar the subtrees , as well as their probabilities, are learnt from the training data.
    Page 2, “Introduction”
  5. In order to make parsing and inference tractable we require the leaves of these subtrees to be terminals, as explained in section 2.
    Page 2, “Introduction”
  6. Thus adaptor grammars are simple models of structure learning, where the subtrees that constitute the units of generalization are in effect new context-free rules learnt during the inference process.
    Page 2, “Introduction”
  7. An adaptor grammar can be viewed as a kind of PCFG in which each subtree of each adapted nonterminal A E M is a potential rule, with its own probability, so an adaptor grammar is nonparametric if there are infinitely many possible adapted subtrees .
    Page 2, “From PCFGs to Adaptor Grammars”
  8. But any finite set of sample parses for any finite corpus can only involve a finite number of such subtrees , so the corresponding PCFG approximation only involves a finite number of rules, which permits us to build MCMC samplers for adaptor grammars.
    Page 2, “From PCFGs to Adaptor Grammars”
  9. this independence assumption by in effect learning the probability of the subtrees rooted in a specified subset M of the nonterminals known as the adapted nonterminals.
    Page 3, “From PCFGs to Adaptor Grammars”
  10. This DP “adapts” A’s PCFG distribution by moving mass from the infrequently to the frequently occuring subtrees .
    Page 3, “From PCFGs to Adaptor Grammars”
  11. Depending on the run, between 1, 100 and 1, 400 subtrees (i.e., new rules) were found for Word.
    Page 4, “Word segmentation with adaptor grammars”

See all papers in Proc. ACL 2008 that mention subtrees.

See all papers in Proc. ACL that mention subtrees.

Back to top.

f-score

Appears in 9 sentences as: f-score (10)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. We evaluated the f-score of the recovered word constituents (Goldwater et al., 2006b).
    Page 3, “Word segmentation with adaptor grammars”
  2. Table 1: Word segmentation f-score results for all models, as a function of DP concentration parameter oz.
    Page 4, “Word segmentation with adaptor grammars”
  3. With a = 1 and 04 = 10 we obtained a word segmentation f-score of 0.55.
    Page 4, “Word segmentation with adaptor grammars”
  4. This grammar recovers words with an f-score of only 0.46 with a = l or 04 = 10, which is considerably less accurate than the unigram model of section 3.1.
    Page 5, “Word segmentation with adaptor grammars”
  5. The unigram syllable grammar achieves a word segmentation f-score of 0.52 at 04 = 1, which is also lower than the unigram word grammar achieves.
    Page 6, “Word segmentation with adaptor grammars”
  6. With the DP concentration parameters 04 = 1000 we obtained a f-score of 0.76, which is approximately the same as the results reported by Goldwater et al.
    Page 6, “Word segmentation with adaptor grammars”
  7. This grammar achieves a word segmentation f-score of 0.73 at 04 = 100, which is much better than the unigram morphology grammar of section 3.2, but not as good as the collocation word grammar of the previous section.
    Page 7, “Word segmentation with adaptor grammars”
  8. “tell”), but they lower the word f-score considerably.
    Page 7, “Word segmentation with adaptor grammars”
  9. This grammar achieves a word segmentation f-score of 0.78 at 04 = 100, which is the highest f-score of any of the grammars investigated in this paper, including the collocation word grammar, which models collocations but not syllables.
    Page 7, “Word segmentation with adaptor grammars”

See all papers in Proc. ACL 2008 that mention f-score.

See all papers in Proc. ACL that mention f-score.

Back to top.

segmentation model

Appears in 6 sentences as: segmentation model (6)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. (2007a) presented an adaptor grammar that defines a unigram model of word segmentation and showed that it performs as well as the unigram DP word segmentation model presented by (Goldwater et al., 2006a).
    Page 4, “Word segmentation with adaptor grammars”
  2. The adaptor grammar that encodes a unigram word segmentation model shown in Figure 1.
    Page 4, “Word segmentation with adaptor grammars”
  3. (2007), a unigram word segmentation model tends to undersegment and misanalyse collocations as individual words.
    Page 4, “Word segmentation with adaptor grammars”
  4. It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance.
    Page 6, “Word segmentation with adaptor grammars”
  5. This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model .
    Page 6, “Word segmentation with adaptor grammars”
  6. We showed how adaptor grammars can implement a previously investigated model of unsupervised word segmentation, the unigram word segmentation model .
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention segmentation model.

See all papers in Proc. ACL that mention segmentation model.

Back to top.

hyperparameters

Appears in 4 sentences as: hyperparameters (4)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. We tied the Dirichlet Process concentration parameters a, and performed runs with 04 = 1, 10, 100 and 1000; apart from this, no attempt was made to optimize the hyperparameters .
    Page 4, “Word segmentation with adaptor grammars”
  2. It may be possible to correct this by “tuning” the grammar’s hyperparameters , but we did not attempt this here.
    Page 5, “Word segmentation with adaptor grammars”
  3. In this paper all of the hyperparameters 04A were tied and varied simultaneously, but it is desirable to learn these from data as well.
    Page 8, “Conclusion and future work”
  4. Just before the camera-ready version of this paper was due we developed a method for estimating the hyperparameters by putting a vague Gamma hyper-prior on each 04A and sampled using Metropolis-Hastings with a sequence of increasingly narrow Gamma proposal distributions, producing results for each model that are as good or better than the best ones reported in Table l.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

bigram

Appears in 3 sentences as: bigram (4)
In Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure
  1. It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance.
    Page 6, “Word segmentation with adaptor grammars”
  2. This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model.
    Page 6, “Word segmentation with adaptor grammars”
  3. We then investigated adaptor grammars that incorporate one additional kind of information, and found that modeling collocations provides the greatest improvement in word segmentation accuracy, resulting in a model that seems to capture many of the same interword dependencies as the bigram model of Goldwater et al.
    Page 8, “Conclusion and future work”

See all papers in Proc. ACL 2008 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.