SciSurf: Index of 'The effect of non-tightness on Bayesian estimation of PCFGs'

The effect of non-tightness on Bayesian estimation of PCFGs

Cohen, Shay B. and Johnson, Mark

Published in Proc. ACL, 2013

Article Structure

Abstract

Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one).

Introduction

Probabilistic Context-Free Grammars (PCFGs) play a special role in computational linguistics because they are perhaps the simplest probabilistic models of hierarchical structures.

PCFGs and tightness

Let G = (T, N, S, R) be a Context-Free Grammar in Chomsky normal form with no useless productions, where T is a finite set of terminal symbols,

Bayesian inference for PCFGs

The goal of Bayesian inference for PCFGs is to infer a posterior distribution over the rule probability vectors 9 given observed data D. This posterior distribution is obtained by combining the likelihood P(D | 6)) with a prior distribution P(@) over 9 using Bayes Rule.

Topics

hyperparameters

Appears in 9 sentences as: hyperparameters (9)

In The effect of non-tightness on Bayesian estimation of PCFGs

Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters 80.
Page 5, “Bayesian inference for PCFGs”
Result: A vector of parameters 8 repeat draw 0 from products of Dirichlet with i hyperparameters oz + f (t)
Page 5, “Bayesian inference for PCFGs”
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous rule parameters 0 .
Page 5, “Bayesian inference for PCFGs”
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters (90.
Page 6, “Bayesian inference for PCFGs”
Result: A vector of parameters 8 draw 8 from products of Dirichlet with hyperparameters oz + f (t) return 8 Algorithm 3: An algorithm for generating sam-
Page 6, “Bayesian inference for PCFGs”
Proposition 2 Let P(@|Ot) be a prior with hyperparameters 04 over the parameters of G such that P is conjugate to the grammar likelihood.
Page 6, “Bayesian inference for PCFGs”
Algorithm 1 samples from a product of Dirichlets distribution with hyperparameters 04 + f (t) repeatedly, each time checking and rejecting the sample until we obtain a tight PCFG.
Page 6, “Bayesian inference for PCFGs”
The more mass the Dirichlet distribution with hyperparameters 04 + f (t) puts on non-tight PCFGs, the more rejections will happen.
Page 6, “Bayesian inference for PCFGs”
Input: Grammar G, vector of hyperparameters a, vector of strings w = (7.01, .
Page 7, “Bayesian inference for PCFGs”

See all papers in Proc. ACL 2013 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

parse trees

Appears in 8 sentences as: parse trees (8)

In The effect of non-tightness on Bayesian estimation of PCFGs

Cognitively it is implausible that children can perceive the parse trees of the language they are learning, but it is more reasonable to assume that they can obtain the terminal strings or yield of these trees.
Page 1, “Introduction”
(2007) proposed MCMC samplers for the posterior distribution over rule probabilities and the parse trees of the training data strings.
Page 1, “Introduction”
timates are always tight for both the supervised case (where the input consists of parse trees ) and the unsupervised case (where the input consists of yields or terminal strings).
Page 2, “Introduction”
In the supervised setting the data D consists of a corpus of parse trees D = (t1, .
Page 3, “Bayesian inference for PCFGs”
Ignoring issues of tightness for the moment and setting P(t | 6)) = Me (If) , this means that in the supervised setting the posterior distribution P(@ | 13,04) given a set of parse trees 1: = (t1, .
Page 5, “Bayesian inference for PCFGs”
The algorithms we give here are based on their Gibbs sampler, which in each iteration first samples parse trees
Page 6, “Bayesian inference for PCFGs”
We compute the posterior distribution over parse trees for the string 21) = a cm.
Page 7, “Bayesian inference for PCFGs”
The grammar generates three parse trees for wl, namely:
Page 7, “Bayesian inference for PCFGs”

See all papers in Proc. ACL 2013 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Gibbs sampler

Appears in 5 sentences as: Gibbs sampler (5)

In The effect of non-tightness on Bayesian estimation of PCFGs

We show how to modify the Gibbs sampler described by Johnson et al.
Page 2, “Introduction”
Perhaps surprisingly, we show that Gibbs sampler as defined by Johnson et al.
Page 2, “Introduction”
The algorithms we give here are based on their Gibbs sampler , which in each iteration first samples parse trees
Page 6, “Bayesian inference for PCFGs”
1—3) for P(@ | 13,04) into the generic Gibbs sampler framework of Johnson et al.
Page 7, “Bayesian inference for PCFGs”
Figure 1 plots the density of F1 scores (compared to the gold standard) resulting from the Gibbs sampler , using all three approaches.
Page 8, “Bayesian inference for PCFGs”

See all papers in Proc. ACL 2013 that mention Gibbs sampler.

See all papers in Proc. ACL that mention Gibbs sampler.

grammar induction

Appears in 4 sentences as: grammar induction (4)

In The effect of non-tightness on Bayesian estimation of PCFGs

9 Empirical effects of the three approaches in unsupervised grammar induction
Page 8, “Bayesian inference for PCFGs”
In this section we present experiments using the three samplers just described in an unsupervised grammar induction problem.
Page 8, “Bayesian inference for PCFGs”
Our goal here is not to improve the state-of-the-art in unsupervised grammar induction , but to try to measure empirical differences in the estimates produced by the three different approaches to tightness just described.
Page 8, “Bayesian inference for PCFGs”
We described samplers for the supervised and unsupervised settings for each of these approaches, and applied them to an unsupervised grammar induction problem.
Page 9, “Bayesian inference for PCFGs”

Topics

Topics

Article Structure

Abstract

Introduction

PCFGs and tightness

Bayesian inference for PCFGs

Topics

hyperparameters

parse trees

Gibbs sampler

grammar induction

TER