Spectral Learning of Latent-Variable PCFGs
Cohen, Shay B. and Stratos, Karl and Collins, Michael and Foster, Dean P. and Ungar, Lyle

Article Structure

Abstract

We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006).

Introduction

Statistical models with hidden or latent variables are of great importance in natural language processing, speech, and many other fields.

Related Work

For work on L-PCFGs using the EM algorithm, see Petrov et al.

Notation

Given a matrix A or a vector v, we write AT or v for the associated transpose.

L-PCFGs: Basic Definitions

This section gives a definition of the L-PCFG formalism used in this paper.

Tensor Form of the Inside-Outside Algorithm

Given an L-PCFG, two calculations are central:

Estimating the Tensor Model

A crucial result is that it is possible to directly estimate parameters C“"" C, c313: and c; that satisfy the conditions in theorem 1, from a training sample consisting of s-trees (i.e., trees where hidden variables are unobserved).

Deriving Empirical Estimates

Figure 4 shows an algorithm that derives estimates of the quantities in Eqs 2, 3, and 4.

Discussion

There are several potential applications of the method.

Proofs

This section gives proofs of theorems l and 2.

Topics

learning algorithm

Appears in 5 sentences as: learning algorithm (3) learning algorithms (2)
In Spectral Learning of Latent-Variable PCFGs
  1. We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006).
    Page 1, “Abstract”
  2. Recent work has introduced polynomial-time learning algorithms (and consistent estimation methods) for two important cases of hidden-variable models: Gaussian mixture models (Dasgupta, 1999; Vempala and Wang, 2004) and hidden Markov models (Hsu et al., 2009).
    Page 1, “Introduction”
  3. (2011) consider spectral learning algorithms of tree-structured directed bayes nets.
    Page 2, “Related Work”
  4. We now state a PAC-style theorem for the learning algorithm .
    Page 6, “Deriving Empirical Estimates”
  5. Figure 4: The spectral learning algorithm .
    Page 7, “Proofs”

See all papers in Proc. ACL 2012 that mention learning algorithm.

See all papers in Proc. ACL that mention learning algorithm.

Back to top.

feature vectors

Appears in 3 sentences as: feature vector (1) feature vectors (2)
In Spectral Learning of Latent-Variable PCFGs
  1. We assume a function u that maps outside trees 0 to feature vectors 2M0) 6 Rd].
    Page 5, “Estimating the Tensor Model”
  2. For example, the feature vector might track the rule directly above the node in question, the word following the node in question, and so on.
    Page 5, “Estimating the Tensor Model”
  3. We also assume a function gb that maps inside trees 75 to feature vectors gb(t) 6 Rd.
    Page 5, “Estimating the Tensor Model”

See all papers in Proc. ACL 2012 that mention feature vectors.

See all papers in Proc. ACL that mention feature vectors.

Back to top.

SVD

Appears in 3 sentences as: SVD (3)
In Spectral Learning of Latent-Variable PCFGs
  1. These algorithms use spectral methods: that is, algorithms based on eigenvector decompositions of linear systems, in particular singular value decomposition ( SVD ).
    Page 1, “Introduction”
  2. The first step is to take an SVD of the training examples, followed by a projection of the training examples down to a low-dimensional space.
    Page 1, “Introduction”
  3. The following lemma justifies the use of an SVD calculation as one method for finding values for U a and Va that satisfy condition 2:
    Page 5, “Estimating the Tensor Model”

See all papers in Proc. ACL 2012 that mention SVD.

See all papers in Proc. ACL that mention SVD.

Back to top.