Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki

Article Structure

Abstract

We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing.

Introduction

Syntactic parsing has played a central role in natural language processing.

Background and Related Work

Our SR-TSG work is built upon recent work on Bayesian TSG induction from parse trees (Post and Gildea, 2009; Cohn et al., 2010).

Symbol-Refined Tree Substitution Grammars

In this section, we propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing.

Inference

We use Markov Chain Monte Carlo (MCMC) sampling to infer the SR-TSG derivations from parse trees.

Experiment

5.1 Settings 5.1.1 Data Preparation

Conclusion

We have presented an SR-TSG, which is an extension of the conventional TSG model where each symbol of tree fragments can be automatically subcategorized to address the problem of the conditional independence assumptions of a TSG.

Topics

parse trees

Appears in 16 sentences as: parse tree (6) parse trees (12)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Our SR-TSG work is built upon recent work on Bayesian TSG induction from parse trees (Post and Gildea, 2009; Cohn et al., 2010).
    Page 2, “Background and Related Work”
  2. A derivation is a process of forming a parse tree .
    Page 2, “Background and Related Work”
  3. Figure la shows an example parse tree and Figure lb shows its example TSG derivation.
    Page 2, “Background and Related Work”
  4. Since different derivations may produce the same parse tree, recent work on TSG induction (Post and Gildea, 2009; Cohn et al., 2010) employs a probabilistic model of a TSG and predicts derivations from observed parse trees in an unsupervised way.
    Page 2, “Background and Related Work”
  5. The posterior distribution over elementary trees given a parse tree t can be computed by using the Bayes’ rule:
    Page 2, “Background and Related Work”
  6. Therefore, the task of TSG induction from parse trees turns out to consist of modeling the prior distribution p (e).
    Page 2, “Background and Related Work”
  7. As with previous work on TSG induction, our task is the induction of SR-TSG derivations from a corpus of parse trees in an unsupervised fashion.
    Page 3, “Symbol-Refined Tree Substitution Grammars”
  8. That is, we wish to infer the symbol subcategories of every node and substitution site (i.e., nodes where substitution occurs) from parse trees .
    Page 3, “Symbol-Refined Tree Substitution Grammars”
  9. We use Markov Chain Monte Carlo (MCMC) sampling to infer the SR-TSG derivations from parse trees .
    Page 4, “Inference”
  10. We first infer latent symbol subcategories for every symbol in the parse trees , and then infer latent substitution sites stepwise.
    Page 5, “Inference”
  11. After that, we unfiX that assumption and infer latent substitution sites given symbol-refined parse trees .
    Page 5, “Inference”

See all papers in Proc. ACL 2012 that mention parse trees.

See all papers in Proc. ACL that mention parse trees.

Back to top.

latent variables

Appears in 7 sentences as: latent variables (7)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. The inference of the SR-TSG derivations corresponds to inferring two kinds of latent variables : latent symbol subcategories and latent substitution
    Page 4, “Inference”
  2. This stepwise learning is simple and efficient in practice, but we believe that the joint learning of both latent variables is possible, and we will deal with this in future work.
    Page 5, “Inference”
  3. This sampler simultaneously updates blocks of latent variables associated with a sentence, thus it can find MAP solutions efficiently.
    Page 5, “Inference”
  4. The tree-level sampler also simultaneously updates blocks of latent variables associated with the type of SR-TSG rules, thus it can find MAP solutions efficiently.
    Page 5, “Inference”
  5. From this viewpoint, TSG utilizes surrounding symbols (NNP of NPNNP in the above example) as latent variables with which to capture context information.
    Page 8, “Experiment”
  6. as latent variables and the search space is larger than that of a TSG when the symbol refinement model allows for more than two subcategories for each symbol.
    Page 8, “Experiment”
  7. Our experimental results comfirm that jointly modeling both latent variables using our SR-TSG assists accurate parsing.
    Page 8, “Experiment”

See all papers in Proc. ACL 2012 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.

treebank

Appears in 7 sentences as: Treebank (3) treebank (4)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
    Page 1, “Abstract”
  2. Probabilistic context-free grammar (PCFG) underlies many statistical parsers, however, it is well known that the PCFG rules extracted from treebank data Via maximum likelihood estimation do not perform well due to unrealistic context freedom assumptions (Klein and Manning, 2003).
    Page 1, “Introduction”
  3. Symbol refinement is a successful approach for weakening context freedom assumptions by dividing coarse treebank symbols (e.g.
    Page 1, “Introduction”
  4. Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.
    Page 2, “Introduction”
  5. We ran experiments on the Wall Street Journal (WSJ) portion of the English Penn Treebank data set (Marcus et al., 1993), using a standard data split (sections 2—21 for training, 22 for development and 23 for testing).
    Page 6, “Experiment”
  6. The treebank data is right-binarized (Matsuzaki et al., 2005) to construct grammars with only unary and binary productions.
    Page 6, “Experiment”
  7. This result suggests that the conventional TSG model trained from the vanilla treebank is insufficient to resolve
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention treebank.

See all papers in Proc. ACL that mention treebank.

Back to top.

F1 score

Appears in 6 sentences as: F1 score (4) F1 scores (2)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.
    Page 2, “Introduction”
  2. We used EVALB1 to compute the F1 score .
    Page 6, “Experiment”
  3. Table 2 shows the F1 scores of the CFG, TSG and SR-TSG parsers for small and full training sets.
    Page 6, “Experiment”
  4. Table 3 shows the F1 scores of an SR-TSG and conventional parsers with the full training set.
    Page 7, “Experiment”
  5. Our SR-TSG (single) parser achieved an F1 score of 91.1%, which is a 6.4 point improvement over the conventional Bayesian TSG parser reported by (Cohn et al., 2010).
    Page 8, “Experiment”
  6. The combination model, SR-TSG (multiple), achieved an F1 score of 92.4%, which is a state-of-the-art result for the WSJ parsing task.
    Page 8, “Experiment”

See all papers in Proc. ACL 2012 that mention F1 score.

See all papers in Proc. ACL that mention F1 score.

Back to top.

reranking

Appears in 6 sentences as: reranking (6) reranks (1)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
    Page 1, “Abstract”
  2. Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.
    Page 2, “Introduction”
  3. It should be noted that discriminative reranking parsers such as (Char-niak and Johnson, 2005) and (Huang, 2008) are constructed on a generative parser.
    Page 8, “Experiment”
  4. The reranking parser takes the k-best lists of candidate trees or a packed forest produced by a baseline parser (usually a generative model), and then reranks the candidates using arbitrary features.
    Page 8, “Experiment”
  5. Hence, we can expect that combining our SR-TSG model with a discriminative reranking parser would provide better performance than SR-TSG alone.
    Page 8, “Experiment”
  6. Compared with discriminative reranking parsers, combining multiple grammars by using the product model provides the advantage that it does not require any additional training.
    Page 8, “Experiment”

See all papers in Proc. ACL 2012 that mention reranking.

See all papers in Proc. ACL that mention reranking.

Back to top.

Gibbs sampler

Appears in 5 sentences as: Gibbs sampler (4) Gibbs sampling (1)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. In each splitting step, we use two types of blocked MCMC algorithm: the sentence-level blocked Metroporil-Hastings (MH) sampler and the tree-level blocked Gibbs sampler , while (Petrov et al., 2006) use a different MLE-based model and the EM algorithm.
    Page 5, “Inference”
  2. The tree-level blocked Gibbs sampler focuses on the type of SR-TSG rules and simultaneously up-
    Page 5, “Inference”
  3. After the inference of symbol subcategories, we use Gibbs sampling to infer the substitution sites of parse trees as described in (Cohn and Lapata, 2009; Post and Gildea, 2009).
    Page 5, “Inference”
  4. For each iteration, the Gibbs sampler works by sampling the value of each binary variable in random order.
    Page 5, “Inference”
  5. After that, to infer the substitution sites, we initialized the model with the final sample from a run on the small training set, and used the Gibbs sampler for 2000 iterations.
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention Gibbs sampler.

See all papers in Proc. ACL that mention Gibbs sampler.

Back to top.

syntactic parsing

Appears in 5 sentences as: Syntactic parsing (1) syntactic parsing (4)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. We propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing .
    Page 1, “Abstract”
  2. Syntactic parsing has played a central role in natural language processing.
    Page 1, “Introduction”
  3. tree fragments and symbol refinement work complementarily for syntactic parsing .
    Page 2, “Introduction”
  4. In this paper, we propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing .
    Page 2, “Introduction”
  5. In this section, we propose Symbol-Refined Tree Substitution Grammars (SR-TSGs) for syntactic parsing .
    Page 3, “Symbol-Refined Tree Substitution Grammars”

See all papers in Proc. ACL 2012 that mention syntactic parsing.

See all papers in Proc. ACL that mention syntactic parsing.

Back to top.

sentence-level

Appears in 4 sentences as: sentence-level (4)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. In each splitting step, we use two types of blocked MCMC algorithm: the sentence-level blocked Metroporil-Hastings (MH) sampler and the tree-level blocked Gibbs sampler, while (Petrov et al., 2006) use a different MLE-based model and the EM algorithm.
    Page 5, “Inference”
  2. Our sampler iterates sentence-level sampling and tree-level sampling alternately.
    Page 5, “Inference”
  3. The sentence-level MH sampler is a recently proposed algorithm for grammar induction (Johnson et al., 2007b; Cohn et al., 2010).
    Page 5, “Inference”
  4. We proposed a novel backoff modeling of an SR-TSG based on the hierarchical Pitman-Yor Process and sentence-level and tree-level blocked MCMC sampling for training our model.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention sentence-level.

See all papers in Proc. ACL that mention sentence-level.

Back to top.

development set

Appears in 3 sentences as: development set (3)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. We estimated the optimal values of the stopping probabilities s by using the development set .
    Page 6, “Experiment”
  2. In all our experiments, we conducted ten independent runs to train our model, and selected the one that performed best on the development set in terms of parsing accuracy.
    Page 6, “Experiment”
  3. development set (S 100).
    Page 7, “Experiment”

See all papers in Proc. ACL 2012 that mention development set.

See all papers in Proc. ACL that mention development set.

Back to top.

grammar induction

Appears in 3 sentences as: grammar induction (3)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. morphology analysis, word segmentation (Johnson and Goldwater, 2009), and dependency grammar induction (Cohen et al., 2010), rather than constituent syntax parsing.
    Page 3, “Background and Related Work”
  2. The sentence-level MH sampler is a recently proposed algorithm for grammar induction (Johnson et al., 2007b; Cohn et al., 2010).
    Page 5, “Inference”
  3. Future work will involve examining the SR-TSG model for different languages and for unsupervised grammar induction .
    Page 8, “Conclusion”

See all papers in Proc. ACL 2012 that mention grammar induction.

See all papers in Proc. ACL that mention grammar induction.

Back to top.

hyperparameters

Appears in 3 sentences as: Hyperparameter (1) hyperparameters (2)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. 4.3 Hyperparameter Estimation
    Page 5, “Inference”
  2. We treat hyperparameters {d, 0} as random variables and update their values for every MCMC iteration.
    Page 5, “Inference”
  3. We place a prior on the hyperparameters as follows: d N Beta(1,1), 6 N Gamma(1,1).
    Page 5, “Inference”

See all papers in Proc. ACL 2012 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

Penn Treebank

Appears in 3 sentences as: Penn Treebank (3)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Our SR-TSG parser achieves an F 1 score of 92.4% in the Wall Street Journal (WSJ) English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and better than state-of-the-art discriminative reranking parsers.
    Page 1, “Abstract”
  2. Our SR-TSG parser achieves an F1 score of 92.4% in the WSJ English Penn Treebank parsing task, which is a 7.7 point improvement over a conventional Bayesian TSG parser, and superior to state-of-the-art discriminative reranking parsers.
    Page 2, “Introduction”
  3. We ran experiments on the Wall Street Journal (WSJ) portion of the English Penn Treebank data set (Marcus et al., 1993), using a standard data split (sections 2—21 for training, 22 for development and 23 for testing).
    Page 6, “Experiment”

See all papers in Proc. ACL 2012 that mention Penn Treebank.

See all papers in Proc. ACL that mention Penn Treebank.

Back to top.

probabilistic model

Appears in 3 sentences as: Probabilistic Model (1) probabilistic model (2)
In Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
  1. Since different derivations may produce the same parse tree, recent work on TSG induction (Post and Gildea, 2009; Cohn et al., 2010) employs a probabilistic model of a TSG and predicts derivations from observed parse trees in an unsupervised way.
    Page 2, “Background and Related Work”
  2. 3.1 Probabilistic Model
    Page 3, “Symbol-Refined Tree Substitution Grammars”
  3. We define a probabilistic model of an SR-TSG based on the Pitman-Yor Process (PYP) (Pitman and Yor, 1997), namely a sort of nonparametric Bayesian model.
    Page 3, “Symbol-Refined Tree Substitution Grammars”

See all papers in Proc. ACL 2012 that mention probabilistic model.

See all papers in Proc. ACL that mention probabilistic model.

Back to top.