Latent Variable Models of Selectional Preference
Ó Séaghdha, Diarmuid

Article Structure

Abstract

This paper describes the application of so-called topic models to selectional preference induction.

Introduction

Language researchers have long been aware that many words place semantic restrictions on the words with which they can co-occur in a syntactic relationship.

Related work

2.1 Selectional preference learning

Three selectional preference models

3.1 Notation

Experimental setup

In the document modelling literature, probabilistic topic models are often evaluated on the likelihood they assign to unseen documents; however, it has been shown that higher log likelihood scores do not necessarily correlate with more semantically coherent induced topics (Chang etal., 2009).

Results

5.1 Induced semantic classes

Conclusions and future work

This paper has demonstrated how Bayesian techniques originally developed for modelling the topical structure of documents can be adapted to learn probabilistic models of selectional preference.

Topics

LDA

Appears in 29 sentences as: LDA (29)
In Latent Variable Models of Selectional Preference
  1. These include the Latent Dirichlet Allocation ( LDA ) model of Blei et al.
    Page 2, “Related work”
  2. (2007) integrate a model of random walks on the WordNet graph into an LDA topic model to build an unsupervised word sense disambiguation system.
    Page 2, “Related work”
  3. and Lapata (2009) adapt the basic LDA model for application to unsupervised word sense induction; in this context, the topics learned by the model are assumed to correspond to distinct senses of a particular lemma.
    Page 3, “Related work”
  4. The model architecture they propose, LinkLDA, falls somewhere between our LDA and DUAL-LDA models.
    Page 3, “Related work”
  5. As noted above, LDA was originally introduced to model sets of documents in terms of topics, or clusters of terms, that they share in varying proportions.
    Page 3, “Three selectional preference models”
  6. The high-level “generative story” for the LDA selectional preference model is as follows:
    Page 3, “Three selectional preference models”
  7. (2009) for LDA .
    Page 4, “Three selectional preference models”
  8. Unlike some topic models such as HDP (Teh et a1., 2006), LDA is parametric: the number of topics Z must be set by the user in advance.
    Page 4, “Three selectional preference models”
  9. (2009) demonstrate that LDA is relatively insensitive to larger-than—necessary choices of Z when the Dirichlet parameters 04 are optimised as part of model estimation.
    Page 4, “Three selectional preference models”
  10. The basic LDA model can be extended to capture this kind of predicate-argument interaction; the generative story for the resulting ROOTH-LDA model is as follows:
    Page 4, “Three selectional preference models”
  11. Unlike LDA , ROOTH-LDA does model the joint probability P(n, v|r) of a predicate and argument co-occurring.
    Page 4, “Three selectional preference models”

See all papers in Proc. ACL 2010 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

topic models

Appears in 15 sentences as: topic model (3) Topic modelling (1) topic models (9) “topic models” (2)
In Latent Variable Models of Selectional Preference
  1. This paper describes the application of so-called topic models to selectional preference induction.
    Page 1, “Abstract”
  2. This paper takes up tools ( “topic models” ) that have been proven successful in modelling document-word co-occurrences and adapts them to the task of selectional preference learning.
    Page 1, “Introduction”
  3. Section 2 surveys prior work on selectional preference modelling and on semantic applications of topic models .
    Page 1, “Introduction”
  4. 2.2 Topic modelling
    Page 2, “Related work”
  5. In the field of document modelling, a class of methods known as “topic models” have become a de facto standard for identifying semantic structure in documents.
    Page 2, “Related work”
  6. As a result of intensive research in recent years, the behaviour of topic models is well-understood and computa-tionally efficient implementations have been developed.
    Page 2, “Related work”
  7. (2007) integrate a model of random walks on the WordNet graph into an LDA topic model to build an unsupervised word sense disambiguation system.
    Page 2, “Related work”
  8. (2007) demonstrate that topic models learned from document-word co-occurrences are good predictors of semantic association judgements by humans.
    Page 3, “Related work”
  9. (2010) have also investigated the use of topic models for selectional preference learning.
    Page 3, “Related work”
  10. Unlike some topic models such as HDP (Teh et a1., 2006), LDA is parametric: the number of topics Z must be set by the user in advance.
    Page 4, “Three selectional preference models”
  11. In the document modelling literature, probabilistic topic models are often evaluated on the likelihood they assign to unseen documents; however, it has been shown that higher log likelihood scores do not necessarily correlate with more semantically coherent induced topics (Chang etal., 2009).
    Page 5, “Experimental setup”

See all papers in Proc. ACL 2010 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

latent variables

Appears in 11 sentences as: Latent variable (1) latent variable (5) latent variables (6)
In Latent Variable Models of Selectional Preference
  1. In Rooth et al.’s model each observed predicate-argument pair is probabilistically generated from a latent variable , which is itself generated from an underlying distribution on variables.
    Page 2, “Related work”
  2. The use of latent variables , which correspond to coherent clusters of predicate-argument interactions, allow probabilities to be assigned to predicate-argument pairs which have not previously been observed by the model.
    Page 2, “Related work”
  3. The work presented in this paper is inspired by Rooth et al.’s latent variable approach, most directly in the model described in Section 3.3.
    Page 2, “Related work”
  4. Formally seen, these are hierarchical Bayesian models which induce a set of latent variables or topics that are shared across documents.
    Page 2, “Related work”
  5. Each model has at least one vocabulary of Z arbitrarily labelled latent variables .
    Page 3, “Three selectional preference models”
  6. fzn is the number of observations where the latent variable 2 has been associated with the argument type n, fzv is the number of observations where 2 has been associated with the predicate type 2) and fzr is the number of observations where 2 has been associated with the relation 7“.
    Page 3, “Three selectional preference models”
  7. In Rooth et al.’s (1999) selectional preference model, a latent variable is responsible for generating both the predicate and argument types of an observation.
    Page 4, “Three selectional preference models”
  8. Each observation is generated by two latent variables rather than one, which potentially allows the model to learn more flexible interactions between arguments and predicates:
    Page 5, “Three selectional preference models”
  9. Latent variable models that use EM for inference can be very sensitive to the number of latent variables chosen.
    Page 8, “Results”
  10. The models presented here derive their predictions by modelling predicate-argument plausibility through the intermediary of latent variables .
    Page 8, “Conclusions and future work”
  11. We also anticipate that latent variable models will prove effective for learning selectional preferences of semantic predicates (e. g., FrameNet roles) where direct estimation from a large corpus is not a viable option.
    Page 9, “Conclusions and future work”

See all papers in Proc. ACL 2010 that mention latent variables.

See all papers in Proc. ACL that mention latent variables.

Back to top.

hyperparameters

Appears in 4 sentences as: hyperparameter (1) hyperparameters (3)
In Latent Variable Models of Selectional Preference
  1. Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv.
    Page 4, “Three selectional preference models”
  2. Unless stated otherwise, all results are based on runs of 1,000 iterations with 100 classes, with a 200-iteration bumin period after which hyperparameters were reesti-mated every 50 iterations.3 The probabilities estimated by the models (P(n|v, 7“) for LDA and P(n,v|7“) for ROOTH- and DUAL-LDA) were sampled every 50 iterations post-burnin and averaged over three runs to smooth out variance.
    Page 5, “Experimental setup”
  3. (2009) demonstrate that LDA is relatively insensitive to the choice of topic vocabulary size Z when the 04 and 6 hyperparameters are optimised appropriately during estimation.
    Page 8, “Results”
  4. In fact, we do not find that performance becomes significantly less robust when hyperparameter reestimation is deactiviated; correlation scores simply drop by a small amount (1—2 points), irrespective of the Z chosen.
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention hyperparameters.

See all papers in Proc. ACL that mention hyperparameters.

Back to top.

co-occurrence

Appears in 3 sentences as: co-occurrence (3)
In Latent Variable Models of Selectional Preference
  1. Further differences are that information about predicate-argument co-occurrence is only shared within a given interaction class rather than across the whole dataset and that the distribution (Dz is not specific to the predicate 2) but rather to the relation 7“.
    Page 4, “Three selectional preference models”
  2. Following their description, we use a 2,000-dimensional space of syntactic co-occurrence features appropriate to the relation being predicted, weight features with the G2 transformation and compute similarity with the cosine measure.
    Page 6, “Experimental setup”
  3. 30 predicates were selected for each relation; each predicate was matched with three arguments from different co-occurrence bands in the BNC, e.g., naughty-girl (high frequency), naughty-dog (medium) and naughty-lunch (low).
    Page 6, “Results”

See all papers in Proc. ACL 2010 that mention co-occurrence.

See all papers in Proc. ACL that mention co-occurrence.

Back to top.

generative model

Appears in 3 sentences as: generative model (3)
In Latent Variable Models of Selectional Preference
  1. Advantages of these models include a well-defined generative model that handles sparse data well, the ability to jointly induce semantic classes and predicate-specific distributions over those classes, and the enhanced statistical strength achieved by sharing knowledge across predicates.
    Page 1, “Introduction”
  2. For frequent predicate-argument pairs (Seen datasets), Web counts are clearly better; however, the BNC counts are unambiguously superior to LDA and ROOTH-LDA (whose predictions are based entirely on the generative model even for observed items) for the Seen verb-object data only.
    Page 7, “Results”
  3. Another potential direction for system improvement would be an integration of our generative model with Bergsma et al.’s (2008) discriminative model — this could be done in a number of ways, including using the induced classes of a topic model as features for a discriminative classifier or using the discriminative classifier to produce additional high-quality training data from noisy unparsed text.
    Page 9, “Conclusions and future work”

See all papers in Proc. ACL 2010 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

Gibbs sampling

Appears in 3 sentences as: Gibbs sampling (3)
In Latent Variable Models of Selectional Preference
  1. The combination of a well-defined probabilistic model and Gibbs sampling procedure for estimation guarantee (eventual) convergence and the avoidance of degenerate solutions.
    Page 2, “Related work”
  2. Following Griffiths and Steyvers (2004), we estimate the model by Gibbs sampling .
    Page 4, “Three selectional preference models”
  3. As suggested by the similarity between (4) and (2), the ROOTH-LDA model can be estimated by an LDA-like Gibbs sampling procedure.
    Page 4, “Three selectional preference models”

See all papers in Proc. ACL 2010 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

models trained

Appears in 3 sentences as: models trained (3)
In Latent Variable Models of Selectional Preference
  1. Table 1 shows sample semantic classes induced by models trained on the corpus of BNC verb-object co-occurrences.
    Page 6, “Results”
  2. unsurprising given that they are structurally similar models trained on the same data.
    Page 8, “Results”
  3. It would be interesting to do a full comparison that controls for size and type of corpus data; in the meantime, we can report that the LDA and ROOTH-LDA models trained on verb-object observations in the BNC (about 4 times smaller than AQUAINT) also achieve a perfect score on the Holmes et al.
    Page 8, “Results”

See all papers in Proc. ACL 2010 that mention models trained.

See all papers in Proc. ACL that mention models trained.

Back to top.

probabilistic model

Appears in 3 sentences as: probabilistic model (1) probabilistic models (1) probability model (1)
In Latent Variable Models of Selectional Preference
  1. The combination of a well-defined probabilistic model and Gibbs sampling procedure for estimation guarantee (eventual) convergence and the avoidance of degenerate solutions.
    Page 2, “Related work”
  2. Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv.
    Page 4, “Three selectional preference models”
  3. This paper has demonstrated how Bayesian techniques originally developed for modelling the topical structure of documents can be adapted to learn probabilistic models of selectional preference.
    Page 8, “Conclusions and future work”

See all papers in Proc. ACL 2010 that mention probabilistic model.

See all papers in Proc. ACL that mention probabilistic model.

Back to top.

WordNet

Appears in 3 sentences as: WordNet (3)
In Latent Variable Models of Selectional Preference
  1. Many approaches to the problem use lexical taxonomies such as WordNet to identify the semantic classes that typically fill a particular argument slot for a predicate (Resnik, 1993; Clark and Weir, 2002; Schulte im Walde et al., 2008).
    Page 2, “Related work”
  2. (2007) integrate a model of random walks on the WordNet graph into an LDA topic model to build an unsupervised word sense disambiguation system.
    Page 2, “Related work”
  3. Reisinger and Pasca (2009) use LDA-like models to map automatically acquired attribute sets onto the WordNet hierarchy.
    Page 3, “Related work”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.