A Latent Dirichlet Allocation Method for Selectional Preferences
Ritter, Alan and Mausam and Etzioni, Oren

Article Structure

Abstract

The computation of selectional preferences, the admissible argument values for a relation, is a well-known NLP task with broad applicability.

Introduction

Selectional Preferences encode the set of admissible argument values for a relation.

Previous Work

Previous work on selectional preferences can be broken into four categories: class-based approaches (Resnik, 1996; Li and Abe, 1998; Clark and Weir, 2002; Pantel et al., 2007), similarity based approaches (Dagan et al., 1999; Erk, 2007), discriminative (Bergsma et al., 2008), and generative probabilistic models (Rooth et al., 1999).

Topic Models for Selectional Prefs.

We present a series of topic models for the task of computing selectional preferences.

Experiments

We perform three main experiments to assess the quality of the preferences obtained using topic models.

Conclusions and Future Work

We have presented an application of topic modeling to the problem of automatically computing selectional preferences.

Topics

topic models

Appears in 12 sentences as: topic modeling (2) Topic Models (1) Topic models (1) topic models (8)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. In this paper we describe a novel approach to computing selectional preferences by making use of unsupervised topic models .
    Page 1, “Introduction”
  2. Unsupervised topic models , such as latent Dirichlet allocation (LDA) (Blei et al., 2003) and its variants are characterized by a set of hidden topics, which represent the underlying semantic structure of a document collection.
    Page 1, “Introduction”
  3. Thus, topic models are a natural fit for modeling our relation data.
    Page 1, “Introduction”
  4. Topic models such as LDA (Blei et al., 2003) and its variants have recently begun to see use in many NLP applications such as summarization (Daume III and Marcu, 2006), document alignment and segmentation (Chen et al., 2009), and inferring class-attribute hierarchies (Reisinger and Pasca, 2009).
    Page 3, “Previous Work”
  5. We present a series of topic models for the task of computing selectional preferences.
    Page 3, “Topic Models for Selectional Prefs.”
  6. Readers familiar with topic modeling terminology can understand our approach as follows: we treat each relation as a document whose contents consist of a bags of words corresponding to all the noun phrases observed as arguments of the relation in our corpus.
    Page 3, “Topic Models for Selectional Prefs.”
  7. 3.5 Advantages of Topic Models
    Page 5, “Topic Models for Selectional Prefs.”
  8. There are several advantages to using topic models for our task.
    Page 5, “Topic Models for Selectional Prefs.”
  9. We perform three main experiments to assess the quality of the preferences obtained using topic models .
    Page 5, “Experiments”
  10. We use this experiment to compare the various topic models as well as the best model with the known state of the art approaches to selectional preferences.
    Page 5, “Experiments”
  11. Tigure 3 plots the precision-recall curve for the tseudo-disambiguation experiment comparing the hree different topic models .
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention topic models.

See all papers in Proc. ACL that mention topic models.

Back to top.

WordNet

Appears in 10 sentences as: WordNet (8) Wordnet (1) WordNet’s (1)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. Resnik (1996) presented the earliest work in this area, describing an information-theoretic approach that inferred selectional preferences based on the WordNet hypernym hierarchy.
    Page 1, “Introduction”
  2. This avoids problems like WordNet’s poor coverage of proper nouns and is shown to improve performance.
    Page 1, “Introduction”
  3. WordNet ), or automatically generated (Pantel, 2003).
    Page 2, “Previous Work”
  4. Finally, we report on the quality of a large database of Wordnet-based preferences obtained after manually associating our topics with Wordnet classes (Section 4.4).
    Page 5, “Experiments”
  5. We then automatically filtered out any rules which contained a negation, or for which the antecedent and consequent contained a pair of antonyms found in WordNet (this left us with 85 rules).
    Page 8, “Experiments”
  6. Table 3: Top 10 and Bottom 10 ranked inference rules ranked by LDA-SPafter automatically filtering out negations and antonyms (using WordNet ).
    Page 8, “Experiments”
  7. Table 3 shows the top 10 and bottom 10 rules out of the 26,000 ranked by KL Divergence after automatically filtering antonyms (using WordNet ) and negations.
    Page 8, “Experiments”
  8. Since we already have a set of topics, our task reduces to mapping the inferred topics to an equivalent class in a taxonomy (e. g., WordNet ).
    Page 9, “Experiments”
  9. In particular, we applied a semiautomatic scheme to map topics to WordNet .
    Page 9, “Experiments”
  10. We first applied Resnik’s approach to automatically shortlist a few candidate WordNet classes for each topic.
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention WordNet.

See all papers in Proc. ACL that mention WordNet.

Back to top.

LDA

Appears in 8 sentences as: LDA (8)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. Unsupervised topic models, such as latent Dirichlet allocation ( LDA ) (Blei et al., 2003) and its variants are characterized by a set of hidden topics, which represent the underlying semantic structure of a document collection.
    Page 1, “Introduction”
  2. In particular, our system, called LDA-SP, uses LinkLDA (Erosheva et al., 2004), an extension of LDA that simultaneously models two sets of distributions for each topic.
    Page 1, “Introduction”
  3. Topic models such as LDA (Blei et al., 2003) and its variants have recently begun to see use in many NLP applications such as summarization (Daume III and Marcu, 2006), document alignment and segmentation (Chen et al., 2009), and inferring class-attribute hierarchies (Reisinger and Pasca, 2009).
    Page 3, “Previous Work”
  4. Van Durme and Gildea (2009) proposed applying LDA to general knowledge templates extracted using the KNEXT system (Schubert and Tong, 2003).
    Page 3, “Previous Work”
  5. We first describe the straightforward application of LDA to modeling our corpus of extracted relations.
    Page 3, “Topic Models for Selectional Prefs.”
  6. In this case two separate LDA models are used to model a1 and a2 independently.
    Page 3, “Topic Models for Selectional Prefs.”
  7. Formally, LDA generates each argument in the corpus of relations as follows:
    Page 3, “Topic Models for Selectional Prefs.”
  8. As a more tightly coupled alternative, we first propose J ointLDA, whose graphical model is depicted in Figure l. The key difference in J ointLDA (versus LDA ) is that instead of one, it maintains two sets of topics (latent distributions over words) denoted by 6 and 7, one for classes of each argument.
    Page 4, “Topic Models for Selectional Prefs.”

See all papers in Proc. ACL 2010 that mention LDA.

See all papers in Proc. ACL that mention LDA.

Back to top.

Gibbs sampling

Appears in 5 sentences as: Gibbs samples (1) Gibbs Sampling (1) Gibbs sampling (3)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. Additionally we perform full Bayesian inference using collapsed Gibbs sampling , in which parameters are integrated out (Griffiths and Steyvers, 2004).
    Page 2, “Previous Work”
  2. For all the models we use collapsed Gibbs sampling for inference in which each of the hidden variables (e. g., 27.,“ and 273,32 in LinkLDA) are sampled sequentially conditioned on a full-assignment to all others, integrating out the parameters (Griffiths and Steyvers, 2004).
    Page 4, “Topic Models for Selectional Prefs.”
  3. In addition, there are several scalability enhancements such as SparseLDA (Yao et al., 2009), and an approximation of the Gibbs Sampling procedure can be efficiently parallelized (Newman et al., 2009).
    Page 5, “Topic Models for Selectional Prefs.”
  4. Next we used collapsed Gibbs sampling to infer a distribution over topics, 6?, for each of the relations in the primary corpus (based solely on tuples in the training set) using the topics from the generalization corpus.
    Page 6, “Experiments”
  5. To evaluate how well our topic-class associations carry over to unseen relations we used the same random sample of 100 relations from the pseudo-disambiguation experiment.8 For each argument of each relation we picked the top two topics according to frequency in the 5 Gibbs samples .
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention Gibbs sampling.

See all papers in Proc. ACL that mention Gibbs sampling.

Back to top.

noun phrases

Appears in 4 sentences as: noun phrases (4)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. 2 Our task is to compute, for each argument ai of each relation r, a set of usual argument values ( noun phrases ) that it takes.
    Page 3, “Topic Models for Selectional Prefs.”
  2. Readers familiar with topic modeling terminology can understand our approach as follows: we treat each relation as a document whose contents consist of a bags of words corresponding to all the noun phrases observed as arguments of the relation in our corpus.
    Page 3, “Topic Models for Selectional Prefs.”
  3. This resulted in a vocabulary of about 32,000 noun phrases , and a set of about 2.4 million tuples in our generalization corpus.
    Page 5, “Experiments”
  4. For each of the 500 observed tuples in the test-set we generated a pseudo-negative tuple by randomly sampling two noun phrases from the distribution of NPs in both corpora.
    Page 6, “Experiments”

See all papers in Proc. ACL 2010 that mention noun phrases.

See all papers in Proc. ACL that mention noun phrases.

Back to top.

probabilistic model

Appears in 4 sentences as: probabilistic model (2) probabilistic models (2)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. Additionally, because LDA-SP is based on a formal probabilistic model , it has the advantage that it can naturally be applied in many scenarios.
    Page 2, “Introduction”
  2. Previous work on selectional preferences can be broken into four categories: class-based approaches (Resnik, 1996; Li and Abe, 1998; Clark and Weir, 2002; Pantel et al., 2007), similarity based approaches (Dagan et al., 1999; Erk, 2007), discriminative (Bergsma et al., 2008), and generative probabilistic models (Rooth et al., 1999).
    Page 2, “Previous Work”
  3. Our solution fits into the general category of generative probabilistic models , which model each relatiorflargument combination as being generated by a latent class variable.
    Page 2, “Previous Work”
  4. Because LDA-SP generates a complete probabilistic model for our relation data, its results are easily applicable to many other tasks such as identifying similar relations, ranking inference rules, etc.
    Page 9, “Conclusions and Future Work”

See all papers in Proc. ACL 2010 that mention probabilistic model.

See all papers in Proc. ACL that mention probabilistic model.

Back to top.

randomly sampled

Appears in 4 sentences as: random sample (1) randomly sampled (2) randomly sampling (1)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. For each of the 500 observed tuples in the test-set we generated a pseudo-negative tuple by randomly sampling two noun phrases from the distribution of NPs in both corpora.
    Page 6, “Experiments”
  2. (2007) we randomly sampled 100 inference rules.
    Page 8, “Experiments”
  3. We randomly sampled 300 of these inferences to hand-label.
    Page 8, “Experiments”
  4. To evaluate how well our topic-class associations carry over to unseen relations we used the same random sample of 100 relations from the pseudo-disambiguation experiment.8 For each argument of each relation we picked the top two topics according to frequency in the 5 Gibbs samples.
    Page 9, “Experiments”

See all papers in Proc. ACL 2010 that mention randomly sampled.

See all papers in Proc. ACL that mention randomly sampled.

Back to top.

dependency paths

Appears in 3 sentences as: dependency paths (3)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. We therefore mapped the DIRT Inference rules (Lin and Pantel, 2001), (which consist of pairs of dependency paths ) to TEXTRUNNER relations as follows.
    Page 8, “Experiments”
  2. From the parses we extracted all dependency paths between nouns that contain only words present in the TEXTRUNNER relation string.
    Page 8, “Experiments”
  3. These dependency paths were then matched against each pair in the DIRT database, and all pairs of associated relations were collected producing about 26,000 inference rules.
    Page 8, “Experiments”

See all papers in Proc. ACL 2010 that mention dependency paths.

See all papers in Proc. ACL that mention dependency paths.

Back to top.

generative model

Appears in 3 sentences as: generative model (2) generative models (1)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. On the other hand, generative models produce complete probability distributions of the data, and hence can be integrated with other systems and tasks in a more principled manner (see Sections 4.2.2 and 4.3.1).
    Page 2, “Previous Work”
  2. In the generative model for our data, each relation T has a corresponding multinomial over topics 67., drawn from a Dirichlet.
    Page 3, “Topic Models for Selectional Prefs.”
  3. These similarity measures were shown to outperform the generative model of Rooth et al.
    Page 5, “Experiments”

See all papers in Proc. ACL 2010 that mention generative model.

See all papers in Proc. ACL that mention generative model.

Back to top.

jointly model

Appears in 3 sentences as: jointly model (1) jointly modeling (1) jointly models (1)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. It also focuses on jointly modeling the generation of both predicate and argument, and evaluation is performed on a set of human-plausibility judgments obtaining impressive results against Keller and Lapata’s (2003) Web hit-count based system.
    Page 3, “Previous Work”
  2. One weakness of IndependentLDA is that it doesn’t jointly model a1 and a2 together.
    Page 3, “Topic Models for Selectional Prefs.”
  3. On the one hand, J ointLDA jointly models the generation of both arguments in an extracted tuple.
    Page 4, “Topic Models for Selectional Prefs.”

See all papers in Proc. ACL 2010 that mention jointly model.

See all papers in Proc. ACL that mention jointly model.

Back to top.

precision and recall

Appears in 3 sentences as: Precision and recall (1) precision and recall (2)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. In figure 5 we compare the precision and recall of LDA-SP against the top two performing systems described by Pantel et al.
    Page 8, “Experiments”
  2. We find that LDA-SP achieves both higher precision and recall than ISP.IIM-\/.
    Page 8, “Experiments”
  3. Figure 5: Precision and recall on the inference filtering task.
    Page 8, “Experiments”

See all papers in Proc. ACL 2010 that mention precision and recall.

See all papers in Proc. ACL that mention precision and recall.

Back to top.

state of the art

Appears in 3 sentences as: state of the art (3)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. Our experiments demonstrate that LDA-SP significantly outperforms state of the art approaches obtaining an 85% increase in recall at precision 0.9 on the standard pseudo-disambiguation task.
    Page 2, “Introduction”
  2. We use this experiment to compare the various topic models as well as the best model with the known state of the art approaches to selectional preferences.
    Page 5, “Experiments”
  3. We first compare the three LDA-based approaches to each other and two state of the art similarity based systems (Erk, 2007) (using mutual information and J accard similarity respectively).
    Page 5, “Experiments”

See all papers in Proc. ACL 2010 that mention state of the art.

See all papers in Proc. ACL that mention state of the art.

Back to top.

topic distributions

Appears in 3 sentences as: topic distribution (1) topic distributions (2)
In A Latent Dirichlet Allocation Method for Selectional Preferences
  1. By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional class-based approaches, it produces human-interpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power.
    Page 1, “Abstract”
  2. ing related topic pairs between arguments we employ a sparse prior over the per-relation topic distributions .
    Page 4, “Topic Models for Selectional Prefs.”
  3. Finally we note that, once a topic distribution has been learned over a set of training relations, one can efficiently apply inference to unseen relations (Yao et al., 2009).
    Page 5, “Topic Models for Selectional Prefs.”

See all papers in Proc. ACL 2010 that mention topic distributions.

See all papers in Proc. ACL that mention topic distributions.

Back to top.