Index of papers in Proc. ACL 2010 that mention
  • topic models
Johnson, Mark
Abstract
Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a low-dimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees.
Abstract
The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well.
Abstract
The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG.
Introduction
so Bayesian inference for PCFGs can be used to learn LDA topic models as well.
Introduction
However, once this link is established it suggests a variety of extensions to the LDA topic models , two of which we explore in this paper.
Introduction
The first involves extending the LDA topic model so that it generates collocations (sequences of words) rather than individual words.
LDA topic models as PCFGs
Figure 2: A tree generated by the CFG encoding an LDA topic model .
Latent Dirichlet Allocation Models
Figure l: A graphical model “plate” representation of an LDA topic model .
topic models is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Li, Linlin and Roth, Benjamin and Sporleder, Caroline
Abstract
We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables.
Introduction
Recently, several researchers have experimented with topic models (Brody and Lapata, 2009; Boyd-Graber et al., 2007; Boyd-Graber and Blei, 2007; Cai et al., 2007) for sense disambiguation and induction.
Introduction
Topic models are generative probabilistic models of text corpora in which each document is modelled as a mixture over (latent) topics, which are in turn represented by a distribution over words.
Introduction
Previous approaches using topic models for sense disambiguation either embed topic features in a supervised model (Cai et al., 2007) or rely heavily on the structure of hierarchical lexicons such as WordNet (Boyd-Graber et al., 2007).
Related Work
Recently, a number of systems have been proposed that make use of topic models for sense disambiguation.
Related Work
They compute topic models from a large unlabelled corpus and include them as features in a supervised system.
Related Work
Boyd-Graber and Blei (2007) propose an unsupervised approach that integrates McCarthy et al.’s (2004) method for finding predominant word senses into a topic modelling framework.
The Sense Disambiguation Model
3.1 Topic Model
The Sense Disambiguation Model
As pointed out by Hofmann (1999), the starting point of topic models is to decompose the conditional word-document probability distribution p(w|d) into two different distributions: the word-topic distribution p(w|z), and the topic-document distribution p(z|d) (see Equation 1).
topic models is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Ó Séaghdha, Diarmuid
Abstract
This paper describes the application of so-called topic models to selectional preference induction.
Experimental setup
In the document modelling literature, probabilistic topic models are often evaluated on the likelihood they assign to unseen documents; however, it has been shown that higher log likelihood scores do not necessarily correlate with more semantically coherent induced topics (Chang etal., 2009).
Introduction
This paper takes up tools ( “topic models” ) that have been proven successful in modelling document-word co-occurrences and adapts them to the task of selectional preference learning.
Introduction
Section 2 surveys prior work on selectional preference modelling and on semantic applications of topic models .
Related work
2.2 Topic modelling
Related work
In the field of document modelling, a class of methods known as “topic models” have become a de facto standard for identifying semantic structure in documents.
Related work
As a result of intensive research in recent years, the behaviour of topic models is well-understood and computa-tionally efficient implementations have been developed.
Three selectional preference models
Unlike some topic models such as HDP (Teh et a1., 2006), LDA is parametric: the number of topics Z must be set by the user in advance.
topic models is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Ritter, Alan and Mausam and Etzioni, Oren
Experiments
We perform three main experiments to assess the quality of the preferences obtained using topic models .
Experiments
We use this experiment to compare the various topic models as well as the best model with the known state of the art approaches to selectional preferences.
Experiments
Tigure 3 plots the precision-recall curve for the tseudo-disambiguation experiment comparing the hree different topic models .
Introduction
In this paper we describe a novel approach to computing selectional preferences by making use of unsupervised topic models .
Introduction
Unsupervised topic models , such as latent Dirichlet allocation (LDA) (Blei et al., 2003) and its variants are characterized by a set of hidden topics, which represent the underlying semantic structure of a document collection.
Introduction
Thus, topic models are a natural fit for modeling our relation data.
Previous Work
Topic models such as LDA (Blei et al., 2003) and its variants have recently begun to see use in many NLP applications such as summarization (Daume III and Marcu, 2006), document alignment and segmentation (Chen et al., 2009), and inferring class-attribute hierarchies (Reisinger and Pasca, 2009).
Topic Models for Selectional Prefs.
We present a series of topic models for the task of computing selectional preferences.
Topic Models for Selectional Prefs.
Readers familiar with topic modeling terminology can understand our approach as follows: we treat each relation as a document whose contents consist of a bags of words corresponding to all the noun phrases observed as arguments of the relation in our corpus.
Topic Models for Selectional Prefs.
3.5 Advantages of Topic Models
topic models is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Zhang, Duo and Mei, Qiaozhu and Zhai, ChengXiang
Abstract
Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way.
Abstract
One common deficiency of existing topic models , though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other.
Abstract
In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages.
Introduction
As a robust unsupervised way to perform shallow latent semantic analysis of topics in text, probabilistic topic models (Hofmann, 1999a; Blei et al., 2003b) have recently attracted much attention.
Introduction
Although many topic models have been proposed and shown to be useful (see Section 2 for more detailed discussion of related work), most of them share a common deficiency: they are designed to work only for monolingual text data and would not work well for extracting cross-lingual latent topics, i.e.
Introduction
In this paper, we propose a novel topic model , called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) model, which can be used to mine shared latent topics from unaligned text data in different languages.
Related Work
Many topic models have been proposed, and the two basic models are the Probabilistic Latent Semantic Analysis (PLSA) model (Hofmann, 1999a) and the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003b).
Related Work
They and their extensions have been successfully applied to many problems, including hierarchical topic extraction (Hofmann, 1999b; Blei et al., 2003a; Li and McCallum, 2006), author-topic modeling (Steyvers et al., 2004), contextual topic analysis (Mei and Zhai, 2006), dynamic and correlated topic models (Blei and Lafferty, 2005; Blei and Lafferty, 2006), and opinion analysis (Mei et al., 2007; Branavan et al., 2008).
Related Work
Some previous work on multilingual topic models assume documents in multiple languages are aligned either at the document level, sentence level or by time stamps (Mimno et al., 2009; Zhao and Xing, 2006; Kim and Khudanpur, 2004; Ni et al., 2009; Wang et al., 2007).
topic models is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Abstract
We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model .
Background and Motivation
One of the challenges of using a previously trained topic model is that the new document might have a totally new vocabulary or may include many other specific topics, which may or may not exist in the trained model.
Background and Motivation
A common method is to rebuild a topic model for new sets of documents (Haghighi and Vanderwende, 2009), which has proven to produce coherent summaries.
Conclusion
We demonstrated that implementation of a summary focused hierarchical topic model to discover sentence structures as well as construction of a discriminative method for inference can benefit summarization quality on manual and automatic evaluation metrics.
Experiments and Discussions
* HIERSUM : (Haghighi and Vanderwende, 2009) A generative summarization method based on topic models , which uses sentences as an additional level.
Experiments and Discussions
* HbeSum (Hybrid Flat Summarizer): To investigate the performance of hierarchical topic model , we build another hybrid model using flat LDA (Blei et al., 2003b).
Experiments and Discussions
Compared to the HbeSum built on LDA, both HybHSum1&2 yield better performance indicating the effectiveness of using hierarchical topic model in summarization task.
Introduction
We present a probabilistic topic model on sentence level building on hierarchical Latent Dirichlet Allocation (hLDA) (Blei et al., 2003a), which is a generalization of LDA (Blei et al., 2003b).
Summary-Focused Hierarchical Model
We build a summary-focused hierarchical probabilistic topic model , sumHLDA, for each document cluster at sentence level, because it enables capturing expected topic distributions in given sentences directly from the model.
Summary-Focused Hierarchical Model
1Please refer to (Blei et al., 2003b) and (Blei et al., 2003a) for details and demonstrations of topic models .
topic models is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Feng, Yansong and Lapata, Mirella
Experimental Setup
The underlying topic model was trained with 1,000 topics using only content words (i.e., nouns, verbs, and adjectives) that appeared
Extractive Caption Generation
Probabilistic Similarity Recall that the backbone of our image annotation model is a topic model with images and documents represented as a probability distribution over latent topics.
Image Annotation
The basic idea underlying LDA, and topic models in general, is that each document is composed of a probability distribution over topics, where each topic represents a probability distribution over words.
Results
As can be seen the probabilistic models (KL and J S divergence) outperform word overlap and cosine similarity (all differences are statistically significant, p < 0.01).6 They make use of the same topic model as the image annotation model, and are thus able to select sentences that cover common content.
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: