Index of papers in Proc. ACL 2010 that mention

**topic models**

Abstract | Latent Dirichlet Allocation (LDA) models are used as “topic models” to produce a low-dimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. |

Abstract | The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. |

Abstract | The first replaces the unigram component of LDA topic models with multi-word sequences or collocations generated by an AG. |

Introduction | so Bayesian inference for PCFGs can be used to learn LDA topic models as well. |

Introduction | However, once this link is established it suggests a variety of extensions to the LDA topic models , two of which we explore in this paper. |

Introduction | The first involves extending the LDA topic model so that it generates collocations (sequences of words) rather than individual words. |

LDA topic models as PCFGs | Figure 2: A tree generated by the CFG encoding an LDA topic model . |

Latent Dirichlet Allocation Models | Figure l: A graphical model “plate” representation of an LDA topic model . |

topic models is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

- LDA (51)
- topic models (26)
- probabilistic models (5)

Abstract | We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. |

Introduction | Recently, several researchers have experimented with topic models (Brody and Lapata, 2009; Boyd-Graber et al., 2007; Boyd-Graber and Blei, 2007; Cai et al., 2007) for sense disambiguation and induction. |

Introduction | Topic models are generative probabilistic models of text corpora in which each document is modelled as a mixture over (latent) topics, which are in turn represented by a distribution over words. |

Introduction | Previous approaches using topic models for sense disambiguation either embed topic features in a supervised model (Cai et al., 2007) or rely heavily on the structure of hierarchical lexicons such as WordNet (Boyd-Graber et al., 2007). |

Related Work | Recently, a number of systems have been proposed that make use of topic models for sense disambiguation. |

Related Work | They compute topic models from a large unlabelled corpus and include them as features in a supervised system. |

Related Work | Boyd-Graber and Blei (2007) propose an unsupervised approach that integrates McCarthy et al.’s (2004) method for finding predominant word senses into a topic modelling framework. |

The Sense Disambiguation Model | 3.1 Topic Model |

The Sense Disambiguation Model | As pointed out by Hofmann (1999), the starting point of topic models is to decompose the conditional word-document probability distribution p(w|d) into two different distributions: the word-topic distribution p(w|z), and the topic-document distribution p(z|d) (see Equation 1). |

topic models is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

- sense disambiguation (21)
- WordNet (16)
- topic models (15)

Abstract | This paper describes the application of so-called topic models to selectional preference induction. |

Experimental setup | In the document modelling literature, probabilistic topic models are often evaluated on the likelihood they assign to unseen documents; however, it has been shown that higher log likelihood scores do not necessarily correlate with more semantically coherent induced topics (Chang etal., 2009). |

Introduction | This paper takes up tools ( “topic models” ) that have been proven successful in modelling document-word co-occurrences and adapts them to the task of selectional preference learning. |

Introduction | Section 2 surveys prior work on selectional preference modelling and on semantic applications of topic models . |

Related work | 2.2 Topic modelling |

Related work | In the field of document modelling, a class of methods known as “topic models” have become a de facto standard for identifying semantic structure in documents. |

Related work | As a result of intensive research in recent years, the behaviour of topic models is well-understood and computa-tionally efficient implementations have been developed. |

Three selectional preference models | Unlike some topic models such as HDP (Teh et a1., 2006), LDA is parametric: the number of topics Z must be set by the user in advance. |

topic models is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

- LDA (29)
- topic models (15)
- latent variables (11)

Experiments | We perform three main experiments to assess the quality of the preferences obtained using topic models . |

Experiments | We use this experiment to compare the various topic models as well as the best model with the known state of the art approaches to selectional preferences. |

Experiments | Tigure 3 plots the precision-recall curve for the tseudo-disambiguation experiment comparing the hree different topic models . |

Introduction | In this paper we describe a novel approach to computing selectional preferences by making use of unsupervised topic models . |

Introduction | Unsupervised topic models , such as latent Dirichlet allocation (LDA) (Blei et al., 2003) and its variants are characterized by a set of hidden topics, which represent the underlying semantic structure of a document collection. |

Introduction | Thus, topic models are a natural fit for modeling our relation data. |

Previous Work | Topic models such as LDA (Blei et al., 2003) and its variants have recently begun to see use in many NLP applications such as summarization (Daume III and Marcu, 2006), document alignment and segmentation (Chen et al., 2009), and inferring class-attribute hierarchies (Reisinger and Pasca, 2009). |

Topic Models for Selectional Prefs. | We present a series of topic models for the task of computing selectional preferences. |

Topic Models for Selectional Prefs. | Readers familiar with topic modeling terminology can understand our approach as follows: we treat each relation as a document whose contents consist of a bags of words corresponding to all the noun phrases observed as arguments of the relation in our corpus. |

Topic Models for Selectional Prefs. | 3.5 Advantages of Topic Models |

topic models is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

- topic models (12)
- WordNet (10)
- LDA (8)

Abstract | Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. |

Abstract | One common deficiency of existing topic models , though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. |

Abstract | In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. |

Introduction | As a robust unsupervised way to perform shallow latent semantic analysis of topics in text, probabilistic topic models (Hofmann, 1999a; Blei et al., 2003b) have recently attracted much attention. |

Introduction | Although many topic models have been proposed and shown to be useful (see Section 2 for more detailed discussion of related work), most of them share a common deficiency: they are designed to work only for monolingual text data and would not work well for extracting cross-lingual latent topics, i.e. |

Introduction | In this paper, we propose a novel topic model , called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) model, which can be used to mine shared latent topics from unaligned text data in different languages. |

Related Work | Many topic models have been proposed, and the two basic models are the Probabilistic Latent Semantic Analysis (PLSA) model (Hofmann, 1999a) and the Latent Dirichlet Allocation (LDA) model (Blei et al., 2003b). |

Related Work | They and their extensions have been successfully applied to many problems, including hierarchical topic extraction (Hofmann, 1999b; Blei et al., 2003a; Li and McCallum, 2006), author-topic modeling (Steyvers et al., 2004), contextual topic analysis (Mei and Zhai, 2006), dynamic and correlated topic models (Blei and Lafferty, 2005; Blei and Lafferty, 2006), and opinion analysis (Mei et al., 2007; Branavan et al., 2008). |

Related Work | Some previous work on multilingual topic models assume documents in multiple languages are aligned either at the document level, sentence level or by time stamps (Mimno et al., 2009; Zhao and Xing, 2006; Kim and Khudanpur, 2004; Ni et al., 2009; Wang et al., 2007). |

topic models is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

- cross-lingual (37)
- topic models (17)
- Latent Semantic (7)

Abstract | We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model . |

Background and Motivation | One of the challenges of using a previously trained topic model is that the new document might have a totally new vocabulary or may include many other specific topics, which may or may not exist in the trained model. |

Background and Motivation | A common method is to rebuild a topic model for new sets of documents (Haghighi and Vanderwende, 2009), which has proven to produce coherent summaries. |

Conclusion | We demonstrated that implementation of a summary focused hierarchical topic model to discover sentence structures as well as construction of a discriminative method for inference can benefit summarization quality on manual and automatic evaluation metrics. |

Experiments and Discussions | * HIERSUM : (Haghighi and Vanderwende, 2009) A generative summarization method based on topic models , which uses sentences as an additional level. |

Experiments and Discussions | * HbeSum (Hybrid Flat Summarizer): To investigate the performance of hierarchical topic model , we build another hybrid model using flat LDA (Blei et al., 2003b). |

Experiments and Discussions | Compared to the HbeSum built on LDA, both HybHSum1&2 yield better performance indicating the effectiveness of using hierarchical topic model in summarization task. |

Introduction | We present a probabilistic topic model on sentence level building on hierarchical Latent Dirichlet Allocation (hLDA) (Blei et al., 2003a), which is a generalization of LDA (Blei et al., 2003b). |

Summary-Focused Hierarchical Model | We build a summary-focused hierarchical probabilistic topic model , sumHLDA, for each document cluster at sentence level, because it enables capturing expected topic distributions in given sentences directly from the model. |

Summary-Focused Hierarchical Model | 1Please refer to (Blei et al., 2003b) and (Blei et al., 2003a) for details and demonstrations of topic models . |

topic models is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

- regression model (12)
- unigram (11)
- topic model (10)

Experimental Setup | The underlying topic model was trained with 1,000 topics using only content words (i.e., nouns, verbs, and adjectives) that appeared |

Extractive Caption Generation | Probabilistic Similarity Recall that the backbone of our image annotation model is a topic model with images and documents represented as a probability distribution over latent topics. |

Image Annotation | The basic idea underlying LDA, and topic models in general, is that each document is composed of a probability distribution over topics, where each topic represents a probability distribution over words. |

Results | As can be seen the probabilistic models (KL and J S divergence) outperform word overlap and cosine similarity (all differences are statistically significant, p < 0.01).6 They make use of the same topic model as the image annotation model, and are thus able to select sentences that cover common content. |

topic models is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- phrase-based (15)
- TER (10)
- extractive system (6)