Index of papers in Proc. ACL 2013 that mention
  • topic models
Deveaud, Romain and SanJuan, Eric and Bellot, Patrice
Abstract
The current topic modeling approaches for Information Retrieval do not allow to explicitly model query-oriented latent topics.
Abstract
We propose a model-based feedback approach that learns Latent Dirichlet Allocation topic models on the top-ranked pseudo-relevant feedback, and we measure the semantic coherence of those topics.
Introduction
Based on the words used within a document, topic models learn topic level relations by assuming that the document covers a small set of concepts.
Introduction
This is one of the reasons of the intensive use of topic models (and especially LDA) in current research in Natural Language Processing (NLP) related areas.
Introduction
From that perspective, topic models seem attractive in the sense that they can provide a descriptive and intuitive representation of concepts.
Topic-Driven Relevance Models
Instead of viewing 9 as a set of document language models that are likely to contain topical information about the query, we take a probabilistic topic modeling approach.
Topic-Driven Relevance Models
Figure 1: Semantic coherence of the topic models f( N of feedback documents.
topic models is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Abstract
We present a topic model based approach to answer these questions.
Abstract
Since agreement and disagreement expressions are usually multi-word phrases, we propose to employ a ranking method to identify highly relevant phrases prior to topic modeling .
Empirical Evaluation
For quantitative evaluation, topic models are often compared using perplexity.
Introduction
In our earlier work (Mukherjee and Liu, 2012a), we proposed three topic models to mine contention points, which also extract AD-expressions.
Introduction
In this paper, we further improve the work by coupling an information retrieval method to rank good candidate phrases with topic modeling in order to discover more accurate AD-expressions.
Phrase Ranking based on Relevance
Topics in most topic models like LDA are usually unigram distributions.
Related Work
Topic models: Our work is also related to topic modeling and joint modeling of topics and other information as we jointly model several aspects of discussions/debates.
Related Work
Topic models like pLSA (Hofmann, 1999) and LDA (Blei et al., 2003) have proved to be very successful in mining topics from large text collections.
Related Work
There have been various extensions to multi-grain (Titov and McDonald, 2008), labeled (Ramage et al., 2009), and sequential (Du et al., 2010) topic models .
topic models is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah
Abstract
In addition to regular text classification, we utilized topic modeling of the entire dataset in various ways.
Abstract
Topic modeling of the corpora provides interpretable themes that exist in these reports.
Abstract
A binary topic model was also built as an unsupervised classification approach with the assumption that each topic corresponds to a class.
Background
2.2 Topic Modeling
Background
Topic modeling is an unsupervised learning algorithm that can automatically discover themes of a document collection.
Background
Either sampling methods such as Gibbs Sampling (Griffiths and Steyvers, 2004) or optimization methods such as variational Bayes approximation (Asuncion et al., 2009) can be used to train a topic model based on LDA.
Introduction
In this study, we developed several topic modeling based classification systems for clinical reports.
Introduction
Topic modeling is an unsupervised technique that can automatically identify themes from a given set of documents and find topic distributions of each document.
Introduction
Therefore, topic model output of patient reports could contain very useful clinical information.
Related Work
For text classification, topic modeling techniques have been utilized in various ways.
Related Work
In our study, we removed the most frequent and infrequent words to have a manageable vocabulary size but we did not utilize topic model output for this purpose.
topic models is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Zhu, Jun and Zheng, Xun and Zhang, Bo
Abstract
Supervised topic models with a logistic likelihood have two issues that potentially limit their practical use: 1) response variables are usually over-weighted by document word counts; and 2) existing variational inference methods make strict mean-field assumptions.
Introduction
First, we present a general framework of Bayesian logistic supervised topic models with a regularization parameter to better balance response variables and words.
Introduction
Second, to solve the intractable posterior inference problem of the generalized Bayesian logistic supervised topic models , we present a simple Gibbs sampling algorithm by exploring the ideas of data augmentation (Tanner and Wong, 1987; van Dyk and Meng, 2001; Holmes and Held, 2006).
Introduction
More specifically, we extend Polson’s method for Bayesian logistic regression (Polson et al., 2012) to the generalized logistic supervised topic models , which are much more challeng-
Logistic Supervised Topic Models
We now present the generalized Bayesian logistic supervised topic models .
Logistic Supervised Topic Models
A logistic supervised topic model consists of two parts — an LDA model (Blei et al., 2003) for describing the words W = {wd}dD=1, where Wd 2 {wdnfigl denote the words within document d, and a logistic classifier for considering the supervising signal y = {yd}dD=1.
Logistic Supervised Topic Models
Logistic classifier: To consider binary supervising information, a logistic supervised topic model (e.g., sLDA) builds a logistic classifier using the topic representations as input features
topic models is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Abstract
It can efficiently handle semantic ambiguity by extending standard topic models with two new features.
Experiments
Since MTR provides a mixture of properties adapted from earlier models, we present performance benchmarks on tag clustering using: (i) LDA; (ii) Hidden Markov Topic Model HMTM (Gruber et al., 2005); and, (iii) w-LDA (Petterson et al., 2010) that uses word features as priors in LDA.
Experiments
Each topic model uses Gibbs sampling for inference and parameter learning.
Experiments
For fair comparison, each benchmark topic model is provided with prior information on word-semantic tag distributions based on the labeled training data, hence, each K latent topic is assigned to one of K semantic tags at the beginning of Gibbs sampling.
Introduction
Our first contribution is a new probabilistic topic model , Markov Topic Regression (MTR), which uses rich features to capture the degree of association between words and semantic tags.
Related Work and Motivation
Standard topic models , such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context.
Related Work and Motivation
Recent topic models consider word sequence information in documents (Griffiths et al., 2005; Moon et al., 2010).
Related Work and Motivation
Thus, we build a semantically rich topic model , MTR, using word context features as side information.
topic models is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Abstract
Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training conversations, which is used to bias translation choices towards the current context.
Corpus Data and Baseline SMT
We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models .
Corpus Data and Baseline SMT
We use the English side of these conversations for training LDA topic models .
Incremental Topic-Based Adaptation
4.1 Topic modeling with LDA
Incremental Topic-Based Adaptation
The full conversation history is available for training the topic models and estimating topic distributions in the training set.
Incremental Topic-Based Adaptation
We use Mallet (McCallum, 2002) for training topic models and inferring topic distributions.
Introduction
We begin by building a monolingual latent Dirichlet allocation (LDA) topic model on the training conversations (each conversation corresponds to a “document” in the LDA paradigm).
Relation to Prior Work
To avoid the need for hard decisions about domain membership, some have used topic modeling to improve SMT performance, e.g., using latent semantic analysis (Tam et al., 2007) or ‘biTAM’ (Zhao and Xing, 2006).
Relation to Prior Work
(2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document.
Relation to Prior Work
While our proposed approach also employs monolingual LDA topic models , it deviates from the above methods in the following important ways.
topic models is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
O'Connor, Brendan and Stewart, Brandon M. and Smith, Noah A.
Abstract
Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models ) with contextual political information—temporal and dyad dependence—to infer latent event classes.
Experiments
This is an example of individual lexical features outperforming a topic model for predictive task, because the topic model’s dimension reduction obscures important indicators from individual words.
Experiments
Similarly, Gerrish and Blei (2011) found that word-based regression outperformed a customized topic model when predicting Congressional bill passage, and Eisen-
Experiments
13 In the latter, a problem-specific topic model did best.
Introduction
We use syntactic preprocessing and a logistic normal topic model , including latent temporal smoothing on the political context prior.
Model
Thus the language model is very similar to a topic model’s generation of token topics and wordtypes.
Model
This simple logistic normal prior is, in terms of topic models , analogous to the asymmetric Dirichlet prior version of LDA in Wallach et al.
topic models is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lu, Xiaoming and Xie, Lei and Leung, Cheung-Chi and Ma, Bin and Li, Haizhou
Experimental setup
We separated this corpus into three non-overlapping sets: a training set of 500 programs for parameter estimation in topic modeling and LE, a development set of 133 programs for empirical tuning and a test set of 400 programs for performance evaluation.
Experimental setup
When evaluating the effects of different size of the training set, the number of latent topics in topic modeling process was set to 64.
Experimental setup
When evaluating the effects of different number of latent topics in topic modeling computation, we fixed the size of the training set to 500 news programs and changed the number of latent topics from 16 to 256.
Introduction
To deal with these problems, some topic model techniques which provide conceptual level matching have been introduced to text and story segmentation task (Hearst, 1997).
topic models is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Reschke, Kevin and Vogel, Adam and Jurafsky, Dan
Conclusion
Using topic models to discover subtypes of businesses, a domain-specific sentiment lexicon, and a number of new techniques for increasing precision in sentiment aspect extraction yields attributes that give a rich representation of the restaurant domain.
Evaluation
‘Top-level’ repeatedly queries the user’s top-level category preferences, ‘Subtopic’ additionally uses our topic modeling subcategories, and ‘All’ uses these plus the aspects extracted from reviews.
Generating Questions from Reviews
Using these topic models , we assign a business
Generating Questions from Reviews
2We use the Topic Modeling Toolkit implementation: http://n1p.stanford.edu/software/tmt
Introduction
The framework makes use of techniques from topic modeling and sentiment-based aspect extraction to identify fine-grained attributes for each business.
topic models is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Zede and Li, Miao and Chen, Lei and Yang, Zhenxin
Building comparable corpora
generate the bilingual topic model (0” from the
Introduction
Preiss (2012) transformed the source language topical model to the target language and classified probability distribution of topics in the same language, whose shortcoming is that the effect of model translation seriously hampers the comparable corpora quality.
Introduction
(2009) adapted monolingual topic model to bilingual topic model in which the documents of a concept unit in different languages were assumed to share identical topic distribution.
Introduction
Bilingual topic model is widely adopted to mine translation equivalents from multi-language documents (Mimno et al., 2009;Ivaneta1,2011)
topic models is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng
Experiments
The larger question threads data is employed for feature learning, such as translation model, and topic model training.
Proposed Features
Topic Model
Proposed Features
To reduce the false negatives of word mismatch in vector model, we also use the topic models to extend matching to semantic topic level.
Proposed Features
The topic model , such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003), considers a collection of documents with K latent topics, where K is much smaller than the number of words.
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan
Discussion and Future Work
While most works on context-insensitive predicate inference rules, such as DIRT (Lin and Pantel, 2001), are based on word-level similarity measures, almost all prior models addressing context-sensitive predicate inference rules are based on topic models (except for (Pantel et al., 2007), which was outperformed by later models).
Discussion and Future Work
In addition, (Dinu and Lapata, 2010a) adapted the predicate inference topic model from (Dinu and Lapata, 2010b) to compute lexical similarity in context.
Introduction
This way, we calculate similarity over vectors in the original word space, while biasing them towards the given context via a topic model .
Two-level Context-sensitive Inference
Our model follows the general DIRT scheme while extending it to handle context-sensitive scoring of rule applications, addressing the scenario dealt by the context-sensitive topic models .
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
RSP: A Random Walk Model for SP
Since RSP falls into the unsupervised distributional approach, we compare it with previous similarity-based methods and unsupervised generative topic model 3.
RSP: A Random Walk Model for SP
C) Seaghdha (2010) applies topic models for the SP induction with three variations: LDA, Rooth-LDA, and Dual-LDA; Ritter et al.
RSP: A Random Walk Model for SP
We use the Mat-lab Topic Modeling Toolbox4 for the inference of latent topics.
Related Work 2.1 WordNet-based Approach
Recently, more sophisticated methods are innovated for SP based on topic models , where the latent variables (topics) take the place of semantic classes and distributional clusterings (Seaghdha, 2010; Ritter et al., 2010).
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.
Abstract
To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling .
Term Ambiguity Detection (TAD)
To do so, we utilized the popular Latent Dirichlet Allocation (LDA (Blei et al., 2003)) topic modeling method.
Term Ambiguity Detection (TAD)
Following standard procedure, stopwords and infrequent words were removed before topic modeling was performed.
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Hongzhao and Wen, Zhen and Yu, Dian and Ji, Heng and Sun, Yizhou and Han, Jiawei and Li, He
Experiments
We also attempted using topic modeling approach to detect target candidates.
Experiments
As shown in Table9 (K is the number of predefined topics), PLSA is not quite effective mainly because traditional topic modeling approaches do not perform well on short texts from social media.
Target Candidate Identification
For comparison we also attempted topic modeling approach to detect target candidates, as shown in section 5.3.
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Bayesian MT Decipherment via Hash Sampling
Firstly, we would like to include as many features as possible to represent the source/target words in our framework besides simple bag-of-words context similarity (for example, left-context, right-context, and other general-purpose features based on topic models , etc.).
Feature-based representation for Source and Target
Similarly, we can add other features based on topic models , orthography (Haghighi et al., 2008), temporal (Klementiev et al., 2012), etc.
Feature-based representation for Source and Target
We note that the new sampling framework is easily extensible to many additional feature types (for example, monolingual topic model features, etc.)
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: