Index of papers in Proc. ACL 2014 that mention
  • topic models
Xiong, Deyi and Zhang, Min
Abstract
We build a broad-coverage sense tagger based on a nonparametric Bayesian topic model that automatically learns sense clusters for words in the source language.
Introduction
We use a nonparametric Bayesian topic model based WSI to infer word senses for source words in our training, development and test set.
Related Work
For ease of comparison, we roughly divide them into 4 categories: 1) WSD for SMT, 2) topic-based W81, 3) topic model for SMT and 4) lexical selection.
Related Work
Brody and Lapata (2009)’s work is the first attempt to approach WSI Via topic modeling .
Related Work
They adapt LDA to word sense induction by building one topic model per word type.
WSI-Based Broad-Coverage Sense Tagger
Recently, we have also witnessed that WSI is cast as a topic modeling problem where the sense clusters of a word type are considered as underlying topics (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012).
WSI-Based Broad-Coverage Sense Tagger
We follow this line to tailor a topic modeling framework to induce word senses for our large-scale training data.
WSI-Based Broad-Coverage Sense Tagger
We can induce topics on this corpus for each pseudo document via topic modeling approaches.
topic models is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Baumel, Tal and Cohen, Raphael and Elhadad, Michael
Abstract
We present an algorithm for Query—Chain Summarization based on a new LDA topic model variant.
Algorithms
We developed a novel Topic Model to identify words that are associated to the current query and not shared with the previous queries.
Algorithms
Figure 3 Plate Model for Our Topic Model
Algorithms
We implemented inference over this topic model using Gibbs Sampling (we distribute the code of the sampler together with our dataset).
Introduction
We introduce a new algorithm to address the task of Query-Chain Focused Summarization, based on a new LDA topic model variant, and present an evaluation which demonstrates it improves on these baselines.
Previous Work
As evidenced since (Daume and Marcu, 2006), Bayesian techniques have proven effective at this task: we construct a latent topic model on the basis of the document set and the query.
Previous Work
This topic model effectively serves as a query expansion mechanism, which helps assess the relevance of individual sentences to the original que-
Previous Work
In recent years, three major techniques have emerged to perform multi-document summarization: graph-based methods such as LexRank (Erkan and Radev, 2004) for multi document summarization and Biased-LexRank (Otterbacher et al., 2008) for query focused summarization, language model methods such as KLSum (Haghighi and Vanderwende, 2009) and variants of KLSum based on topic models such as BayesSum (Daume and Marcu, 2006) and TopicSum (Haghighi and Vanderwende, 2009).
topic models is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Chen, Zhiyuan and Mukherjee, Arjun and Liu, Bing
Abstract
Topic modeling is a popular method for the task.
Abstract
However, unsupervised topic models often generate incoherent aspects.
Abstract
Such knowledge can then be used by a topic model to discover more coherent aspects.
Introduction
Recently, topic models have been extensively applied to aspect extraction because they can perform both subtasks at the same time while other
Introduction
Traditional topic models such as LDA (Blei et al., 2003) and pLSA (Hofmann, 1999) are unsupervised methods for extracting latent topics in text documents.
Introduction
However, researchers have shown that fully unsupervised models often produce incoherent topics because the objective functions of topic models do not always correlate well with human judgments (Chang et al., 2009).
Related Work
To extract and group aspects simultaneously, topic models have been applied by researchers (Branavan et al., 2008, Brody and Elhadad, 2010, Chen et al., 2013b, Fang and Huang, 2012, He et al., 2011, Jo and Oh, 2011, Kim et al., 2013, Lazaridou et al., 2013, Li et al., 2011, Lin and He, 2009, Lu et al., 2009, Lu et al., 2012, Lu and Zhai, 2008, Mei et al., 2007, Moghaddam and Ester, 2013, Mukherjee and Liu, 2012, Sauper and Barzilay, 2013, Titov and McDonald, 2008, Wang et al., 2010, Zhao et al., 2010).
topic models is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Peng, Nanyun and Wang, Yiming and Dredze, Mark
Abstract
Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages.
Code-Switching
By collecting the entire conversation into a single document we provide the topic model with additional content.
Code-Switching
To train a polylingual topic model on social media, we make two modifications to the model of Mimno et al.
Code-Switching
First, polylingual topic models require parallel or comparable corpora in which each document has an assigned language.
Introduction
Topic models (Blei et al., 2003) have become standard tools for analyzing document collections, and topic analyses are quite common for social media (Paul and Dredze, 2011; Zhao et al., 2011; Hong and Davison, 2010; Ramage et al., 2010; Eisenstein et al., 2010).
Introduction
A good candidate for multilingual topic analyses are polylingual topic models (Mimno et al., 2009), which learn topics for multiple languages, creating tuples of language specific distributions over monolingual vocabularies for each topic.
Introduction
Polylingual topic models enable cross language analysis by grouping documents by topic regardless of language.
topic models is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan
Abstract
We examine Arora et al.’s anchor words algorithm for topic modeling and develop new, regularized algorithms that not only mathematically resemble Gaussian and Dirichlet priors but also improve the interpretability of topic models .
Adding Regularization
While the distribution over topics is typically Dirichlet, Dirichlet distributions have been replaced by logistic normals in topic modeling applications (Blei and Lafferty, 2005) and for probabilistic grammars of language (Cohen and Smith, 2009).
Anchor Words: Scalable Topic Models
In this section, we briefly review the anchor method and place it in the context of topic model inference.
Anchor Words: Scalable Topic Models
Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data.
Anchor Words: Scalable Topic Models
Like other topic modeling algorithms, the output of the anchor method is the topic word distributions A with size V >I< K, where K is the total number of topics desired, a parameter of the algorithm.
Introduction
Topic models are of practical and theoretical interest.
Introduction
Modern topic models are formulated as a latent variable model.
Introduction
Unlike a HMM, topic models assume that each document is an admixture of these hidden components called topics.
topic models is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Abstract
Topic models , an unsupervised technique for inferring translation domains improve machine translation quality.
Abstract
We propose new polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes.
Introduction
Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA), are one of the most popular statistical frameworks for navigating large unannotated document collections.
Introduction
Topic models discover—without any supervision—the primary themes presented in a dataset: the namesake topics.
Introduction
Topic models have two primary applications: to aid human exploration of corpora (Chang et al., 2009) or serve as a low-dimensional representation for downstream applications.
topic models is mentioned in 47 sentences in this paper.
Topics mentioned in this paper:
Kobayashi, Hayato
Abstract
Based on the assumption that a corpus follows Zipf’s law, we derive tradeoff formulae of the perpleXity of k-gram models and topic models with respect to the size of the reduced vocabulary.
Experiments
We used Zip fMix only for the experiments on topic models .
Experiments
Therefore, we can use TheoryAve as a heuristic function for estimating the perplexity of topic models .
Introduction
Removing low-frequency words from a corpus (often called cutofi‘) is a common practice to save on the computational costs involved in learning language models and topic models .
Introduction
In the case of topic models , the intuition is that low-frequency words do not make a large contribution to the statistics of the models.
Introduction
Actually, when we try to roughly analyze a corpus with topic models , a reduced corpus is enough for the purpose (Steyvers and Griffiths, 2007).
Perplexity on Reduced Corpora
3.3 PerpleXity of Topic Models
Perplexity on Reduced Corpora
In this section, we consider the perpleXity of the widely used topic model , Latent Dirichlet Allocation (LDA) (Blei et al., 2003), by using the notation given in (Griffiths and Steyvers, 2004).
Preliminaries
Perplexity is a widely used evaluation measure of k-gram models and topic models .
topic models is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Introduction
Topic modeling is a useful mechanism for discovering and characterizing various semantic concepts embedded in a collection of documents.
Introduction
In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007).
Introduction
Since the information within the sentence is insufficient for topic modeling , we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected.
Related Work
Topic modeling was first leveraged to improve SMT performance in (Zhao and Xing, 2006; Zhao and Xing, 2007).
Related Work
Another direction of approaches leveraged topic modeling techniques for domain adaptation.
Related Work
Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM.
topic models is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lau, Jey Han and Cook, Paul and McCarthy, Diana and Gella, Spandana and Baldwin, Timothy
Background and Related Work
Recent work on finding novel senses has tended to focus on comparing diachronic corpora (Sagi et al., 2009; Cook and Stevenson, 2010; Gulor—dava and Baroni, 2011) and has also considered topic models (Lau et al., 2012).
Introduction
In this paper, we propose a method which uses topic models to estimate word sense distributions.
Introduction
Topic models have been used for WSD in a number of studies (Boyd-Graber et al., 2007; Li et al., 2010; Lau et al., 2012; Preiss and Stevenson, 2013; Cai et al., 2007; Knopp et al., 2013), but our work extends significantly on this earlier work in focusing on the acquisition of prior word sense distributions (and predominant senses).
Introduction
(2004b), the use of topic models makes this possible, using topics as a proxy for sense (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012).
Methodology
(2006)), a nonparametric variant of a Latent Dirichlet Allocation topic model (Blei et al., 2003) where the model automatically opti-mises the number of topics in a fully-unsupervised fashion over the training data.
Methodology
To learn the senses of a target lemma, we train a single topic model per target lemma.
topic models is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Motivation
Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams.
Motivation
Indeed there is work in the literature that shows that various topic models , latent or otherwise, can be useful for improving lan-
Motivation
In the information retrieval community, clustering and latent topic models have yielded improvements over traditional vector space models.
topic models is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhai, Ke and Williams, Jason D
Abstract
Our methods synthesize hidden Markov models (for underlying state) and topic models (to connect words to states).
Introduction
In this paper, we retain the underlying HMM, but assume words are emitted using topic models (TM), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA).
Latent Structure in Dialogues
In other words, instead of generating words via a LM, we generate words from a topic model (TM), where each state maps to a mixture of topics.
Latent Structure in Dialogues
4Note that a TM-HMMS model with state-specific topic models (instead of state-specific language models) would be subsumed by TM—HMM, since one topic could be used as the background topic in TM -HMMS.
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon
Background and overview of models
To demonstrate the benefit of situational information, we develop the Topic-Lexical-Distributional (TLD) model, which extends the LD model by assuming that words appear in situations analogous to documents in a topic model .
Background and overview of models
(2012) found that topics learned from similar transcript data using a topic model were strongly correlated with immediate activities and contexts.
Background and overview of models
training an LDA topic model (Blei et al., 2003) on a superset of the child-directed transcript data we use for lexical-phonetic learning, dividing the transcripts into small sections (the ‘documents’ in LDA) that serve as our distinct situations h. As noted above, the learned document-topic distributions 6 are treated as observed variables in the TLD model to represent the situational context.
Introduction
However, in our simulations we approximate the environmental information by running a topic model (Blei et al., 2003) over a corpus of child-directed speech to infer a topic distribution for each situation.
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Collins, Michael
Introduction
(2012) in the context of learning topic models .
Related Work
Recently a number of researchers have developed provably correct algorithms for parameter estimation in latent variable models such as hidden Markov models, topic models , directed graphical models with latent variables, and so on (Hsu et al., 2009; Bailly et al., 2010; Siddiqi et al., 2010; Parikh et al., 2011; Balle et al., 2011; Arora et al., 2013; Dhillon et al., 2012; Anandkumar et al., 2012; Arora et al., 2012; Arora et al., 2013).
Related Work
Our work is most directly related to the algorithm for parameter estimation in topic models described by Arora et al.
The Learning Algorithm for L-PCFGS
(2012) in the context of learning topic models .
topic models is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard
Feature-based Additive Model
where we assume ysemZ—ment and ypomam are given for each document d. Note that we assume conditional independence between features and words given 3/, similar to other topic models (Blei et al., 2003).
Related Work
(2011), which can be viewed as an combination of topic models (Blei et al., 2003) and generalized additive models (Hastie and Tibshirani, 1990).
Related Work
Unlike other derivatives of topic models , SAGE drops the Dirichlet-multinomial assumption and adopts a Laplacian prior, triggering sparsity in topic-word distribution.
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Kang and Xu, Liheng and Zhao, Jun
Experiments
Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words.
Related Work
In terms of considering semantic relations among words, our method is related with several approaches based on topic model (Zhao et al., 2010; Moghaddam and Ester, 2011; Moghaddam and Ester, 2012a; Moghaddam and Ester, 2012b; Mukherjee and Liu, 2012).
The Proposed Method
After topic modeling , we obtain the probability of the candidates (2)75 and 210) to topic 2, i.e.
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hingmire, Swapnil and Chakraborti, Sutanu
Introduction
LDA is an unsupervised probabilistic topic model and it is widely used to discover latent semantic structure of a document collection by modeling words in the documents.
Introduction
In this approach, a topic model on a given set of unlabeled training documents is constructed using LDA, then an annotator assigns a class label to some topics based on their most probable words.
Introduction
These labeled topics are used to create a new topic model such that in the new model topics are better aligned to class labels.
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Andrews, Nicholas and Eisner, Jason and Dredze, Mark
Generative Model of Coreference
The entire corpus, including these entities, is generated according to standard topic model assumptions; we first generate a topic distribution for a document, then sample topics and words for the document (Blei et al., 2003).
Inference by Block Gibbs Sampling
The \1136 factors in (5) approximate the topic model’s prior distribution over z. is proportional to the probability that a Gibbs sampling step for an ordinary topic model would choose this value of cc.z.
Introduction
Our novel approach features: §4.1 A topical model of which entities from previ-
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Background and Related Work
(2013a) used topic models to combine type-level predicate inference rules with token-level information from their arguments in a specific context.
Our Proposal: A Latent LC Approach
We further incorporate features based on a Latent Dirichlet Allocation (LDA) topic model (Blei et al., 2003).
Our Proposal: A Latent LC Approach
Several recent works have underscored the usefulness of using topic models to model a predicate’s selectional preferences (Ritter et al., 2010; Dinu and Lapata, 2010; Seaghdha, 2010; Lewis and Steedman, 2013; Melamud et al., 2013a).
topic models is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: