Index of papers in Proc. ACL 2014 that mention

topic models

Seen in text as:

topic models (103)
topic model (57)
topic modeling (22)
Topic models (12)
Topic Models (6)
Topic modeling (3)

Seen in 191 sentences in 18 papers.

1. A Sense-Based Translation Model for Statistical Machine Translation

Xiong, Deyi and Zhang, Min

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We build a broad-coverage sense tagger based on a nonparametric Bayesian topic model that automatically learns sense clusters for words in the source language.
Introduction	We use a nonparametric Bayesian topic model based WSI to infer word senses for source words in our training, development and test set.
Related Work	For ease of comparison, we roughly divide them into 4 categories: 1) WSD for SMT, 2) topic-based W81, 3) topic model for SMT and 4) lexical selection.
Related Work	Brody and Lapata (2009)’s work is the first attempt to approach WSI Via topic modeling .
Related Work	They adapt LDA to word sense induction by building one topic model per word type.
WSI-Based Broad-Coverage Sense Tagger	Recently, we have also witnessed that WSI is cast as a topic modeling problem where the sense clusters of a word type are considered as underlying topics (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012).
WSI-Based Broad-Coverage Sense Tagger	We follow this line to tailor a topic modeling framework to induce word senses for our large-scale training data.
WSI-Based Broad-Coverage Sense Tagger	We can induce topics on this corpus for each pseudo document via topic modeling approaches.

topic models is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

2. Query-Chain Focused Summarization

Baumel, Tal and Cohen, Raphael and Elhadad, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an algorithm for Query—Chain Summarization based on a new LDA topic model variant.
Algorithms	We developed a novel Topic Model to identify words that are associated to the current query and not shared with the previous queries.
Algorithms	Figure 3 Plate Model for Our Topic Model
Algorithms	We implemented inference over this topic model using Gibbs Sampling (we distribute the code of the sampler together with our dataset).
Introduction	We introduce a new algorithm to address the task of Query-Chain Focused Summarization, based on a new LDA topic model variant, and present an evaluation which demonstrates it improves on these baselines.
Previous Work	As evidenced since (Daume and Marcu, 2006), Bayesian techniques have proven effective at this task: we construct a latent topic model on the basis of the document set and the query.
Previous Work	This topic model effectively serves as a query expansion mechanism, which helps assess the relevance of individual sentences to the original que-
Previous Work	In recent years, three major techniques have emerged to perform multi-document summarization: graph-based methods such as LexRank (Erkan and Radev, 2004) for multi document summarization and Biased-LexRank (Otterbacher et al., 2008) for query focused summarization, language model methods such as KLSum (Haghighi and Vanderwende, 2009) and variants of KLSum based on topic models such as BayesSum (Daume and Marcu, 2006) and TopicSum (Haghighi and Vanderwende, 2009).

topic models is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

3. Aspect Extraction with Automated Prior Knowledge Learning

Chen, Zhiyuan and Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Topic modeling is a popular method for the task.
Abstract	However, unsupervised topic models often generate incoherent aspects.
Abstract	Such knowledge can then be used by a topic model to discover more coherent aspects.
Introduction	Recently, topic models have been extensively applied to aspect extraction because they can perform both subtasks at the same time while other
Introduction	Traditional topic models such as LDA (Blei et al., 2003) and pLSA (Hofmann, 1999) are unsupervised methods for extracting latent topics in text documents.
Introduction	However, researchers have shown that fully unsupervised models often produce incoherent topics because the objective functions of topic models do not always correlate well with human judgments (Chang et al., 2009).
Related Work	To extract and group aspects simultaneously, topic models have been applied by researchers (Branavan et al., 2008, Brody and Elhadad, 2010, Chen et al., 2013b, Fang and Huang, 2012, He et al., 2011, Jo and Oh, 2011, Kim et al., 2013, Lazaridou et al., 2013, Li et al., 2011, Lin and He, 2009, Lu et al., 2009, Lu et al., 2012, Lu and Zhai, 2008, Mei et al., 2007, Moghaddam and Ester, 2013, Mukherjee and Liu, 2012, Sauper and Barzilay, 2013, Titov and McDonald, 2008, Wang et al., 2010, Zhao et al., 2010).

topic models is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

4. Learning Polylingual Topic Models from Code-Switched Social Media Documents

Peng, Nanyun and Wang, Yiming and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages.
Code-Switching	By collecting the entire conversation into a single document we provide the topic model with additional content.
Code-Switching	To train a polylingual topic model on social media, we make two modifications to the model of Mimno et al.
Code-Switching	First, polylingual topic models require parallel or comparable corpora in which each document has an assigned language.
Introduction	Topic models (Blei et al., 2003) have become standard tools for analyzing document collections, and topic analyses are quite common for social media (Paul and Dredze, 2011; Zhao et al., 2011; Hong and Davison, 2010; Ramage et al., 2010; Eisenstein et al., 2010).
Introduction	A good candidate for multilingual topic analyses are polylingual topic models (Mimno et al., 2009), which learn topics for multiple languages, creating tuples of language specific distributions over monolingual vocabularies for each topic.
Introduction	Polylingual topic models enable cross language analysis by grouping documents by topic regardless of language.

topic models is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Anchors Regularized: Adding Robustness and Extensibility to Scalable Topic-Modeling Algorithms

Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We examine Arora et al.’s anchor words algorithm for topic modeling and develop new, regularized algorithms that not only mathematically resemble Gaussian and Dirichlet priors but also improve the interpretability of topic models .
Adding Regularization	While the distribution over topics is typically Dirichlet, Dirichlet distributions have been replaced by logistic normals in topic modeling applications (Blei and Lafferty, 2005) and for probabilistic grammars of language (Cohen and Smith, 2009).
Anchor Words: Scalable Topic Models	In this section, we briefly review the anchor method and place it in the context of topic model inference.
Anchor Words: Scalable Topic Models	Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data.
Anchor Words: Scalable Topic Models	Like other topic modeling algorithms, the output of the anchor method is the topic word distributions A with size V >I< K, where K is the total number of topics desired, a parameter of the algorithm.
Introduction	Topic models are of practical and theoretical interest.
Introduction	Modern topic models are formulated as a latent variable model.
Introduction	Unlike a HMM, topic models assume that each document is an admixture of these hidden components called topics.

topic models is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

6. Polylingual Tree-Based Topic Models for Translation Domain Adaptation

Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Topic models , an unsupervised technique for inferring translation domains improve machine translation quality.
Abstract	We propose new polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes.
Introduction	Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA), are one of the most popular statistical frameworks for navigating large unannotated document collections.
Introduction	Topic models discover—without any supervision—the primary themes presented in a dataset: the namesake topics.
Introduction	Topic models have two primary applications: to aid human exploration of corpora (Chang et al., 2009) or serve as a low-dimensional representation for downstream applications.

topic models is mentioned in 47 sentences in this paper.

Topics mentioned in this paper:

7. Perplexity on Reduced Corpora

Kobayashi, Hayato

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Based on the assumption that a corpus follows Zipf’s law, we derive tradeoff formulae of the perpleXity of k-gram models and topic models with respect to the size of the reduced vocabulary.
Experiments	We used Zip fMix only for the experiments on topic models .
Experiments	Therefore, we can use TheoryAve as a heuristic function for estimating the perplexity of topic models .
Introduction	Removing low-frequency words from a corpus (often called cutofi‘) is a common practice to save on the computational costs involved in learning language models and topic models .
Introduction	In the case of topic models , the intuition is that low-frequency words do not make a large contribution to the statistics of the models.
Introduction	Actually, when we try to roughly analyze a corpus with topic models , a reduced corpus is enough for the purpose (Steyvers and Griffiths, 2007).
Perplexity on Reduced Corpora	3.3 PerpleXity of Topic Models
Perplexity on Reduced Corpora	In this section, we consider the perpleXity of the widely used topic model , Latent Dirichlet Allocation (LDA) (Blei et al., 2003), by using the notation given in (Griffiths and Steyvers, 2004).
Preliminaries	Perplexity is a widely used evaluation measure of k-gram models and topic models .

topic models is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

unigram (13)
topic models (12)
LDA (10)

8. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Topic modeling is a useful mechanism for discovering and characterizing various semantic concepts embedded in a collection of documents.
Introduction	In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007).
Introduction	Since the information within the sentence is insufficient for topic modeling , we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected.
Related Work	Topic modeling was first leveraged to improve SMT performance in (Zhao and Xing, 2006; Zhao and Xing, 2007).
Related Work	Another direction of approaches leveraged topic modeling techniques for domain adaptation.
Related Work	Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM.

topic models is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Learning Word Sense Distributions, Detecting Unattested Senses and Identifying Novel Senses Using Topic Models

Lau, Jey Han and Cook, Paul and McCarthy, Diana and Gella, Spandana and Baldwin, Timothy

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and Related Work	Recent work on finding novel senses has tended to focus on comparing diachronic corpora (Sagi et al., 2009; Cook and Stevenson, 2010; Gulor—dava and Baroni, 2011) and has also considered topic models (Lau et al., 2012).
Introduction	In this paper, we propose a method which uses topic models to estimate word sense distributions.
Introduction	Topic models have been used for WSD in a number of studies (Boyd-Graber et al., 2007; Li et al., 2010; Lau et al., 2012; Preiss and Stevenson, 2013; Cai et al., 2007; Knopp et al., 2013), but our work extends significantly on this earlier work in focusing on the acquisition of prior word sense distributions (and predominant senses).
Introduction	(2004b), the use of topic models makes this possible, using topics as a proxy for sense (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012).
Methodology	(2006)), a nonparametric variant of a Latent Dirichlet Allocation topic model (Blei et al., 2003) where the model automatically opti-mises the number of topics in a fully-unsupervised fashion over the training data.
Methodology	To learn the senses of a target lemma, we train a single topic model per target lemma.

topic models is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Motivation	Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams.
Motivation	Indeed there is work in the literature that shows that various topic models , latent or otherwise, can be useful for improving lan-
Motivation	In the information retrieval community, clustering and latent topic models have yielded improvements over traditional vector space models.

topic models is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Discovering Latent Structure in Task-Oriented Dialogues

Zhai, Ke and Williams, Jason D

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our methods synthesize hidden Markov models (for underlying state) and topic models (to connect words to states).
Introduction	In this paper, we retain the underlying HMM, but assume words are emitted using topic models (TM), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA).
Latent Structure in Dialogues	In other words, instead of generating words via a LM, we generate words from a topic model (TM), where each state maps to a mixture of topics.
Latent Structure in Dialogues	4Note that a TM-HMMS model with state-specific topic models (instead of state-specific language models) would be subsumed by TM—HMM, since one topic could be used as the background topic in TM -HMMS.

topic models is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Weak semantic context helps phonetic learning in a model of infant language acquisition

Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and overview of models	To demonstrate the benefit of situational information, we develop the Topic-Lexical-Distributional (TLD) model, which extends the LD model by assuming that words appear in situations analogous to documents in a topic model .
Background and overview of models	(2012) found that topics learned from similar transcript data using a topic model were strongly correlated with immediate activities and contexts.
Background and overview of models	training an LDA topic model (Blei et al., 2003) on a superset of the child-directed transcript data we use for lexical-phonetic learning, dividing the transcripts into small sections (the ‘documents’ in LDA) that serve as our distinct situations h. As noted above, the learned document-topic distributions 6 are treated as observed variables in the TLD model to represent the situational context.
Introduction	However, in our simulations we approximate the environmental information by running a topic model (Blei et al., 2003) over a corpus of child-directed speech to infer a topic distribution for each situation.

topic models is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cohen, Shay B. and Collins, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	(2012) in the context of learning topic models .
Related Work	Recently a number of researchers have developed provably correct algorithms for parameter estimation in latent variable models such as hidden Markov models, topic models , directed graphical models with latent variables, and so on (Hsu et al., 2009; Bailly et al., 2010; Siddiqi et al., 2010; Parikh et al., 2011; Balle et al., 2011; Arora et al., 2013; Dhillon et al., 2012; Anandkumar et al., 2012; Arora et al., 2012; Arora et al., 2013).
Related Work	Our work is most directly related to the algorithm for parameter estimation in topic models described by Arora et al.
The Learning Algorithm for L-PCFGS	(2012) in the context of learning topic models .

topic models is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Towards a General Rule for Identifying Deceptive Opinion Spam

Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Feature-based Additive Model	where we assume ysemZ—ment and ypomam are given for each document d. Note that we assume conditional independence between features and words given 3/, similar to other topic models (Blei et al., 2003).
Related Work	(2011), which can be viewed as an combination of topic models (Blei et al., 2003) and generalized additive models (Hastie and Tibshirani, 1990).
Related Work	Unlike other derivatives of topic models , SAGE drops the Dirichlet-multinomial assumption and adopts a Laplacian prior, triggering sparsity in topic-word distribution.

topic models is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

15. Extracting Opinion Targets and Opinion Words from Online Reviews with Graph Co-ranking

Liu, Kang and Xu, Liheng and Zhao, Jun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words.
Related Work	In terms of considering semantic relations among words, our method is related with several approaches based on topic model (Zhao et al., 2010; Moghaddam and Ester, 2011; Moghaddam and Ester, 2012a; Moghaddam and Ester, 2012b; Mukherjee and Liu, 2012).
The Proposed Method	After topic modeling , we obtain the probability of the candidates (2)75 and 210) to topic 2, i.e.

topic models is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. Sprinkling Topics for Weakly Supervised Text Classification

Hingmire, Swapnil and Chakraborti, Sutanu

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	LDA is an unsupervised probabilistic topic model and it is widely used to discover latent semantic structure of a document collection by modeling words in the documents.
Introduction	In this approach, a topic model on a given set of unlabeled training documents is constructed using LDA, then an annotator assigns a class label to some topics based on their most probable words.
Introduction	These labeled topics are used to create a new topic model such that in the new model topics are better aligned to class labels.

topic models is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Robust Entity Clustering via Phylogenetic Inference

Andrews, Nicholas and Eisner, Jason and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generative Model of Coreference	The entire corpus, including these entities, is generated according to standard topic model assumptions; we first generate a topic distribution for a document, then sample topics and words for the document (Blei et al., 2003).
Inference by Block Gibbs Sampling	The \1136 factors in (5) approximate the topic model’s prior distribution over z. is proportional to the probability that a Gibbs sampling step for an ordinary topic model would choose this value of cc.z.
Introduction	Our novel approach features: §4.1 A topical model of which entities from previ-

topic models is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Lexical Inference over Multi-Word Predicates: A Distributional Approach

Abend, Omri and Cohen, Shay B. and Steedman, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background and Related Work	(2013a) used topic models to combine type-level predicate inference rules with token-level information from their arguments in a specific context.
Our Proposal: A Latent LC Approach	We further incorporate features based on a Latent Dirichlet Allocation (LDA) topic model (Blei et al., 2003).
Our Proposal: A Latent LC Approach	Several recent works have underscored the usefulness of using topic models to model a predicate’s selectional preferences (Ritter et al., 2010; Dinu and Lapata, 2010; Seaghdha, 2010; Lewis and Steedman, 2013; Melamud et al., 2013a).

topic models is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: