Abstract | We build a broad-coverage sense tagger based on a nonparametric Bayesian topic model that automatically learns sense clusters for words in the source language. |
Introduction | We use a nonparametric Bayesian topic model based WSI to infer word senses for source words in our training, development and test set. |
Related Work | For ease of comparison, we roughly divide them into 4 categories: 1) WSD for SMT, 2) topic-based W81, 3) topic model for SMT and 4) lexical selection. |
Related Work | Brody and Lapata (2009)’s work is the first attempt to approach WSI Via topic modeling . |
Related Work | They adapt LDA to word sense induction by building one topic model per word type. |
WSI-Based Broad-Coverage Sense Tagger | Recently, we have also witnessed that WSI is cast as a topic modeling problem where the sense clusters of a word type are considered as underlying topics (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012). |
WSI-Based Broad-Coverage Sense Tagger | We follow this line to tailor a topic modeling framework to induce word senses for our large-scale training data. |
WSI-Based Broad-Coverage Sense Tagger | We can induce topics on this corpus for each pseudo document via topic modeling approaches. |
Abstract | We present an algorithm for Query—Chain Summarization based on a new LDA topic model variant. |
Algorithms | We developed a novel Topic Model to identify words that are associated to the current query and not shared with the previous queries. |
Algorithms | Figure 3 Plate Model for Our Topic Model |
Algorithms | We implemented inference over this topic model using Gibbs Sampling (we distribute the code of the sampler together with our dataset). |
Introduction | We introduce a new algorithm to address the task of Query-Chain Focused Summarization, based on a new LDA topic model variant, and present an evaluation which demonstrates it improves on these baselines. |
Previous Work | As evidenced since (Daume and Marcu, 2006), Bayesian techniques have proven effective at this task: we construct a latent topic model on the basis of the document set and the query. |
Previous Work | This topic model effectively serves as a query expansion mechanism, which helps assess the relevance of individual sentences to the original que- |
Previous Work | In recent years, three major techniques have emerged to perform multi-document summarization: graph-based methods such as LexRank (Erkan and Radev, 2004) for multi document summarization and Biased-LexRank (Otterbacher et al., 2008) for query focused summarization, language model methods such as KLSum (Haghighi and Vanderwende, 2009) and variants of KLSum based on topic models such as BayesSum (Daume and Marcu, 2006) and TopicSum (Haghighi and Vanderwende, 2009). |
Abstract | Topic modeling is a popular method for the task. |
Abstract | However, unsupervised topic models often generate incoherent aspects. |
Abstract | Such knowledge can then be used by a topic model to discover more coherent aspects. |
Introduction | Recently, topic models have been extensively applied to aspect extraction because they can perform both subtasks at the same time while other |
Introduction | Traditional topic models such as LDA (Blei et al., 2003) and pLSA (Hofmann, 1999) are unsupervised methods for extracting latent topics in text documents. |
Introduction | However, researchers have shown that fully unsupervised models often produce incoherent topics because the objective functions of topic models do not always correlate well with human judgments (Chang et al., 2009). |
Related Work | To extract and group aspects simultaneously, topic models have been applied by researchers (Branavan et al., 2008, Brody and Elhadad, 2010, Chen et al., 2013b, Fang and Huang, 2012, He et al., 2011, Jo and Oh, 2011, Kim et al., 2013, Lazaridou et al., 2013, Li et al., 2011, Lin and He, 2009, Lu et al., 2009, Lu et al., 2012, Lu and Zhai, 2008, Mei et al., 2007, Moghaddam and Ester, 2013, Mukherjee and Liu, 2012, Sauper and Barzilay, 2013, Titov and McDonald, 2008, Wang et al., 2010, Zhao et al., 2010). |
Abstract | Code-switched documents are common in social media, providing evidence for polylingual topic models to infer aligned topics across languages. |
Code-Switching | By collecting the entire conversation into a single document we provide the topic model with additional content. |
Code-Switching | To train a polylingual topic model on social media, we make two modifications to the model of Mimno et al. |
Code-Switching | First, polylingual topic models require parallel or comparable corpora in which each document has an assigned language. |
Introduction | Topic models (Blei et al., 2003) have become standard tools for analyzing document collections, and topic analyses are quite common for social media (Paul and Dredze, 2011; Zhao et al., 2011; Hong and Davison, 2010; Ramage et al., 2010; Eisenstein et al., 2010). |
Introduction | A good candidate for multilingual topic analyses are polylingual topic models (Mimno et al., 2009), which learn topics for multiple languages, creating tuples of language specific distributions over monolingual vocabularies for each topic. |
Introduction | Polylingual topic models enable cross language analysis by grouping documents by topic regardless of language. |
Abstract | We examine Arora et al.’s anchor words algorithm for topic modeling and develop new, regularized algorithms that not only mathematically resemble Gaussian and Dirichlet priors but also improve the interpretability of topic models . |
Adding Regularization | While the distribution over topics is typically Dirichlet, Dirichlet distributions have been replaced by logistic normals in topic modeling applications (Blei and Lafferty, 2005) and for probabilistic grammars of language (Cohen and Smith, 2009). |
Anchor Words: Scalable Topic Models | In this section, we briefly review the anchor method and place it in the context of topic model inference. |
Anchor Words: Scalable Topic Models | Rethinking Data: Word Co-occurrence Inference in topic models can be viewed as a black box: given a set of documents, discover the topics that best explain the data. |
Anchor Words: Scalable Topic Models | Like other topic modeling algorithms, the output of the anchor method is the topic word distributions A with size V >I< K, where K is the total number of topics desired, a parameter of the algorithm. |
Introduction | Topic models are of practical and theoretical interest. |
Introduction | Modern topic models are formulated as a latent variable model. |
Introduction | Unlike a HMM, topic models assume that each document is an admixture of these hidden components called topics. |
Abstract | Topic models , an unsupervised technique for inferring translation domains improve machine translation quality. |
Abstract | We propose new polylingual tree-based topic models to extract domain knowledge that considers both source and target languages and derive three different inference schemes. |
Introduction | Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA), are one of the most popular statistical frameworks for navigating large unannotated document collections. |
Introduction | Topic models discover—without any supervision—the primary themes presented in a dataset: the namesake topics. |
Introduction | Topic models have two primary applications: to aid human exploration of corpora (Chang et al., 2009) or serve as a low-dimensional representation for downstream applications. |
Abstract | Based on the assumption that a corpus follows Zipf’s law, we derive tradeoff formulae of the perpleXity of k-gram models and topic models with respect to the size of the reduced vocabulary. |
Experiments | We used Zip fMix only for the experiments on topic models . |
Experiments | Therefore, we can use TheoryAve as a heuristic function for estimating the perplexity of topic models . |
Introduction | Removing low-frequency words from a corpus (often called cutofi‘) is a common practice to save on the computational costs involved in learning language models and topic models . |
Introduction | In the case of topic models , the intuition is that low-frequency words do not make a large contribution to the statistics of the models. |
Introduction | Actually, when we try to roughly analyze a corpus with topic models , a reduced corpus is enough for the purpose (Steyvers and Griffiths, 2007). |
Perplexity on Reduced Corpora | 3.3 PerpleXity of Topic Models |
Perplexity on Reduced Corpora | In this section, we consider the perpleXity of the widely used topic model , Latent Dirichlet Allocation (LDA) (Blei et al., 2003), by using the notation given in (Griffiths and Steyvers, 2004). |
Preliminaries | Perplexity is a widely used evaluation measure of k-gram models and topic models . |
Introduction | Topic modeling is a useful mechanism for discovering and characterizing various semantic concepts embedded in a collection of documents. |
Introduction | In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation (LDA) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007). |
Introduction | Since the information within the sentence is insufficient for topic modeling , we first enrich sentence contexts via Information Retrieval (IR) methods using content words in the sentence as queries, so that topic-related monolingual documents can be collected. |
Related Work | Topic modeling was first leveraged to improve SMT performance in (Zhao and Xing, 2006; Zhao and Xing, 2007). |
Related Work | Another direction of approaches leveraged topic modeling techniques for domain adaptation. |
Related Work | Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM. |
Background and Related Work | Recent work on finding novel senses has tended to focus on comparing diachronic corpora (Sagi et al., 2009; Cook and Stevenson, 2010; Gulor—dava and Baroni, 2011) and has also considered topic models (Lau et al., 2012). |
Introduction | In this paper, we propose a method which uses topic models to estimate word sense distributions. |
Introduction | Topic models have been used for WSD in a number of studies (Boyd-Graber et al., 2007; Li et al., 2010; Lau et al., 2012; Preiss and Stevenson, 2013; Cai et al., 2007; Knopp et al., 2013), but our work extends significantly on this earlier work in focusing on the acquisition of prior word sense distributions (and predominant senses). |
Introduction | (2004b), the use of topic models makes this possible, using topics as a proxy for sense (Brody and Lapata, 2009; Yao and Durme, 2011; Lau et al., 2012). |
Methodology | (2006)), a nonparametric variant of a Latent Dirichlet Allocation topic model (Blei et al., 2003) where the model automatically opti-mises the number of topics in a fully-unsupervised fashion over the training data. |
Methodology | To learn the senses of a target lemma, we train a single topic model per target lemma. |
Motivation | Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams. |
Motivation | Indeed there is work in the literature that shows that various topic models , latent or otherwise, can be useful for improving lan- |
Motivation | In the information retrieval community, clustering and latent topic models have yielded improvements over traditional vector space models. |
Abstract | Our methods synthesize hidden Markov models (for underlying state) and topic models (to connect words to states). |
Introduction | In this paper, we retain the underlying HMM, but assume words are emitted using topic models (TM), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA). |
Latent Structure in Dialogues | In other words, instead of generating words via a LM, we generate words from a topic model (TM), where each state maps to a mixture of topics. |
Latent Structure in Dialogues | 4Note that a TM-HMMS model with state-specific topic models (instead of state-specific language models) would be subsumed by TM—HMM, since one topic could be used as the background topic in TM -HMMS. |
Background and overview of models | To demonstrate the benefit of situational information, we develop the Topic-Lexical-Distributional (TLD) model, which extends the LD model by assuming that words appear in situations analogous to documents in a topic model . |
Background and overview of models | (2012) found that topics learned from similar transcript data using a topic model were strongly correlated with immediate activities and contexts. |
Background and overview of models | training an LDA topic model (Blei et al., 2003) on a superset of the child-directed transcript data we use for lexical-phonetic learning, dividing the transcripts into small sections (the ‘documents’ in LDA) that serve as our distinct situations h. As noted above, the learned document-topic distributions 6 are treated as observed variables in the TLD model to represent the situational context. |
Introduction | However, in our simulations we approximate the environmental information by running a topic model (Blei et al., 2003) over a corpus of child-directed speech to infer a topic distribution for each situation. |
Introduction | (2012) in the context of learning topic models . |
Related Work | Recently a number of researchers have developed provably correct algorithms for parameter estimation in latent variable models such as hidden Markov models, topic models , directed graphical models with latent variables, and so on (Hsu et al., 2009; Bailly et al., 2010; Siddiqi et al., 2010; Parikh et al., 2011; Balle et al., 2011; Arora et al., 2013; Dhillon et al., 2012; Anandkumar et al., 2012; Arora et al., 2012; Arora et al., 2013). |
Related Work | Our work is most directly related to the algorithm for parameter estimation in topic models described by Arora et al. |
The Learning Algorithm for L-PCFGS | (2012) in the context of learning topic models . |
Feature-based Additive Model | where we assume ysemZ—ment and ypomam are given for each document d. Note that we assume conditional independence between features and words given 3/, similar to other topic models (Blei et al., 2003). |
Related Work | (2011), which can be viewed as an combination of topic models (Blei et al., 2003) and generalized additive models (Hastie and Tibshirani, 1990). |
Related Work | Unlike other derivatives of topic models , SAGE drops the Dirichlet-multinomial assumption and adopts a Laplacian prior, triggering sparsity in topic-word distribution. |
Experiments | Second, our method captures semantic relations using topic modeling and captures opinion relations through word alignments, which are more precise than Hai which merely uses co-occurrence information to indicate such relations among words. |
Related Work | In terms of considering semantic relations among words, our method is related with several approaches based on topic model (Zhao et al., 2010; Moghaddam and Ester, 2011; Moghaddam and Ester, 2012a; Moghaddam and Ester, 2012b; Mukherjee and Liu, 2012). |
The Proposed Method | After topic modeling , we obtain the probability of the candidates (2)75 and 210) to topic 2, i.e. |
Introduction | LDA is an unsupervised probabilistic topic model and it is widely used to discover latent semantic structure of a document collection by modeling words in the documents. |
Introduction | In this approach, a topic model on a given set of unlabeled training documents is constructed using LDA, then an annotator assigns a class label to some topics based on their most probable words. |
Introduction | These labeled topics are used to create a new topic model such that in the new model topics are better aligned to class labels. |
Generative Model of Coreference | The entire corpus, including these entities, is generated according to standard topic model assumptions; we first generate a topic distribution for a document, then sample topics and words for the document (Blei et al., 2003). |
Inference by Block Gibbs Sampling | The \1136 factors in (5) approximate the topic model’s prior distribution over z. is proportional to the probability that a Gibbs sampling step for an ordinary topic model would choose this value of cc.z. |
Introduction | Our novel approach features: §4.1 A topical model of which entities from previ- |
Background and Related Work | (2013a) used topic models to combine type-level predicate inference rules with token-level information from their arguments in a specific context. |
Our Proposal: A Latent LC Approach | We further incorporate features based on a Latent Dirichlet Allocation (LDA) topic model (Blei et al., 2003). |
Our Proposal: A Latent LC Approach | Several recent works have underscored the usefulness of using topic models to model a predicate’s selectional preferences (Ritter et al., 2010; Dinu and Lapata, 2010; Seaghdha, 2010; Lewis and Steedman, 2013; Melamud et al., 2013a). |