BBC News Database | Specifically, we use Latent Dirichlet Allocation ( LDA ) as our topic model (Blei et al., 2003). |
BBC News Database | LDA |
BBC News Database | Given a collection of documents and a set of latent variables (i.e., the number of topics), the LDA model estimates the probability of topics per document and the probability of words per topic. |
Related Work | More sophisticated graphical models (Blei and Jordan, 2003) have also been employed including Gaussian Mixture Models (GMM) and Latent Dirichlet Allocation ( LDA ). |
Abstract | Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training conversations, which is used to bias translation choices towards the current context. |
Corpus Data and Baseline SMT | We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models. |
Corpus Data and Baseline SMT | We use the English side of these conversations for training LDA topic models. |
Incremental Topic-Based Adaptation | 4.1 Topic modeling with LDA |
Incremental Topic-Based Adaptation | We use latent Dirichlet allocation, or LDA , (Blei et al., 2003) to obtain a topic distribution over conversations. |
Incremental Topic-Based Adaptation | For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics. |
Introduction | We begin by building a monolingual latent Dirichlet allocation (LDA) topic model on the training conversations (each conversation corresponds to a “document” in the LDA paradigm). |
Relation to Prior Work | (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document. |
Relation to Prior Work | While our proposed approach also employs monolingual LDA topic models, it deviates from the above methods in the following important ways. |
Background: Topic Model | Both Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) and Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999) are types of topic models. |
Background: Topic Model | LDA is the most common topic model currently in use, therefore we exploit it for mining topics in this paper. |
Background: Topic Model | Here, we first give a brief description of LDA . |
Estimation | Unlike document-topic distribution that can be directly learned by LDA tools, we need to estimate the rule-topic distribution according to our requirement. |
Estimation | bution of every documents inferred by LDA tool. |
Estimation | The topic assignments are output by LDA tool. |
Topic Similarity Model | The k-th dimension P(z = k3|d) means the probability of topic k: given document d. Different from rule-topic distribution, the document-topic distribution can be directly inferred by an off-the-shelf LDA tool. |
Abstract | The latent topic distribution estimated by Latent Dirichlet Allocation ( LDA ) is used to represent each text block. |
Abstract | We evaluate two approaches employing LDA and probabilistic latent semantic analysis (PLSA) distributions respectively. |
Introduction | To deal with this issue, Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) has been proposed. |
Introduction | LDA has been proved to be effective in many segmentation tasks (Arora and Ravindran, 2008; Hall et al., 2008; Sun et al., 2008; Riedl and Biemann, 2012; Chien and Chueh, 2012). |
Our Proposed Approach | In this paper, we propose to apply LE on the LDA topic distributions, each of which is estimated from a text block. |
Our Proposed Approach | 2.1 Latent Dirichlet Allocation Latent Dirichlet allocation ( LDA ) (Blei et al., 2003) is a generative probabilistic model of a corpus. |
Our Proposed Approach | In LDA , given a corpus D 2 {d1, d2, . |
Background and Model Setting | Several more recent works utilize a Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) framework. |
Background and Model Setting | We note that a similar LDA model construction was employed also in (Séaghdha, 2010), for estimating predicate-argument likelihood. |
Background and Model Setting | First, an LDA model is constructed, as follows. |
Introduction | cation ( LDA ) model. |
Introduction | Rather than computing a single context-insensitive rule score, we compute a distinct word-level similarity score for each topic in an LDA model. |
Two-level Context-sensitive Inference | Based on all pseudo-documents we learn an LDA model and obtain its associated probability distributions. |
Two-level Context-sensitive Inference | At learning time, we compute for each candidate rule a separate, topic-biased, similarity score per each of the topics in the LDA model. |
Experiments and Results | The performance of WTMF on CDR is compared with (a) an Information Retrieval model (IR) that is based on surface word matching, (b) an n-gram model (N-gram) that captures phrase overlaps by returning the number of overlapping ngrams as the similarity score of two sentences, (c) LSA that uses svds() function in Matlab, and (d) LDA that uses Gibbs Sampling for inference (Griffiths and Steyvers, 2004). |
Experiments and Results | WTMF is also compared with all existing reported SS results on L106 and MSR04 data sets, as well as LDA that is trained on the same data as WTMF. |
Experiments and Results | To eliminate randomness in statistical models (WTMF and LDA ), all the reported results are averaged over 10 runs. |
Introduction | Latent variable models, such as Latent Semantic Analysis [LSA] (Landauer et al., 1998), Probabilistic Latent Semantic Analysis [PLSA] (Hofmann, 1999), Latent Dirichlet Allocation [ LDA ] (Blei et al., 2003) can solve the two issues naturally by modeling the semantics of words and sentences simultaneously in the low-dimensional latent space. |
Limitations of Topic Models and LSA for Modeling Sentences | Therefore, PLSA finds a topic distribution for each concept definition that maximizes the log likelihood of the corpus X ( LDA has a similar form): |
Abstract | Our experiments on a large Twitter dataset show that there are more meaningful and unique bursty topics in the top-ranked results returned by our model than an LDA baseline and two degenerate variations of our model. |
Experiments | )f tweets (or words in the case of the LDA model) 1ssigned to the topics and take the top-30 bursty top-cs from each model. |
Experiments | In the case of the LDA mod-:1, only 23 bursty topics were detected. |
Introduction | To discover topics, we can certainly apply standard topic models such as LDA (Blei et al., 2003), but with standard LDA temporal information is lost during topic discovery. |
Introduction | We find that compared with bursty topics discovered by standard LDA and by two degenerate variations of our model, bursty topics discovered by our model are more accurate and less redundant within the top-ranked results. |
Method | In standard LDA , a document contains a mixture of topics, represented by a topic distribution, and each word has a hidden topic label. |
Method | We also consider a standard LDA model in our experiments, where each word is associated with a hidden topic. |
Method | Just like standard LDA , our topic model itself finds a set of topics represented by gbc but does not directly generate bursty topics. |
Experiments | Latent Dirichlet Allocation ( LDA ; Blei et al., 2003) We use the method described in section 2 for inducing word representations from the topic matrix. |
Experiments | To train the 50-topic LDA model we use code released by Blei et a1. |
Experiments | We use the same 5,000 term vocabulary for LDA as is used for training word vector models. |
Our Model | This component does not require labeled data, and shares its foundation with probabilistic topic models such as LDA . |
Our Model | Equation 1 resembles the probabilistic model of LDA (Blei et al., 2003), which models documents as mixtures of latent topics. |
Our Model | Because of the log-linear formulation of the conditional distribution, 6 is a vector in R5 and not restricted to the unit simplex as it is in LDA . |
Related work | Latent Dirichlet Allocation ( LDA ; (Blei et al., 2003)) is a probabilistic document model that assumes each document is a mixture of latent topics. |
Related work | However, because the emphasis in LDA is on modeling topics, not word meanings, there is no guarantee that the row (word) vectors are sensible as points in a k-dimensional space. |
Related work | Indeed, we show in section 4 that using LDA in this way does not deliver robust word vectors. |
A Gibbs Sampling Algorithm | LDA (Griffiths and Steyvers, 2004). |
A Gibbs Sampling Algorithm | where Cfpn indicates that term n is excluded from the corresponding document or topic; 7 = Nid; and Afin = fl 2k, 77/9/0527, is the discriminant function value without word n. We can see that the first term is from the LDA model for observed word counts and the second term is from |
Experiments | We compare the generalized logistic supervised LDA using Gibbs sampling (denoted by gSLDA) with various competitors, including the standard sLDA using variational mean-field methods (denoted by vSLDA) (Wang et al., 2009), the MedLDA model using variational mean-field methods (denoted by vMedLDA) (Zhu et al., 2012), and the MedLDA model using collapsed Gibbs sampling algorithms (denoted by gMedLDA) (Jiang et al., 2012). |
Introduction | As widely adopted in supervised latent Dirichlet allocation (sLDA) models (Blei and McAuliffe, 2010; Wang et al., 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available document-level response variables, in addition to the likelihood model for document words. |
Introduction | Though powerful, one issue that could limit the use of existing logistic supervised LDA models is that they treat the document-level response variable as one additional word via a normalized likelihood model. |
Introduction | For Bayesian LDA models, we can also explore the conjugacy of the Dirichlet-Multinomial prior-likelihood pairs to collapse out the Dirichlet variables (i.e., topics and mixing proportions) to do collapsed Gibbs sampling, which can have better mixing rates (Griffiths and Steyvers, 2004). |
Logistic Supervised Topic Models | A logistic supervised topic model consists of two parts — an LDA model (Blei et al., 2003) for describing the words W = {wd}dD=1, where Wd 2 {wdnfigl denote the words within document d, and a logistic classifier for considering the supervising signal y = {yd}dD=1. |
Logistic Supervised Topic Models | LDA: LDA is a hierarchical Bayesian model that posits each document as an admixture of K topics, where each topic (I);g is a multinomial distribution over a V-word vocabulary. |
Logistic Supervised Topic Models | For fully-Bayesian LDA , the topics are random samples from a Dirichlet prior, (I);c N Dir(,6). |
Abstract | In this work, we develop a framework for allowing users to iteratively refine the topics discovered by models such as latent Dirichlet allocation ( LDA ) by adding constraints that enforce that sets of words must appear together in the same topic. |
Constraints Shape Topics | As discussed above, LDA views topics as distributions over words, and each document expresses an admixture of these topics. |
Constraints Shape Topics | For “vanilla” LDA (no constraints), these are symmetric Dirichlet distributions. |
Constraints Shape Topics | Because LDA assumes a document’s tokens are interchangeable, it treats the document as a bag-of—words, ignoring potential relations between words. |
Introduction | Probabilistic topic models, as exemplified by probabilistic latent semantic indexing (Hofmann, 1999) and latent Dirichlet allocation ( LDA ) (Blei et al., 2003) are unsupervised statistical techniques to discover the thematic topics that permeate a large corpus of text documents. |
Putting Knowledge in Topic Models | At a high level, topic models such as LDA take as input a number of topics K and a corpus. |
Putting Knowledge in Topic Models | In LDA both of these outputs are multinomial distributions; typically they are presented to users in summary form by listing the elements with highest probability. |
Introduction | was extended from the latent Dirichlet allocation ( LDA ) model (?) |
Joint Sentiment-Topic (J ST) Model | It is worth pointing out that the J ST model with single topic becomes the standard LDA model with only three sentiment topics. |
Joint Sentiment-Topic (J ST) Model | that the J ST model with word polarity priors incorporated performs significantly better than the LDA model without incorporating such prior information. |
Joint Sentiment-Topic (J ST) Model | For comparison purpose, we also run the LDA model and augmented the BOW features with the |
AKL: Using the Learned Knowledge | To compute this distribution, instead of considering how well 21- matches with w,- only (as in LDA ), we also consider two other factors: |
Experiments | This section evaluates and compares the proposed AKL model with three baseline models LDA , MC-LDA, and GK—LDA. |
Introduction | Traditional topic models such as LDA (Blei et al., 2003) and pLSA (Hofmann, 1999) are unsupervised methods for extracting latent topics in text documents. |
Introduction | We thus propose to first use LDA to learn topics/aspects from each individual domain and then discover the shared aspects (or topics) and aspect terms among a subset of domains. |
Introduction | We propose a method to solve this problem, which also results in a new topic model, called AKL (Automated Knowledge LDA ), whose inference can exploit the automatically learned prior knowledge and handle the issues of incorrect knowledge to produce superior aspects. |
Learning Quality Knowledge | This section details Step 1 in the overall algorithm, which has three sub-steps: running LDA (or AKL) on each domain corpus, clustering the resulting topics, and mining frequent patterns from the topics in each cluster. |
Learning Quality Knowledge | Since running LDA is simple, we will not discuss it further. |
Learning Quality Knowledge | After running LDA (or AKL) on each domain corpus, a set of topics is obtained. |
Overall Algorithm | Lines 3 and 5 run LDA on each review domain corpus Di 6 D L to generate a set of aspects/topics A, (lines 2, 4, and 6-9 will be discussed below). |
Overall Algorithm | Scalability: the proposed algorithm is naturally scalable as both LDA and AKL run on each domain independently. |
Abstract | In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation ( LDA ) topics. |
Background 3.1 LDA | LDA is an unsupervised probabilistic generative model for collections of discrete data such as text documents. |
Background 3.1 LDA | The generative process of LDA can be described as follows: |
Background 3.1 LDA | The key problem in LDA is posterior inference. |
Introduction | In this paper, we propose a text classification algorithm based on Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) which does not need labeled documents. |
Introduction | LDA is an unsupervised probabilistic topic model and it is widely used to discover latent semantic structure of a document collection by modeling words in the documents. |
Introduction | (Blei et al., 2003) used LDA topics as features in text classification, but they use labeled documents while learning a classifier. |
Related Work | As LDA topics are semantically more meaningful than individual words and can be acquired easily, our approach overcomes limitations of the semi-supervised methods discussed above. |
Related work | These include the Latent Dirichlet Allocation ( LDA ) model of Blei et al. |
Related work | (2007) integrate a model of random walks on the WordNet graph into an LDA topic model to build an unsupervised word sense disambiguation system. |
Related work | and Lapata (2009) adapt the basic LDA model for application to unsupervised word sense induction; in this context, the topics learned by the model are assumed to correspond to distinct senses of a particular lemma. |
Three selectional preference models | As noted above, LDA was originally introduced to model sets of documents in terms of topics, or clusters of terms, that they share in varying proportions. |
Three selectional preference models | The high-level “generative story” for the LDA selectional preference model is as follows: |
Three selectional preference models | (2009) for LDA . |
Integrating Semantic Constraint into Surprisal | The factor A(wn, h) is essentially based on a comparison between the vector representing the current word wn and the vector representing the prior history h. Varying the method for constructing word vectors (e. g., using LDA or a simpler semantic space model) and for combining them into a representation of the prior context h (e.g., using additive or multiplicative functions) produces distinct models of semantic composition. |
Method | We also trained the LDA model on BLLIP, using the Gibb’s sampling procedure discussed in Griffiths et al. |
Models of Processing Difficulty | LDA is a probabilistic topic model offering an alternative to spatial semantic representations. |
Models of Processing Difficulty | Whereas in LSA words are represented as points in a multidimensional space, LDA represents words using topics. |
Results | SSS Additive — .03820*** Multiplicative — .00895*** LDA Additive — .025 00 |
Results | Table 2: Coefficients of LME models including simple semantic space (SSS) or Latent Dirichlet Allocation ( LDA ) as factors; ***p < .001 |
Results | Besides, replicating Pynte et al.’s (2008) finding, we were also interested in assessing whether the underlying semantic representation (simple semantic space or LDA ) and composition function (additive versus multiplicative) modulate reading times differentially. |
Discussion | The first example shows both LDA and ptLDA improve the baseline. |
Experiments | Topic Models Configuration We compare our polylingual tree-based topic model (ptLDA) against tree-based topic models (tLDA), polylingual topic models (pLDA) and vanilla topic models ( LDA ).3 We also examine different inference algorithms—Gibbs sampling (gibbs), variational inference (variational) and hybrid approach (variational-hybrid)—on the effects of SMT performance. |
Experiments | We refer to the SMT model without domain adaptation as baseline.5 LDA marginally improves machine translation (less than half a BLEU point). |
Experiments | 3For Gibbs sampling, we use implementations available in Hu and Boyd-Graber (2012) for tLDA; and Mallet (McCallum, 2002) for LDA and pLDA. |
Inference | p(zdn : k7 ydn : Slfizdna fiydn7 057/6) _ Nk|d+0z 0C ll — wdn] Zk,(Nk, ld+a ) OH Ni—>j|k+fli—>j |
Introduction | Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA ), are one of the most popular statistical frameworks for navigating large unannotated document collections. |
Polylingual Tree-based Topic Models | Generative Process As in LDA , each word token is associated with a topic. |
Polylingual Tree-based Topic Models | With these correlated in topics in hand, the generation of documents are very similar to LDA . |
Topic Models for Machine Translation | While vanilla topic models ( LDA ) can only be applied to monolingual data, there are a number of topic models for parallel corpora: Zhao and Xing (2006) assume aligned word pairs share same topics; Mimno et al. |
Abstract | Latent Dirichlet Allocation ( LDA ) models are used as “topic models” to produce a low-dimensional representation of documents, while Probabilistic Context-Free Grammars (PCFGs) define distributions over trees. |
Abstract | The paper begins by showing that LDA topic models can be viewed as a special kind of PCFG, so Bayesian inference for PCFGs can be used to infer Topic Models as well. |
Abstract | Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. |
Introduction | Specifically, we show that an LDA model can be expressed as a certain kind of PCFG, |
Introduction | so Bayesian inference for PCFGs can be used to learn LDA topic models as well. |
Introduction | The importance of this observation is primarily theoretical, as current Bayesian inference algorithms for PCFGs are less efficient than those for LDA inference. |
Abstract | We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multilingual corpus analysis. |
Abstract | We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perpleXity over LDA , and learns semantically coherent aligned topics as judged by human annotators. |
Code-Switching | We call the resulting model Code-Switched LDA (csLDA). |
Code-Switching | 3.1 Inference Inference for csLDA follows directly from LDA . |
Code-Switching | Instead, we constructed a baseline from LDA run on the entire dataset (no |
Experiments | LDA: Our approach with LDA as the topic model. |
Experiments | The implementation of LDA is based on Blei’s code of variational EM for LDAS . |
Experiments | Table 4 shows the average query processing time and results quality of the LDA approach, by varying frequency threshold h. Similar results are observed for the pLSI approach. |
Our Approach | And at the same time, one document could be related to multiple topics in some topic models (e.g., pLSI and LDA ). |
Our Approach | Here we use LDA as an example to |
Our Approach | According to the assumption of LDA and our concept mapping in Table 3, a RASC (“document”) is viewed as a mixture of hidden semantic classes (“topics”). |
Topic Models | 1 Z LDA (Blei et al., 2003): In LDA , the topic mixture is drawn from a conjugate Dirichlet prior that remains the same for all documents (Figure l). |
Topic Models | Figure l. Graphical model representation of LDA , from Blei et a1. |
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | The underlying mechanism for our annotation procedure is LDA (Blei et al., 2003b), a fully Bayesian extension of probabilistic Latent Semantic Analysis (Hofmann, 1999). |
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | Given D labeled attribute sets wd, d E D, LDA infers an unstructured set of T latent annotated concepts over which attribute sets decompose as mixtures.2 The latent annotated concepts represent semantically coherent groups of attributes expressed in the data, as shown in Example 1. |
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | The generative model for LDA is given by |
Introduction | In this paper, we show that both of these goals can be realized jointly using a probabilistic topic model, namely hierarchical Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003b). |
Introduction | There are three main advantages to using a topic model as the annotation procedure: (1) Unlike hierarchical clustering (Duda et al., 2000), the attribute distribution at a concept node is not composed of the distributions of its children; attributes found specific to the concept Painter would not need to appear in the distribution of attributes for Person, making the internal distributions at each concept more meaningful as attributes specific to that concept; (2) Since LDA is fully Bayesian, its model semantics allow additional prior information to be included, unlike standard models such as Latent Semantic Analysis (Hofmann, 1999), improving annotation precision; (3) Attributes with multiple related meanings (i.e., polysemous attributes) are modeled implicitly: if an attribute (e.g., “style”) occurs in two separate input classes (e.g., poets and car models), then that attribute might attach at two different concepts in the ontology, which is better than attaching it at their most specific common ancestor (Whole) if that ancestor is too general to be useful. |
Introduction | ate three variants: (1) a fixed structure approach where each flat class is attached to WN using a simple string-matching heuristic, and concept nodes are annotated using LDA, (2) an extension of LDA allowing for sense selection in addition to annotation, and (3) an approach employing a nonparametric prior over tree structures capable of inferring arbitrary ontologies. |
Ontology Annotation | LDA Fixed Structure LDA nCRP |
Ontology Annotation | Figure 2: Graphical models for the LDA variants; shaded nodes indicate observed quantities. |
Ontology Annotation | We propose a set of Bayesian generative models based on LDA that take as input labeled attribute sets generated using an extraction procedure such as the above and organize the attributes in WN according to their level of generality. |
Learning | For the LDA regularizer, L = R X K. For the Brown cluster regularizer, L = V — 1. |
Structured Regularizers for Text | 4.3 LDA Regularizer |
Structured Regularizers for Text | We do this by inferring topics in the training corpus by estimating the latent Dirichlet allocation ( LDA ) model (Blei et al., 2003)). |
Structured Regularizers for Text | Note that LDA is an unsupervised method, so we can infer topical structures from any collection of documents that are considered related to the target corpus (e. g., training documents, text from the web, etc.). |
Experiments | 1(f) plots the perpleXity of LDA models with 20 topics learned from Reuters, ZOnews, Enwiki, Zipfl, and Ziprix versus the size of reduced vocabulary on a log-log graph. |
Experiments | Table 2: Computational time and memory size for LDA learning on the original corpus, (1/ 10)-reduced corpus, and (1/20)-reduced corpus of Reuters. |
Experiments | Finally, let us examine the computational costs for LDA learning. |
Perplexity on Reduced Corpora | In this section, we consider the perpleXity of the widely used topic model, Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), by using the notation given in (Griffiths and Steyvers, 2004). |
Perplexity on Reduced Corpora | LDA is a probabilistic language model that generates a corpus as a mixture of hidden topics, and it allows us to infer two parameters: the document-topic distribution 6 that represents the mixture rate of topics in each document, and the topic-word distribution gb that represents the occurrence rate of words in each topic. |
Perplexity on Reduced Corpora | The assumption placed on 9b may not be reasonable in the case of d, because we can easily think of a document with only one topic, and we usually use a small number T of topics for LDA , e.g., T = 20. |
Bilingual LDA Model | 2.1 Standard LDA |
Bilingual LDA Model | LDA model (Blei et al., 2003) represents the latent topic of the document distribution by Dirichlet distribution with a K-dimensional implicit random variable, which is transformed into a complete generative model when ,8 is exerted to Dirichlet distribution (Griffiths et al., 2004) (Shown in Fig. |
Bilingual LDA Model | Figure 1: Standard LDA model |
Building comparable corpora | Based on the bilingual LDA model, building comparable corpora includes several steps to |
Introduction | Based on Bilingual LDA Model |
Introduction | The paper concretely includes: 1) Introduce the Bilingual LDA (Latent Dirichlet Allocation) model which builds comparable corpora and improves the efficiency of matching similar documents; 2) Design a novel method of TFIDF (Topic Frequency-Inverse Document Frequency) to enhance the distinguishing ability of topics from different documents; 3) Propose a tailored |
Experiments | Since MTR provides a mixture of properties adapted from earlier models, we present performance benchmarks on tag clustering using: (i) LDA; (ii) Hidden Markov Topic Model HMTM (Gruber et al., 2005); and, (iii) w-LDA (Petterson et al., 2010) that uses word features as priors in LDA . |
Experiments | 0.6 T I ags I _ LDA w—LDA HMTM MTR |
Experiments | models: LDA , HMTM, w—LDA. |
Markov Topic Regression - MTR | LDA assumes that the latent topics of documents are sampled independently from one of K topics. |
Related Work and Motivation | Standard topic models, such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context. |
Related Work and Motivation | In LDA , common words tend to dominate all topics causing related words to end up in different topics. |
Related Work and Motivation | In (Petterson et al., 2010), the vector-based features of words are used as prior information in LDA so that the words that are synonyms end up in same topic. |
Abstract | In this paper, we propose a novel Emotion-aware LDA (EaLDA) model to build a domain-specific lexicon for predefined emotions that include anger, disgust, fear, joy, sadness, surprise. |
Algorithm | In this section, we rigorously define the emotion-aware LDA model and its learning algorithm. |
Algorithm | Like the standard LDA model, EaLDA is a generative model. |
Algorithm | The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’). |
Conclusions and Future Work | In this paper, we have presented a novel emotion-aware LDA model that is able to quickly build a fine-grained domain-specific emotion lexicon for languages without many manually constructed resources. |
Conclusions and Future Work | The proposed EaLDA model extends the standard LDA model by accepting a set of domain-independent emotion words as prior knowledge, and guiding to group semantically related words into the same emotion category. |
Introduction | The proposed EaLDA model extends the standard Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) model by employing a small set of seeds to guide the model generating topics. |
Related Work | Our approach relates most closely to the method proposed by Xie and Li (2012) for the construction of lexicon annotated for polarity based on LDA model. |
Introduction | Latent Semantic Indexing (Deerwester et al., 1990) (LSI), probabilistic Latent Semantic Analysis (Hofmann, 2001) (pLSA) and Latent Dirichlet Allocation (Blei et al., 2003) ( LDA ) are the most famous approaches that tried to tackle this problem throughout the years. |
Introduction | This is one of the reasons of the intensive use of topic models (and especially LDA ) in current research in Natural Language Processing (NLP) related areas. |
Introduction | The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results. |
Topic-Driven Relevance Models | We specifically focus on Latent Dirichlet Allocation ( LDA ), since it is currently one of the most representative. |
Topic-Driven Relevance Models | In LDA , each topic multinomial distribution gbk is generated by a conjugate Dirichlet prior with parameter [3, while each document multinomial distribution 6d is generated by a conjugate Dirichlet prior with parameter 04. |
Topic-Driven Relevance Models | TDRM relies on two important parameters: the number of topics K that we want to learn, and the number of feedback documents N from which LDA learns the topics. |
Introduction | Latent variable models, such as latent Dirichlet allocation ( LDA ) (Blei et al., 2003) and probabilistic latent semantic analysis (PLSA) (Hofmann, 1999), have been used in the past to facilitate social science research. |
Introduction | SAGE (Eisenstein et al., 2011a), a recently proposed sparse additive generative model of language, addresses many of the drawbacks of LDA . |
Introduction | Another advantage, from a social science perspective, is that SAGE can be derived from a standard logit random-utility model of judicial opinion writing, in contrast to LDA . |
Related Work | Related research efforts include using the LDA model for topic modeling in historical newspapers (Yang et al., 2011), a rule-based approach to extract verbs in historical Swedish texts (Pettersson and Nivre, 2011), a system for semantic tagging of historical Dutch archives (Cybulska and Vossen, 2011). |
Related Work | (2010) study the effect of the context of interaction in blogs using a standard LDA model. |
The Sparse Mixed-Effects Model | To address the over-parameterization, lack of expressiveness and robustness issues in LDA , the SAGE (Eisenstein et al., 2011a) framework draws a |
The Sparse Mixed-Effects Model | In this SME model, we still have the same Dirichlet a, the latent topic proportion 6, and the latent topic variable 2 as the original LDA model. |
The Sparse Mixed-Effects Model | In contrast to traditional multinomial distribution of words in LDA models, we approximate the conditional word distribution in the document d as the |
Experiments | It is difficult to compare our model to other unsupervised systems such as MG—LDA or LDA . |
Related Work | Recently, Blei and McAuliffe (2008) proposed an approach for joint sentiment and topic modeling that can be viewed as a supervised LDA (sLDA) model that tries to infer topics appropriate for use in a given classification or regression problem. |
The Model | 2.1 Multi-Grain LDA |
The Model | The Multi-Grain Latent Dirichlet Allocation model (MG-LDA) is an extension of Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
The Model | strated in Titov and McDonald (2008), the topics produced by LDA do not correspond to ratable aspects of entities. |
Background and Motivation | A hierarchical model is particularly appealing to summarization than a ”flat” model, e. g. LDA (Blei et al., 2003b), in that one can discover ”abstract” and ”specific” topics. |
Experiments and Discussions | * HbeSum (Hybrid Flat Summarizer): To investigate the performance of hierarchical topic model, we build another hybrid model using flat LDA (Blei et al., 2003b). |
Experiments and Discussions | In LDA each sentence is a superposition of all K topics with sentence specific weights, there is no hierarchical relation between topics. |
Experiments and Discussions | Instead of the new tree-based sentence scoring (§ 4), we present a similar method using topics from LDA on sentence level. |
Introduction | We present a probabilistic topic model on sentence level building on hierarchical Latent Dirichlet Allocation (hLDA) (Blei et al., 2003a), which is a generalization of LDA (Blei et al., 2003b). |
Introduction | Unsupervised topic models, such as latent Dirichlet allocation ( LDA ) (Blei et al., 2003) and its variants are characterized by a set of hidden topics, which represent the underlying semantic structure of a document collection. |
Introduction | In particular, our system, called LDA-SP, uses LinkLDA (Erosheva et al., 2004), an extension of LDA that simultaneously models two sets of distributions for each topic. |
Previous Work | Topic models such as LDA (Blei et al., 2003) and its variants have recently begun to see use in many NLP applications such as summarization (Daume III and Marcu, 2006), document alignment and segmentation (Chen et al., 2009), and inferring class-attribute hierarchies (Reisinger and Pasca, 2009). |
Previous Work | Van Durme and Gildea (2009) proposed applying LDA to general knowledge templates extracted using the KNEXT system (Schubert and Tong, 2003). |
Topic Models for Selectional Prefs. | We first describe the straightforward application of LDA to modeling our corpus of extracted relations. |
Topic Models for Selectional Prefs. | In this case two separate LDA models are used to model a1 and a2 independently. |
Topic Models for Selectional Prefs. | Formally, LDA generates each argument in the corpus of relations as follows: |
Experiments | For LDA-6 and LDA-wvec, we run Gibbs Sampling based LDA for 2000 iterations and average the model over the last 10 iterations. |
Experiments | For LDA we tune the hyperparameter 04 (Dirichlet prior for topic distribution of a document) and [3 (Dirichlet prior for word distribution given a topic). |
Introduction | WTMF is a state-of-the-art unsupervised model that was tested on two short text similarity datasets: (Li et al., 2006) and (Agirre et al., 2012), which outperforms Latent Semantic Analysis [LSA] (Landauer et al., 1998) and Latent Dirichelet Allocation [ LDA ] (Blei et al., 2003) by a large margin. |
Introduction | We employ it as a strong baseline in this task as it exploits and effectively models the missing words in a tweet, in practice adding thousands of more features for the tweet, by contrast LDA , for example, only leverages observed words (14 features) to infer the latent vector for a tweet. |
Related Work | (2010) also use hashtags to improve the latent representation of tweets in a LDA framework, Labeled-LDA (Ramage et al., 2009), treating each hashtag as a label. |
Related Work | Similar to the experiments presented in this paper, the result of using Labeled-LDA alone is worse than the IR model, due to the sparseness in the induced LDA latent vector. |
Related Work | (2011) apply an LDA based model on clustering by incorporating url referred documents. |
Comparable Question Mining | Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation |
Evaluation | The reason for this mistake is that many named entities appear as frequent terms in LDA topics, and thus mentioning many names that belong to a single topic drives LDA to assign this topic a high probability. |
Online Question Generation | Specifically, we utilize Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) to infer latent topics in texts. |
Online Question Generation | To train an LDA model, we constructed for each comparable relation a pseudo-document consisting of all questions that contain this relation in our corpus (the supporting questions). |
Online Question Generation | An additional product of the LDA training process is a topic distribution for each relation’s pseudo-document, which we consider as the relation’s context profile. |
Related Work | Instead, we are interested in a higher level topical similarity to the input article, for which LDA topics were shown to help (Celikyilmaz et al., 2010). |
Experiments | To this end we interpret the de-scriptors as words in documents, and train a standard LDA model based on these documents. |
Experiments | We also train a standard LDA model to obtain the theme of a sentence. |
Experiments | The LDA model assigns each word to a topic. |
Our Approach | In our experiments, we use the meta-descriptors of a document as side information and train a standard LDA model to find the theme of a document. |
Our Approach | This model is a minor variation on standard LDA and the difference is that instead of drawing an observation from a hidden topic variable, we draw multiple observations from a hidden topic variable. |
Introduction | In this paper, we retain the underlying HMM, but assume words are emitted using topic models (TM), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA ). |
Introduction | LDA assumes each word in an utterance is drawn from one of a set of latent topics, where each topic is a multinomial distribution over the vocabulary. |
Introduction | This paper is organized as follows: Section 2 introduces two task-oriented domains and corpora; Section 3 details three new unsupervised generative models which combine HMMs and LDA and efficient inference schemes; Section 4 evaluates our models qualitatively and quantitatively, and finally conclude in Section 5. |
Latent Structure in Dialogues | We assume 6’s and qb’s are drawn from corresponding Dirichlet priors, as in LDA . |
Latent Structure in Dialogues | All probabilities can be computed using collapsed Gibbs sampler for LDA (Griffiths |
Latent Structure in Dialogues | Again, we impose Dirichlet priors on distributions over topics 6’s and distributions over words qb’s as in LDA . |
Background | Several techniques can be used for this purpose such as Latent Semantic Analysis (LSA) (Deerwester et al., 1990), Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999), and Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
Background | LDA , first defined by (Blei et al., 2003), defines topic as a distribution over a fixed vocabulary, where each document can exhibit them with different proportions. |
Background | For each document, LDA generates the words in a two-step process: |
Experiments | LDA was chosen to generate the topic models of clinical reports due to its being a generative probabilistic system for documents and its robustness to overfitting. |
Conclusion | Comparing with macro topics of documents inferred by LDA with bag of words from the whole documents, word senses inferred by the HDP-based WSI can be considered as micro topics. |
Related Work | They adapt LDA to word sense induction by building one topic model per word type. |
WSI-Based Broad-Coverage Sense Tagger | We first describe WSI, especially WSI based on the Hierarchical Dirichlet Process (HDP) (Teh et al., 2004), a nonparametric version of Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
WSI-Based Broad-Coverage Sense Tagger | The conventional topic distribution 6j for the j-th pseudo document is taken as the the distribution over senses for the given word type W. The LDA generative process for sense induction is as follows: 1) for each pseudo document Dj, draw a per-document sense distribution 63- from a Dirichlet distribution Dir(a); 2) for each item ww- in the pseudo document Dj, 2.1) draw a sense cluster sm- N Multinomial(6j); and 2.2) draw a word tum- ~ @st where cps“ is the distribution of sense 3M over words drawn from a Dirichlet distribution Dir(6 |
WSI-Based Broad-Coverage Sense Tagger | As LDA needs to manually specify the number of senses (topics), a better idea is to let the training data automatically determine the number of senses for each word type. |
Background and overview of models | training an LDA topic model (Blei et al., 2003) on a superset of the child-directed transcript data we use for lexical-phonetic learning, dividing the transcripts into small sections (the ‘documents’ in LDA ) that serve as our distinct situations h. As noted above, the learned document-topic distributions 6 are treated as observed variables in the TLD model to represent the situational context. |
Background and overview of models | The topic-word distributions learned by LDA are discarded, since these are based on the (correct and unambiguous) words in the transcript, whereas the TLD model is presented with phonetically ambiguous versions of these word tokens and must learn to disambiguate them and associate them with topics. |
Conclusion | Regardless of the specific way in which infants encode semantic information, our method of adding this information by using LDA topics from transcript data was shown to be effective. |
Experiments | The input to the TLD model includes a distribution over topics for each situation, which we infer in advance from the full Brent corpus (not only the C1 subset) using LDA . |
Inference: Gibbs Sampling | The first factor, the prior probability of topic k in document h, is given by 6M, obtained from the LDA . |
Topic-Lexical-Distributional Model | There are a fixed number of lower level topic-lexicons; these are matched to the number of topics in the LDA model used to infer the topic distributions (see Section 6.4). |
Baselines | Here, the topics are extracted from all the documents in the *SEM 2012 shared task using the LDA Gibbs Sampling algorithm (Griffiths, 2002). |
Baselines | In the topic-driven word-based graph model, the first layer denotes the relatedness among content words as captured in the above word-based graph model, and the second layer denotes the topic distribution, with the dashed lines between these two layers indicating the word-topic model return by LDA . |
Baselines | where Rel(w,, rm) is the weight of word w in topic rm calculated by the LDA Gibbs Sampling algorithm. |
Introduction | In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007). |
Introduction | Although we can easily apply LDA at the |
Introduction | Additionally, our model can be discriminatively trained with a large number of training instances, without expensive sampling methods such as in LDA or HTMM, thus it is more practicable and scalable. |
Related Work | Experiments show that their approach not only achieved better translation performance but also provided a faster decoding speed compared with previous lexicon-based LDA methods. |
Related Work | Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM. |
Learning Templates from Raw Text | We consider two unsupervised algorithms: Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), and agglomerative clustering based on word distance. |
Learning Templates from Raw Text | 4.1.1 LDA for Unknown Data |
Learning Templates from Raw Text | LDA is a probabilistic model that treats documents as mixtures of topics. |
Experimental Setup | To compute the LDA features, we use the online variational Bayes algorithm of (Hoffman et al., 2010) as implemented in the Gensim software package (Rehurek and Sojka, 2010). |
Experimental Setup | More inclusive is the feature set NO-LDA, which includes all features except the LDA features. |
Experimental Setup | Experiments with this set were performed in order to isolate the effect of the LDA features. |
Our Proposal: A Latent LC Approach | We further incorporate features based on a Latent Dirichlet Allocation ( LDA ) topic model (Blei et al., 2003). |
Our Proposal: A Latent LC Approach | We populate the pseudo-documents of an LC with its arguments according to R. We then train an LDA model with 25 topics over these documents. |
Experiment | Furthermore, the hyper-parameters for topic probability distribution and word probability distribution in LDA are a=0.5 and [3:05, respectively. |
Experiment | Here, in the case of clustering the documents based on the topic probabilistic distribution by LDA , the topic distribution over documents 6 is changed in every estimation. |
Introduction | 3) Information used for classification — we use latent information estimated by latent Dirichlet allocation ( LDA ) (Blei et al., 2003) to classify documents, and compare the results of the cases using both surface and latent information. |
Techniques for text classification | After obtaining a collection of refined documents for classification, we adopt LDA to estimate the latent topic probabilistic distributions over the target documents and use them for clustering. |
Techniques for text classification | As for the refined document obtained in step 2, the latent topics are estimated by means of LDA . |
Semi-Supervised SimHash | Clearly, Equation (12) is analogous to Linear Discriminant Analysis ( LDA ) (Duda et al., 2000) except for the difference: 1) measurement. |
Semi-Supervised SimHash | 83H uses similarity while LDA uses distance. |
Semi-Supervised SimHash | As a result, the objective function of 83H is just the reciprocal of LDA’s . |
Abstract | We also applied Latent Dirichlet Allocation ( LDA ; Blei et al., 2003) to learn a distribution over latent topics in the extracted data, as this is a popular exploratory data analysis method. |
Abstract | In LDA a topic is a unigram distribution over words, and each document is modeled as a distribution over topics. |
Abstract | Some of the topics that LDA finds correspond closely with specific domains, such as topics 1 (blingee . |
Attribute-based Semantic Models | (2009) present an extension of LDA (Blei et al., 2003) where words in documents and their associated attributes are treated as observed variables that are explained by a generative process. |
Attribute-based Semantic Models | Inducing these attribute-topic components from Q) with the extended LDA model gives two sets of parameters: word probabilities given components PW (wi|X = xc) for w, i = l, ...,n, and attribute probabilities given components PA(ak|X = xc) for ak, k = 1,...,F. For example, most of the probability mass of a component x would be reserved for the words shirt, coat, dress and the attributes has_1_piece, has_seams, made_of_materia| and so on. |
Related Work | Their model is essentially Latent Dirichlet Allocation ( LDA , Blei et al., 2003) trained on a corpus of multimodal documents (i.e., BBC news articles and their associated images). |
Results | Unseen are concepts covered by LDA but unknown to the attribute classifiers (N = 388). |
Model Description | Our analysis of the document text is based on probabilistic topic models such as LDA (Blei et al., 2003). |
Model Description | In the LDA framework, each word is generated from a language model that is indexed by the word’s topic assignment. |
Model Description | Thus, rather than identifying a single topic for a document, LDA identifies a distribution over topics. |
Related Work | This approach is inspired by methods in the topic modeling literature, such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), where topics are treated as hidden variables that govern the distribution of words in a text. |
Experiments | Topic modeling was performed with Mallet (Mccallum, 2002), a standard implementation of LDA , using a Chinese sto-plist and setting the per-document Dirichlet parameter a = 0.01. |
Model Description | , K} over each document, using Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
Model Description | For this case, we also propose a local LDA model (LTM), which treats each sentence as a separate document. |
RSP: A Random Walk Model for SP | LDA-SP: Another kind of sophisticated unsupervised approaches for SP are latent variable models based on Latent Dirichlet Allocation ( LDA ). |
RSP: A Random Walk Model for SP | C) Seaghdha (2010) applies topic models for the SP induction with three variations: LDA , Rooth-LDA, and Dual-LDA; Ritter et al. |
RSP: A Random Walk Model for SP | In this work, we compare with C) Seaghdha’s original LDA approach to SP. |
Related Work | (2007), for example, use LDA to capture global context. |
Related Work | (2007) enhance the basic LDA algorithm by incorporating WordNet senses as an additional latent variable. |
The Sense Disambiguation Model | LDA is a Bayesian version of this framework with Dirichlet hyper-parameters (Blei et al., 2003). |
Image Annotation | Latent Dirichlet Allocation ( LDA , Blei et al. |
Image Annotation | The basic idea underlying LDA , and topic models in general, is that each document is composed of a probability distribution over topics, where each topic represents a probability distribution over words. |
Image Annotation | Examples include PLSA-based approaches to image annotation (e.g., Monay and Gatica-Perez 2007) and correspondence LDA (Blei and Jordan, 2003). |
Related Work 2.1 Market Prediction and Social Media | One of the basic and most widely used models is Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
Related Work 2.1 Market Prediction and Social Media | LDA can learn a predefined number of topics and has been widely applied in its extended forms in sentiment analysis and many other tasks (Mei et al., 2007; Branavan et al., 2008; Lin and He, 2009; Zhao et al., 2010; Wang et al., 2010; Brody and Elhadad, 2010; Jo and Oh, 2011; Moghaddam and Ester, 2011; Sauper et al., 2011; Mukherjee and Liu, 2012; He et al., 2012). |
Related Work 2.1 Market Prediction and Social Media | The Dirichlet Processes Mixture (DPM) model is a nonparametric extension of LDA (Teh et al., 2006), which can estimate the number of topics inherent in the data itself. |
Linguistic Mapping | In this work we follow closely the Author-Topic (AT) model (Steyvers et al., 2004) which is a generalization of Latent Dirichlet Allocation ( LDA ) (Blei et al., 2005).3 |
Linguistic Mapping | LDA is a technique that was developed to model the distribution of topics discussed in a large corpus of documents. |
Linguistic Mapping | The AT model generalizes LDA , saying that the mixture of topics is not dependent on the document itself, but rather on the authors who wrote it. |
Experiments | DF-LDA adds constraints to LDA . |
Proposed Seeded Models | The standard LDA and existing aspect and sentiment models (ASMs) are mostly governed by the phenomenon called “higher-order co-occurrence” (Heinrich, 2009), i.e., based on how often terms co-occur in different contexts]. |
Related Work | Existing works are based on two basic models, pLSA (Hofmann, 1999) and LDA (Blei et al., 2003). |