AKL: Using the Learned Knowledge | To compute this distribution, instead of considering how well 21- matches with w,- only (as in LDA ), we also consider two other factors: |
Experiments | This section evaluates and compares the proposed AKL model with three baseline models LDA , MC-LDA, and GK—LDA. |
Introduction | Traditional topic models such as LDA (Blei et al., 2003) and pLSA (Hofmann, 1999) are unsupervised methods for extracting latent topics in text documents. |
Introduction | We thus propose to first use LDA to learn topics/aspects from each individual domain and then discover the shared aspects (or topics) and aspect terms among a subset of domains. |
Introduction | We propose a method to solve this problem, which also results in a new topic model, called AKL (Automated Knowledge LDA ), whose inference can exploit the automatically learned prior knowledge and handle the issues of incorrect knowledge to produce superior aspects. |
Learning Quality Knowledge | This section details Step 1 in the overall algorithm, which has three sub-steps: running LDA (or AKL) on each domain corpus, clustering the resulting topics, and mining frequent patterns from the topics in each cluster. |
Learning Quality Knowledge | Since running LDA is simple, we will not discuss it further. |
Learning Quality Knowledge | After running LDA (or AKL) on each domain corpus, a set of topics is obtained. |
Overall Algorithm | Lines 3 and 5 run LDA on each review domain corpus Di 6 D L to generate a set of aspects/topics A, (lines 2, 4, and 6-9 will be discussed below). |
Overall Algorithm | Scalability: the proposed algorithm is naturally scalable as both LDA and AKL run on each domain independently. |
Abstract | In this paper, we propose a weakly supervised algorithm in which supervision comes in the form of labeling of Latent Dirichlet Allocation ( LDA ) topics. |
Background 3.1 LDA | LDA is an unsupervised probabilistic generative model for collections of discrete data such as text documents. |
Background 3.1 LDA | The generative process of LDA can be described as follows: |
Background 3.1 LDA | The key problem in LDA is posterior inference. |
Introduction | In this paper, we propose a text classification algorithm based on Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) which does not need labeled documents. |
Introduction | LDA is an unsupervised probabilistic topic model and it is widely used to discover latent semantic structure of a document collection by modeling words in the documents. |
Introduction | (Blei et al., 2003) used LDA topics as features in text classification, but they use labeled documents while learning a classifier. |
Related Work | As LDA topics are semantically more meaningful than individual words and can be acquired easily, our approach overcomes limitations of the semi-supervised methods discussed above. |
Discussion | The first example shows both LDA and ptLDA improve the baseline. |
Experiments | Topic Models Configuration We compare our polylingual tree-based topic model (ptLDA) against tree-based topic models (tLDA), polylingual topic models (pLDA) and vanilla topic models ( LDA ).3 We also examine different inference algorithms—Gibbs sampling (gibbs), variational inference (variational) and hybrid approach (variational-hybrid)—on the effects of SMT performance. |
Experiments | We refer to the SMT model without domain adaptation as baseline.5 LDA marginally improves machine translation (less than half a BLEU point). |
Experiments | 3For Gibbs sampling, we use implementations available in Hu and Boyd-Graber (2012) for tLDA; and Mallet (McCallum, 2002) for LDA and pLDA. |
Inference | p(zdn : k7 ydn : Slfizdna fiydn7 057/6) _ Nk|d+0z 0C ll — wdn] Zk,(Nk, ld+a ) OH Ni—>j|k+fli—>j |
Introduction | Probabilistic topic models (Blei and Lafferty, 2009), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA ), are one of the most popular statistical frameworks for navigating large unannotated document collections. |
Polylingual Tree-based Topic Models | Generative Process As in LDA , each word token is associated with a topic. |
Polylingual Tree-based Topic Models | With these correlated in topics in hand, the generation of documents are very similar to LDA . |
Topic Models for Machine Translation | While vanilla topic models ( LDA ) can only be applied to monolingual data, there are a number of topic models for parallel corpora: Zhao and Xing (2006) assume aligned word pairs share same topics; Mimno et al. |
Abstract | We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multilingual corpus analysis. |
Abstract | We experiment on two code-switching corpora (English-Spanish Twitter data and English-Chinese Weibo data) and show that csLDA improves perpleXity over LDA , and learns semantically coherent aligned topics as judged by human annotators. |
Code-Switching | We call the resulting model Code-Switched LDA (csLDA). |
Code-Switching | 3.1 Inference Inference for csLDA follows directly from LDA . |
Code-Switching | Instead, we constructed a baseline from LDA run on the entire dataset (no |
Learning | For the LDA regularizer, L = R X K. For the Brown cluster regularizer, L = V — 1. |
Structured Regularizers for Text | 4.3 LDA Regularizer |
Structured Regularizers for Text | We do this by inferring topics in the training corpus by estimating the latent Dirichlet allocation ( LDA ) model (Blei et al., 2003)). |
Structured Regularizers for Text | Note that LDA is an unsupervised method, so we can infer topical structures from any collection of documents that are considered related to the target corpus (e. g., training documents, text from the web, etc.). |
Experiments | 1(f) plots the perpleXity of LDA models with 20 topics learned from Reuters, ZOnews, Enwiki, Zipfl, and Ziprix versus the size of reduced vocabulary on a log-log graph. |
Experiments | Table 2: Computational time and memory size for LDA learning on the original corpus, (1/ 10)-reduced corpus, and (1/20)-reduced corpus of Reuters. |
Experiments | Finally, let us examine the computational costs for LDA learning. |
Perplexity on Reduced Corpora | In this section, we consider the perpleXity of the widely used topic model, Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), by using the notation given in (Griffiths and Steyvers, 2004). |
Perplexity on Reduced Corpora | LDA is a probabilistic language model that generates a corpus as a mixture of hidden topics, and it allows us to infer two parameters: the document-topic distribution 6 that represents the mixture rate of topics in each document, and the topic-word distribution gb that represents the occurrence rate of words in each topic. |
Perplexity on Reduced Corpora | The assumption placed on 9b may not be reasonable in the case of d, because we can easily think of a document with only one topic, and we usually use a small number T of topics for LDA , e.g., T = 20. |
Abstract | In this paper, we propose a novel Emotion-aware LDA (EaLDA) model to build a domain-specific lexicon for predefined emotions that include anger, disgust, fear, joy, sadness, surprise. |
Algorithm | In this section, we rigorously define the emotion-aware LDA model and its learning algorithm. |
Algorithm | Like the standard LDA model, EaLDA is a generative model. |
Algorithm | The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’). |
Conclusions and Future Work | In this paper, we have presented a novel emotion-aware LDA model that is able to quickly build a fine-grained domain-specific emotion lexicon for languages without many manually constructed resources. |
Conclusions and Future Work | The proposed EaLDA model extends the standard LDA model by accepting a set of domain-independent emotion words as prior knowledge, and guiding to group semantically related words into the same emotion category. |
Introduction | The proposed EaLDA model extends the standard Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) model by employing a small set of seeds to guide the model generating topics. |
Related Work | Our approach relates most closely to the method proposed by Xie and Li (2012) for the construction of lexicon annotated for polarity based on LDA model. |
Background and overview of models | training an LDA topic model (Blei et al., 2003) on a superset of the child-directed transcript data we use for lexical-phonetic learning, dividing the transcripts into small sections (the ‘documents’ in LDA ) that serve as our distinct situations h. As noted above, the learned document-topic distributions 6 are treated as observed variables in the TLD model to represent the situational context. |
Background and overview of models | The topic-word distributions learned by LDA are discarded, since these are based on the (correct and unambiguous) words in the transcript, whereas the TLD model is presented with phonetically ambiguous versions of these word tokens and must learn to disambiguate them and associate them with topics. |
Conclusion | Regardless of the specific way in which infants encode semantic information, our method of adding this information by using LDA topics from transcript data was shown to be effective. |
Experiments | The input to the TLD model includes a distribution over topics for each situation, which we infer in advance from the full Brent corpus (not only the C1 subset) using LDA . |
Inference: Gibbs Sampling | The first factor, the prior probability of topic k in document h, is given by 6M, obtained from the LDA . |
Topic-Lexical-Distributional Model | There are a fixed number of lower level topic-lexicons; these are matched to the number of topics in the LDA model used to infer the topic distributions (see Section 6.4). |
Conclusion | Comparing with macro topics of documents inferred by LDA with bag of words from the whole documents, word senses inferred by the HDP-based WSI can be considered as micro topics. |
Related Work | They adapt LDA to word sense induction by building one topic model per word type. |
WSI-Based Broad-Coverage Sense Tagger | We first describe WSI, especially WSI based on the Hierarchical Dirichlet Process (HDP) (Teh et al., 2004), a nonparametric version of Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
WSI-Based Broad-Coverage Sense Tagger | The conventional topic distribution 6j for the j-th pseudo document is taken as the the distribution over senses for the given word type W. The LDA generative process for sense induction is as follows: 1) for each pseudo document Dj, draw a per-document sense distribution 63- from a Dirichlet distribution Dir(a); 2) for each item ww- in the pseudo document Dj, 2.1) draw a sense cluster sm- N Multinomial(6j); and 2.2) draw a word tum- ~ @st where cps“ is the distribution of sense 3M over words drawn from a Dirichlet distribution Dir(6 |
WSI-Based Broad-Coverage Sense Tagger | As LDA needs to manually specify the number of senses (topics), a better idea is to let the training data automatically determine the number of senses for each word type. |
Introduction | In this paper, we retain the underlying HMM, but assume words are emitted using topic models (TM), exemplified by latent Dirichlet allocation (Blei et al., 2003, LDA ). |
Introduction | LDA assumes each word in an utterance is drawn from one of a set of latent topics, where each topic is a multinomial distribution over the vocabulary. |
Introduction | This paper is organized as follows: Section 2 introduces two task-oriented domains and corpora; Section 3 details three new unsupervised generative models which combine HMMs and LDA and efficient inference schemes; Section 4 evaluates our models qualitatively and quantitatively, and finally conclude in Section 5. |
Latent Structure in Dialogues | We assume 6’s and qb’s are drawn from corresponding Dirichlet priors, as in LDA . |
Latent Structure in Dialogues | All probabilities can be computed using collapsed Gibbs sampler for LDA (Griffiths |
Latent Structure in Dialogues | Again, we impose Dirichlet priors on distributions over topics 6’s and distributions over words qb’s as in LDA . |
Experimental Setup | To compute the LDA features, we use the online variational Bayes algorithm of (Hoffman et al., 2010) as implemented in the Gensim software package (Rehurek and Sojka, 2010). |
Experimental Setup | More inclusive is the feature set NO-LDA, which includes all features except the LDA features. |
Experimental Setup | Experiments with this set were performed in order to isolate the effect of the LDA features. |
Our Proposal: A Latent LC Approach | We further incorporate features based on a Latent Dirichlet Allocation ( LDA ) topic model (Blei et al., 2003). |
Our Proposal: A Latent LC Approach | We populate the pseudo-documents of an LC with its arguments according to R. We then train an LDA model with 25 topics over these documents. |
Introduction | In this way, the topic of a sentence can be inferred with document-level information using off-the-shelf topic modeling toolkits such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) or Hidden Topic Markov Model (HTMM) (Gruber et al., 2007). |
Introduction | Although we can easily apply LDA at the |
Introduction | Additionally, our model can be discriminatively trained with a large number of training instances, without expensive sampling methods such as in LDA or HTMM, thus it is more practicable and scalable. |
Related Work | Experiments show that their approach not only achieved better translation performance but also provided a faster decoding speed compared with previous lexicon-based LDA methods. |
Related Work | Generally, most previous research has leveraged conventional topic modeling techniques such as LDA or HTMM. |
Baselines | Here, the topics are extracted from all the documents in the *SEM 2012 shared task using the LDA Gibbs Sampling algorithm (Griffiths, 2002). |
Baselines | In the topic-driven word-based graph model, the first layer denotes the relatedness among content words as captured in the above word-based graph model, and the second layer denotes the topic distribution, with the dashed lines between these two layers indicating the word-topic model return by LDA . |
Baselines | where Rel(w,, rm) is the weight of word w in topic rm calculated by the LDA Gibbs Sampling algorithm. |