Abstract | Our approach employs a monolingual LDA topic model to derive a similarity measure between the test conversation and the set of training conversations, which is used to bias translation choices towards the current context. |
Corpus Data and Baseline SMT | We use the DARPA TransTac English-Iraqi parallel two-way spoken dialogue collection to train both translation and LDA topic models. |
Corpus Data and Baseline SMT | We use the English side of these conversations for training LDA topic models. |
Incremental Topic-Based Adaptation | 4.1 Topic modeling with LDA |
Incremental Topic-Based Adaptation | We use latent Dirichlet allocation, or LDA , (Blei et al., 2003) to obtain a topic distribution over conversations. |
Incremental Topic-Based Adaptation | For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics. |
Introduction | We begin by building a monolingual latent Dirichlet allocation (LDA) topic model on the training conversations (each conversation corresponds to a “document” in the LDA paradigm). |
Relation to Prior Work | (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document. |
Relation to Prior Work | While our proposed approach also employs monolingual LDA topic models, it deviates from the above methods in the following important ways. |
Abstract | The latent topic distribution estimated by Latent Dirichlet Allocation ( LDA ) is used to represent each text block. |
Abstract | We evaluate two approaches employing LDA and probabilistic latent semantic analysis (PLSA) distributions respectively. |
Introduction | To deal with this issue, Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) has been proposed. |
Introduction | LDA has been proved to be effective in many segmentation tasks (Arora and Ravindran, 2008; Hall et al., 2008; Sun et al., 2008; Riedl and Biemann, 2012; Chien and Chueh, 2012). |
Our Proposed Approach | In this paper, we propose to apply LE on the LDA topic distributions, each of which is estimated from a text block. |
Our Proposed Approach | 2.1 Latent Dirichlet Allocation Latent Dirichlet allocation ( LDA ) (Blei et al., 2003) is a generative probabilistic model of a corpus. |
Our Proposed Approach | In LDA , given a corpus D 2 {d1, d2, . |
Background and Model Setting | Several more recent works utilize a Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) framework. |
Background and Model Setting | We note that a similar LDA model construction was employed also in (Séaghdha, 2010), for estimating predicate-argument likelihood. |
Background and Model Setting | First, an LDA model is constructed, as follows. |
Introduction | cation ( LDA ) model. |
Introduction | Rather than computing a single context-insensitive rule score, we compute a distinct word-level similarity score for each topic in an LDA model. |
Two-level Context-sensitive Inference | Based on all pseudo-documents we learn an LDA model and obtain its associated probability distributions. |
Two-level Context-sensitive Inference | At learning time, we compute for each candidate rule a separate, topic-biased, similarity score per each of the topics in the LDA model. |
A Gibbs Sampling Algorithm | LDA (Griffiths and Steyvers, 2004). |
A Gibbs Sampling Algorithm | where Cfpn indicates that term n is excluded from the corresponding document or topic; 7 = Nid; and Afin = fl 2k, 77/9/0527, is the discriminant function value without word n. We can see that the first term is from the LDA model for observed word counts and the second term is from |
Experiments | We compare the generalized logistic supervised LDA using Gibbs sampling (denoted by gSLDA) with various competitors, including the standard sLDA using variational mean-field methods (denoted by vSLDA) (Wang et al., 2009), the MedLDA model using variational mean-field methods (denoted by vMedLDA) (Zhu et al., 2012), and the MedLDA model using collapsed Gibbs sampling algorithms (denoted by gMedLDA) (Jiang et al., 2012). |
Introduction | As widely adopted in supervised latent Dirichlet allocation (sLDA) models (Blei and McAuliffe, 2010; Wang et al., 2009), one way to improve the predictive power of LDA is to define a likelihood model for the widely available document-level response variables, in addition to the likelihood model for document words. |
Introduction | Though powerful, one issue that could limit the use of existing logistic supervised LDA models is that they treat the document-level response variable as one additional word via a normalized likelihood model. |
Introduction | For Bayesian LDA models, we can also explore the conjugacy of the Dirichlet-Multinomial prior-likelihood pairs to collapse out the Dirichlet variables (i.e., topics and mixing proportions) to do collapsed Gibbs sampling, which can have better mixing rates (Griffiths and Steyvers, 2004). |
Logistic Supervised Topic Models | A logistic supervised topic model consists of two parts — an LDA model (Blei et al., 2003) for describing the words W = {wd}dD=1, where Wd 2 {wdnfigl denote the words within document d, and a logistic classifier for considering the supervising signal y = {yd}dD=1. |
Logistic Supervised Topic Models | LDA: LDA is a hierarchical Bayesian model that posits each document as an admixture of K topics, where each topic (I);g is a multinomial distribution over a V-word vocabulary. |
Logistic Supervised Topic Models | For fully-Bayesian LDA , the topics are random samples from a Dirichlet prior, (I);c N Dir(,6). |
Experiments | Since MTR provides a mixture of properties adapted from earlier models, we present performance benchmarks on tag clustering using: (i) LDA; (ii) Hidden Markov Topic Model HMTM (Gruber et al., 2005); and, (iii) w-LDA (Petterson et al., 2010) that uses word features as priors in LDA . |
Experiments | 0.6 T I ags I _ LDA w—LDA HMTM MTR |
Experiments | models: LDA , HMTM, w—LDA. |
Markov Topic Regression - MTR | LDA assumes that the latent topics of documents are sampled independently from one of K topics. |
Related Work and Motivation | Standard topic models, such as Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003), use a bag-of-words approach, which disregards word order and clusters words together that appear in a similar global context. |
Related Work and Motivation | In LDA , common words tend to dominate all topics causing related words to end up in different topics. |
Related Work and Motivation | In (Petterson et al., 2010), the vector-based features of words are used as prior information in LDA so that the words that are synonyms end up in same topic. |
Bilingual LDA Model | 2.1 Standard LDA |
Bilingual LDA Model | LDA model (Blei et al., 2003) represents the latent topic of the document distribution by Dirichlet distribution with a K-dimensional implicit random variable, which is transformed into a complete generative model when ,8 is exerted to Dirichlet distribution (Griffiths et al., 2004) (Shown in Fig. |
Bilingual LDA Model | Figure 1: Standard LDA model |
Building comparable corpora | Based on the bilingual LDA model, building comparable corpora includes several steps to |
Introduction | Based on Bilingual LDA Model |
Introduction | The paper concretely includes: 1) Introduce the Bilingual LDA (Latent Dirichlet Allocation) model which builds comparable corpora and improves the efficiency of matching similar documents; 2) Design a novel method of TFIDF (Topic Frequency-Inverse Document Frequency) to enhance the distinguishing ability of topics from different documents; 3) Propose a tailored |
Introduction | Latent Semantic Indexing (Deerwester et al., 1990) (LSI), probabilistic Latent Semantic Analysis (Hofmann, 2001) (pLSA) and Latent Dirichlet Allocation (Blei et al., 2003) ( LDA ) are the most famous approaches that tried to tackle this problem throughout the years. |
Introduction | This is one of the reasons of the intensive use of topic models (and especially LDA ) in current research in Natural Language Processing (NLP) related areas. |
Introduction | The approach by Wei and Croft (2006) was the first to leverage LDA topics to improve the estimate of document language models and achieved good empirical results. |
Topic-Driven Relevance Models | We specifically focus on Latent Dirichlet Allocation ( LDA ), since it is currently one of the most representative. |
Topic-Driven Relevance Models | In LDA , each topic multinomial distribution gbk is generated by a conjugate Dirichlet prior with parameter [3, while each document multinomial distribution 6d is generated by a conjugate Dirichlet prior with parameter 04. |
Topic-Driven Relevance Models | TDRM relies on two important parameters: the number of topics K that we want to learn, and the number of feedback documents N from which LDA learns the topics. |
Experiments | For LDA-6 and LDA-wvec, we run Gibbs Sampling based LDA for 2000 iterations and average the model over the last 10 iterations. |
Experiments | For LDA we tune the hyperparameter 04 (Dirichlet prior for topic distribution of a document) and [3 (Dirichlet prior for word distribution given a topic). |
Introduction | WTMF is a state-of-the-art unsupervised model that was tested on two short text similarity datasets: (Li et al., 2006) and (Agirre et al., 2012), which outperforms Latent Semantic Analysis [LSA] (Landauer et al., 1998) and Latent Dirichelet Allocation [ LDA ] (Blei et al., 2003) by a large margin. |
Introduction | We employ it as a strong baseline in this task as it exploits and effectively models the missing words in a tweet, in practice adding thousands of more features for the tweet, by contrast LDA , for example, only leverages observed words (14 features) to infer the latent vector for a tweet. |
Related Work | (2010) also use hashtags to improve the latent representation of tweets in a LDA framework, Labeled-LDA (Ramage et al., 2009), treating each hashtag as a label. |
Related Work | Similar to the experiments presented in this paper, the result of using Labeled-LDA alone is worse than the IR model, due to the sparseness in the induced LDA latent vector. |
Related Work | (2011) apply an LDA based model on clustering by incorporating url referred documents. |
Comparable Question Mining | Input: A news article Output: A sorted list of comparable questions 1: Identify all target named entities (NEs) in the article 2: Infer the distribution of LDA topics for the article 3: For each comparable relation R in the database, compute its relevance score to be the similarity between the topic distributions of R and the article 4: Rank all the relations according to their relevance score and pick the top M as relevant 5: for each relevant relation R in the order of relevance ranking do 6: Filter out all the target NEs that do not pass the single entity classifier for R 7: Generate all possible NE pairs from the those that passed the single classifier 8: Filter out all the generated NE pairs that do not pass the entity pair classifier for R 9: Pick up the top N pairs with positive classification score to be qualified for generation |
Evaluation | The reason for this mistake is that many named entities appear as frequent terms in LDA topics, and thus mentioning many names that belong to a single topic drives LDA to assign this topic a high probability. |
Online Question Generation | Specifically, we utilize Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003) to infer latent topics in texts. |
Online Question Generation | To train an LDA model, we constructed for each comparable relation a pseudo-document consisting of all questions that contain this relation in our corpus (the supporting questions). |
Online Question Generation | An additional product of the LDA training process is a topic distribution for each relation’s pseudo-document, which we consider as the relation’s context profile. |
Related Work | Instead, we are interested in a higher level topical similarity to the input article, for which LDA topics were shown to help (Celikyilmaz et al., 2010). |
Background | Several techniques can be used for this purpose such as Latent Semantic Analysis (LSA) (Deerwester et al., 1990), Probabilistic Latent Semantic Analysis (PLSA) (Hofmann, 1999), and Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
Background | LDA , first defined by (Blei et al., 2003), defines topic as a distribution over a fixed vocabulary, where each document can exhibit them with different proportions. |
Background | For each document, LDA generates the words in a two-step process: |
Experiments | LDA was chosen to generate the topic models of clinical reports due to its being a generative probabilistic system for documents and its robustness to overfitting. |
Experiment | Furthermore, the hyper-parameters for topic probability distribution and word probability distribution in LDA are a=0.5 and [3:05, respectively. |
Experiment | Here, in the case of clustering the documents based on the topic probabilistic distribution by LDA , the topic distribution over documents 6 is changed in every estimation. |
Introduction | 3) Information used for classification — we use latent information estimated by latent Dirichlet allocation ( LDA ) (Blei et al., 2003) to classify documents, and compare the results of the cases using both surface and latent information. |
Techniques for text classification | After obtaining a collection of refined documents for classification, we adopt LDA to estimate the latent topic probabilistic distributions over the target documents and use them for clustering. |
Techniques for text classification | As for the refined document obtained in step 2, the latent topics are estimated by means of LDA . |
Attribute-based Semantic Models | (2009) present an extension of LDA (Blei et al., 2003) where words in documents and their associated attributes are treated as observed variables that are explained by a generative process. |
Attribute-based Semantic Models | Inducing these attribute-topic components from Q) with the extended LDA model gives two sets of parameters: word probabilities given components PW (wi|X = xc) for w, i = l, ...,n, and attribute probabilities given components PA(ak|X = xc) for ak, k = 1,...,F. For example, most of the probability mass of a component x would be reserved for the words shirt, coat, dress and the attributes has_1_piece, has_seams, made_of_materia| and so on. |
Related Work | Their model is essentially Latent Dirichlet Allocation ( LDA , Blei et al., 2003) trained on a corpus of multimodal documents (i.e., BBC news articles and their associated images). |
Results | Unseen are concepts covered by LDA but unknown to the attribute classifiers (N = 388). |
Abstract | We also applied Latent Dirichlet Allocation ( LDA ; Blei et al., 2003) to learn a distribution over latent topics in the extracted data, as this is a popular exploratory data analysis method. |
Abstract | In LDA a topic is a unigram distribution over words, and each document is modeled as a distribution over topics. |
Abstract | Some of the topics that LDA finds correspond closely with specific domains, such as topics 1 (blingee . |
Related Work 2.1 Market Prediction and Social Media | One of the basic and most widely used models is Latent Dirichlet Allocation ( LDA ) (Blei et al., 2003). |
Related Work 2.1 Market Prediction and Social Media | LDA can learn a predefined number of topics and has been widely applied in its extended forms in sentiment analysis and many other tasks (Mei et al., 2007; Branavan et al., 2008; Lin and He, 2009; Zhao et al., 2010; Wang et al., 2010; Brody and Elhadad, 2010; Jo and Oh, 2011; Moghaddam and Ester, 2011; Sauper et al., 2011; Mukherjee and Liu, 2012; He et al., 2012). |
Related Work 2.1 Market Prediction and Social Media | The Dirichlet Processes Mixture (DPM) model is a nonparametric extension of LDA (Teh et al., 2006), which can estimate the number of topics inherent in the data itself. |
RSP: A Random Walk Model for SP | LDA-SP: Another kind of sophisticated unsupervised approaches for SP are latent variable models based on Latent Dirichlet Allocation ( LDA ). |
RSP: A Random Walk Model for SP | C) Seaghdha (2010) applies topic models for the SP induction with three variations: LDA , Rooth-LDA, and Dual-LDA; Ritter et al. |
RSP: A Random Walk Model for SP | In this work, we compare with C) Seaghdha’s original LDA approach to SP. |