Abstract | We present an efficient approach for broadcast news story segmentation using a manifold learning algorithm on latent topic distributions . |
Abstract | The latent topic distribution estimated by Latent Dirichlet Allocation (LDA) is used to represent each text block. |
Abstract | We employ Laplacian Eigenmaps (LE) to project the latent topic distributions into low-dimensional semantic representations while preserving the intrinsic local geometric structure. |
Experimental setup | 0 PLSA-DP: PLSA topic distributions were used to compute sentence cohesive strength. |
Introduction | To further improve the segmentation performance, using latent topic distributions and LE instead of term frequencies to represent text blocks is studied in this paper. |
Our Proposed Approach | In this paper, we propose to apply LE on the LDA topic distributions , each of which is estimated from a text block. |
Our Proposed Approach | [3 is a K x V matrix, which defines the latent topic distributions over terms. |
Our Proposed Approach | Given the ASR transcripts of N text blocks, we apply LDA algorithm to compute the corresponding latent topic distributions X = [x1, x2, . |
Abstract | A significant novelty of our adaptation technique is its incremental nature; we continuously update the topic distribution on the evolving test conversation as new utterances become available. |
Corpus Data and Baseline SMT | The training, development and test sets were partitioned at the conversation level, so that we could model a topic distribution for entire conversations, both during training and during tuning and testing. |
Incremental Topic-Based Adaptation | The topic distribution is incrementally updated as the conversation history grows, and we recompute the topic similarity between the current conversation and the training conversations for each new source utterance. |
Incremental Topic-Based Adaptation | We use latent Dirichlet allocation, or LDA, (Blei et al., 2003) to obtain a topic distribution over conversations. |
Incremental Topic-Based Adaptation | For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics. |
Introduction | At runtime, this model is used to infer a topic distribution over the evolving test conversation up to and including the current utterance. |
Introduction | Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model. |
Introduction | The topic distribution for the test conversation is updated incrementally for each new utterance as the available history grows. |
Relation to Prior Work | (2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document. |
Relation to Prior Work | Our proposed approach infers topic distributions incrementally as the conversation progresses. |
Relation to Prior Work | Second, we do not directly augment the translation table with the inferred topic distribution . |
Introduction | To capture this intuition, one solution is to assume that posts published within the same short time window follow the same topic distribution . |
Introduction | Our model is based on the following two assumptions: (1) If a post is about a global event, it is likely to follow a global topic distribution that is time-dependent. |
Introduction | (2) If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable over time. |
Method | In standard LDA, a document contains a mixture of topics, represented by a topic distribution , and each word has a hidden topic label. |
Method | As we will see very soon, this treatment also allows us to model topic distributions at time window level and user level. |
Method | To model this observation, we assume that there is a global topic distribution 6’ for each time point t. Presumably 6’ has a high probability for a topic that is popular in the microblog-sphere at time 75. |
Related Work | (2004) expand topic distributions from document-level to user-level in order to capture users’ specific interests. |
Abstract | We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features. |
Introduction | In our case, by building a topic distribution for the source side of the training data, we abstract the notion of domain to include automatically derived subcorpora with probabilistic membership. |
Introduction | This topic model infers the topic distribution of a test set and biases sentence translations to appropriate topics. |
Model Description | Given a parallel training corpus T composed of documents di, we build a source side topic model over T, which provides a topic distribution p(zn|d7;) for Zn 2 {1, . |
Model Description | Then, we assign p(zn|d7;) to be the topic distribution for every sentence 303- 6 di, thus enforcing topic sharing across sentence pairs in the same document instead of treating them as unrelated. |
Model Description | Computing the topic distribution over a document and assigning it to the sentences serves to tie the sentences together in the document context. |
Abstract | We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. |
Background: Topic Model | The first one relates to the document-topic distribution, which records the topic distribution of each document. |
Introduction | Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution . |
Introduction | 0 Given a document to be translated, we calculate the topic similarity between a rule and the document based on their topic distributions . |
Introduction | 0 We estimate the topic distribution for a rule based on both the source and target side topic models (Section 4.1). |
Topic Similarity Model | Therefore, we use topic distribution to describe the relatedness of rules to topics. |
Topic Similarity Model | Figure 1 shows four synchronous rules (Chiang, 2007) with topic distributions , some of which contain nonterminals. |
Topic Similarity Model | We can see that, although the source part of rule (b) and (c) are identical, their topic distributions are quite different. |
Background and Model Setting | Next, an LDA model is learned from the set of all pseudo-documents, extracted for all predicates.2 The learning process results in the construction of K latent topics, where each topic 75 specifies a distribution over all words, denoted by p(w|t), and a topic distribution for each pseudo-document d, denoted by p(t|d). |
Background and Model Setting | Within the LDA model we can derive the a-posteriori topic distribution conditioned on a particular word within a document, denoted by p(t|d,w) oc p(w|t) -p(t|d). |
Background and Model Setting | Dinu and Lapata (2010b) presented a slightly different similarity measure for topic distributions that performed better in their setting as well as in a related later paper on context-sensitive scoring of lexical similarity (Dinu and Lapata, 2010a). |
Discussion and Future Work | Then, given a specific candidate rule application, the LDA model is used to infer the topic distribution relevant to the context specified by the given arguments. |
Discussion and Future Work | Finally, the context-sensitive rule application score is computed as a weighted average of the per-topic word-level similarity scores, which are weighed according to the inferred topic distribution . |
Discussion and Future Work | Finally, they train a classifier to translate a given target word based on these tables and the inferred topic distribution of the given document in which the target word appears. |
Introduction | Then, similarity is measured between the two topic distribution vectors corresponding to the two sides of the rule in the given context, yielding a context-sensitive score for each particular rule application. |
Introduction | Then, when applying a rule in a given context, these different scores are weighed together based on the specific topic distribution under the given context. |
Latent Structure in Dialogues | where a, ,6, *7 are symmetric Dirichlet priors on state-wise topic distribution 675’s, topic-wise word distribution qbt’s and state transition multinomials, respectively. |
Latent Structure in Dialogues | Therefore, the topic distribution is often stable throughout the entire dialogue, and does not vary from turn to turn. |
Latent Structure in Dialogues | To express this, words in the TM—HMMS model are generated either from a dialogue-specific topic distribution , or from a state-specific language model.4 A distribution over sources is sampled once at the beginning of each dialogue and selects the expected fraction of words generated from different sources. |
Background and overview of models | From an acquisition perspective, the observed topic distribution represents the child’s knowledge of the context of the interaction: she can distinguish bathtime from dinnertime, and is able to recognize that some topics appear in certain contexts (e. g. animals on walks, vegetables at dinnertime) and not in others (few vegetables appear at bath-time). |
Background and overview of models | Conversely, potential minimal pairs that occur in situations with similar topic distributions are more likely to belong to the same topic and thus the same lexeme. |
Background and overview of models | Although we assume that children infer topic distributions from the nonlinguistic environment, we will use transcripts from CHILDES to create the word/phone learning input for our model. |
Introduction | However, in our simulations we approximate the environmental information by running a topic model (Blei et al., 2003) over a corpus of child-directed speech to infer a topic distribution for each situation. |
Introduction | These topic distributions are then used as input to our model to represent situational contexts. |
Topic-Lexical-Distributional Model | There are a fixed number of lower level topic-lexicons; these are matched to the number of topics in the LDA model used to infer the topic distributions (see Section 6.4). |
Conclusion | The basic idea of these models is to compare the topic distribution of a target instance with the candidate sense paraphrases and choose the most probable one. |
Experiments | We then compare the topic distributions of literal and nonliteral senses. |
Experiments | As the topic distribution of nouns and verbs exhibit different properties, topic comparisons across parts-of-speech do not make sense. |
Experiments | make the topic distributions comparable by making sure each type of paraphrase contains the same sets of parts-of-speech. |
Introduction | In this paper, we propose a novel framework which is fairly resource-poor in that it requires only 1) a large unlabelled corpus from which to estimate the topics distributions , and 2) paraphrases for the possible target senses. |
Related Work | In addition to generating a topic from the document’s topic distribution and sampling a word from that topic, the enhanced model also generates a distributional neighbour for the chosen word and then assigns a sense based on the word, its neighbour and the topic. |
The Sense Disambiguation Model | A similar topic distribution to that of the individual words ‘norm’ or ‘trouble’ would be strong supporting evidence of the corresponding idiomatic reading.). |
Experiments | We use the inferred topic distribution 6 as a latent vector to represent the tweet/news. |
Experiments | The problem with LDA-6 is the inferred topic distribution latent vector is very sparse with only a few nonzero values, resulting in many tweet/news pairs receiving a high similarity value as long as they are in the same topic domain. |
Experiments | Hence following (Guo and Diab, 2012b), we first compute the latent vector of a word by ( topic distribution per word), then average the word latent vectors weighted by TF-IDF values to represent the short text, which yields much better results. |
Related Work | The semantics of long documents are transferred to the topic distribution of tweets. |
Background and Motivation | An alternative yet feasible solution, presented in this work, is building a model that can summarize new document clusters using characteristics of topic distributions of training documents. |
Introduction | Such models can yield comparable or better performance on DUC and other evaluations, since representing documents as topic distributions rather than bags of words diminishes the effect of lexical variability. |
Introduction | Our focus is on identifying similarities of candidate sentences to summary sentences using a novel tree based sentence scoring algorithm, concerning topic distributions at different levels of the discovered hierarchy as described in § 3 and § 4, |
Summary-Focused Hierarchical Model | We discover hidden topic distributions of sentences in a given document cluster along with provided summary sentences based on hLDA described in (Blei et al., 2003a)1. |
Summary-Focused Hierarchical Model | We build a summary-focused hierarchical probabilistic topic model, sumHLDA, for each document cluster at sentence level, because it enables capturing expected topic distributions in given sentences directly from the model. |
Summary-Focused Hierarchical Model | Each node is associated with a topic distribution over words. |
Abstract | Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes. |
Experiments | Topic modeling of reports produces a topic distribution for each report which can be used to represent them as topic vectors. |
Experiments | With this approach, a representative topic vector for each class was composed by averaging their corresponding topic distributions in the training dataset. |
Introduction | Topic modeling is an unsupervised technique that can automatically identify themes from a given set of documents and find topic distributions of each document. |
Introduction | Representing reports according to their topic distributions is more compact and can be processed faster than raw text in subsequent automated processing. |
Polylingual Tree-based Topic Models | Polylingual topic models (Mimno et al., 2009) assume that the aligned documents in different languages share the same topic distribution and each language has a unique topic distribution over its word types. |
Topic Models for Machine Translation | If each topic defines a SMT domain, the document’s topic distribution is a soft domain assignment for that document. |
Topic Models for Machine Translation | The topics come from source documents only and create topic-specific lexical weights from the per-document topic distribution p(l<: | d). |
Topic Models for Machine Translation | For a test document d, the document topic distribution p(l<: | d) is inferred based on the topics learned from training data. |
Getting Humans in the Loop | In general, the more words in the constraint, the more likely it was to noticeably affect the topic distribution . |
Simulation Experiment | Specifically, for a corpus with M classes, we use the per-document topic distribution as a feature vector in a supervised classi- |
Simulation Experiment | Next, we perform one of the strategies for state ablation, add additional iterations of Gibbs sampling, use the newly obtained topic distribution of each document as the feature vector, and perform classification on the test / train split. |
Simulation Experiment | Doc ablation gives more freedom for the constraints to overcome the inertia of the old topic distribution and move towards a new one influenced by the constraints. |
Experiments | The PLDA toolkit (Liu et al., 2011) is used to infer topic distributions , which takes 34.5 hours to finish. |
Experiments | In contrast, with our neural network based approach, the learned topic distributions of “deliver X” or “distribute X” are more similar with the input sentence than “send X”, which is shown in Figure 4. |
Related Work | (2007) used bilingual LSA to learn latent topic distributions across different languages and enforce one-to-one topic correspondence during model training. |
Topic Similarity Model with Neural Network | Since different sentences may have very similar topic distributions , we select negative instances that are dissimilar with the positive instances based on the following criteria: |
Abstract | We present Code-Switched LDA (csLDA), which infers language specific topic distributions based on code-switched documents to facilitate multilingual corpus analysis. |
Code-Switching | Our solution is to automatically identify aligned polylingual topics after learning by examining a topic’s distribution across code-switched documents. |
Code-Switching | 0 Draw a topic distribution 6d ~ Dir(oz) 0 Draw a language distribution deDiT (7) o For each token i E d: 0 Draw a topic 21- ~ 6d 0 Draw a language I, ~ wd 0 Draw a word 21),- N of, For monolingual documents, we fix I, to the LID tag for all tokens. |
Code-Switching | Since csLDA duplicates topic distributions (2' x L) we used twice as many topics for LDA. |
Model Description | For this reason, we also construct a document-specific topic distribution gb. |
Model Description | The auxiliary variable 0 indicates whether a given word’s topic is drawn from the set of keyphrase clusters, or from this topic distribution . |
Posterior Sampling | The third term is the dependence of the word topics 2d,“, on the topic distribution 77d. |
Posterior Sampling | The word topics 2 are sampled according to keyphrase topic distribution 77d, document topic distribution gbd, words w, and auxiliary variables c: |
Abstract | We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. |
Introduction | It is rather necessary to determine reliable negative instances with a similar topic distribution as the set of positive instances in order to factor out the sampling bias. |
Selection of Reliable Training Instances | In order to factor out the article topics as a major characteristic for distinguishing flawed articles from the set of outliers, reliable negative instances Are; have to be sampled from the restricted topic set Ampic that contains articles with a topic distribution similar to the flawed articles in A f (see Fig. |
Tweet Recommendation Framework | Given n topics, we obtain a topic distribution matrix D using Latent Dirichlet Allocation (Blei et al., 2003). |
Tweet Recommendation Framework | Consider a user with a topic preference vector t and topic distribution matrix D. We calculate the response probability 1' for all tweets for this user as: |
Tweet Recommendation Framework | ,rw]1xw, where w<|VM| for a given user and the topic distribution matrix D, our task is estimate the topic preference vector t. We do this using maximum-likelihood |
Experiments | * Base—MCM: Our first version injects an informative prior for domain, dialog act and slot topic distributions using information extracted from only labeled training utterances and inject as prior constraints (corpus n-gram base measure during topic assignments. |
MultiLayer Context Model - MCM | In the topic model literature, such constraints are sometimes used to deterministically allocate topic assignments to known labels (Labeled Topic Modeling (Ramage et al., 2009)) or in terms of pre-learnt topics encoded as prior knowledge on topic distributions in documents (Reisinger and Pasca, 2009). |
MultiLayer Context Model - MCM | * Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions . |
Abstract | By simultaneously inferring latent topics and topic distributions over relations, LDA-SP combines the benefits of previous approaches: like traditional class-based approaches, it produces human-interpretable classes describing each relation’s preferences, but it is competitive with non-class-based methods in predictive power. |
Topic Models for Selectional Prefs. | ing related topic pairs between arguments we employ a sparse prior over the per-relation topic distributions . |
Topic Models for Selectional Prefs. | Finally we note that, once a topic distribution has been learned over a set of training relations, one can efficiently apply inference to unseen relations (Yao et al., 2009). |
Building comparable corpora | given bilingual corpora, predict the topic distribution (9m, kof the new documents, calculate the |
Building comparable corpora | P(Z) as prior topic distribution is assumed a |
Introduction | (2009) adapted monolingual topic model to bilingual topic model in which the documents of a concept unit in different languages were assumed to share identical topic distribution . |
Extractive Caption Generation | age and a sentence can be broadly measured by the extent to which they share the same topic distributions (Steyvers and Griffiths, 2007). |
Extractive Caption Generation | K 1309,61): 2 pjlogz Ii (4) 1:1 ‘11 where p and q are shorthand for the image topic distribution PdMix and sentence topic distribution PSd, respectively. |
Image Annotation | The image annotation model takes the topic distributions into account when finding the most likely keywords for an image and its associated document. |
The Proposed Method | Thus, we employ a LDA variation (Mukherjee and Liu, 2012), an extension of (Zhao et al., 2010), to discover topic distribution on words, which sampled all words into two separated observations: opinion targets and opinion words. |
The Proposed Method | It’s because that we are only interested in topic distribution of opinion targets/words, regardless of other useless words, including conjunctions, prepositions etc. |
The Proposed Method | p(z|vt) and p(z|v0), and topic distribution Then, a symmetric Kullback—Leibler divergence as same as Eq.5 is used to calculate the semantical associations between any two homoge-nous candidates. |
Anchor Words: Scalable Topic Models | The kth column of A will be the topic distribution over all words for topic 11:, and A111,], is the probability of observing type w given topic 11:. |
Anchor Words: Scalable Topic Models | We use Bayes rule to recover the topic distribution p(w = = k) E |
Regularization Improves Topic Models | Held-out likelihood cannot be computed with existing anchor algorithms, so we use the topic distributions learned from anchor as input to a reference variational inference implementation (Blei et al., 2003) to compute HL. |
Baselines | In particular, a word-based graph model is proposed to represent the explicit relatedness among words in a discourse from the lexical perspective, while a topic-driven word-based model is proposed to enrich the implicit relatedness between words, by adding one more layer to the word-based graph model in representing the global topic distribution of the whole dataset. |
Baselines | In order to reduce the gap, we propose a topic-driven word-based model by adding one more layer to refine the word-based graph model over the global topic distribution , as shown in Figure 2. |
Baselines | In the topic-driven word-based graph model, the first layer denotes the relatedness among content words as captured in the above word-based graph model, and the second layer denotes the topic distribution , with the dashed lines between these two layers indicating the word-topic model return by LDA. |