Index of papers in Proc. ACL 2013 that mention
  • topic distribution
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Abstract
A significant novelty of our adaptation technique is its incremental nature; we continuously update the topic distribution on the evolving test conversation as new utterances become available.
Corpus Data and Baseline SMT
The training, development and test sets were partitioned at the conversation level, so that we could model a topic distribution for entire conversations, both during training and during tuning and testing.
Incremental Topic-Based Adaptation
The topic distribution is incrementally updated as the conversation history grows, and we recompute the topic similarity between the current conversation and the training conversations for each new source utterance.
Incremental Topic-Based Adaptation
We use latent Dirichlet allocation, or LDA, (Blei et al., 2003) to obtain a topic distribution over conversations.
Incremental Topic-Based Adaptation
For each conversation di in the training collection (1,600 conversations), LDA infers a topic distribution Qdi = p(zk|di) for all latent topics 2],, = {1, ...,K}, where K is the number of topics.
Introduction
At runtime, this model is used to infer a topic distribution over the evolving test conversation up to and including the current utterance.
Introduction
Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
Introduction
The topic distribution for the test conversation is updated incrementally for each new utterance as the available history grows.
Relation to Prior Work
(2012), who both describe adaptation techniques where monolingual LDA topic models are used to obtain a topic distribution over the training data, followed by dynamic adaptation of the phrase table based on the inferred topic of the test document.
Relation to Prior Work
Our proposed approach infers topic distributions incrementally as the conversation progresses.
Relation to Prior Work
Second, we do not directly augment the translation table with the inferred topic distribution .
topic distribution is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Lu, Xiaoming and Xie, Lei and Leung, Cheung-Chi and Ma, Bin and Li, Haizhou
Abstract
We present an efficient approach for broadcast news story segmentation using a manifold learning algorithm on latent topic distributions .
Abstract
The latent topic distribution estimated by Latent Dirichlet Allocation (LDA) is used to represent each text block.
Abstract
We employ Laplacian Eigenmaps (LE) to project the latent topic distributions into low-dimensional semantic representations while preserving the intrinsic local geometric structure.
Experimental setup
0 PLSA-DP: PLSA topic distributions were used to compute sentence cohesive strength.
Introduction
To further improve the segmentation performance, using latent topic distributions and LE instead of term frequencies to represent text blocks is studied in this paper.
Our Proposed Approach
In this paper, we propose to apply LE on the LDA topic distributions , each of which is estimated from a text block.
Our Proposed Approach
[3 is a K x V matrix, which defines the latent topic distributions over terms.
Our Proposed Approach
Given the ASR transcripts of N text blocks, we apply LDA algorithm to compute the corresponding latent topic distributions X = [x1, x2, .
topic distribution is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Melamud, Oren and Berant, Jonathan and Dagan, Ido and Goldberger, Jacob and Szpektor, Idan
Background and Model Setting
Next, an LDA model is learned from the set of all pseudo-documents, extracted for all predicates.2 The learning process results in the construction of K latent topics, where each topic 75 specifies a distribution over all words, denoted by p(w|t), and a topic distribution for each pseudo-document d, denoted by p(t|d).
Background and Model Setting
Within the LDA model we can derive the a-posteriori topic distribution conditioned on a particular word within a document, denoted by p(t|d,w) oc p(w|t) -p(t|d).
Background and Model Setting
Dinu and Lapata (2010b) presented a slightly different similarity measure for topic distributions that performed better in their setting as well as in a related later paper on context-sensitive scoring of lexical similarity (Dinu and Lapata, 2010a).
Discussion and Future Work
Then, given a specific candidate rule application, the LDA model is used to infer the topic distribution relevant to the context specified by the given arguments.
Discussion and Future Work
Finally, the context-sensitive rule application score is computed as a weighted average of the per-topic word-level similarity scores, which are weighed according to the inferred topic distribution .
Discussion and Future Work
Finally, they train a classifier to translate a given target word based on these tables and the inferred topic distribution of the given document in which the target word appears.
Introduction
Then, similarity is measured between the two topic distribution vectors corresponding to the two sides of the rule in the given context, yielding a context-sensitive score for each particular rule application.
Introduction
Then, when applying a rule in a given context, these different scores are weighed together based on the specific topic distribution under the given context.
topic distribution is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Li, Hao and Ji, Heng and Diab, Mona
Experiments
We use the inferred topic distribution 6 as a latent vector to represent the tweet/news.
Experiments
The problem with LDA-6 is the inferred topic distribution latent vector is very sparse with only a few nonzero values, resulting in many tweet/news pairs receiving a high similarity value as long as they are in the same topic domain.
Experiments
Hence following (Guo and Diab, 2012b), we first compute the latent vector of a word by ( topic distribution per word), then average the word latent vectors weighted by TF-IDF values to represent the short text, which yields much better results.
Related Work
The semantics of long documents are transferred to the topic distribution of tweets.
topic distribution is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sarioglu, Efsun and Yadav, Kabir and Choi, Hyeong-Ah
Abstract
Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes.
Experiments
Topic modeling of reports produces a topic distribution for each report which can be used to represent them as topic vectors.
Experiments
With this approach, a representative topic vector for each class was composed by averaging their corresponding topic distributions in the training dataset.
Introduction
Topic modeling is an unsupervised technique that can automatically identify themes from a given set of documents and find topic distributions of each document.
Introduction
Representing reports according to their topic distributions is more compact and can be processed faster than raw text in subsequent automated processing.
topic distribution is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ferschke, Oliver and Gurevych, Iryna and Rittberger, Marc
Abstract
We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles.
Introduction
It is rather necessary to determine reliable negative instances with a similar topic distribution as the set of positive instances in order to factor out the sampling bias.
Selection of Reliable Training Instances
In order to factor out the article topics as a major characteristic for distinguishing flawed articles from the set of outliers, reliable negative instances Are; have to be sampled from the restricted topic set Ampic that contains articles with a topic distribution similar to the flawed articles in A f (see Fig.
topic distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhu, Zede and Li, Miao and Chen, Lei and Yang, Zhenxin
Building comparable corpora
given bilingual corpora, predict the topic distribution (9m, kof the new documents, calculate the
Building comparable corpora
P(Z) as prior topic distribution is assumed a
Introduction
(2009) adapted monolingual topic model to bilingual topic model in which the documents of a concept unit in different languages were assumed to share identical topic distribution .
topic distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: