Introduction | To capture this intuition, one solution is to assume that posts published within the same short time window follow the same topic distribution . |
Introduction | Our model is based on the following two assumptions: (1) If a post is about a global event, it is likely to follow a global topic distribution that is time-dependent. |
Introduction | (2) If a post is about a personal topic, it is likely to follow a personal topic distribution that is more or less stable over time. |
Method | In standard LDA, a document contains a mixture of topics, represented by a topic distribution , and each word has a hidden topic label. |
Method | As we will see very soon, this treatment also allows us to model topic distributions at time window level and user level. |
Method | To model this observation, we assume that there is a global topic distribution 6’ for each time point t. Presumably 6’ has a high probability for a topic that is popular in the microblog-sphere at time 75. |
Related Work | (2004) expand topic distributions from document-level to user-level in order to capture users’ specific interests. |
Abstract | We use these topic distributions to compute topic-dependent lexical weighting probabilities and directly incorporate them into our translation model as features. |
Introduction | In our case, by building a topic distribution for the source side of the training data, we abstract the notion of domain to include automatically derived subcorpora with probabilistic membership. |
Introduction | This topic model infers the topic distribution of a test set and biases sentence translations to appropriate topics. |
Model Description | Given a parallel training corpus T composed of documents di, we build a source side topic model over T, which provides a topic distribution p(zn|d7;) for Zn 2 {1, . |
Model Description | Then, we assign p(zn|d7;) to be the topic distribution for every sentence 303- 6 di, thus enforcing topic sharing across sentence pairs in the same document instead of treating them as unrelated. |
Model Description | Computing the topic distribution over a document and assigning it to the sentences serves to tie the sentences together in the document context. |
Abstract | We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. |
Background: Topic Model | The first one relates to the document-topic distribution, which records the topic distribution of each document. |
Introduction | Consequently, we propose a topic similarity model for hierarchical phrase-based translation (Chiang, 2007), where each synchronous rule is associated with a topic distribution . |
Introduction | 0 Given a document to be translated, we calculate the topic similarity between a rule and the document based on their topic distributions . |
Introduction | 0 We estimate the topic distribution for a rule based on both the source and target side topic models (Section 4.1). |
Topic Similarity Model | Therefore, we use topic distribution to describe the relatedness of rules to topics. |
Topic Similarity Model | Figure 1 shows four synchronous rules (Chiang, 2007) with topic distributions , some of which contain nonterminals. |
Topic Similarity Model | We can see that, although the source part of rule (b) and (c) are identical, their topic distributions are quite different. |
Experiments | * Base—MCM: Our first version injects an informative prior for domain, dialog act and slot topic distributions using information extracted from only labeled training utterances and inject as prior constraints (corpus n-gram base measure during topic assignments. |
MultiLayer Context Model - MCM | In the topic model literature, such constraints are sometimes used to deterministically allocate topic assignments to known labels (Labeled Topic Modeling (Ramage et al., 2009)) or in terms of pre-learnt topics encoded as prior knowledge on topic distributions in documents (Reisinger and Pasca, 2009). |
MultiLayer Context Model - MCM | * Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions . |
Tweet Recommendation Framework | Given n topics, we obtain a topic distribution matrix D using Latent Dirichlet Allocation (Blei et al., 2003). |
Tweet Recommendation Framework | Consider a user with a topic preference vector t and topic distribution matrix D. We calculate the response probability 1' for all tweets for this user as: |
Tweet Recommendation Framework | ,rw]1xw, where w<|VM| for a given user and the topic distribution matrix D, our task is estimate the topic preference vector t. We do this using maximum-likelihood |