Abstract | The contribution of the paper is twofold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweetnews pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). |
Conclusion | We formalize the linking task as a short text modeling problem, and extract Twitter/news specific features to extract text-to-text relations, which are incorporated into a latent variable model. |
Experiments | As a latent variable model, it is able to capture global topics (+1.89% ATOP over LDA-wvec); moreover, by explicitly modeling missing words, the existence of a word is also encoded in the latent vector (+2.31% TOPIO and —0.011% RR over IR model). |
Experiments | The only evidence the latent variable models rely on is lexical items (WTMF-G extract additional text-to-text correlation by word matching). |
Introduction | Latent variable models are powerful by going beyond the surface word level and mapping short texts into a low dimensional dense vector (Socher et al., 2011; Guo and Diab, 2012b). |
Introduction | Accordingly, we apply a latent variable model, namely, the Weighted Textual Matrix Factorization [WTMF] (Guo and Diab, 2012b; Guo and Diab, 2012c) to both the tweets and the news articles. |
Introduction | Our proposed latent variable model not only models text-to-word information, but also is aware of the text-to-text information (illustrated in Figure 1): two linked texts should have similar latent vectors, accordingly the semantic picture of a tweet is completed by receiving semantics from its related tweets. |
Conclusions and Future Work | The closed-form online update for our relative margin solution accounts for surrogate references and latent variables . |
Introduction | Unfortunately, not all advances in machine learning are easy to apply to structured prediction problems such as SMT; the latter often involve latent variables and surrogate references, resulting in loss functions that have not been well explored in machine learning (Mcallester and Keshet, 2011; Gimpel and Smith, 2012). |
Introduction | The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables ; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm. |
Introduction | First, we introduce RMM (§3.1) and propose a latent structured relative margin objective which incorporates cost-augmented hypothesis selection and latent variables . |
Learning in SMT | While many derivations d E D(:c) can produce a given translation, we are only able to observe 3/; thus we model d as a latent variable . |
Abstract | Many models in NLP involve latent variables , such as unknown parses, tags, or alignments. |
Projections | Given a relaxed joint solution to the parameters and the latent variables, one must be able to project it to a nearby feasible one, by projecting either the fractional parameters or the fractional latent variables into the feasible space and then solving exactly for the other. |
Related Work | The goal of this work was to better understand and address the non-convexity of maximum-likelihood training with latent variables , especially parses. |
Related Work | For supervised parsing, spectral leam-ing has been used to learn latent variable PCFGs (Cohen et al., 2012) and hidden-state dependency grammars (Luque et al., 2012). |
The Constrained Optimization Task | The feature counts are constrained to be derived from the latent variables (e.g., parses), which are unknown discrete structures that must be encoded with integer variables. |
A Gibbs Sampling Algorithm | Our algorithm represents a first attempt to extend Polson’s approach (Polson et al., 2012) to deal with highly nontrivial Bayesian latent variable models. |
Experiments | trivial to develop a Gibbs sampling algorithm using the similar data augmentation idea, due to the presence of latent variables and the nonlinearity of the soft-max function. |
Introduction | ing due to the presence of nontrivial latent variables . |
Logistic Supervised Topic Models | But the presence of latent variables poses additional challenges in carrying out a formal theoretical analysis of these surrogate losses (Lin, 2001) in the topic model setting. |
Logistic Supervised Topic Models | Moreover, the latent variables Z make the inference problem harder than that of Bayesian logistic regression models (Chen et al., 1999; Meyer and Laud, 2002; Polson et al., 2012). |
Distributional Semantic Hidden Markov Models | This model can be thought of as an HMM with two layers of latent variables , representing events and slots in the domain. |
Distributional Semantic Hidden Markov Models | Event Variables At the top-level, a categorical latent variable E; with N E possible states represents the event that is described by clause 75. |
Distributional Semantic Hidden Markov Models | Slot Variables Categorical latent variables with N 3 possible states represent the slot that an argument fills, and are conditioned on the event variable in the clause, E7; (i.e., PS(Sta|Et), for the ath slot variable). |
Related Work | Distributions that generate the latent variables and hyperparameters are omitted for clarity. |
RSP: A Random Walk Model for SP | LDA-SP: Another kind of sophisticated unsupervised approaches for SP are latent variable models based on Latent Dirichlet Allocation (LDA). |
Related Work 2.1 WordNet-based Approach | Recently, more sophisticated methods are innovated for SP based on topic models, where the latent variables (topics) take the place of semantic classes and distributional clusterings (Seaghdha, 2010; Ritter et al., 2010). |
Related Work 2.1 WordNet-based Approach | Without introducing semantic classes and latent variables , Keller and Lapata (2003) use the web to obtain frequencies for unseen bigrams smooth. |