BBC News Database | Unlike other unsupervised approaches vhere a set of latent variables is introduced, each 1efining a joint distribution on the space of key-vords and image features, the relevance model cap-,ures the joint probability of images and annotated vords directly, without requiring an intermediate :lustering stage. |
BBC News Database | achieve competitive performance with latent variable models. |
BBC News Database | Each annotated image in the training set is treated as a latent variable . |
Related Work | Another way of capturing co-occurrence information is to introduce latent variables linking image features with words. |
Abbreviator with Nonlocal Information | 2.1 A Latent Variable Abbreviator |
Abbreviator with Nonlocal Information | To implicitly incorporate nonlocal information, we propose discriminative probabilistic latent variable models (DPLVMs) (Morency et al., 2007; Petrov and Klein, 2008) for abbreviating terms. |
Abbreviator with Nonlocal Information | The DPLVM is a natural extension of the CRF model (see Figure 2), which is a special case of the DPLVM, with only one latent variable assigned for each label. |
Abstract | First, in order to incorporate nonlocal information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model, or alternatively, the label encoding approach with global information. |
Introduction | Variables at, y, and h represent observation, label, and latent variables , respectively. |
Introduction | discriminative probabilistic latent variable model (DPLVM) in which nonlocal information is modeled by latent variables . |
Experiments | The entropy of g on a single latent variable 2 is defined to be H (g, z) é — 2666 P(c|z) log2 P(c|z), where C is the class |
Image Clustering with Annotated Auxiliary Data | In order to unify those two separate PLSA models, these two steps are done simultaneously with common latent variables used as a bridge linking them. |
Image Clustering with Annotated Auxiliary Data | Through these common latent variables , which are now constrained by both target image data and auxiliary annotation data, a better clustering result is expected for the target data. |
Image Clustering with Annotated Auxiliary Data | Let Z = be the latent variable set in our aPLSA model. |
Conclusions and future work | The models presented here derive their predictions by modelling predicate-argument plausibility through the intermediary of latent variables . |
Conclusions and future work | We also anticipate that latent variable models will prove effective for learning selectional preferences of semantic predicates (e. g., FrameNet roles) where direct estimation from a large corpus is not a viable option. |
Related work | In Rooth et al.’s model each observed predicate-argument pair is probabilistically generated from a latent variable , which is itself generated from an underlying distribution on variables. |
Related work | The use of latent variables , which correspond to coherent clusters of predicate-argument interactions, allow probabilities to be assigned to predicate-argument pairs which have not previously been observed by the model. |
Related work | The work presented in this paper is inspired by Rooth et al.’s latent variable approach, most directly in the model described in Section 3.3. |
Results | Latent variable models that use EM for inference can be very sensitive to the number of latent variables chosen. |
Three selectional preference models | Each model has at least one vocabulary of Z arbitrarily labelled latent variables . |
Three selectional preference models | fzn is the number of observations where the latent variable 2 has been associated with the argument type n, fzv is the number of observations where 2 has been associated with the predicate type 2) and fzr is the number of observations where 2 has been associated with the relation 7“. |
Three selectional preference models | In Rooth et al.’s (1999) selectional preference model, a latent variable is responsible for generating both the predicate and argument types of an observation. |
Abstract | We propose a latent variable model to enhance historical analysis of large corpora. |
Introduction | Latent variable models, such as latent Dirichlet allocation (LDA) (Blei et al., 2003) and probabilistic latent semantic analysis (PLSA) (Hofmann, 1999), have been used in the past to facilitate social science research. |
Introduction | To do this we augment SAGE with two sparse latent variables that model the region and time of a document, as well as a third sparse latent |
Introduction | variable that captures the interactions among the region, time and topic latent variables . |
Related Work | For example, SVM does not have latent variables to model the subtle differences and interactions of features from different domains (e.g. |
Related Work | (2010) use a latent variable model to predict geolocation information of Twitter users, and investigate geographic variations of language use. |
The Sparse Mixed-Effects Model | It also incorporates latent variables 7' to model the variance for each sparse deviation 77. |
The Sparse Mixed-Effects Model | The three major sparse deviation latent variables are (T) (R) (Q) |
The Sparse Mixed-Effects Model | All of the three latent variables are condi- |
Abstract | One way to tackle this problem is to train a generative model with latent variables on the mixture of data from the source and target domains. |
Abstract | Such a model would cluster features in both domains and ensure that at least some of the latent variables are predictive of the label on the source domain. |
Abstract | We introduce a constraint enforcing that marginal distributions of each cluster (i.e., each latent variable ) do not vary significantly across domains. |
Introduction | We use generative latent variable models (LVMs) learned on all the available data: unlabeled data for both domains and on the labeled data for the source domain. |
Introduction | The latent variables encode regularities observed on unlabeled data from both domains, and they are learned to be predictive of the labels on the source domain. |
Introduction | The danger of this semi-supervised approach in the domain-adaptation setting is that some of the latent variables will correspond to clusters of features specific only to the source domain, and consequently, the classifier relying on this latent variable will be badly affected when tested on the target domain. |
The Latent Variable Model | vectors of latent variables , to abstract away from handcrafted features. |
The Latent Variable Model | The model assumes that the features and the latent variable vector are generated jointly from a globally-normalized model and then the label 3/ is generated from a conditional distribution dependent on z. |
Abstract | We present a translation model which models derivations as a latent variable , in both training and decoding, and is fully discriminative and globally optimised. |
Challenges for Discriminative SMT | Instead we model the translation distribution with a latent variable for the derivation, which we marginalise out in training and decoding. |
Discriminative Synchronous Transduction | As the training data only provides source and target sentences, the derivations are modelled as a latent variable . |
Discriminative Synchronous Transduction | Our findings echo those observed for latent variable log-linear models successfully used in monolingual parsing (Clark and Curran, 2007; Petrov et al., 2007). |
Discriminative Synchronous Transduction | This method has been demonstrated to be effective for (non-convex) log-linear models with latent variables (Clark and Curran, 2004; Petrov et al., 2007). |
Evaluation | Derivational ambiguity Table 1 shows the impact of accounting for derivational ambiguity in training and decoding.5 There are two options for training, we could use our latent variable model and optimise the probability of all derivations of the reference translation, or choose a single derivation that yields the reference and optimise its probability alone. |
Evaluation | Max-translation decoding for the model trained on single derivations has only a small positive effect, while for the latent variable model the impact is much larger.6 |
Introduction | Second, within this framework, we model the derivation, d, as a latent variable , p(e, d|f), which is marginalised out in training and decoding. |
Related Work | Sparsity for low-order contexts has recently spurred interest in using latent variables to represent distributions over contexts in language models. |
Related Work | Several authors investigate neural network models that learn not just one latent state, but rather a vector of latent variables , to represent each word in a language model (Bengio et al., 2003; Emami et al., 2003; Morin and Bengio, 2005). |
Smoothing Natural Language Sequences | 2.3 Latent Variable Language Model Representation |
Smoothing Natural Language Sequences | Latent variable language models (LVLMs) can be used to produce just such a distributional representation. |
Smoothing Natural Language Sequences | We use Hidden Markov Models (HMMs) as the main example in the discussion and as the LVLMs in our experiments, but the smoothing technique can be generalized to other forms of LVLMs, such as factorial HMMs and latent variable maximum entropy models (Ghahramani and Jordan, 1997; Smith and Eisner, 2005). |
Abstract | We associate each sentence with an undirected latent tree graphical model, which is a tree consisting of both observed variables (corresponding to the words in the sentence) and an additional set of latent variables that are unobserved in the data. |
Abstract | However, due to the presence of latent variables , structure learning of latent trees is substantially more complicated than in observed models. |
Abstract | The latent variables can incorporate various linguistic properties, such as head information, valence of dependency being generated, and so on. |
Abstract | The contribution of the paper is twofold: 1. we introduce the Linking-Tweets-to-News task as well as a dataset of linked tweetnews pairs, which can benefit many NLP applications; 2. in contrast to previous research which focuses on lexical features within the short texts (text-to-word information), we propose a graph based latent variable model that models the inter short text correlations (text-to-text information). |
Conclusion | We formalize the linking task as a short text modeling problem, and extract Twitter/news specific features to extract text-to-text relations, which are incorporated into a latent variable model. |
Experiments | As a latent variable model, it is able to capture global topics (+1.89% ATOP over LDA-wvec); moreover, by explicitly modeling missing words, the existence of a word is also encoded in the latent vector (+2.31% TOPIO and —0.011% RR over IR model). |
Experiments | The only evidence the latent variable models rely on is lexical items (WTMF-G extract additional text-to-text correlation by word matching). |
Introduction | Latent variable models are powerful by going beyond the surface word level and mapping short texts into a low dimensional dense vector (Socher et al., 2011; Guo and Diab, 2012b). |
Introduction | Accordingly, we apply a latent variable model, namely, the Weighted Textual Matrix Factorization [WTMF] (Guo and Diab, 2012b; Guo and Diab, 2012c) to both the tweets and the news articles. |
Introduction | Our proposed latent variable model not only models text-to-word information, but also is aware of the text-to-text information (illustrated in Figure 1): two linked texts should have similar latent vectors, accordingly the semantic picture of a tweet is completed by receiving semantics from its related tweets. |
Experiment | From this viewpoint, TSG utilizes surrounding symbols (NNP of NPNNP in the above example) as latent variables with which to capture context information. |
Experiment | as latent variables and the search space is larger than that of a TSG when the symbol refinement model allows for more than two subcategories for each symbol. |
Experiment | Our experimental results comfirm that jointly modeling both latent variables using our SR-TSG assists accurate parsing. |
Inference | The inference of the SR-TSG derivations corresponds to inferring two kinds of latent variables : latent symbol subcategories and latent substitution |
Inference | This stepwise learning is simple and efficient in practice, but we believe that the joint learning of both latent variables is possible, and we will deal with this in future work. |
Inference | This sampler simultaneously updates blocks of latent variables associated with a sentence, thus it can find MAP solutions efficiently. |
Abstract | In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. |
Abstract | Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. |
Experiments and Results | All the latent variable models (LSA, LDA, WTMF) are built on the same set of corpus: WN+Wik+Brown (393, 666 sentences and 4, 262, 026 words). |
Experiments and Results | In these latent variable models, there are several essential parameters: weight of missing words wm, and dimension K. Figure 2 and 3 analyze the impact of these parameters on ATOPteSt. |
Introduction | Latent variable models, such as Latent Semantic Analysis [LSA] (Landauer et al., 1998), Probabilistic Latent Semantic Analysis [PLSA] (Hofmann, 1999), Latent Dirichlet Allocation [LDA] (Blei et al., 2003) can solve the two issues naturally by modeling the semantics of words and sentences simultaneously in the low-dimensional latent space. |
Introduction | After analyzing the way traditional latent variable models (LSA, PLSNLDA) handle missing words, we decide to model sentences using a weighted matrix factorization approach (Srebro and J aakkola, 2003), which allows us to treat observed words and missing words differently. |
Limitations of Topic Models and LSA for Modeling Sentences | Usually latent variable models aim to find a latent semantic profile for a sentence that is most relevant to the observed words. |
Introduction | While most of the state-of-the-art CWS systems used semi-Markov conditional random fields or latent variable conditional random fields, we simply use a single first-order conditional random fields (CRFs) for the joint modeling. |
Introduction | The semi-Markov CRFs and latent variable CRFs relax the Markov assumption of CRFs to express more complicated dependencies, and therefore to achieve higher disambiguation power. |
Related Work | To achieve high accuracy, most of the state-of-the-art systems are heavy probabilistic systems using semi-Markov assumptions or latent variables (Andrew, 2006; Sun et al., 2009b). |
Related Work | For example, one of the state-of-the-art CWS system is the latent variable conditional random field (Sun et al., 2008; Sun and Tsujii, 2009) system presented in Sun et al. |
Related Work | Those semi-Markov perceptron systems are moderately faster than the heavy probabilistic systems using semi-Markov conditional random fields or latent variable conditional random fields. |
Conclusion | These regularizations could improve spectral algorithms for latent variables models, improving the performance for other NLP tasks such as latent variable PCFGs (Cohen et al., 2013) and HMMs (Anandkumar et al., 2012), combining the flexibility and robustness offered by priors with the speed and accuracy of new, scalable algorithms. |
Introduction | Theoretically, their latent variable formulation has served as a foundation for more robust models of other linguistic phenomena (Brody and Lapata, 2009). |
Introduction | Modern topic models are formulated as a latent variable model. |
Introduction | Typical solutions use MCMC (Griffiths and Steyvers, 2004) or variational EM (Blei et al., 2003), which can be viewed as local optimization: searching for the latent variables that maximize the data likelihood. |
Background and Related Work | Our work proposes a uniform treatment to MWPs of varying degrees of compositionality, and avoids defining MWPs explicitly by modelling their LCs as latent variables . |
Introduction | We present a novel approach to the task that models the selection and relative weighting of the predicate’s LCs using latent variables . |
Our Proposal: A Latent LC Approach | We address the task with a latent variable log-linear model, representing the LCs of the predicates. |
Our Proposal: A Latent LC Approach | We choose this model for its generality, conceptual simplicity, and because it allows to easily incorporate various feature sets and sets of latent variables . |
Our Proposal: A Latent LC Approach | The introduction of latent variables into the log-linear model leads to a non-convex objective function. |
Inference | To find the latent variables that best explain observed data, we use Gibbs sampling, a widely used Markov chain Monte Carlo inference technique (Neal, 2000; Resnik and Hardisty, 2010). |
Inference | The state space is latent variables for topic indices assigned to all tokens z = {ZQM} and topic shifts assigned to turns 1 2 {lat}. |
Inference | We marginalize over all other latent variables . |
Modeling Multiparty Discussions | Instead, we endow each turn with a binary latent variable lat, called the topic shift. |
Modeling Multiparty Discussions | This latent variable signifies whether the speaker changed the topic of the conversation. |
Related and Future Work | as a distinct latent variable (Wang and McCallum, 2006; Eisenstein et a1., 2010). |
Guided DS | We introduce a set of latent variables hi which model human ground truth for each mention in the ith bag and take precedence over the current model assignment zi. |
Guided DS | o ZijER U NR: a latent variable that denotes the relation of the jth mention in the ith bag |
Guided DS | 0 hij E R U NR: a latent variable that denotes the refined relation of the mention xij |
Introduction | (2012), we generalize the labeled data through feature selection and model this additional information directly in the latent variable approaches. |
The Challenge | Instead we propose to perform feature selection to generalize human labeled data into training guidelines, and integrate them into latent variable model. |
Conclusions and Future Work | The closed-form online update for our relative margin solution accounts for surrogate references and latent variables . |
Introduction | Unfortunately, not all advances in machine learning are easy to apply to structured prediction problems such as SMT; the latter often involve latent variables and surrogate references, resulting in loss functions that have not been well explored in machine learning (Mcallester and Keshet, 2011; Gimpel and Smith, 2012). |
Introduction | The contributions of this paper include (1) introduction of a loss function for structured RMM in the SMT setting, with surrogate reference translations and latent variables ; (2) an online gradient-based solver, RM, with a closed-form parameter update to optimize the relative margin loss; and (3) an efficient implementation that integrates well with the open source cdec SMT system (Dyer et al., 2010).1 In addition, (4) as our solution is not dependent on any specific QP solver, it can be easily incorporated into practically any gradient-based learning algorithm. |
Introduction | First, we introduce RMM (§3.1) and propose a latent structured relative margin objective which incorporates cost-augmented hypothesis selection and latent variables . |
Learning in SMT | While many derivations d E D(:c) can produce a given translation, we are only able to observe 3/; thus we model d as a latent variable . |
Abstract | Many models in NLP involve latent variables , such as unknown parses, tags, or alignments. |
Projections | Given a relaxed joint solution to the parameters and the latent variables, one must be able to project it to a nearby feasible one, by projecting either the fractional parameters or the fractional latent variables into the feasible space and then solving exactly for the other. |
Related Work | The goal of this work was to better understand and address the non-convexity of maximum-likelihood training with latent variables , especially parses. |
Related Work | For supervised parsing, spectral leam-ing has been used to learn latent variable PCFGs (Cohen et al., 2012) and hidden-state dependency grammars (Luque et al., 2012). |
The Constrained Optimization Task | The feature counts are constrained to be derived from the latent variables (e.g., parses), which are unknown discrete structures that must be encoded with integer variables. |
Inference | Inference of probabilistic models discovers the posterior distribution over latent variables . |
Inference | For a collection of D documents, each of which contains Nd number of words, the latent variables of ptLDA are: transition distributions 71',“- for every topic k and internal node i in the prior tree structure; multinomial distributions over topics 6d for every document d; topic assignments zdn and path ydn for the nth word wdn in document d. The joint distribution of polylingual tree-based topic models is |
Inference | proximate posterior inference to discover the latent variables that best explain our data. |
A Gibbs Sampling Algorithm | Our algorithm represents a first attempt to extend Polson’s approach (Polson et al., 2012) to deal with highly nontrivial Bayesian latent variable models. |
Experiments | trivial to develop a Gibbs sampling algorithm using the similar data augmentation idea, due to the presence of latent variables and the nonlinearity of the soft-max function. |
Introduction | ing due to the presence of nontrivial latent variables . |
Logistic Supervised Topic Models | But the presence of latent variables poses additional challenges in carrying out a formal theoretical analysis of these surrogate losses (Lin, 2001) in the topic model setting. |
Logistic Supervised Topic Models | Moreover, the latent variables Z make the inference problem harder than that of Bayesian logistic regression models (Chen et al., 1999; Meyer and Laud, 2002; Polson et al., 2012). |
Introduction | In contrast to the previous methods, we approach the problem by modeling the three sub-problems as well as the unknown set of sub-word units as latent variables in one nonparametric Bayesian model. |
Model | In the next section, we show how to infer the value of each of the latent variables in Fig. |
Problem Formulation | We model the three subtasks as latent variables in our approach. |
Problem Formulation | In this section, we describe the observed data, latent variables , and auxiliary variables |
Related Work | For the domain our problem is applied to, our model has to include more latent variables and is more complex. |
Background | (2008) present a latent variable model that describes the relationship between translation and derivation clearly. |
Background | Although originally proposed for supporting large sets of nonindependent and overlapping features, the latent variable model is actually a more general form of conventional linear model (Och and Ney, 2002). |
Background | Accordingly, decoding for the latent variable model can be formalized as |
Related Work | They show that max-translation decoding outperforms max-derivation decoding for the latent variable model. |
Distributional Semantic Hidden Markov Models | This model can be thought of as an HMM with two layers of latent variables , representing events and slots in the domain. |
Distributional Semantic Hidden Markov Models | Event Variables At the top-level, a categorical latent variable E; with N E possible states represents the event that is described by clause 75. |
Distributional Semantic Hidden Markov Models | Slot Variables Categorical latent variables with N 3 possible states represent the slot that an argument fills, and are conditioned on the event variable in the clause, E7; (i.e., PS(Sta|Et), for the ath slot variable). |
Related Work | Distributions that generate the latent variables and hyperparameters are omitted for clarity. |
Background | Petrov and Klein (2007a) derive coarse grammars in a more statistically principled way, although the technique is closely tied to their latent variable grammar representation. |
Experimental Setup | Alternative decoding methods, such as marginalizing over the latent variables in the grammar or MaxRule decoding (Petrov and Klein, 2007a) are certainly possible in our framework, but it is unknown how effective these methods will be given the heavily pruned na- |
Introduction | Grammar transformation techniques such as linguistically inspired nonterminal annotations (Johnson, 1998; Klein and Manning, 2003b) and latent variable grammars (Matsuzaki et al., 2005; Petrov et al., 2006) have increased the grammar size |G| from a few thousand rules to several million in an explicitly enumerable grammar, or even more in an implicit grammar. |
Introduction | Rather, the beam-width prediction model is trained to learn the rank of constituents in the maximum likelihood trees.1 We will illustrate this by presenting results using a latent-variable grammar, for which there is no “true” reference latent variable parse. |
Inference | In order to do so, we need to integrate out all the other latent variables in our model. |
Inference | To do so tractably, we use Gibbs sampling to draw each latent variable conditioned on our current sample of the others. |
Inference | Even with a large number of sampling rounds, it is difficult to fully explore the latent variable space for complex unsupervised models. |
Introduction | An HMM is a generative probabilistic model that generates each word 5137; in the corpus conditioned on a latent variable Y}. |
Introduction | Each Y; in the model takes on integral values from 1 to K, and each one is generated by the latent variable for the preceding word, Y};_1. |
Introduction | In response, we introduce latent variable models of word spans, or sequences of words. |
Introduction | In both cases, the available labeled equations (either the seed set, or the full set) are abstracted to provide the model’s equation templates, while the slot filling and alignment decisions are latent variables whose settings are estimated by directly optimizing the marginal data log-likelihood. |
Mapping Word Problems to Equations | In this way, the distribution over derivations 3/ is modeled as a latent variable . |
Related Work | In our approach, systems of equations are relatively easy to specify, providing a type of template structure, and the alignment of the slots in these templates to the text is modeled primarily with latent variables during learning. |
Extending the Model | By adding additional transitions, we can constrain the latent variables further. |
Introduction | (2011) proposed an approach that uses co-occurrence patterns to find entity type candidates, and then learns their applicability to relation arguments by using them as latent variables in a first-order HMM. |
Model | Thus all common nouns are possible types, and can be used as latent variables in an HMM. |
Minimum Bayes risk parsing | MBR parsing has proven especially useful in latent variable grammars. |
Minimum Bayes risk parsing | Petrov and Klein (2007) showed that MBR trees substantially improved performance over Viterbi parses for latent variable grammars, earning up to 1.5Fl. |
Sparsity and CPUs | For instance, in a latent variable parser, the coarse grammar would have symbols like NP, VP, etc., and the fine pass would have refined symbols N P0, N P1, VP4, and so on. |
Introduction | This matrix form has clear relevance to latent variable models. |
Related Work | Recently a number of researchers have developed provably correct algorithms for parameter estimation in latent variable models such as hidden Markov models, topic models, directed graphical models with latent variables , and so on (Hsu et al., 2009; Bailly et al., 2010; Siddiqi et al., 2010; Parikh et al., 2011; Balle et al., 2011; Arora et al., 2013; Dhillon et al., 2012; Anandkumar et al., 2012; Arora et al., 2012; Arora et al., 2013). |
The Learning Algorithm for L-PCFGS | The training set does not include values for the latent variables ; this is the main challenge in learning. |
Intervention Prediction Models | pi, 7“ and gb(t) are observed and hi are the latent variables . |
Intervention Prediction Models | In the first step, it determines the latent variable assignments for positive examples. |
Intervention Prediction Models | Once this process converges for negative examples, the algorithm reassigns values to the latent variables for positive examples, and proceeds to the second step. |
Model overview | Many existing paraphrase models introduce latent variables to describe the derivation of c from :c, e.g., with transformations (Heilman and Smith, 2010; Stern and Dagan, 2011) or alignments (Haghighi et al., 2005; Das and Smith, 2009; Chang et al., 2010). |
Model overview | However, we opt for a simpler paraphrase model without latent variables in the interest of efficiency. |
Paraphrasing | The NLP paraphrase literature is vast and ranges from simple methods employing surface features (Wan et al., 2006), through vector space models (Socher et al., 2011), to latent variable models (Das and Smith, 2009; Wang and Manning, 2010; Stern and Dagan, 2011). |
Experiments | As a Baseline, we also evaluate all hypotheses on a model with no latent variables whatsoever, which instead measures similarity as the average J S divergence between the empirical word distributions over each role type. |
Experiments | Table 1 presents the results of this comparison; for all models with latent variables , we report the average of 5 sampling runs with different random initializations. |
Model | Observed variables are shaded, latent variables are clear, and collapsed variables are dotted. |
RSP: A Random Walk Model for SP | LDA-SP: Another kind of sophisticated unsupervised approaches for SP are latent variable models based on Latent Dirichlet Allocation (LDA). |
Related Work 2.1 WordNet-based Approach | Recently, more sophisticated methods are innovated for SP based on topic models, where the latent variables (topics) take the place of semantic classes and distributional clusterings (Seaghdha, 2010; Ritter et al., 2010). |
Related Work 2.1 WordNet-based Approach | Without introducing semantic classes and latent variables , Keller and Lapata (2003) use the web to obtain frequencies for unseen bigrams smooth. |
Adding Linguistic Knowledge to the Monte-Carlo Framework | Game only 17.3 5.3 i: 2.7 Sentence relevance 46.7 2.8 i: 3.5 Full model 53.7 5.9 i: 3.5 Random text 40.3 4.3 i: 3.4 Latent variable 26.1 3.7 i: 3.1 |
Adding Linguistic Knowledge to the Monte-Carlo Framework | Method % Wins Standard Error Game only 45.7 i: 7.0 Latent variable 62.2 i: 6.9 Full model 78.8 i: 5.8 |
Adding Linguistic Knowledge to the Monte-Carlo Framework | The second baseline, latent variable, extends the linear action-value function Q(s, a) of the game only baseline with a set of latent variables — i.e., it is a four layer neural network, where the second layer’s units are activated only based on game information. |
Constraints Shape Topics | In topic modeling, collapsed Gibbs sampling (Griffiths and Steyvers, 2004) is a standard procedure for obtaining a Markov chain over the latent variables in the model. |
Constraints Shape Topics | Typically, these only change based on assignments of latent variables in the sampler; in Section 4 we describe how changes in the model’s structure (in addition to the latent state) can be reflected in these count statistics. |
Interactively adding constraints | In the more general case, when words lack a unique path in the constraint tree, an additional latent variable specifies which possible paths in the constraint tree produced the word; this would have to be sampled. |
Joint Translation Model | structural part and their associated probabilities define a model 19(0) over the latent variable 0 determining the recursive, reordering and phrase-pair segmenting structure of translation, as in Figure 4. |
Learning Translation Structure | It works it-eratively on a partition of the training data, climbing the likelihood of the training data while cross-validating the latent variable values, considering for every training data point only those which can be produced by models built from the rest of the data excluding the current part. |
Related Work | The rich linguistically motivated latent variable learnt by our method delivers translation performance that compares favourably to a state-of-the-art system. |
Models | To identify refinements without labeled data, we propose a generative model of reviews (or more generally documents) with latent variables . |
Models | Finally, although we motivated including the review-level latent variable y as a way to improve segment-level prediction of 2, note that predictions of y are useful in and of themselves. |
Models | over latent variables using the sum-product algorithm (Koller and Friedman, 2009). |