Abstract | One way to tackle this problem is to train a generative model with latent variables on the mixture of data from the source and target domains. |
Abstract | Such a model would cluster features in both domains and ensure that at least some of the latent variables are predictive of the label on the source domain. |
Abstract | We introduce a constraint enforcing that marginal distributions of each cluster (i.e., each latent variable ) do not vary significantly across domains. |
Introduction | We use generative latent variable models (LVMs) learned on all the available data: unlabeled data for both domains and on the labeled data for the source domain. |
Introduction | The latent variables encode regularities observed on unlabeled data from both domains, and they are learned to be predictive of the labels on the source domain. |
Introduction | The danger of this semi-supervised approach in the domain-adaptation setting is that some of the latent variables will correspond to clusters of features specific only to the source domain, and consequently, the classifier relying on this latent variable will be badly affected when tested on the target domain. |
The Latent Variable Model | vectors of latent variables , to abstract away from handcrafted features. |
The Latent Variable Model | The model assumes that the features and the latent variable vector are generated jointly from a globally-normalized model and then the label 3/ is generated from a conditional distribution dependent on z. |
Background | Petrov and Klein (2007a) derive coarse grammars in a more statistically principled way, although the technique is closely tied to their latent variable grammar representation. |
Experimental Setup | Alternative decoding methods, such as marginalizing over the latent variables in the grammar or MaxRule decoding (Petrov and Klein, 2007a) are certainly possible in our framework, but it is unknown how effective these methods will be given the heavily pruned na- |
Introduction | Grammar transformation techniques such as linguistically inspired nonterminal annotations (Johnson, 1998; Klein and Manning, 2003b) and latent variable grammars (Matsuzaki et al., 2005; Petrov et al., 2006) have increased the grammar size |G| from a few thousand rules to several million in an explicitly enumerable grammar, or even more in an implicit grammar. |
Introduction | Rather, the beam-width prediction model is trained to learn the rank of constituents in the maximum likelihood trees.1 We will illustrate this by presenting results using a latent-variable grammar, for which there is no “true” reference latent variable parse. |
Adding Linguistic Knowledge to the Monte-Carlo Framework | Game only 17.3 5.3 i: 2.7 Sentence relevance 46.7 2.8 i: 3.5 Full model 53.7 5.9 i: 3.5 Random text 40.3 4.3 i: 3.4 Latent variable 26.1 3.7 i: 3.1 |
Adding Linguistic Knowledge to the Monte-Carlo Framework | Method % Wins Standard Error Game only 45.7 i: 7.0 Latent variable 62.2 i: 6.9 Full model 78.8 i: 5.8 |
Adding Linguistic Knowledge to the Monte-Carlo Framework | The second baseline, latent variable, extends the linear action-value function Q(s, a) of the game only baseline with a set of latent variables — i.e., it is a four layer neural network, where the second layer’s units are activated only based on game information. |
Constraints Shape Topics | In topic modeling, collapsed Gibbs sampling (Griffiths and Steyvers, 2004) is a standard procedure for obtaining a Markov chain over the latent variables in the model. |
Constraints Shape Topics | Typically, these only change based on assignments of latent variables in the sampler; in Section 4 we describe how changes in the model’s structure (in addition to the latent state) can be reflected in these count statistics. |
Interactively adding constraints | In the more general case, when words lack a unique path in the constraint tree, an additional latent variable specifies which possible paths in the constraint tree produced the word; this would have to be sampled. |
Joint Translation Model | structural part and their associated probabilities define a model 19(0) over the latent variable 0 determining the recursive, reordering and phrase-pair segmenting structure of translation, as in Figure 4. |
Learning Translation Structure | It works it-eratively on a partition of the training data, climbing the likelihood of the training data while cross-validating the latent variable values, considering for every training data point only those which can be produced by models built from the rest of the data excluding the current part. |
Related Work | The rich linguistically motivated latent variable learnt by our method delivers translation performance that compares favourably to a state-of-the-art system. |