Abstract | In evaluations on various highly—inflected languages, this joint model outperforms both a baseline tagger in morphological disambiguation, and a pipeline parser in head selection. |
Baselines | To ensure a meaningful comparison with the joint model , our two baselines are both implemented in the same graphical model framework, and trained with the same machine-leaming algorithm. |
Baselines | Roughly speaking, they divide up the variables and factors of the joint model and train them separately. |
Baselines | The tagger is a graphical model with the WORD and TAG variables, connected by the local factors TAG-UNIGRAM, TAG-BIGRAM, and TAG-CONSISTENCY, all used in the joint model (§3). |
Experimental Results | We compare the performance of the pipeline model (§4) and the joint model (§3) on morphological disambiguation and unlabeled dependency parsing. |
Experimental Setup | The output of the joint model is the assignment to the TAG and LINK variables. |
Experimental Setup | In principle, the joint model should consider every possible combination of morphological attributes for every word. |
Introduction | After a description of previous work (§2), the joint model (§3) will be contrasted with the baseline pipeline model (§4). |
Joint Model | If fully implemented in our joint model , these features would necessitate two separate families of link factors: 0(n3m3) factors for the POS trigrams, and 0(n2m4) factors for the POS 4-grams. |
Previous Work | Goldberg and Tsarfaty (2008) propose a generative joint model . |
A Joint Model with Unlabeled Parallel Text | 3.2 The Joint Model |
A Joint Model with Unlabeled Parallel Text | Since previous work (Banea et al., 2008; 2010; Wan, 2009) has shown that it could be useful to automatically translate the labeled data from the source language into the target language, we can further incorporate such translated labeled data into the joint model by adding the following component into Equation 6: |
Conclusion | In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language. |
Experimental Setup 4.1 Data Sets and Preprocessing | In our experiments, the proposed joint model is compared with the following baseline methods. |
Introduction | In Section 3, the proposed joint model is described. |
Related Work | Another notable approach is the work of Boyd-Graber and Resnik (2010), which presents a generative model --- supervised multilingual latent Dirichlet allocation --- that jointly models topics that are consistent across languages, and employs them to better predict sentiment ratings. |
Results and Analysis | We first compare the proposed joint model (Joint) with the baselines in Table 2. |
Results and Analysis | Overall, the unlabeled parallel data improves classification accuracy for both languages when using our proposed joint model and Co-SVM. |
Results and Analysis | The joint model makes better use of the unlabeled parallel data than Co-SVM or TSVMs presumably because of its attempt to jointly optimize the two monolingual models via soft (probabilistic) assignments of the unlabeled instances to classes in each iteration, instead of the hard assignments in Co-SVM and TSVMs. |
Abstract | We learn a joint model of sentence extraction and compression for multi-document summarization. |
Efficient Prediction | By solving the following ILP we can compute the arg max required for prediction in the joint model: |
Experiments | Figure 4: Example summaries produced by our learned joint model of extraction and compression. |
Joint Model | Learning weights for Objective 2 where Y(x) is the set of compressive summaries, and C (y) the set of broken edges that produce subtree deletions, gives our LEARNED COMPRESSIVE system, which is our joint model of extraction and compression. |
Joint Translation Model | Phrase-pairs are emitted jointly and the over-11 probabilistic SCFG is a joint model over parallel trings. |
Joint Translation Model | By splitting the joint model in a hierarchical structure model and a lexical emission one we facilitate estimating the two models separately. |
Related Work | We show that a translation system based on such a joint model can perform competitively in comparison with conditional probability models, when it is augmented with a rich latent hierarchical structure trained adequately to avoid overfitting. |