A Joint Model with Unlabeled Parallel Text | sentiment) bilingual (in L1 and L2) parallel data U that are defined as follows. |
A Joint Model with Unlabeled Parallel Text | where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U. |
A Joint Model with Unlabeled Parallel Text | However, there could be considerable noise in real-world parallel data , i.e. |
Abstract | We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data . |
Experimental Setup 4.1 Data Sets and Preprocessing | We also try to remove neutral sentences from the parallel data since they can introduce noise into our model, which deals only with positive and negative examples. |
Experimental Setup 4.1 Data Sets and Preprocessing | Co-Training with SVMs (Co-SVM): This method applies SVM-based co-training given both the labeled training data and the unlabeled parallel data following Wan (2009). |
Introduction | We furthermore find that improvements, albeit smaller, are obtained when the parallel data is replaced with a pseudo-parallel (i.e. |
Results and Analysis | 8 By making use of the unlabeled parallel data , our proposed approach improves the accuracy, compared to MaXEnt, by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting. |
Introduction | Of course, for many language pairs and domains, parallel data is not available. |
Machine Translation as a Decipherment Task | We now turn to the problem of MT without parallel data . |
Machine Translation as a Decipherment Task | Next, we present two novel decipherment approaches for MT training without parallel data . |
Machine Translation as a Decipherment Task | Bayesian Decipherment: We introduce a novel method for estimating IBM Model 3 parameters without parallel data , using Bayesian learning. |
Word Substitution Decipherment | Before we tackle machine translation without parallel data , we first solve a simpler problem—word substitution decipherment. |
Conclusion | thank Amamag Subramanya for helping us with the implementation of label propagation and Shankar Kumar for access to the parallel data . |
Experiments and Results | The parallel data came from the Europarl corpus (Koehn, 2005) and the ODS United Nations dataset (UN, 2006). |
Experiments and Results | Taking the intersection of languages in these resources, and selecting languages with large amounts of parallel data , yields the following set of eight Indo-European languages: Danish, Dutch, German, Greek, Italian, Portuguese, Spanish and Swedish. |
Experiments and Results | 0 Projection: Our third baseline incorporates bilingual information by projecting POS tags directly across alignments in the parallel data . |
Introduction | To bridge this gap, we consider a practically motivated scenario, in which we want to leverage existing resources from a resource-rich language (like English) when building tools for resource-poor foreign languages.1 We assume that absolutely no labeled training data is available for the foreign language of interest, but that we have access to parallel data with a resource-rich language. |
Conclusion | We introduced a data collection framework that produces highly parallel data by asking different annotators to describe the same video segments. |
Discussions and Future Work | While our data collection framework yields useful parallel data , it also has some limitations. |
Discussions and Future Work | By pairing up descriptions of the same video in different languages, we obtain parallel data without requiring any bilingual skills. |
Experiments | We quantified the utility of our highly parallel data by computing the correlation between BLEU and human ratings when different numbers of references were available. |