Abstract | In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data . |
Abstract | By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly. |
Experiment | CLMM includes two hyper-parameters (A3 and At) controlling the contribution of unlabeled parallel data . |
Experiment | 4.5 The Influence of Unlabeled Parallel Data |
Experiment | We investigate how the size of the unlabeled parallel data affects the sentiment classification in this subsection. |
Introduction | Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language. |
Introduction | CLMM is a generative model that treats the source language and target language words in parallel data as generated simultaneously by a set of mixture components. |
Introduction | This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language. |
Abstract | We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. |
Abstract | When applied to German—English parallel data , our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on nonparallel sentences. |
Conclusions | We show that an agreement signal extracted from parallel data provides indirect supervision capable of substantially improving a state-of-the-art model for semantic role induction. |
Introduction | The goal of this work is to show that parallel data is useful in unsupervised induction of shallow semantic representations. |
Multilingual Extension | As we argued in Section 1, our goal is to penalize for disagreement in semantic structures predicted for each language on parallel data . |
Multilingual Extension | Intuitively, when two arguments are aligned in parallel data , we expect them to be labeled with the same semantic role in both languages. |
Multilingual Extension | Specifically, we augment the joint probability with a penalty term computed on parallel data: |
Abstract | Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose. |
Inferring a learning curve from mostly monolingual data | However, when a configuration )f four initial points is used for the same amount of ‘seed” parallel data , it outperforms both the config-Jrations with three initial points. |
Inferring a learning curve from mostly monolingual data | The ability to predict the amount of parallel data required to achieve a given level of quality is very valuable in planning business deployments of statistical machine translation; yet, we are not aware of any rigorous proposal for addressing this need. |
Introduction | Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific business purpose. |
Introduction | This prediction, or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data , is the objective of this paper. |
Introduction | They show that without any parallel data we can predict the expected translation accuracy at 75K segments within an error of 6 BLEU points (Table 4), while using a seed training corpus of 10K segments narrows this error to within 1.5 points (Table 6). |
Experimental Evaluation | We also compare the results on these corpora to a system trained on parallel data . |
Experimental Evaluation | Och (2002) reports results of 48.2 BLEU for a single-word based translation system and 56.1 BLEU using the alignment template approach, both trained on parallel data . |
Related Work | Unsupervised training of statistical translations systems without parallel data and related problems have been addressed before. |
Related Work | Close to the methods described in this work, Ravi and Knight (2011) treat training and translation without parallel data as a deciphering problem. |
Related Work | They perform experiments on a SpanislflEnglish task with vocabulary sizes of about 500 words and achieve a performance of around 20 BLEU compared to 70 BLEU obtained by a system that was trained on parallel data . |