Index of papers in Proc. ACL 2011 that mention
  • parallel data
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
sentiment) bilingual (in L1 and L2) parallel data U that are defined as follows.
A Joint Model with Unlabeled Parallel Text
where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
A Joint Model with Unlabeled Parallel Text
However, there could be considerable noise in real-world parallel data , i.e.
Abstract
We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data .
Experimental Setup 4.1 Data Sets and Preprocessing
We also try to remove neutral sentences from the parallel data since they can introduce noise into our model, which deals only with positive and negative examples.
Experimental Setup 4.1 Data Sets and Preprocessing
Co-Training with SVMs (Co-SVM): This method applies SVM-based co-training given both the labeled training data and the unlabeled parallel data following Wan (2009).
Introduction
We furthermore find that improvements, albeit smaller, are obtained when the parallel data is replaced with a pseudo-parallel (i.e.
Results and Analysis
8 By making use of the unlabeled parallel data , our proposed approach improves the accuracy, compared to MaXEnt, by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.
parallel data is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Introduction
Of course, for many language pairs and domains, parallel data is not available.
Machine Translation as a Decipherment Task
We now turn to the problem of MT without parallel data .
Machine Translation as a Decipherment Task
Next, we present two novel decipherment approaches for MT training without parallel data .
Machine Translation as a Decipherment Task
Bayesian Decipherment: We introduce a novel method for estimating IBM Model 3 parameters without parallel data , using Bayesian learning.
Word Substitution Decipherment
Before we tackle machine translation without parallel data , we first solve a simpler problem—word substitution decipherment.
parallel data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
Conclusion
thank Amamag Subramanya for helping us with the implementation of label propagation and Shankar Kumar for access to the parallel data .
Experiments and Results
The parallel data came from the Europarl corpus (Koehn, 2005) and the ODS United Nations dataset (UN, 2006).
Experiments and Results
Taking the intersection of languages in these resources, and selecting languages with large amounts of parallel data , yields the following set of eight Indo-European languages: Danish, Dutch, German, Greek, Italian, Portuguese, Spanish and Swedish.
Experiments and Results
0 Projection: Our third baseline incorporates bilingual information by projecting POS tags directly across alignments in the parallel data .
Introduction
To bridge this gap, we consider a practically motivated scenario, in which we want to leverage existing resources from a resource-rich language (like English) when building tools for resource-poor foreign languages.1 We assume that absolutely no labeled training data is available for the foreign language of interest, but that we have access to parallel data with a resource-rich language.
parallel data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Conclusion
We introduced a data collection framework that produces highly parallel data by asking different annotators to describe the same video segments.
Discussions and Future Work
While our data collection framework yields useful parallel data , it also has some limitations.
Discussions and Future Work
By pairing up descriptions of the same video in different languages, we obtain parallel data without requiring any bilingual skills.
Experiments
We quantified the utility of our highly parallel data by computing the correlation between BLEU and human ratings when different numbers of references were available.
parallel data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: