Index of papers in Proc. ACL 2011 that mention
  • parallel corpus
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Experiments
The Wikipedia InterLanguage Links shared task data contains a much larger proportion of transliterations than a parallel corpus .
Experiments
For English/Arabic, we use a freely available parallel corpus from the United Nations (UN) (Eisele and Chen, 2010).
Experiments
glish/Hindi parallel corpus .
Extraction of Transliteration Pairs
Initially, we extract a list of word pairs from a word-aligned parallel corpus using GIZA++.
Extraction of Transliteration Pairs
Initially, the parallel corpus is word-aligned using GIZA++ (Och and Ney, 2003), and the alignments are refined using the grow-diag-final-and heuristic (Koehn et al., 2003).
Extraction of Transliteration Pairs
The reason is that the parallel corpus contains inflectional variants of the same word.
Introduction
In this paper, we show that it is possible to extract transliteration pairs from a parallel corpus using an unsupervised method.
Introduction
The NEWSlO data sets are extracted Wikipedia InterLanguage Links (WIL) which consist of parallel phrases, whereas a parallel corpus consists of parallel sentences.
Models
The training data is a list of word pairs (a source word and its presumed transliteration) extracted from a word-aligned parallel corpus .
parallel corpus is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
We also consider the case where a parallel corpus is not available: to obtain a pseudo-parallel corpus U (i.e.
Conclusion
In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.
Experimental Setup 4.1 Data Sets and Preprocessing
For the unlabeled parallel text, we use the ISI Chinese-English parallel corpus (Munteanu and Marcu, 2005), which was extracted automatically from news articles published by Xinhua News Agency in the Chinese Gigaword (2nd Edition) and English Gigaword (2nd Edition) collections.
Experimental Setup 4.1 Data Sets and Preprocessing
We choose the most confidently predicted 10,000 positive and 10,000 negative pairs to constitute the unlabeled parallel corpus U for each data setting.
Introduction
Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Related Work
(2007), for example, generate subjectivity analysis resources in a new language from English sentiment resources by leveraging a bilingual dictionary or a parallel corpus .
Results and Analysis
In our experiments, the methods are tested in the two data settings with the corresponding unlabeled parallel corpus as mentioned in Section 4.6 We use
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Experimental Evaluation
We use the first 100k sentences of the parallel corpus for the TM, and the whole parallel corpus for the LM.
Experimental Evaluation
Finally, we varied the size of the parallel corpus for the J apanese-English task from 50k to 400k sen-
Experimental Evaluation
50 100k 200k 400k Parallel Corpus Size
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Subotin, Michael
Conclusion
This paper has introduced a scalable exponential phrase model for target languages with complex morphology that can be trained on the full parallel corpus .
Conclusion
The results suggest that the model should be especially useful for languages with sparser resources, but that performance improvements can be obtained even for a very large parallel corpus .
Exponential phrase models with shared features
We obtain the counts of instances and features from the standard heuristics used to extract the grammar from a word-aligned parallel corpus .
Features
2Note that this model is estimated from the fall parallel corpus , rather than a held-out development set.
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
Approach Overview
Central to our approach (see Algorithm 1) is a bilingual similarity graph built from a sentence-aligned parallel corpus .
Graph Construction
The graph vertices are extracted from the different sides of a parallel corpus (De, Df) and an additional unlabeled monolingual foreign corpus Ff, which will be used later for training.
Graph Construction
Since our graph is built from a parallel corpus , we can use standard word alignment techniques to align the English sentences “De
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: