Index of papers in Proc. ACL 2012 that mention
  • parallel corpus
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Abstract
We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available.
Inferring a learning curve from mostly monolingual data
In this section we address scenario 81: we have access to a source-language monolingual collection (from which portions to be manually translated could be sampled) and a target-language in—domain monolingual corpus, to supplement the target side of a parallel corpus while training a language model.
Inferring a learning curve from mostly monolingual data
5 Extrapolating a learning curve fitted on a small parallel corpus
Inferring a learning curve from mostly monolingual data
Given a small “seed” parallel corpus , the translation system can be used to train small in-domain models and the evaluation score can be measured at a few initial sample sizes {($1,y1), ($2, yg)...(acp, yp)}.
Introduction
In the first scenario ($1), the SMT developer is given only monolingual source and target samples from the relevant domain, and a small test parallel corpus .
Introduction
In the second scenario (S2), an additional small seed parallel corpus is given that can be used to train small in-domain models and measure (with some variance) the evaluation score at a few points on the initial portion of the learning curve.
Selecting a parametric family of curves
For a certain bilingual test dataset d, we consider a set of observations 0d 2 {(301, yl), ($2, yg)...(;vn, 3471)}, where y, is the performance on d (measured using BLEU (Papineni et al., 2002)) of a translation model trained on a parallel corpus of size 307;.
Selecting a parametric family of curves
The corpus size 307; is measured in terms of the number of segments (sentences) present in the parallel corpus .
parallel corpus is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Cross-Lingual Mixture Model for Sentiment Classification
English), unlabeled parallel corpus U of the source language and the target language, and optional labeled data DT in target language T. Aligning with previous work (Wan, 2008; Wan, 2009), we only consider binary sentiment classification scheme (positive or negative) in this paper, but the proposed method can be used in other classification schemes with minor modifications.
Cross-Lingual Mixture Model for Sentiment Classification
The basic idea underlying CLMM is to enlarge the vocabulary by learning sentiment words from the parallel corpus .
Cross-Lingual Mixture Model for Sentiment Classification
More formally, CLMM defines a generative mixture model for generating a parallel corpus .
Introduction
By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus .
Related Work
the authors use an unlabeled parallel corpus instead of machine translation engines.
Related Work
They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus .
parallel corpus is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
5.1.2 Parallel Corpus
Experiments
Our parallel corpus is crawled from the web, containing news articles, technical documents, blog entries etc.
Experiments
We use Giza++ (Och and Ney, 2003) to generate the word alignment for the parallel corpus .
Ranking Model Training
In this section, we first describe how to extract reordering examples from parallel corpus ; then we show our features for ranking function; finally, we discuss how to train the model from the extracted examples.
parallel corpus is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kim, Seokhwan and Lee, Gary Geunbae
Abstract
This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus .
Cross-lingual Annotation Projection for Relation Extraction
To accomplish that goal, the method automatically creates a set of annotated text for ft, utilizing a well-made extractor f3 for a resource-rich source language L3 and a parallel corpus of LS and Lt.
Graph Construction
where count (us, ut) is the number of alignments between us and ut across the whole parallel corpus .
Implementation
We used an English-Korean parallel corpus 1 that contains 266,892 bi-sentence pairs in English and Korean.
Implementation
1The parallel corpus collected is available in our website: http://isoft.postech.ac.kr/"megaup/acl/datasets thtpzllreverbcs.washington.edu/
Implementation
The English sentence annotations in the parallel corpus were then propagated into the corresponding Korean sentences.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Abstract
Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side.
Extraction of Paraphrase Rules
We train a source-to-target PBMT system (SYS_ST) and a target-to-source PBMT system (SYS_TS) on the parallel corpus .
Forward-Translation vs. Back-Translation
Note that all the texts of S0, S1, S2, T 0 and T1 are sentence aligned because the initial parallel corpus (S0, T 0) is aligned in the sentence level.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: