SciSurf: Index of "parallel corpus" in Proc. ACL 2012

Index of papers in Proc. ACL 2012 that mention

parallel corpus

Seen in text as:

parallel corpus (47)

Seen in 44 sentences in 5 papers.

1. Prediction of Learning Curves in Machine Translation

Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available.
Inferring a learning curve from mostly monolingual data	In this section we address scenario 81: we have access to a source-language monolingual collection (from which portions to be manually translated could be sampled) and a target-language in—domain monolingual corpus, to supplement the target side of a parallel corpus while training a language model.
Inferring a learning curve from mostly monolingual data	5 Extrapolating a learning curve fitted on a small parallel corpus
Inferring a learning curve from mostly monolingual data	Given a small “seed” parallel corpus , the translation system can be used to train small in-domain models and the evaluation score can be measured at a few initial sample sizes {($1,y1), ($2, yg)...(acp, yp)}.
Introduction	In the first scenario ($1), the SMT developer is given only monolingual source and target samples from the relevant domain, and a small test parallel corpus .
Introduction	In the second scenario (S2), an additional small seed parallel corpus is given that can be used to train small in-domain models and measure (with some variance) the evaluation score at a few points on the initial portion of the learning curve.
Selecting a parametric family of curves	For a certain bilingual test dataset d, we consider a set of observations 0d 2 {(301, yl), ($2, yg)...(;vn, 3471)}, where y, is the performance on d (measured using BLEU (Papineni et al., 2002)) of a translation model trained on a parallel corpus of size 307;.
Selecting a parametric family of curves	The corpus size 307; is measured in terms of the number of segments (sentences) present in the parallel corpus .

parallel corpus is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

2. Cross-Lingual Mixture Model for Sentiment Classification

Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cross-Lingual Mixture Model for Sentiment Classification	English), unlabeled parallel corpus U of the source language and the target language, and optional labeled data DT in target language T. Aligning with previous work (Wan, 2008; Wan, 2009), we only consider binary sentiment classification scheme (positive or negative) in this paper, but the proposed method can be used in other classification schemes with minor modifications.
Cross-Lingual Mixture Model for Sentiment Classification	The basic idea underlying CLMM is to enlarge the vocabulary by learning sentiment words from the parallel corpus .
Cross-Lingual Mixture Model for Sentiment Classification	More formally, CLMM defines a generative mixture model for generating a parallel corpus .
Introduction	By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus .
Related Work	the authors use an unlabeled parallel corpus instead of machine translation engines.
Related Work	They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus .

parallel corpus is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

3. A Ranking-based Approach to Word Reordering for Statistical Machine Translation

Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	5.1.2 Parallel Corpus
Experiments	Our parallel corpus is crawled from the web, containing news articles, technical documents, blog entries etc.
Experiments	We use Giza++ (Och and Ney, 2003) to generate the word alignment for the parallel corpus .
Ranking Model Training	In this section, we first describe how to extract reordering examples from parallel corpus ; then we show our features for ranking function; finally, we discuss how to train the model from the extracted examples.

parallel corpus is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

Kim, Seokhwan and Lee, Gary Geunbae

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus .
Cross-lingual Annotation Projection for Relation Extraction	To accomplish that goal, the method automatically creates a set of annotated text for ft, utilizing a well-made extractor f3 for a resource-rich source language L3 and a parallel corpus of LS and Lt.
Graph Construction	where count (us, ut) is the number of alignments between us and ut across the whole parallel corpus .
Implementation	We used an English-Korean parallel corpus 1 that contains 266,892 bi-sentence pairs in English and Korean.
Implementation	1The parallel corpus collected is available in our website: http://isoft.postech.ac.kr/"megaup/acl/datasets thtpzllreverbcs.washington.edu/
Implementation	The English sentence annotations in the parallel corpus were then propagated into the corresponding Korean sentences.

parallel corpus is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

5. Improve SMT Quality with Automatically Extracted Paraphrase Rules

He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side.
Extraction of Paraphrase Rules	We train a source-to-target PBMT system (SYS_ST) and a target-to-source PBMT system (SYS_TS) on the parallel corpus .
Forward-Translation vs. Back-Translation	Note that all the texts of S0, S1, S2, T 0 and T1 are sentence aligned because the initial parallel corpus (S0, T 0) is aligned in the sentence level.

parallel corpus is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: