Index of papers in Proc. ACL 2014 that mention
  • parallel corpus
van Gompel, Maarten and van den Bosch, Antal
Data preparation
We start with a parallel corpus that is tokenised for both L1 and L2.
Data preparation
The parallel corpus is randomly sampled into two large and equally-sized parts.
Data preparation
1. using phrase-translation table T and parallel corpus split 8
Experiments & Results
The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Experiments
Dataset and SMT Pipeline We use the NIST MT Chinese-English parallel corpus (NIS T), excluding non-UN and non-HK Hansards portions as our training dataset.
Polylingual Tree-based Topic Models
In addition, we extract the word alignments from aligned sentences in a parallel corpus .
Topic Models for Machine Translation
For a parallel corpus of aligned source and target sentences (.73, 5 a phrase f E .7: is translated to a phrase 6 E 5 according to a distribution pw(é| f One popular method to estimate the probability
Topic Models for Machine Translation
Our contribution are topics that capture multilingual information and thus better capture the domains in the parallel corpus .
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus .
Experiments
The reason is that large scale parallel corpus maintains more bilingual knowledge and language phenomenon, while small in-domain corpus encounters data sparse problem, which degrades the translation performance.
Experiments
Results of the systems trained on only a subset of the general-domain parallel corpus .
Introduction
For this, an effective approach is to automatically select and eXpand domain-specific sentence pairs from large scale general-domain parallel corpus .
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Introduction
We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus , and a vector space model, which represents each utterance as a vector and learns a similarity score between them.
Introduction
(2013) presented a QA system that maps questions onto simple queries against Open IE extractions, by learning paraphrases from a large monolingual parallel corpus , and performing a single paraphrasing step.
Model overview
Our framework accommodates any paraphrasing method, and in this paper we propose an association model that learns to associate natural language phrases that co-occur frequently in a monolingual parallel corpus , combined with a vector space model, which learns to score the similarity between vector representations of natural language utterances (Section 5).
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Instead of using a parallel corpus , labeled and unlabeled instances in one language are translated into ones in the other language and all instances in both languages are then fed into a bilingual active learning engine as pseudo parallel corpora.
Abstract
Instead of using a parallel corpus which should have entity/relation alignment information and is thus difficult to obtain, this paper employs an off-the-shelf machine translator to translate both labeled and unlabeled instances from one language into the other language, forming pseudo parallel corpora.
Abstract
Our lexicon is derived from the FBIS parallel corpus (#LDC2003E14), which is widely used in machine translation between English and Chinese.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
Our goal is to obtain translation distributions for source phrases that are not present in the phrase table extracted from the parallel corpus .
Generation & Propagation
The label space is thus the phrasal translation inventory, and like the source side it can also be represented in terms of a graph, initially consisting of target phrase nodes from the parallel corpus .
Generation & Propagation
Thus, the target phrase inventory from the parallel corpus may be inadequate for unlabeled instances.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: