Index of papers in Proc. ACL that mention
  • parallel corpus
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Cross-Lingual Mixture Model for Sentiment Classification
English), unlabeled parallel corpus U of the source language and the target language, and optional labeled data DT in target language T. Aligning with previous work (Wan, 2008; Wan, 2009), we only consider binary sentiment classification scheme (positive or negative) in this paper, but the proposed method can be used in other classification schemes with minor modifications.
Cross-Lingual Mixture Model for Sentiment Classification
The basic idea underlying CLMM is to enlarge the vocabulary by learning sentiment words from the parallel corpus .
Cross-Lingual Mixture Model for Sentiment Classification
More formally, CLMM defines a generative mixture model for generating a parallel corpus .
Introduction
By “synchronizing” the generation of words in the source language and the target language in a parallel corpus, the proposed model can (1) improve vocabulary coverage by learning sentiment words from the unlabeled parallel corpus; (2) transfer polarity label information between the source language and target language using a parallel corpus .
Related Work
the authors use an unlabeled parallel corpus instead of machine translation engines.
Related Work
They propose a method of training two classifiers based on maximum entropy formulation to maximize their prediction agreement on the parallel corpus .
parallel corpus is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Abstract
We consider two scenarios, 1) Monolingual samples in the source and target languages are available and 2) An additional small amount of parallel corpus is also available.
Inferring a learning curve from mostly monolingual data
In this section we address scenario 81: we have access to a source-language monolingual collection (from which portions to be manually translated could be sampled) and a target-language in—domain monolingual corpus, to supplement the target side of a parallel corpus while training a language model.
Inferring a learning curve from mostly monolingual data
5 Extrapolating a learning curve fitted on a small parallel corpus
Inferring a learning curve from mostly monolingual data
Given a small “seed” parallel corpus , the translation system can be used to train small in-domain models and the evaluation score can be measured at a few initial sample sizes {($1,y1), ($2, yg)...(acp, yp)}.
Introduction
In the first scenario ($1), the SMT developer is given only monolingual source and target samples from the relevant domain, and a small test parallel corpus .
Introduction
In the second scenario (S2), an additional small seed parallel corpus is given that can be used to train small in-domain models and measure (with some variance) the evaluation score at a few points on the initial portion of the learning curve.
Selecting a parametric family of curves
For a certain bilingual test dataset d, we consider a set of observations 0d 2 {(301, yl), ($2, yg)...(;vn, 3471)}, where y, is the performance on d (measured using BLEU (Papineni et al., 2002)) of a translation model trained on a parallel corpus of size 307;.
Selecting a parametric family of curves
The corpus size 307; is measured in terms of the number of segments (sentences) present in the parallel corpus .
parallel corpus is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Experiments
The Wikipedia InterLanguage Links shared task data contains a much larger proportion of transliterations than a parallel corpus .
Experiments
For English/Arabic, we use a freely available parallel corpus from the United Nations (UN) (Eisele and Chen, 2010).
Experiments
glish/Hindi parallel corpus .
Extraction of Transliteration Pairs
Initially, we extract a list of word pairs from a word-aligned parallel corpus using GIZA++.
Extraction of Transliteration Pairs
Initially, the parallel corpus is word-aligned using GIZA++ (Och and Ney, 2003), and the alignments are refined using the grow-diag-final-and heuristic (Koehn et al., 2003).
Extraction of Transliteration Pairs
The reason is that the parallel corpus contains inflectional variants of the same word.
Introduction
In this paper, we show that it is possible to extract transliteration pairs from a parallel corpus using an unsupervised method.
Introduction
The NEWSlO data sets are extracted Wikipedia InterLanguage Links (WIL) which consist of parallel phrases, whereas a parallel corpus consists of parallel sentences.
Models
The training data is a list of word pairs (a source word and its presumed transliteration) extracted from a word-aligned parallel corpus .
parallel corpus is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Conclusion
Future work includes studying the effect of size of parallel corpus on the induced oov translations.
Conclusion
Increasing the size of parallel corpus on one hand reduces the number of oovs.
Experiments & Results 4.1 Experimental Setup
We word-aligned the dev/test sets by concatenating them to a large parallel corpus and running GIZA++ on the whole set.
Experiments & Results 4.1 Experimental Setup
appearing more than once in the parallel corpus and being assigned to multiple different phrases), we take the average of reciprocal ranks for each of them.
Experiments & Results 4.1 Experimental Setup
The generated candidate translations for the oovs can be added to the phrase-table created using the parallel corpus to increase the coverage of the phrase-table.
Graph-based Lexicon Induction
Given a (possibly small amount of) parallel data between the source and target languages, and a large monolingual data in the source language, we construct a graph over all phrase types in the monolingual text and the source side of the parallel corpus and connect phrases that have similar meanings (i.e.
Graph-based Lexicon Induction
There are three types of vertices in the graph: i) labeled nodes which appear in the parallel corpus and for which we have the target-side
Graph-based Lexicon Induction
The labels are translations and their probabilities (more specifically p(e| f )) from the phrase-table extracted from the parallel corpus .
parallel corpus is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Experiments
5.1.2 Parallel Corpus
Experiments
Our parallel corpus is crawled from the web, containing news articles, technical documents, blog entries etc.
Experiments
We use Giza++ (Och and Ney, 2003) to generate the word alignment for the parallel corpus .
Ranking Model Training
In this section, we first describe how to extract reordering examples from parallel corpus ; then we show our features for ranking function; finally, we discuss how to train the model from the extracted examples.
parallel corpus is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Data preparation
We start with a parallel corpus that is tokenised for both L1 and L2.
Data preparation
The parallel corpus is randomly sampled into two large and equally-sized parts.
Data preparation
1. using phrase-translation table T and parallel corpus split 8
Experiments & Results
The data for our experiments were drawn from the Europarl parallel corpus (Koehn, 2005) from which we extracted two sets of 200, 000 sentence pairs each for several language pairs.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kim, Seokhwan and Lee, Gary Geunbae
Abstract
This paper proposes a novel graph-based projection approach and demonstrates the merits of it by using a Korean relation extraction system based on projected dataset from an English—Korean parallel corpus .
Cross-lingual Annotation Projection for Relation Extraction
To accomplish that goal, the method automatically creates a set of annotated text for ft, utilizing a well-made extractor f3 for a resource-rich source language L3 and a parallel corpus of LS and Lt.
Graph Construction
where count (us, ut) is the number of alignments between us and ut across the whole parallel corpus .
Implementation
We used an English-Korean parallel corpus 1 that contains 266,892 bi-sentence pairs in English and Korean.
Implementation
1The parallel corpus collected is available in our website: http://isoft.postech.ac.kr/"megaup/acl/datasets thtpzllreverbcs.washington.edu/
Implementation
The English sentence annotations in the parallel corpus were then propagated into the corresponding Korean sentences.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
We also consider the case where a parallel corpus is not available: to obtain a pseudo-parallel corpus U (i.e.
Conclusion
In this paper, we study bilingual sentiment classification and propose a joint model to simultaneously learn better monolingual sentiment classifiers for each language by exploiting an unlabeled parallel corpus together with the labeled data available for each language.
Experimental Setup 4.1 Data Sets and Preprocessing
For the unlabeled parallel text, we use the ISI Chinese-English parallel corpus (Munteanu and Marcu, 2005), which was extracted automatically from news articles published by Xinhua News Agency in the Chinese Gigaword (2nd Edition) and English Gigaword (2nd Edition) collections.
Experimental Setup 4.1 Data Sets and Preprocessing
We choose the most confidently predicted 10,000 positive and 10,000 negative pairs to constitute the unlabeled parallel corpus U for each data setting.
Introduction
Given the labeled data in each language, we propose an approach that exploits an unlabeled parallel corpus with the following
Related Work
(2007), for example, generate subjectivity analysis resources in a new language from English sentiment resources by leveraging a bilingual dictionary or a parallel corpus .
Results and Analysis
In our experiments, the methods are tested in the two data settings with the corresponding unlabeled parallel corpus as mentioned in Section 4.6 We use
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
AL-SMT: Multilingual Setting
Consider a multilingual parallel corpus , such as EuroParl, which contains parallel sentences for several languages.
Experiments
We preprocessed the EuroParl corpus (http://wwwstatmt.org/europarl) (Koehn, 2005) and built a multilingual parallel corpus with 653,513 sentences, excluding the Q4/2000 portion of the data (2000-10 to 2000-12) which is reserved as the test set.
Experiments
The test set consists of 2,000 multi-language sentences and comes from the multilingual parallel corpus built from Q4/2000 portion of the data.
Introduction
The main source of training data for statistical machine translation (SMT) models is a parallel corpus .
Introduction
In many cases, the same information is available in multiple languages simultaneously as a multilingual parallel corpus , e. g., European Parliament (EuroParl) and UN.
Introduction
In this paper, we consider how to use active learning (AL) in order to add a new language to such a multilingual parallel corpus and at the same time we construct an MT system from each language in the original corpus into this new target language.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Bernhard, Delphine and Gurevych, Iryna
Parallel Datasets
Question-Answer Pairs (WAQA) In this setting, question-answer pairs are considered as a parallel corpus .
Parallel Datasets
(2008) has shown that the best results are obtained by pooling the question-answer pairs {(q, a)1, ..., (q, a)n} and the answer-question pairs {(a,q)1,..., (a,q)n} for training, so that we obtain the following parallel corpus : {(q, a)1, ..., (q, a)n}U{(a, (1)1, ..., (a, Overall, this corpus contains l,227,362 parallel pairs and will be referred to as WAQA (WikiAnswers Question-Answers) in the rest of the paper.
Parallel Datasets
Question Reformulations (WAQ) In this setting, question and question reformulation pairs are considered as a parallel corpus , e.g.
Related Work
Murdock and Croft (2005) created a first parallel corpus of synonym pairs extracted from WordNet, and an additional parallel corpus of English words translating to the same Arabic term in a parallel English-Arabic corpus.
parallel corpus is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Liu, Shujie and Li, Mu and Zhou, Ming and Yu, Nenghai
Experiments and Results
Our parallel corpus contains about 26 million unique sentence pairs in total which are mined from web.
Experiments and Results
The result is not surprising considering our parallel corpus is quite large, and similar observations have been made in previous work as (DeNero and Macherey, 2011) that better alignment quality does not necessarily lead to better end-to-end result.
Training
As we do not have a large manually word aligned corpus, we use traditional word alignment models such as HMM and IBM model 4 to generate word alignment on a large parallel corpus .
Training
Our vocabularies V8 and Vt contain the most frequent 100,000 words from each side of the parallel corpus , and all other words are treated as unknown words.
Training
As there is no clear stopping criteria, we simply run the stochastic optimizer through parallel corpus for N iterations.
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Experimental setup
We use a parallel corpus of 3.9M words consisting of 1.7M words from the NIST MT—08 training data set and 2.2M words extracted from parallel news stories on the
Experimental setup
The parallel corpus is used for building our phrased based machine translation system and to add training data for our reordering model.
Experimental setup
For our English language model, we use the Gigaword English corpus in addition to the English side of our parallel corpus .
Generating reference reordering from parallel sentences
This model allows us to combine features from the original reordering model along with information coming from the alignments to find source reorderings given a parallel corpus and alignments.
Related work
(DeNero and Uszkoreit, 2011; Visweswariah et al., 2011; Neubig et al., 2012) focus on the use of manual word alignments to learn preordering models and in both cases no benefit was obtained by using the parallel corpus in addition to manual word alignments.
Results and Discussions
Table 3: mBLEU with different methods to generate reordering model training data from a machine aligned parallel corpus in addition to manual word alignments.
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kuznetsova, Polina and Ordonez, Vicente and Berg, Alexander and Berg, Tamara and Choi, Yejin
Abstract
Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus with respect to a concrete application of image caption transfer.
Code was provided by Deng et a1. (2012).
We evaluate the usefulness of our new image-text parallel corpus for automatic generation of image descriptions.
Code was provided by Deng et a1. (2012).
Therefore, we also report scores based on semantic matching, which gives partial credits to word pairs based on their lexical similarity.5 The best performing approach with semantic matching is VISUAL (with LM = Image corpus), improving BLEU, Precision, F—score substantially over those of ORIG, demonstrating the extrinsic utility of our newly generated image-text parallel corpus in comparison to the original database.
Conclusion
We have introduced the task of image caption generalization as a means to reduce noise in the parallel corpus of images and text.
Introduction
Evaluation results show both the intrinsic quality of the generalized captions and the extrinsic utility of the new image-text parallel corpus .
Introduction
The new parallel corpus will be made publicly available.2
parallel corpus is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Cross-Language Text Classification
For instance, cross-language latent semantic indexing (Dumais et al., 1997) and cross-language explicit semantic analysis (Potthast et al., 2008) estimate 6 using a parallel corpus .
Introduction
The approach uses unlabeled documents from both languages along with a small number (100 - 500) of translated words, instead of employing a parallel corpus or an extensive bilingual dictionary.
Related Work
(1997) is considered as seminal work in CLIR: they propose a method which induces semantic correspondences between two languages by performing latent semantic analysis, LSA, on a parallel corpus .
Related Work
The major limitation of these approaches is their computational complexity and, in particular, the dependence on a parallel corpus , which is hard to obtain—especially for less resource-rich languages.
Related Work
Gliozzo and Strapparava (2005) circumvent the dependence on a parallel corpus by using so-called multilingual domain models, which can be acquired from comparable corpora in an unsupervised manner.
parallel corpus is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Abstract
A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
Introduction
In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
Introduction
However, dependency parsing focuses on the relations of word pairs, this allows us to use a dictionary-based translation without assuming a parallel corpus available, and the training stage of translation may be ignored and the decoding will be quite fast in this case.
The Related Work
The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
Treebank Translation and Dependency Transformation
Since we use a lexicon rather than a parallel corpus to estimate the translation probabilities, we simply assign uniform probabilities to all translation options.
parallel corpus is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina
Model
High-level Generative Story We have a parallel corpus of several thousand short phrases in the two languages 5 and .73.
Model
Once A, E, and F have been drawn, we model our parallel corpus of short phrases as a series of independent draws from a phrase-pair generation model.
Related Work
Given a parallel corpus , the annotations are projected from this source language to its counterpart, and the resulting annotations are used for supervised training in the target language.
Related Work
While their approach does not require a parallel corpus it does assume the availability of annotations in one language.
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus .
Experiments
The reason is that large scale parallel corpus maintains more bilingual knowledge and language phenomenon, while small in-domain corpus encounters data sparse problem, which degrades the translation performance.
Experiments
Results of the systems trained on only a subset of the general-domain parallel corpus .
Introduction
For this, an effective approach is to automatically select and eXpand domain-specific sentence pairs from large scale general-domain parallel corpus .
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Experiments
Dataset and SMT Pipeline We use the NIST MT Chinese-English parallel corpus (NIS T), excluding non-UN and non-HK Hansards portions as our training dataset.
Polylingual Tree-based Topic Models
In addition, we extract the word alignments from aligned sentences in a parallel corpus .
Topic Models for Machine Translation
For a parallel corpus of aligned source and target sentences (.73, 5 a phrase f E .7: is translated to a phrase 6 E 5 according to a distribution pw(é| f One popular method to estimate the probability
Topic Models for Machine Translation
Our contribution are topics that capture multilingual information and thus better capture the domains in the parallel corpus .
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Subotin, Michael
Conclusion
This paper has introduced a scalable exponential phrase model for target languages with complex morphology that can be trained on the full parallel corpus .
Conclusion
The results suggest that the model should be especially useful for languages with sparser resources, but that performance improvements can be obtained even for a very large parallel corpus .
Exponential phrase models with shared features
We obtain the counts of instances and features from the standard heuristics used to extract the grammar from a word-aligned parallel corpus .
Features
2Note that this model is estimated from the fall parallel corpus , rather than a held-out development set.
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Duan, Xiangyu and Zhang, Min and Li, Haizhou
Abstract
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus .
Experiments and Results
A 5-gram language model is trained on English side of parallel corpus .
Experiments and Results
SSP puts overly strong alignment constraints on parallel corpus , which impacts performance dramatically.
Introduction
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Experimental Evaluation
We use the first 100k sentences of the parallel corpus for the TM, and the whole parallel corpus for the LM.
Experimental Evaluation
Finally, we varied the size of the parallel corpus for the J apanese-English task from 50k to 400k sen-
Experimental Evaluation
50 100k 200k 400k Parallel Corpus Size
parallel corpus is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Experiments
Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different.
Word Clustering
For concreteness, A(:c, y) will be the number of times that cc is aligned to y in a word aligned parallel corpus .
Word Clustering
We compare two different clusterings of a two-sentence Arabic-English parallel corpus (the English half of the corpus contains the same sentence, twice, while the Arabic half has two variants with the same meaning).
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kozhevnikov, Mikhail and Titov, Ivan
Conclusion
notation projection approaches require sentence-and word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus , cross-lingual model transfer can be performed using only a bilingual dictionary.
Evaluation
Projection Baseline: The projection baseline we use for English-Czech and English-Chinese is a straightforward one: we label the source side of a parallel corpus using the source-language model, then identify those verbs on the target side that are aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion.
Model Transfer
The mapping (bilingual dictionary) we use is derived from a word-aligned parallel corpus , by identifying, for each word in the target language,
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Experiments
In contrast, because the alignment entropy doesn’t depend on the gold standard, one can easily report the alignment performance on any unaligned parallel corpus .
Introduction
Since the models are trained on an aligned parallel corpus , the resulting statistical models can only be as good as the alignment of the corpus.
Transliteration alignment techniques
The alignment can be performed via the Expectation-Maximization (EM) by starting with a random initial alignment and calculating the afi‘inity matrix count(ei, cj) over the whole parallel corpus , where element (2', j) is the number of times character 6, was aligned to 03-.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Clustering for Cross Lingual Sentiment Analysis
As a viable alternative, cluster linkages could be learned from a bilingual parallel corpus and these linkages can be used to bridge the language gap for CLSA.
Experimental Setup
English-Hindi parallel corpus contains 45992 sentences and English-Marathi parallel corpus contains 47881 sentences.
Introduction
To perform CLSA, this study leverages unlabelled parallel corpus to generate the word alignments.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Experiments
The EC parallel corpus in our experiments was constructed using several LDC bilingual corpora5.
Experiments
In our experiment, we implemented DIRT and extracted paraphrase patterns from the English part of our bilingual parallel corpus .
Proposed Method
An English-Chinese (EC) bilingual parallel corpus is employed for training.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Abstract
Without using extra paraphrase resources, we acquire the rules by comparing the source side of the parallel corpus with the target-to-source translations of the target side.
Extraction of Paraphrase Rules
We train a source-to-target PBMT system (SYS_ST) and a target-to-source PBMT system (SYS_TS) on the parallel corpus .
Forward-Translation vs. Back-Translation
Note that all the texts of S0, S1, S2, T 0 and T1 are sentence aligned because the initial parallel corpus (S0, T 0) is aligned in the sentence level.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berant, Jonathan and Liang, Percy
Introduction
We use two complementary paraphrase models: an association model based on aligned phrase pairs extracted from a monolingual parallel corpus , and a vector space model, which represents each utterance as a vector and learns a similarity score between them.
Introduction
(2013) presented a QA system that maps questions onto simple queries against Open IE extractions, by learning paraphrases from a large monolingual parallel corpus , and performing a single paraphrasing step.
Model overview
Our framework accommodates any paraphrasing method, and in this paper we propose an association model that learns to associate natural language phrases that co-occur frequently in a monolingual parallel corpus , combined with a vector space model, which learns to score the similarity between vector representations of natural language utterances (Section 5).
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Instead of using a parallel corpus , labeled and unlabeled instances in one language are translated into ones in the other language and all instances in both languages are then fed into a bilingual active learning engine as pseudo parallel corpora.
Abstract
Instead of using a parallel corpus which should have entity/relation alignment information and is thus difficult to obtain, this paper employs an off-the-shelf machine translator to translate both labeled and unlabeled instances from one language into the other language, forming pseudo parallel corpora.
Abstract
Our lexicon is derived from the FBIS parallel corpus (#LDC2003E14), which is widely used in machine translation between English and Chinese.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
Our goal is to obtain translation distributions for source phrases that are not present in the phrase table extracted from the parallel corpus .
Generation & Propagation
The label space is thus the phrasal translation inventory, and like the source side it can also be represented in terms of a graph, initially consisting of target phrase nodes from the parallel corpus .
Generation & Propagation
Thus, the target phrase inventory from the parallel corpus may be inadequate for unlabeled instances.
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
Approach Overview
Central to our approach (see Algorithm 1) is a bilingual similarity graph built from a sentence-aligned parallel corpus .
Graph Construction
The graph vertices are extracted from the different sides of a parallel corpus (De, Df) and an additional unlabeled monolingual foreign corpus Ff, which will be used later for training.
Graph Construction
Since our graph is built from a parallel corpus , we can use standard word alignment techniques to align the English sentences “De
parallel corpus is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: