Index of papers in Proc. ACL 2014 that mention
  • parallel data
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Abstract
In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.
Background: Deep Learning
Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase.
Experiments
In the pre-training phase, all parallel data is fed into two neural networks respectively for DAE training, where network parameters W and b are randomly initialized.
Experiments
The parallel data we use is released by LDC3.
Experiments
Translation models are trained over the parallel data that is automatically word-aligned
Introduction
One typical property of these approaches in common is that they only utilize parallel data where document boundaries are explicitly given.
Introduction
However, this situation does not always happen since there is considerable amount of parallel data which does not have document boundaries.
Introduction
This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible.
Topic Similarity Model with Neural Network
learn topic representations using sentence-level parallel data .
Topic Similarity Model with Neural Network
3.2 Fine-tuning with parallel data
Topic Similarity Model with Neural Network
Consequently, the whole neural network can be fine-tuned towards the supervised criteria with the help of parallel data .
parallel data is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Data and Tools
The parallel data come from the Europarl corpus version 7 (Koehn, 2005) and Kaist Corpus4.
Data and Tools
The parallel data for these three languages are also from the Europarl corpus version 7.
Data and Tools
POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro-
Experiments
6Japanese and Indonesia are excluded as no practicable parallel data are available.
Introduction
In this paper, we consider a practically motivated scenario, in which we want to build statistical parsers for resource-poor target languages, using existing resources from a resource-rich source language (like English).1 We assume that there are absolutely no labeled training data for the target language, but we have access to parallel data with a resource-rich language and a sufficient amount of labeled training data to build an accurate parser for the resource-rich language.
Introduction
(2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.
Introduction
We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data.
Our Approach
Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
Our Approach
In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
Our Approach
We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:
parallel data is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
Evaluation
We also examine how our approach can learn from noisy parallel data compared to the traditional SMT system.
Evaluation
We used this set in two ways: either to augment the parallel data presented in Table 2, or to augment the non-comparable monolingual data in Table 3 for graph construction.
Generation & Propagation
If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data ) is used as the label, and is sub-
Introduction
However, the limiting factor in the success of these techniques is parallel data availability.
Introduction
While parallel data is generally scarce, monolingual resources exist in abundance and are being created at accelerating rates.
Introduction
Can we use monolingual data to augment the phrasal translations acquired from parallel data ?
parallel data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Abstract
Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.
Abstract
Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data .
Approach
The idea is that, given enough parallel data , a shared representation of two parallel sentences would be forced to capture the common elements between these two sentences.
Conclusion
To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models.
Overview
A key difference between our approach and those listed above is that we only require sentence-aligned parallel data in our otherwise unsupervised learning function.
Overview
Parallel data in multiple languages provides an
Related Work
However, there exists a corpus of prior work on learning multilingual embeddings or on using parallel data to transfer linguistic information across languages.
Related Work
(2012), our baseline in §5.2, use a form of multi-agent learning on word-aligned parallel data to transfer embeddings from one language to another.
parallel data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Experiments
Additionally, we adopt GIZA++ to get the word alignment of in-domain parallel data and form the word translation probability table.
Experiments
We adopt five methods for extracting domain-relevant parallel data from general-domain corpus.
Experiments
When top 600k sentence pairs are picked out from general-domain corpus to train machine translation systems, the systems perform higher than the General-domain baseline trained on 16 million parallel data .
Training Data Selection Methods
These methods are based on language model and translation model, which are trained on small in-domain parallel data .
Training Data Selection Methods
t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data .
parallel data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Experiments
We use a phrase-based system similar to Moses (Koehn et al., 2007) based on a set of common features including maximum likelihood estimates pML (elf) and pML (f |e), lexically weighted estimates pLW(e| f) and p LW( f |e), word and phrase-penalties, a hierarchical reordering model (Galley and Manning, 2008), a linear distortion feature, and a modified Kneser—Ney language model trained on the target-side of the parallel data .
Experiments
Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Experiments
All neural network models are trained on the news portion of the parallel data , corresponding to 136K sentences, which we found to be most useful in initial experiments.
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
We align the parallel data with GIZA++ (Och et al., 2003) and decode using Moses (Koehn et al., 2007).
Experimental Setup
A KN-smoothed 5-gram language model is trained on the target side of the parallel data with SRILM (Stolcke, 2002).
Related Work
With word-boundary-aware phrase extraction, a phrase pair containing all of “with his blue car” must have been seen in the parallel data to translate the phrase correctly at test time.
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
Introduction
For statistical machine translation (MT), which relies on the existence of parallel data , translating from nonstandard dialects is a challenge.
Machine Translation Experiments
DAT (in the fourth column) is the DA part of the 5M word DA-En parallel data processed with the DA-MSA MT system.
Related Work
Two approaches have emerged to alleviate the problem of DA-English parallel data scarcity: using MSA as a bridge language (Sawaf, 2010; Salloum and Habash, 2011; Salloum and Habash, 2013; Sajjad et al., 2013), and using crowd sourcing to acquire parallel data (Zbib et al., 2012).
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: