Index of papers in Proc. ACL that mention
  • parallel data
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
Evaluation
We also examine how our approach can learn from noisy parallel data compared to the traditional SMT system.
Evaluation
We used this set in two ways: either to augment the parallel data presented in Table 2, or to augment the non-comparable monolingual data in Table 3 for graph construction.
Generation & Propagation
If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data ) is used as the label, and is sub-
Introduction
However, the limiting factor in the success of these techniques is parallel data availability.
Introduction
While parallel data is generally scarce, monolingual resources exist in abundance and are being created at accelerating rates.
Introduction
Can we use monolingual data to augment the phrasal translations acquired from parallel data ?
parallel data is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Ma, Xuezhe and Xia, Fei
Data and Tools
The parallel data come from the Europarl corpus version 7 (Koehn, 2005) and Kaist Corpus4.
Data and Tools
The parallel data for these three languages are also from the Europarl corpus version 7.
Data and Tools
POS tags are not available for parallel data in the Europarl and Kaist corpus, so we need to pro-
Experiments
6Japanese and Indonesia are excluded as no practicable parallel data are available.
Introduction
In this paper, we consider a practically motivated scenario, in which we want to build statistical parsers for resource-poor target languages, using existing resources from a resource-rich source language (like English).1 We assume that there are absolutely no labeled training data for the target language, but we have access to parallel data with a resource-rich language and a sufficient amount of labeled training data to build an accurate parser for the resource-rich language.
Introduction
(2011) proposed an approach for unsupervised dependency parsing with nonparallel multilingual guidance from one or more helper languages, in which parallel data is not used.
Introduction
We train probabilistic parsing models for resource-poor languages by maximizing a combination of likelihood on parallel data and confidence on unlabeled data.
Our Approach
Another advantage of the learning framework is that it combines both the likelihood on parallel data and confidence on unlabeled data, so that both parallel text and unlabeled data can be utilized in our approach.
Our Approach
In our scenario, we have a set of aligned parallel data P = mg, a,} where ai is the word alignment for the pair of source-target sentences (mf, and a set of unlabeled sentences of the target language U = We also have a trained English parsing model pAE Then the K in equation (7) can be divided into two cases, according to whether 3:,- belongs to parallel data set P or unlabeled data set U.
Our Approach
We define the transferring distribution by defining the transferring weight utilizing the English parsing model pAE (y Via parallel data with word alignments:
parallel data is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
sentiment) bilingual (in L1 and L2) parallel data U that are defined as follows.
A Joint Model with Unlabeled Parallel Text
where v E {1,2} denotes L1 or L2; the first term on the right-hand side is the likelihood of labeled data for both D1 and D2; and the second term is the likelihood of the unlabeled parallel data U.
A Joint Model with Unlabeled Parallel Text
However, there could be considerable noise in real-world parallel data , i.e.
Abstract
We present a novel approach for joint bilingual sentiment classification at the sentence level that augments available labeled data in each language with unlabeled parallel data .
Experimental Setup 4.1 Data Sets and Preprocessing
We also try to remove neutral sentences from the parallel data since they can introduce noise into our model, which deals only with positive and negative examples.
Experimental Setup 4.1 Data Sets and Preprocessing
Co-Training with SVMs (Co-SVM): This method applies SVM-based co-training given both the labeled training data and the unlabeled parallel data following Wan (2009).
Introduction
We furthermore find that improvements, albeit smaller, are obtained when the parallel data is replaced with a pseudo-parallel (i.e.
Results and Analysis
8 By making use of the unlabeled parallel data , our proposed approach improves the accuracy, compared to MaXEnt, by 8.12% (or 33.27% error reduction) on English and 3.44% (or 16.92% error reduction) on Chinese in the first setting, and by 5.07% (or 19.67% error reduction) on English and 3.87% (or 19.4% error reduction) on Chinese in the second setting.
parallel data is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Introduction
Of course, for many language pairs and domains, parallel data is not available.
Machine Translation as a Decipherment Task
We now turn to the problem of MT without parallel data .
Machine Translation as a Decipherment Task
Next, we present two novel decipherment approaches for MT training without parallel data .
Machine Translation as a Decipherment Task
Bayesian Decipherment: We introduce a novel method for estimating IBM Model 3 parameters without parallel data , using Bayesian learning.
Word Substitution Decipherment
Before we tackle machine translation without parallel data , we first solve a simpler problem—word substitution decipherment.
parallel data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Abstract
In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.
Background: Deep Learning
Inspired by previous successful research, we first learn sentence representations using topic-related monolingual texts in the pre-training phase, and then optimize the bilingual similarity by leveraging sentence-level parallel data in the fine-tuning phase.
Experiments
In the pre-training phase, all parallel data is fed into two neural networks respectively for DAE training, where network parameters W and b are randomly initialized.
Experiments
The parallel data we use is released by LDC3.
Experiments
Translation models are trained over the parallel data that is automatically word-aligned
Introduction
One typical property of these approaches in common is that they only utilize parallel data where document boundaries are explicitly given.
Introduction
However, this situation does not always happen since there is considerable amount of parallel data which does not have document boundaries.
Introduction
This underlying topic space is learned from sentence-level parallel data in order to share topic information across the source and target languages as much as possible.
Topic Similarity Model with Neural Network
learn topic representations using sentence-level parallel data .
Topic Similarity Model with Neural Network
3.2 Fine-tuning with parallel data
Topic Similarity Model with Neural Network
Consequently, the whole neural network can be fine-tuned towards the supervised criteria with the help of parallel data .
parallel data is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Abstract
In this paper, we propose a generative cross-lingual mixture model (CLMM) to leverage unlabeled bilingual parallel data .
Abstract
By fitting parameters to maximize the likelihood of the bilingual parallel data, the proposed model learns previously unseen sentiment words from the large bilingual parallel data and improves vocabulary coverage significantly.
Experiment
CLMM includes two hyper-parameters (A3 and At) controlling the contribution of unlabeled parallel data .
Experiment
4.5 The Influence of Unlabeled Parallel Data
Experiment
We investigate how the size of the unlabeled parallel data affects the sentiment classification in this subsection.
Introduction
Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
Introduction
CLMM is a generative model that treats the source language and target language words in parallel data as generated simultaneously by a set of mixture components.
Introduction
This paper makes two contributions: (1) we propose a model to effectively leverage large bilingual parallel data for improving vocabulary coverage; and (2) the proposed model is applicable in both settings of cross-lingual sentiment classification, irrespective of the availability of labeled data in the target language.
parallel data is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam
Abstract
of language pairs, large amounts of parallel data
Abstract
However, for most language pairs and domains there is little to no curated parallel data available.
Abstract
Hence discovery of parallel data is an important first step for translation between most of the world’s languages.
parallel data is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Abstract
As a supplement to existing parallel training data, our automatically extracted parallel data yields substantial translation quality improvements in translating microblog text and modest improvements in translating edited news commentary.
Introduction
Section 2 describes the related work in parallel data extraction.
Introduction
Section 3 presents our model to extract parallel data within the same document.
Parallel Data Extraction
We will now describe our method to extract parallel data from Microblogs.
Parallel Data Extraction
these are also considered for the extraction of parallel data .
Parallel Segment Retrieval
Prior work on finding parallel data attempts to reason about the probability that pairs of documents (x, y) are parallel.
Parallel Segment Retrieval
,xn, and consisting of n tokens, and need to determine whether there is parallel data in X, and if so, where are the parallel segments and their languages.
Parallel Segment Retrieval
The main problem we address is to find the parallel data when the boundaries of the parallel segments are not defined explicitly.
Related Work
Automatic collection of parallel data is a well-studied problem.
Related Work
We aim to propose a method that acquires large amounts of parallel data for free.
parallel data is mentioned in 28 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
Experiments
of the full parallel text; we do not use the English side of the parallel data for actually building systems.
Experiments
One disadvantage to the previous method for evaluating the SENSESPOTTING task is that it requires parallel data in a new domain.
Experiments
Suppose we have no parallel data in the new domain at all, yet still want to attack the SENSESPOTTING task.
Introduction
We operate under the framework of phrase sense disambiguation (Carpuat and Wu, 2007), in which we take automatically align parallel data in an old domain to generate an initial old-domain sense inventory.
New Sense Indicators
Table 2: Basic characteristics of the parallel data .
Related Work
In contrast, the SENSESPOTTING task consists of detecting when senses are unknown in parallel data .
Task Definition
From an applied perspective, the assumption of a small amount of parallel data in the new domain is reasonable: if we want an MT system for a new domain, we will likely have some data for system tuning and evaluation.
parallel data is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan
Abstract
Further, adapting large MSAflEnglish parallel data increases the lexical coverage, reduces OOVs to 0.7% and leads to an absolute BLEU improvement of 2.73 points.
Conclusion
adapted parallel data showed an improvement of 1.87 BLEU points over our best baseline.
Introduction
Later, we applied an adaptation method to incorporate MSA/English parallel data .
Introduction
— We built a phrasal Machine Translation (MT) system on adapted EgyptiarflEnglish parallel data , which outperformed a non-adapted baseline by 1.87 BLEU points.
Introduction
— We used phrase-table merging (Nakov and Ng, 2009) to utilize MSA/English parallel data with the available in-domain parallel data .
Previous Work
This can be done by either translating between the related languages using word-level translation, character level transformations, and language specific rules (Durrani et al., 2010; Hajic et al., 2000; Nakov and Tiedemann, 2012), or by concatenating the parallel data for both languages (Nakov and Ng, 2009).
Previous Work
These translation methods generally require parallel data , for which hardly any exists between dialects and MSA.
Previous Work
Their best Egyptian/English system was trained on dialect/English parallel data .
parallel data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Conclusion
We presented a novel approach for inducing oov translations from a monolingual corpus on the source side and a parallel data using graph propagation.
Experiments & Results 4.1 Experimental Setup
From the deV and test sets, we extract all source words that do not appear in the phrase-table constructed from the parallel data .
Experiments & Results 4.1 Experimental Setup
Similarly, the value of original four probability features in the phrase-table for the new entries are set to l. The entire training pipeline is as follows: (i) a phrase table is constructed using parallel data as usual, (ii) oovs for dev and test sets are extracted, (iii) oovs are translated using graph propagation, (iv) oovs and translations are added to the phrase table, introducing a new feature type, (v) the new phrase table is tuned (with a LM) using MERT (Och, 2003) on the dev set.
Experiments & Results 4.1 Experimental Setup
The correctness of this gold standard is limited to the size of the parallel data used as well as the quality of the word alignment software toolkit, and is not 100% precise.
Graph-based Lexicon Induction
Given a (possibly small amount of) parallel data between the source and target languages, and a large monolingual data in the source language, we construct a graph over all phrase types in the monolingual text and the source side of the parallel corpus and connect phrases that have similar meanings (i.e.
Graph-based Lexicon Induction
When a relatively small parallel data is used, unlabeled nodes outnumber labeled ones and many of them lie on the paths between an oov node to labeled ones.
Introduction
Increasing the size of the parallel data can reduce the number of oovs.
Introduction
Pivot language techniques tackle this problem by taking advantage of available parallel data between the source language and a third language.
Introduction
(2009) in which a graph is constructed from source language monolingual text1 and the source-side of the available parallel data .
parallel data is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Abstract
Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences.
Abstract
Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data .
Approach
The idea is that, given enough parallel data , a shared representation of two parallel sentences would be forced to capture the common elements between these two sentences.
Conclusion
To summarize, we have presented a novel method for learning multilingual word embeddings using parallel data in conjunction with a multilingual objective function for compositional vector models.
Overview
A key difference between our approach and those listed above is that we only require sentence-aligned parallel data in our otherwise unsupervised learning function.
Overview
Parallel data in multiple languages provides an
Related Work
However, there exists a corpus of prior work on learning multilingual embeddings or on using parallel data to transfer linguistic information across languages.
Related Work
(2012), our baseline in §5.2, use a form of multi-agent learning on word-aligned parallel data to transfer embeddings from one language to another.
parallel data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and Klementiev, Alexandre
Abstract
We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations.
Abstract
When applied to German—English parallel data , our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on nonparallel sentences.
Conclusions
We show that an agreement signal extracted from parallel data provides indirect supervision capable of substantially improving a state-of-the-art model for semantic role induction.
Introduction
The goal of this work is to show that parallel data is useful in unsupervised induction of shallow semantic representations.
Multilingual Extension
As we argued in Section 1, our goal is to penalize for disagreement in semantic structures predicted for each language on parallel data .
Multilingual Extension
Intuitively, when two arguments are aligned in parallel data , we expect them to be labeled with the same semantic role in both languages.
Multilingual Extension
Specifically, we augment the joint probability with a penalty term computed on parallel data:
parallel data is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kozhevnikov, Mikhail and Titov, Ivan
Background and Motivation
The approaches in this third group often use parallel data to bridge the gap between languages.
Background and Motivation
However, they are very sensitive to the quality of parallel data , as well as the accuracy of a source-language model on it.
Background and Motivation
This approach yields an SRL model for a new language at a very low cost, effectively requiring only a source language model and parallel data .
Conclusion
It allows one to quickly construct an SRL model for a new language without manual annotation or language-specific heuristics, provided an accurate model is available for one of the related languages along with a certain amount of parallel data for the two languages.
Conclusion
notation projection approaches require sentence-and word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus, cross-lingual model transfer can be performed using only a bilingual dictionary.
Evaluation
We use parallel data to construct a bilingual dictionary used in word mapping, as well as in the projection baseline.
Related Work
The basic idea behind model transfer is similar to that of cross-lingual annotation projection, as we can see from the way parallel data is used in, for example, McDonald et al.
parallel data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Naseem, Tahira and Barzilay, Regina
Abstract
We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters.
Experimental setup
Though the model is trained using parallel data , during testing it has access only to monolingual data.
Experimental setup
This setup ensures that we are testing our model’s ability to learn better parameters at training time, rather than its ability to exploit parallel data at test time.
Introduction
We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters.
Related Work
More recently, there has been a body of work attempting to improve parsing performance by exploiting syntactically annotated parallel data .
Related Work
In one strand of this work, annotations are assumed only in a resource-rich language and are projected onto a resource-poor language using the parallel data (Hwa et al., 2005; Xi and Hwa, 2005).
Related Work
In another strand of work, syntactic annotations are assumed on both sides of the parallel data, and a model is trained to exploit the parallel data at test time as well (Smith and Smith, 2004; Burkett and Klein, 2008).
parallel data is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Mauser, Arne and Ney, Hermann
Experimental Evaluation
We also compare the results on these corpora to a system trained on parallel data .
Experimental Evaluation
Och (2002) reports results of 48.2 BLEU for a single-word based translation system and 56.1 BLEU using the alignment template approach, both trained on parallel data .
Related Work
Unsupervised training of statistical translations systems without parallel data and related problems have been addressed before.
Related Work
Close to the methods described in this work, Ravi and Knight (2011) treat training and translation without parallel data as a deciphering problem.
Related Work
They perform experiments on a SpanislflEnglish task with vocabulary sizes of about 500 words and achieve a performance of around 20 BLEU compared to 70 BLEU obtained by a system that was trained on parallel data .
parallel data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Abstract
Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose.
Inferring a learning curve from mostly monolingual data
However, when a configuration )f four initial points is used for the same amount of ‘seed” parallel data , it outperforms both the config-Jrations with three initial points.
Inferring a learning curve from mostly monolingual data
The ability to predict the amount of parallel data required to achieve a given level of quality is very valuable in planning business deployments of statistical machine translation; yet, we are not aware of any rigorous proposal for addressing this need.
Introduction
Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific business purpose.
Introduction
This prediction, or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data , is the objective of this paper.
Introduction
They show that without any parallel data we can predict the expected translation accuracy at 75K segments within an error of 6 BLEU points (Table 4), while using a seed training corpus of 10K segments narrows this error to within 1.5 points (Table 6).
parallel data is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
Conclusion
thank Amamag Subramanya for helping us with the implementation of label propagation and Shankar Kumar for access to the parallel data .
Experiments and Results
The parallel data came from the Europarl corpus (Koehn, 2005) and the ODS United Nations dataset (UN, 2006).
Experiments and Results
Taking the intersection of languages in these resources, and selecting languages with large amounts of parallel data , yields the following set of eight Indo-European languages: Danish, Dutch, German, Greek, Italian, Portuguese, Spanish and Swedish.
Experiments and Results
0 Projection: Our third baseline incorporates bilingual information by projecting POS tags directly across alignments in the parallel data .
Introduction
To bridge this gap, we consider a practically motivated scenario, in which we want to leverage existing resources from a resource-rich language (like English) when building tools for resource-poor foreign languages.1 We assume that absolutely no labeled training data is available for the foreign language of interest, but that we have access to parallel data with a resource-rich language.
parallel data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Experiments
Additionally, we adopt GIZA++ to get the word alignment of in-domain parallel data and form the word translation probability table.
Experiments
We adopt five methods for extracting domain-relevant parallel data from general-domain corpus.
Experiments
When top 600k sentence pairs are picked out from general-domain corpus to train machine translation systems, the systems perform higher than the General-domain baseline trained on 16 million parallel data .
Training Data Selection Methods
These methods are based on language model and translation model, which are trained on small in-domain parallel data .
Training Data Selection Methods
t(ej|fi) is the translation probability of word 61- conditioned on word fiand is estimated from the small in-domain parallel data .
parallel data is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Faruqui, Manaal and Dyer, Chris
Conclusions
Our model can be extended for clustering any number of given languages together in a joint framework, and incorporate both monolingual and parallel data .
Experiments
Monolingual Clustering: For every language pair, we train German word clusters on the monolingual German data from the parallel data .
Experiments
Recall that A(cc, y) is the count of the alignment links between cc and 3/ observed in the parallel data , and A(cc) and A(y) are the respective marginal counts.
Introduction
Since the objective consists of terms representing the entropy monolingual data (for each language) and parallel bilingual data, it is particularly attractive for the usual situation in which there is much more monolingual data available than parallel data .
parallel data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Conclusion
We introduced a data collection framework that produces highly parallel data by asking different annotators to describe the same video segments.
Discussions and Future Work
While our data collection framework yields useful parallel data , it also has some limitations.
Discussions and Future Work
By pairing up descriptions of the same video in different languages, we obtain parallel data without requiring any bilingual skills.
Experiments
We quantified the utility of our highly parallel data by computing the correlation between BLEU and human ratings when different numbers of references were available.
parallel data is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Introduction
To be able to translate a Chinese abbreviation that s unseen in available parallel corpora, one may an-lotate more parallel data .
Unsupervised Translation Induction for Chinese Abbreviations
This is particularly interesting since we normally have enormous monolingual data, but a small amount of parallel data .
Unsupervised Translation Induction for Chinese Abbreviations
For example, in the translation task between Chinese and English, both the Chinese and English Gigaword have billions of words, but the parallel data has only about 30 million words.
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Experiments and Results
We also report the first BLEU results on such a large-scale MT task under truly nonparallel settings (without using any parallel data or seed lexicon).
Experiments and Results
The results are encouraging and demonstrates the ability of the method to scale to large-scale settings while performing efficient inference with compleX models, which we believe will be especially useful for future MT application in scenarios where parallel data is hard to obtain.
Introduction
But obtaining parallel data is an expensive process and not available for all language
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Darwish, Kareem
Cross-lingual Features
The sentences were drawn from the UN parallel data along with a variety of parallel news data from LDC and the GALE project.
Related Work
If cross-lingual resources are available, such as parallel data , increased training data, better resources, or superior features can be used to improve the processing (ex.
Related Work
They did so by training a bilingual model and then generating more training data from unlabeled parallel data .
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Haffari, Gholamreza
Abstract
This paper presents a novel method for inducing phrase-based translation units directly from parallel data , which we frame as learning an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Analysis
We have presented a novel method for leam-ing a phrase-based model of translation directly from parallel data which we have framed as leam-ing an inverse transduction grammar (ITG) using a recursive Bayesian prior.
Introduction
Word-based translation models (Brown et al., 1993) remain central to phrase-based model training, where they are used to infer word-level alignments from sentence aligned parallel data , from
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Experiments
We use a phrase-based system similar to Moses (Koehn et al., 2007) based on a set of common features including maximum likelihood estimates pML (elf) and pML (f |e), lexically weighted estimates pLW(e| f) and p LW( f |e), word and phrase-penalties, a hierarchical reordering model (Galley and Manning, 2008), a linear distortion feature, and a modified Kneser—Ney language model trained on the target-side of the parallel data .
Experiments
Translation models are estimated on 102M words of parallel data for French-English, and 99M words for German-English; about 6.5M words for each language pair are newswire, the remainder are parliamentary proceedings.
Experiments
All neural network models are trained on the news portion of the parallel data , corresponding to 136K sentences, which we found to be most useful in initial experiments.
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
We align the parallel data with GIZA++ (Och et al., 2003) and decode using Moses (Koehn et al., 2007).
Experimental Setup
A KN-smoothed 5-gram language model is trained on the target side of the parallel data with SRILM (Stolcke, 2002).
Related Work
With word-boundary-aware phrase extraction, a phrase pair containing all of “with his blue car” must have been seen in the parallel data to translate the phrase correctly at test time.
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
Introduction
For statistical machine translation (MT), which relies on the existence of parallel data , translating from nonstandard dialects is a challenge.
Machine Translation Experiments
DAT (in the fourth column) is the DA part of the 5M word DA-En parallel data processed with the DA-MSA MT system.
Related Work
Two approaches have emerged to alleviate the problem of DA-English parallel data scarcity: using MSA as a bridge language (Sawaf, 2010; Salloum and Habash, 2011; Salloum and Habash, 2013; Sajjad et al., 2013), and using crowd sourcing to acquire parallel data (Zbib et al., 2012).
parallel data is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: