Index of papers in Proc. ACL that mention
  • machine translation
Xiang, Bing and Luo, Xiaoqiang and Zhou, Bowen
Abstract
ECs are ubiquitous in languages like Chinese, but they are tacitly ignored in most machine translation (MT) work because of their elusive nature.
Abstract
In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features.
Chinese Empty Category Prediction
our opinion, recovering ECs from machine parse trees is more meaningful since that is what one would encounter when developing a downstream application such as machine translation .
Conclusions and Future Work
We also applied the predicted ECs to a large-scale Chinese-to-English machine translation task and achieved significant improvement over two strong MT base-
Integrating Empty Categories in Machine Translation
In this section, we explore multiple approaches of utilizing recovered ECs in machine translation .
Introduction
One of the key challenges in statistical machine translation (SMT) is to effectively model inherent differences between the source and the target language.
Introduction
ty Categories for Machine Translation
Introduction
0 Measure the effect of ECs on automatic word alignment for machine translation after integrating recovered ECs into the MT data;
Related Work
There exists only a handful of previous work on applying ECs explicitly to machine translation so far.
Related Work
First, in addition to the preprocessing of training data and inserting recovered empty categories, we implement sparse features to further boost the performance, and tune the feature weights directly towards maximizing the machine translation metric.
machine translation is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Abstract
This paper presents collaborative decoding (co-decoding), a new method to improve machine translation accuracy by leveraging translation consensus between multiple machine translation decoders.
Abstract
Different from system combination and MBR decoding, which post-process the n-best lists or word lattice of machine translation decoders, in our method multiple machine translation decoders collaborate by exchanging partial translation results.
Abstract
Experimental results on data sets for NIST Chinese-to—English machine translation task show that the co-decoding method can bring significant improvements to all baseline decoders, and the outputs from co-decoding can be used to further improve the result of system combination.
Collaborative Decoding
Because usually it is not feasible to enumerate the entire hypothesis space for machine translation , we approximate 17-[k (f) with n-best hypotheses by convention.
Introduction
Recent research has shown substantial improvements can be achieved by utilizing consensus statistics obtained from outputs of multiple machine translation systems.
Introduction
Typically, the resulting systems take outputs of individual machine translation systems as
Introduction
A common property of all the work mentioned above is that the combination models work on the basis of n-best translation lists (full hypotheses) of existing machine translation systems.
machine translation is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Abstract
We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs.
Abstract
Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system.
Conclusions and Future Work
similarity for terms from parallel corpora and applied it to statistical machine translation .
Conclusions and Future Work
We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation .
Experiments
We evaluate the algorithm of bilingual sense similarity via machine translation .
Experiments
For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java.
Introduction
tatistical Machine Translation
Introduction
Is it useful for multilingual applications, such as statistical machine translation (SMT)?
Introduction
The source and target sides of the rules with (*) at the end are not semantically equivalent; it seems likely that measuring the semantic similarity from their context between the source and target sides of rules might be helpful to machine translation .
machine translation is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Echizen-ya, Hiroshi and Araki, Kenji
Abstract
As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking.
Abstract
Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced using automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR—7.
Experiments
These English output sentences are sentences that 12 machine translation systems in NTCIR—7 translated from 100 Japanese sentences.
Experiments
Table 1 presents types of the 12 machine translation systems.
Experiments
12 machine translation systems in respective automatic evaluation methods, and “All” are the correlation coefficients using the scores of 1,200 output sentences obtained using the 12 machine translation systems.
Introduction
High-quality automatic evaluation has become increasingly important as various machine translation systems have developed.
Introduction
Evaluation experiments using MT outputs obtained by 12 machine translation systems in NTCIR—7(Fujii et al., 2008) demonstrate that the scores obtained using our system yield the highest correlation with the human judgments among the automatic evaluation methods in both sentence-level adequacy and fluency.
Introduction
Results confirmed that our method using noun-phrase chunking is effective for automatic evaluation for machine translation .
machine translation is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun
Abstract
Our proposed method employs statistical machine translation to improve question retrieval and enriches the question representation with the translated words from other languages via matrix factorization.
Introduction
The idea of improving question retrieval with statistical machine translation is based on the following two observa-
Introduction
However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages increases the dimensionality and makes the question representation even more sparse; (2) statistical machine translation may introduce noise, which can harm the performance of question retrieval.
Introduction
To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization.
Our Approach
This paper aims to leverage statistical machine translation to enrich the question representation.
Our Approach
Statistical machine translation (e.g., Google Translate) can utilize contextual information during the question translation, so it can solve the word ambiguity and word mismatch problems to some extent.
Our Approach
However, there are two problems with this enrichment: (1) enriching the original questions with the translated words from other languages makes the question representation even more sparse; (2) statistical machine translation may introduce noise.5 To solve these two problems, we propose to leverage statistical machine translation to improve question retrieval via matrix factorization.
machine translation is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Abstract
In this paper, we propose a sense-based translation model to integrate word senses into statistical machine translation .
Abstract
Our method is significantly different from preVious word sense disambiguation reformulated for machine translation in that the latter neglects word senses in nature.
Conclusion
We have presented a sense-based translation model that integrates word senses into machine translation .
Conclusion
0 Word senses automatically induced by the HDP-based WSI on large-scale training data are very useful for machine translation .
Experiments
This suggests that automatically induced word senses alone are indeed useful for machine translation .
Introduction
In the context of machine translation , such different meanings normally produce different target translations.
Introduction
Therefore a natural assumption is that word sense disambiguation (WSD) may contribute to statistical machine translation (SMT) by providing appropriate word senses for target translation selection with context features (Carpuat and Wu, 2005).
Introduction
'or Statistical Machine Translation
Related Work
Xiong and Zhang (2013) employ a sentence-level topic model to capture coherence for document-level machine translation .
Related Work
The difference between our work and these preVious studies on topic model for SMT lies in that we adopt topic-based WSI to obtain word senses rather than generic topics and integrate induced word senses into machine translation .
WSI-Based Broad-Coverage Sense Tagger
We want to extend this hypothesis to machine translation by building sense-based translation model upon the HDP-based word sense induction: words with the same meanings tend to be translated in the same way.
machine translation is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Abstract
Most existing work relies on machine translation engines to directly adapt labeled data from the source language to the target language.
Abstract
This approach suffers from the limited coverage of vocabulary in the machine translation results.
Experiment
Instead of using the corresponding machine translation of Chinese unlabeled sentences, we use the parallel English sentences of the Chinese unlabeled sentences.
Introduction
One direct approach to leveraging the labeled data in English is to use machine translation engines as a black box to translate the labeled data from English to the target language (e.g.
Introduction
Second, machine translation may change the sentiment polarity of the original text.
Introduction
Instead of relying on the unreliable machine translated labeled data, CLMM leverages bilingual parallel data to bridge the language gap between the source language and the target language.
Related Work
Most existing work relies on machine translation engines to directly adapt labeled data from the source language to target language.
Related Work
Their bilingual view is also constructed by using machine translation engines to translate original documents.
Related Work
Prettenhofer and Stein (2011) use machine translation engines in a different way.
machine translation is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Abstract
We show that combining them with word—based n—gram models in the log—linear model of a state—of—the—art statistical machine translation system leads to improvements in translation quality as indicated by the BLEU score.
Experiments
We use the distributed training and application infrastructure described in (Brants et al., 2007) with modifications to allow the training of predictive class-based models and their application in the decoder of the machine translation system.
Experiments
Instead we report BLEU scores (Papineni et al., 2002) of the machine translation system using different combinations of word- and class-based models for translation tasks from English to Arabic and Arabic to English.
Experiments
A fourth data set, en_’web, was used together with the other three data sets to train the large word-based model used in the second machine translation experiment.
Introduction
.g for Large Scale Class-Based in Machine Translation
Introduction
However, in the area of statistical machine translation , especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006).
Introduction
We then show that using partially class-based language models trained using the resulting classifications together with word-based language models in a state-of-the-art statistical machine translation system yields improvements despite the very large size of the word-based models used.
machine translation is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Wan, Xiaojun and Li, Huiying and Xiao, Jianguo
Abstract
EXisting methods simply use machine translation for document translation or summary translation.
Abstract
However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content.
Introduction
A straightforward way for cross-language document summarization is to translate the summary from the source language to the target language by using machine translation services.
Introduction
However, though machine translation techniques have been advanced a lot, the machine translation quality is far from satisfactory, and in many cases, the translated texts are hard to understand.
Introduction
An empirical evaluation is conducted to evaluate the performance of machine translation quality prediction, and a user study is performed to evaluate the cross-language summary quality.
Related Work 2.1 Machine Translation Quality Prediction
Machine translation evaluation aims to assess the correctness and quality of the translation.
Related Work 2.1 Machine Translation Quality Prediction
Chae and Nenkova (2009) use surface syntactic features to assess the fluency of machine translation results.
Related Work 2.1 Machine Translation Quality Prediction
In this study, we further predict the translation quality of an English sentence before the machine translation process, i.e., we do not leverage reference translation and the target sentence.
machine translation is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Narayan, Shashi and Gardent, Claire
Abstract
We present a hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones.
Introduction
It is useful as a preprocessing step for a variety of NLP systems such as parsers and machine translation systems (Chandrasekar et al., 1996), sum-marisation (Knight and Marcu, 2000), sentence fusion (Filippova and Strube, 2008) and semantic
Introduction
Machine Translation systems have been adapted to translate complex sentences into $nqfleones(ZhuetaL,2010;VVubbenetaL,2012; Coster and Kauchak, 2011).
Introduction
First, it combines a model encoding probabilities for splitting and deletion with a monolingual machine translation module which handles reordering and substitution.
Related Work
(2010) constructed a parallel corpus (PWKP) of 108,016/114,924 comple)dsimple sentences by aligning sentences from EWKP and SWKP and used the resulting bitext to train a simplification model inspired by syntax-based machine translation (Yamada and Knight, 2001).
Related Work
To account for deletions, reordering and substitution, Coster and Kauchak (2011) trained a phrase based machine translation system on the PWKP corpus while modifying the word alignment output by GIZA++ in Moses to allow for null phrasal alignments.
Related Work
(2012) use Moses and the PWKP data to train a phrase based machine translation system augmented with a post-hoc reranking procedure designed to rank the output based on their dissimilarity from the source.
Simplification Framework
We also depart from Coster and Kauchak (2011) who rely on null phrasal alignments for deletion during phrase based machine translation .
Simplification Framework
Second, the simplified sentence(s) s’ is further simplified to s using a phrase based machine translation system (PBMT+LM).
Simplification Framework
where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.
machine translation is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Abstract
Topic models, an unsupervised technique for inferring translation domains improve machine translation quality.
Experiments
We evaluate our new topic model, ptLDA, and existing topic models—LDA, pLDA, and tLDA—on their ability to induce domains for machine translation and the resulting performance of the translations on standard machine translation metrics.
Introduction
In particular, we use topic models to aid statistical machine translation (Koehn, 2009, SMT).
Introduction
Modern machine translation systems use millions of examples of translations to learn translation rules.
Introduction
As we review in Section 2, topic models are a promising solution for automatically discovering domains in machine translation corpora.
Polylingual Tree-based Topic Models
We compare these models’ machine translation performance in Section 5.
Topic Models for Machine Translation
2.1 Statistical Machine Translation
Topic Models for Machine Translation
Statistical machine translation casts machine translation as a probabilistic process (Koehn, 2009).
Topic Models for Machine Translation
(2012) ignore a wealth of information that could improve topic models and help machine translation .
machine translation is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Ganchev, Kuzman and Graça, João V. and Taskar, Ben
Abstract
Automatic word alignment is a key step in training statistical machine translation systems.
Abstract
Despite much recent work on word alignment methods, alignment accuracy increases often produce little or no improvements in machine translation quality.
Conclusions
To our knowledge this is the first extensive evaluation where improvements in alignment accuracy lead to improvements in machine translation performance.
Introduction
The typical pipeline for a machine translation (MT) system starts with a parallel sentence-aligned corpus and proceeds to align the words in every sentence pair.
Introduction
Our contribution is a large scale evaluation of this methodology for word alignments, an investigation of how the produced alignments differ and how they can be used to consistently improve machine translation performance (as measured by BLEU score) across many languages on training corpora with up to hundred thousand sentences.
Introduction
Section 5 explores how the new alignments lead to consistent and significant improvement in a state of the art phrase base machine translation by using posterior decoding rather than Viterbi decoding.
Phrase-based machine translation
In particular we fix a state of the art machine translation system1 and measure its performance when we vary the supplied word alignments.
Phrase-based machine translation
The baseline system uses GIZA model 4 alignments and the open source Moses phrase-based machine translation toolkit2, and performed close to the best at the competition last year.
Phrase-based machine translation
In addition to the Hansards corpus and the Europarl English-Spanish corpus, we used four other corpora for the machine translation experiments.
Word alignment results
A natural question is how to tune the threshold in order to improve machine translation quality.
Word alignment results
In the next section we evaluate and compare the effects of the different alignments in a phrase based machine translation system.
machine translation is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Visweswariah, Karthik and Khapra, Mitesh M. and Ramanathan, Ananthakrishnan
Abstract
Preordering of a source language sentence to match target word order has proved to be useful for improving machine translation systems.
Abstract
Previous work has shown that a reordering model can be learned from high quality manual word alignments to improve machine translation performance.
Abstract
In this paper, we focus on further improving the performance of the reordering model (and thereby machine translation ) by using a larger corpus of sentence aligned data for which manual word alignments are not available but automatic machine generated alignments are available.
Experimental setup
Additionally, we evaluate the effect of reordering on our final systems for machine translation measured using BLEU.
Experimental setup
The parallel corpus is used for building our phrased based machine translation system and to add training data for our reordering model.
Experimental setup
For our machine translation experiments, we used a standard phrase based system (Al-Onaizan and Papineni, 2006) with a lexicalized distortion model with a window size of +/-4 words5.
Introduction
Dealing with word order differences between source and target languages presents a significant challenge for machine translation systems.
Introduction
in machine translation output that is not fluent and is often very hard to understand.
Introduction
This results in a 1.8 BLEU point gain in machine translation performance on an Urdu-English machine translation task over a preordering model trained using only manual word alignments.
Results and Discussions
We see a significant gain of 1.8 BLEU points in machine translation by going beyond manual word alignments using the best reordering model reported in Table 3.
machine translation is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Koehn, Philipp and Tsoukala, Chara and Saint-Amand, Herve
Introduction
As machine translation enters the workflow of professional translators, the exact nature of this human-computer interaction is currently an open challenge.
Introduction
Instead of tasking translators to post-edit the output of machine translation systems, a more interactive approach may be more fruitful.
Introduction
The standard approach to this problem uses the search graph of the machine translation system.
Properties of Core Algorithm
We predict translations that were crafted by manual post-editing of machine translation output.
Properties of Core Algorithm
We also use the search graphs of the system that produced the original machine translation output.
Properties of Core Algorithm
In the project’s first field trialz, professional translators corrected machine translations of news stories from a competitive English—Spanish machine translation system (Koehn and Haddow, 2012).
Refinements
Analysis of the data suggests that gains mainly come from large length mismatches between user translation and machine translation , even in the case of first pass searches.
Refinements
For instance, if the user prefix differs only in casing from the machine translation (say, University instead of university), then we may still want to treat that as a word match in our algorithm.
Related Work
The interactive machine translation paradigm was first explored in the TransType and TransType2 projects (Langlais et al., 2000a; Foster et al., 2002; Bender et al., 2005; Barrachina et al., 2009).
Word Completion
When the machine translation system decides for college over university, but the user types the letter u, it should change its prediction.
machine translation is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Qian, Longhua and Hui, Haotian and Hu, Ya'nan and Zhou, Guodong and Zhu, Qiaoming
Abstract
Instead of using a parallel corpus which should have entity/relation alignment information and is thus difficult to obtain, this paper employs an off-the-shelf machine translator to translate both labeled and unlabeled instances from one language into the other language, forming pseudo parallel corpora.
Abstract
Based on a small number of labeled instances and a large number of unlabeled instances in both languages, our method differs from theirs in that we adopt a bilingual active learning paradigm via machine translation and improve the performance for both languages simultaneously.
Abstract
machine translation , which make use of multilingual corpora to decrease human annotation efforts by selecting highly informative sentences for a newly added language in multilingual parallel corpora.
machine translation is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Abstract
In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only nonparallel corpora.
Bayesian MT Decipherment via Hash Sampling
We use a similar technique as theirs but a different approximate distribution for the proposal, one that is better-suited for machine translation models and without some of the additional overhead required for computing certain terms in the original formulation.
Decipherment Model for Machine Translation
We now describe the decipherment problem formulation for machine translation .
Decipherment Model for Machine Translation
Contrary to standard machine translation training scenarios, here we have to estimate the translation model P9( f |e) parameters using only monolingual data.
Decipherment Model for Machine Translation
Translation Model: Machine translation is a much more complex task than solving other decipherment tasks such as word substitution ciphers (Ravi and Knight, 2011b; Dou and Knight, 2012).
Discussion and Future Work
for unsupervised machine translation which can help further improve the performance in addition to accelerating the sampling process.
Experiments and Results
To evaluate translation quality, we use BLEU score (Papineni et al., 2002), a standard evaluation measure used in machine translation .
Introduction
Statistical machine translation (SMT) systems these days are built using large amounts of bilingual parallel corpora.
Introduction
Recently, this topic has been receiving increasing attention from researchers and new methods have been proposed to train statistical machine translation models using only monolingual data in the source and target language.
machine translation is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav
Abstract
We present experiments in using discourse structure for improving machine translation evaluation.
Abstract
Then, we show that these measures can help improve a number of existing machine translation evaluation metrics both at the segment- and at the system-level.
Experimental Results
In this section, we explore how discourse information can be used to improve machine translation evaluation metrics.
Experimental Results
Overall, from the experimental results in this section, we can conclude that discourse structure is an important information source to be taken into account in the automatic evaluation of machine translation output.
Introduction
From its foundations, Statistical Machine Translation (SMT) had two defining characteristics: first, translation was modeled as a generative process at the sentence-level.
Introduction
This is demonstrated by the establishment of a recent workshop dedicated to Discourse in Machine Translation (Webber et al., 2013), collocated with the 2013 annual meeting of the Association of Computational Linguistics.
Introduction
The area of discourse analysis for SMT is still nascent and, to the best of our knowledge, no previous research has attempted to use rhetorical structure for SMT or machine translation evaluation.
Related Work
Addressing discourse-level phenomena in machine translation is relatively new as a research direction.
Related Work
The field of automatic evaluation metrics for MT is very active, and new metrics are continuously being proposed, especially in the context of the evaluation campaigns that run as part of the Workshops on Statistical Machine Translation (WMT 2008-2012), and NIST Metrics for Machine Translation Challenge (MetricsMATR), among others.
machine translation is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Abstract
We also show fluency improvements in a preliminary machine translation experiment.
Experiments
We report machine translation reranking results in Section 5.4.
Experiments
(2004) and Cherry and Quirk (2008) both use the l-best output of a machine translation system.
Experiments
5.3.3 Machine Translation Classification
Introduction
N -gram language models are a central component of all speech recognition and machine translation systems, and a great deal of research centers around refining models (Chen and Goodman, 1998), efficient storage (Pauls and Klein, 2011; Heafield, 2011), and integration into decoders (Koehn, 2004; Chiang, 2005).
Introduction
We also show fluency improvements in a preliminary machine translation reranking experiment.
Scoring a Sentence
For machine translation , a model that builds target-side constituency parses, such as that of Galley et a1.
machine translation is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Duan, Xiangyu and Zhang, Min and Li, Haizhou
Abstract
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus.
Conclusion
We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation .
Conclusion
Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain.
Experiments and Results
We conduct experiments on Chinese-to-English machine translation .
Introduction
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step.
Introduction
Some researchers have explored coarse-grained translational unit for machine translation .
Introduction
(2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation .
machine translation is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Wan, Xiaojun
Abstract
Machine translation services are used for eliminating the language gap between the training set and test set, and English features and Chinese features are considered as two independent views of the classification problem.
Conclusion and Future Work
2) The feature distributions of the translated text and the natural text in the same language are still different due to the inaccuracy of the machine translation service.
Introduction
First, machine translation services are used to translate English training reviews into Chinese reviews and also translate Chinese test reviews and additional unlabeled reviews into English reviews.
Related Work 2.1 Sentiment Classification
(2004) use the technique of deep language analysis for machine translation to extract sentiment units in text documents.
The Co-Training Approach
The labeled English reviews are translated into labeled Chinese reviews, and the unlabeled Chinese reviews are translated into unlabeled English reviews, by using machine translation services.
The Co-Training Approach
Fortunately, machine translation techniques have been well developed in the NLP field, though the translation performance is far from satisfactory.
The Co-Training Approach
A few commercial machine translation services can be publicly accessed, e.g.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Sun, Hong and Zhou, Ming
Discussion
As the method highly depends on machine translation , a natural question arises to what is the impact when using different pivots or SMT systems.
Discussion
The first part of iBLEU, which is the traditional BLEU score, helps to ensure the quality of the machine translation results.
Experiments and Results
We use 2003 NIST Open Machine Translation Evaluation data (NIST 2003) as development data (containing 919 sentences) for MERT and test the performance on NIST 2008 data set (containing 1357 sentences).
Introduction
Paraphrasing technology has been applied in many NLP applications, such as machine translation (MT), question answering (QA), and natural language generation (NLG).
Introduction
As paraphrasing can be viewed as a translation process between the original expression (as input) and the paraphrase results (as output), both in the same language, statistical machine translation (SMT) has been used for this task.
Introduction
the noise introduced by machine translation , Zhao et al.
Paraphrasing with a Dual SMT System
We focus on sentence level paraphrasing and leverage homogeneous machine translation systems for this task bi-directionally.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong
Abstract
A simple statistical machine translation method, word-by-word decoding, where not a parallel corpus but a bilingual lexicon is necessary, is adopted for the treebank translation.
Conclusion and Future Work
A simple statistical machine translation technique, word-by-word decoding, where only a bilingual lexicon is necessary, is used to translate the source treebank.
Introduction
Machine translation has been shown one of the most expensive language processing tasks, as a great deal of time and space is required to perform this task.
Introduction
In addition, a standard statistical machine translation method based on a parallel corpus will not work effectively if it is not able to find a parallel corpus that right covers source and target treebanks.
The Related Work
In our method, a machine translation method is applied to tackle golden-standard treebank, while all the previous works focus on the unlabeled data.
The Related Work
The proposed parser using features from monolingual and mutual constraints helped its log-linear model to achieve better performance for both monolingual parsers and machine translation system.
The Related Work
The second is that a parallel corpus is required for their work and a strict statistical machine translation procedure was performed, while our approach holds a merit of simplicity as only a bilingual lexicon is required.
Treebank Translation and Dependency Transformation
A word-by-word statistical machine translation strategy is adopted to translate words attached with the respective dependency information from the source language to the target one.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Abstract
Statistical machine translation is often faced with the problem of combining training data from many diverse sources into a single translation model which then has to translate sentences in a new domain.
Abstract
Our experimental results show that ensemble decoding outperforms various strong baselines including mixture models, the current state-of-the-art for domain adaptation in machine translation .
Introduction
Statistical machine translation (SMT) systems require large parallel corpora in order to be able to obtain a reasonable translation quality.
Introduction
els in Statistical Machine Translation
Introduction
As a result, language model adaptation has been well studied in various work (Clarkson and Robinson, 1997; Seymore and Rosenfeld, 1997; Bacchiani and Roark, 2003; Eck et al., 2004) both for speech recognition and for machine translation .
Related Work 5.1 Domain Adaptation
(2010) propose a similar method for machine translation that uses features to capture degrees of generality.
Related Work 5.1 Domain Adaptation
Unlike previous work on instance weighting in machine translation , they use phrase-level instances instead of sentences.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Abney, Steven and Bird, Steven
Building the Corpus
In addition, the overall measure of success—induction of machine translation systems from limited resources—pushes the state of the art (Kumar et al., 2007).
Building the Corpus
Variation will arise as a consequence, but we believe that it will be no worse than the variability in input that current machine translation training methods routinely deal with, and will not greatly injure the utility of the Corpus.
Conclusion
We need leaner methods for building machine translation systems; new algorithms for cross-linguistic bootstrapping via multiple paths; more effective techniques for leveraging human effort in labeling data; scalable ways to get bilingual text for unwritten languages; and large scale social engineering to make it all happen quickly.
Human Language Project
Although we strive for maximum generality, we also propose a specific driving “use case,” namely, machine translation (MT), (Hutchins and Somers, 1992; Koehn, 2010).
Human Language Project
That is, we view machine translation as an approximation to language understanding.
Human Language Project
Taking sentences in a reference language as the meaning representation, we arrive back at machine translation as the measure of success.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Conclusion
The results show that CL—SCL is competitive with state-of-the-art machine translation technology while requiring fewer resources.
Experiments
We chose a machine translation baseline to compare CL—SCL to another cross-language method.
Experiments
Statistical machine translation technology offers a straightforward solution to the problem of cross-language text classification and has been used in a number of cross-language sentiment classification studies (Hiroshi et al., 2004; Bautin et al., 2008; Wan, 2009).
Experiments
This difference can be explained by the fact that machine translation works better for European than for Asian languages such as Japanese.
Introduction
For the application of f3 under language ’2' different approaches are current practice: machine translation of unlabeled documents from ’2' to S, dictionary-based translation of unlabeled
Introduction
The approach solves the classification problem directly, instead of resorting to a more general and potentially much harder problem such as machine translation .
Related Work
Recent work in cross-language text classification focuses on the use of automatic machine translation technology.
Related Work
Most of these methods involve two steps: (1) translation of the documents into the source or the target language, and (2) dimensionality reduction or semi-supervised learning to reduce the noise introduced by the machine translation .
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Data selection has been demonstrated to be an effective approach to addressing the lack of high-quality bitext for statistical machine translation in the domain of interest.
Conclusion
our methods into domain adaptation task of statistical machine translation in model level.
Experiments
We use the NiuTrans 2 toolkit which adopts GIZA++ (Och and Ney, 2003) and MERT (Och, 2003) to train and tune the machine translation system.
Experiments
This tool scores the outputs in several criterions, while the case-insensitive BLEU-4 (Papineni et al., 2002) is used as the evaluation for the machine translation system.
Experiments
When top 600k sentence pairs are picked out from general-domain corpus to train machine translation systems, the systems perform higher than the General-domain baseline trained on 16 million parallel data.
Introduction
Statistical machine translation depends heavily on large scale parallel corpora.
Introduction
However, domain-specific machine translation has few parallel corpora for translation model training in the domain of interest.
Training Data Selection Methods
Translation model is a key component in statistical machine translation .
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Yan, Rui and Gao, Mingkun and Pavlick, Ellie and Callison-Burch, Chris
Abstract
Crowdsourcing is a viable mechanism for creating training data for machine translation .
Conclusion
In addition to its benefits of cost and scalability, crowdsourcing provides access to languages that currently fall outside the scope of statistical machine translation research.
Evaluation
art machine translation system (the syntax-based variant of Joshua) achieves a score of 26.91, which is reported in (Zaidan and Callison-Burch, 2011).
Introduction
Statistical machine translation (SMT) systems are trained using bilingual sentence-aligned parallel corpora.
Related work
These have focused on an iterative collaboration between monolingual speakers of the two languages, facilitated with a machine translation system.
Related work
In our setup the poor translations are produced by bilingual individuals who are weak in the target language, and in their experiments the translations are the output of a machine translation system.1 Another significant difference is that the HCI studies assume cooperative participants.
Related work
1A variety of HC1 and NLP studies have confirmed the efficacy of monolingual or bilingual individuals post-editing of machine translation output (Callison-Burch, 2005; Koehn, 2010; Green et al., 2013).
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Popat, Kashyap and A.R, Balamurali and Bhattacharyya, Pushpak and Haffari, Gholamreza
Abstract
Similar idea is applied to Cross Lingual Sentiment Analysis (CLSA), and it is shown that reduction in data sparsity (after translation or bilingual-mapping) produces accuracy higher than Machine Translation based CLSA and sense based CLSA.
Clustering for Cross Lingual Sentiment Analysis
Existing approaches for CLSA depend on an intermediary machine translation system to bridge the language gap (Hiroshi et al., 2004; Banea et al., 2008).
Clustering for Cross Lingual Sentiment Analysis
Machine translation is very resource intensive.
Clustering for Cross Lingual Sentiment Analysis
Given that sentiment analysis is a less resource intensive task compared to machine translation , the use of an MT system is hard to justify for performing
Discussions
This could degrade the accuracy of the machine translation itself, limiting the performance of an MT based CLSA system.
Introduction
When used as an additional feature with word based language models, it has been shown to improve the system performance viz, machine translation (Uszkoreit and Brants, 2008; Stymne, 2012), speech recognition (Martin et al., 1995; Samuelsson and Reichl, 1999), dependency parsing (Koo et al., 2008; Haffari et al., 2011; Zhang and Nivre, 2011; Tratz and Hovy, 2011) and NER (Miller et al., 2004; Faruqui and Pado, 2010; Turian et al., 2010; Tackstro'm et al., 2012).
Introduction
Popular approaches for Cross-Lingual Sentiment Analysis (CLSA) (Wan, 2009; Duh et al., 2011) depend on Machine Translation (MT) for converting the labeled data from one language to the other (Hiroshi et al., 2004; Banea et al., 2008; Wan, 2009).
Related Work
Most often these methods depend on an intermediary machine translation system (Wan, 2009; Brooke et al., 2009) or a bilingual dictionary (Ghorbel and Jacot, 2011; Lu et al., 2011) to bridge the language gap.
machine translation is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Abstract
Automatic error detection is desired in the postprocessing to improve machine translation quality.
Abstract
We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features.
Conclusions and Future Work
Therefore our approach can be used for other machine translation systems, such as rule-based or example-based system, which generally do not produce N -best lists.
Introduction
Translation hypotheses generated by a statistical machine translation (SMT) system always contain both correct parts (e.g.
Introduction
Automatically distinguishing incorrect parts from correct parts is therefore very desirable not only for post-editing and interactive machine translation (Ueffing and Ney, 2007) but also for SMT itself: either by rescoring hypotheses in the N-best list using the probability of correctness calculated for each hypothesis (Zens and Ney, 2006) or by generating new hypotheses using N -best lists from one SMT system or multiple sys-
Related Work
In this section, we present an overview of confidence estimation (CE) for machine translation at the word level.
SMT System
To obtain machine-generated translation hypotheses for our error detection, we use a state-of-the-art phrase-based machine translation system MOSES (Koehn et al., 2003; Koehn et al., 2007).
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Abstract
We further apply the subtree alignment in machine translation with two methods.
Introduction
Syntax based Statistical Machine Translation (SMT) systems allow the translation process to be more grammatically performed, which provides decent reordering capability.
Introduction
In multilingual tasks such as machine translation , tree kernels are seldom applied.
Introduction
Further experiments in machine translation also suggest that the obtained subtree alignment can improve the performance of both phrase and syntax based SMT systems.
Substructure Spaces for BTKs
Due to the above issues, we annotate a new data set to apply the subtree alignment in machine translation .
Substructure Spaces for BTKs
7 Experiments on Machine Translation
Substructure Spaces for BTKs
However, utilizing syntactic translational equivalences alone for machine translation loses the capability of modeling non-syntactic phrases (Koehn et al., 2003).
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Abstract
Most statistical machine translation (SMT) systems are modeled using a log-linear framework.
Introduction
However, the neural network based machine translation is far from easy.
Introduction
Actually, existing works empirically show that some nonlocal features, especially language model, contribute greatly to machine translation .
Introduction
According to the above analysis, we propose a variant of a neural network model for machine translation , and we call it Additive Neural Networks or AdNN for short.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Navigli, Roberto and Ponzetto, Simone Paolo
Abstract
In addition Machine Translation is also applied to enrich the resource with lexical information for all languages.
BabelNet
using (a) the human-generated translations provided in Wikipedia (the so-called inter-language links), as well as (b) a machine translation system to translate occurrences of the concepts within sense-tagged corpora, namely SemCor (Miller et al., 1993) — a corpus annotated with WordNet senses — and Wikipedia itself (Section 3.3).
Conclusions
Further, we contribute a large set of sense occurrences harvested from Wikipedia and SemCor, a corpus that we input to a state-of-the-art machine translation system to fill in the gap between resource-rich languages — such as English — and resource-poorer ones.
Experiment 2: Translation Evaluation
both from Wikipedia and the machine translation system.
Experiment 2: Translation Evaluation
In contrast, good translations were produced using our machine translation method when enough sentences were available.
Introduction
poor languages with the aid of Machine Translation .
Methodology
An initial prototype used a statistical machine translation system based on Moses (Koehn et al., 2007) and trained on Europarl (Koehn, 2005).
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Abstract
The minimum Bayes risk (MBR) decoding objective improves BLEU scores for machine translation output relative to the standard Viterbi objective of maximizing model score.
Abstract
We evaluate our procedure on translation forests from two large-scale, state-of-the-art hierarchical machine translation systems.
Computing Feature Expectations
Exploiting forests has proven a fruitful avenue of research in both parsing (Huang, 2008) and machine translation (Mi et al., 2008).
Consensus Decoding Algorithms
Modern statistical machine translation systems take as input some f and score each derivation 6 according to a linear model of features: A, -6i(f, e).
Consensus Decoding Algorithms
Most similarity measures of interest for machine translation are not linear, and so Algorithm 2 does not apply.
Experimental Results
We evaluate these consensus decoding techniques on two different full-scale state-of-the-art hierarchical machine translation systems.
Introduction
In statistical machine translation , output translations are evaluated by their similarity to human reference translations, where similarity is most often measured by BLEU (Papineni et al., 2002).
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Talbot, David and Brants, Thorsten
Abstract
We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework.
Experimental Setup
4.3 Machine Translation
Experiments
5.3 Machine Translation
Introduction
Language models (LMs) are a core component in statistical machine translation , speech recognition, optical character recognition and many other areas.
Introduction
Efficiency is paramount in applications such as machine translation which make huge numbers of LM requests per sentence.
Introduction
This paper focuses on machine translation .
Scaling Language Models
In statistical machine translation (SMT), LMs are used to score candidate translations in the target language.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Galley, Michel and Manning, Christopher D.
Dependency parsing experiments
For the MT setting, texts are all lower case, and tokenization was changed to improve machine translation (e. g., most hyphenated words were split).
Dependency parsing for machine translation
Dependency models have recently gained considerable interest in many NLP applications, including machine translation (Ding and Palmer, 2005; Quirk et al., 2005; Shen et al., 2008).
Introduction
Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).
Introduction
may sometimes appear too computa-tionally expensive for high-end statistical machine translation, there are many alternative parsing algorithms that have seldom been explored in the machine translation literature.
Introduction
In this paper, we show how to exploit syntactic dependency structure for better machine translation , under the constraint that the depen-
Related work
Perhaps due to the high computational cost of synchronous CFG decoding, there have been various attempts to exploit syntactic knowledge and hierarchical structure in other machine translation experiments that do not require chart parsing.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Abstract
Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation .
Abstract
Our best results improve a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross-entropy trained model by up to 0.6 BLEU in a single reference setup.
Introduction
Neural network-based language and translation models have achieved impressive accuracy improvements on statistical machine translation tasks (Allauzen et al., 2011; Le et al., 2012b; Schwenk et al., 2012; Vaswani et al., 2013; Gao et al., 2014).
Introduction
In this paper we focus on recurrent neural network architectures which have recently advanced the state of the art in language modeling (Mikolov et al., 2010; Mikolov et al., 2011; Sundermeyer et al., 2013) with several subsequent applications in machine translation (Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Hu et al., 2014).
Introduction
In practice, neural network models for machine translation are usually trained by maximizing the likelihood of the training data, either via a cross-entropy objective (Mikolov et al., 2010; Schwenk
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick
Abstract
This paper presents a method that shares similarities with both spell checking and machine translation approaches.
Conclusion and perspectives
With the intention to avoid wrong modifications of special tokens and to handle word boundaries as easily as possible, we designed a method that shares similarities with both spell checking and machine translation .
Introduction
Evaluated in French, our method shares similarities with both spell checking and machine translation .
Related work
(2008b), SMS normalization, up to now, has been handled through three well-known NLP metaphors: spell checking, machine translation and automatic speech recognition.
Related work
The machine translation metaphor, which is historically the first proposed (Bangalore et al., 2002; Aw et al., 2006), considers the process of normalizing SMS as a translation task from a source language (the SMS) to a target language (its standard written form).
Related work
(2006) proposed a statistical machine translation model working at the phrase-level, by splitting sentences into their k most probable phrases.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Feng, Yang and Cohn, Trevor
Abstract
Most modern machine translation systems use phrase pairs as translation units, allowing for accurate modelling of phrase-internal translation and reordering.
Experiments
We used the Moses machine translation decoder (Koehn et al., 2007), using the default features and decoding settings.
Experiments
Table 3: Machine translation performance in BLE U % on the IWSLT 2005 Chinese-English test set.
Introduction
Recent years have witnessed burgeoning development of statistical machine translation research, notably phrase-based (Koehn et al., 2003) and syntax-based approaches (Chiang, 2005; Galley et al., 2006; Liu et al., 2006).
Model
We consider a process in which the target string is generated using a left-to-right order, similar to the decoding strategy used by phrase-based machine translation systems (Koehn et al., 2003).
Related Work
Word based models have a long history in machine translation , starting with the venerable IBM translation models (Brown et al., 1993) and the hidden Markov model (Vogel et al., 1996).
Related Work
More recently, a number of authors have proposed Markov models for machine translation .
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
Abstract
However, in most current statistical machine translation (SMT) systems, the outputs of compound-complex sentences still lack proper transitional expressions.
Conclusion
of machine translation .
Experiments
5 http://www.speech.sri.com/projects/srilm/ 6 The China Workshop on Machine Translation
Introduction
During the last decade, great progress has been made on statistical machine translation (SMT) models.
Related Work
In (Xiong et al., 2013a), three different features were designed to capture the lexical cohesion for document-level machine translation .
Related Work
(Xiong et al., 2013b) incorporated lexical-chain-based models (Morris and Hirst, 1991) into machine translation .
Related Work
(Meyer and Popescu-Belis, 2012) used sense-labeled discourse connectives for machine translation from English to French.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Conclusions
We have presented a novel way to incorporate source syntactic structure in English-to-Turkish phrase-based machine translation by parsing the source sentences and then encoding many local and nonlocal source syntactic structures as additional complex tag factors.
Introduction
Statistical machine translation into a morphologically complex language such as Turkish, Finnish or Arabic, involves the generation of target words with the proper morphology, in addition to properly ordering the target words.
Introduction
We assume that the reader is familiar with the basics of phrase-based statistical machine translation (Koehn et al., 2003) and factored statistical machine translation (Koehn and Hoang, 2007).
Related Work
Statistical Machine Translation into a morphologically rich language is a challenging problem in that, on the target side, the decoder needs to generate both the right sequence of constituents and the right sequence of morphemes for each word.
Related Work
Using morphology in statistical machine translation has been addressed by many researchers for translation from or into morphologically rich(er) languages.
Related Work
Goldwater and McClosky (2005) use morphological analysis on the Czech side to get improvements in Czech-to-English statistical machine translation .
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen
Abstract
This paper describes a novel technique for incorporating syntactic knowledge into phrase-based machine translation through incremental syntactic parsing.
Introduction
Early work in statistical machine translation Viewed translation as a noisy channel process comprised of a translation model, which functioned to posit adequate translations of source language words, and a target language model, which guided the fluency of generated target language strings (Brown et al.,
Introduction
Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
Related Work
Recent work has shown that parsing-based machine translation using syntax-augmented (Zoll-mann and Venugopal, 2006) hierarchical translation grammars with rich nonterminal sets can demonstrate substantial gains over hierarchical grammars for certain language pairs (Baker et al., 2009).
Related Work
speech recognition and statistical machine translation focus on the use of n-grams, which provide a simple finite-state model approximation of the target language.
Related Work
priate algorithmic fit for incorporating syntax into phrase-based statistical machine translation , since both process sentences in an incremental left-to-right fashion.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Andreas, Jacob and Vlachos, Andreas and Clark, Stephen
Abstract
Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser.
Abstract
These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation .
Conclusions
We have presented a semantic parser which uses techniques from machine translation to learn mappings from natural language to variable-free meaning representations.
Discussion
For this reason, we argue for the use of a machine translation baseline as a point of comparison for new methods.
Introduction
At least superficially, SP is simply a machine translation (MT) task: we transform an NL utterance in one language into a statement of another (unnatural) meaning representation language (MRL).
Related Work
tsVB also uses a piece of standard MT machinery, specifically tree transducers, which have been profitably employed for syntax-based machine translation (Maletti, 2010).
Related Work
The present work is also the first we are aware of which uses phrase-based rather than tree-based machine translation techniques to learn a semantic parser.
machine translation is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Corpora
While the corpus is aimed at machine translation tasks, we use the keywords associated with each talk to build a subsidiary corpus for multilingual document classification as follows.3
Experiments
A similar idea exists in machine translation where English is frequently used to pivot between other languages (Cohn and Lapata, 2007).
Experiments
MT System We develop a machine translation baseline as follows.
Experiments
We train a machine translation tool on the parallel training data, using the development data of each language pair to optimize the translation system.
Related Work
Is was demonstrated that this approach can be applied to improve tasks related to machine translation .
Related Work
(2013), also learned bilingual embeddings for machine translation .
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Jiajun and Zong, Chengqing
Abstract
Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains.
Introduction
During the last decade, statistical machine translation has made great progress.
Introduction
Recently, more and more researchers concentrated on taking full advantage of the monolingual corpora in both source and target languages, and proposed methods for bilingual lexicon induction from nonparallel data (Rapp, 1995, 1999; Koehn and Knight, 2002; Haghighi et al., 2008; Daume III and J agarlamudi, 2011) and proposed unsupervised statistical machine translation (bilingual lexicon is a byproduct) with only monolingual corpora (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012).
Introduction
The unsupervised statistical machine translation method (Ravi and Knight, 2011; Nuhn et al., 2012; Dou and Knight, 2012) viewed the translation task as a decipherment problem and designed a generative model with the objective function to maximize the likelihood of the source language monolingual data.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Abstract
A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years.
Discussions and Future Work
In addition to paraphrasing, our data collection framework could also be used to produces useful data for machine translation and computer vision.
Introduction
Machine paraphrasing has many applications for natural language processing tasks, including machine translation (MT), MT evaluation, summary evaluation, question answering, and natural language generation.
Introduction
Despite the similarities between paraphrasing and translation, several maj or differences have prevented researchers from simply following standards that have been established for machine translation .
Introduction
Professional translators produce large volumes of bilingual data according to a more or less consistent specification, indirectly fueling work on machine translation algorithms.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal
Abstract
We present a method to transliterate names in the framework of end-to-end statistical machine translation .
Discussion
We have shown that a state-of-the-art statistical machine translation system can benefit from a dedicated transliteration module to improve the transla-
Discussion
Improved named entity translation accuracy as measured by the NEWA metric in general, and a reduction in dropped names in particular is clearly valuable to the human reader of machine translated documents as well as for systems using machine translation for further information processing.
End-to-End results
Finally, here are end-to-end machine translation results for three sentences, with and without the transliteration module, along with a human reference translation.
Introduction
State-of-the-art statistical machine translation (SMT) is bad at translating names that are not very common, particularly across languages with different character sets and sound systems.
Introduction
This evaluation involves a mixture of entity identification and translation concems—for example, the scoring system asks for coreference determination, which may or may not be of interest for improving machine translation output.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Chang and Ng, Hwee Tou
Abstract
In this work, we introduce the TESLA-CELAB metric (Translation Evaluation of Sentences with Linear-programming-based Analysis — Character-level Evaluation for Languages with Ambiguous word Boundaries) for automatic machine translation evaluation.
Introduction
Since the introduction of BLEU (Papineni et al., 2002), automatic machine translation (MT) evaluation has received a lot of research interest.
Introduction
The Workshop on Statistical Machine Translation (WMT) hosts regular campaigns comparing different machine translation evaluation metrics (Callison-Burch et al., 2009; Callison-Burch et al., 2010; Callison-Burch et al., 2011).
Introduction
The research on automatic machine translation evaluation is important for a number of reasons.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Abstract
Conventional statistical machine translation (SMT) systems do not perform well on measure word generation due to data sparseness and the potential long distance dependency between measure words and their corresponding head words.
Abstract
Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system.
Experiments
We also compared our method with a well-known rule-based machine translation system —SYSTRAN3.
Introduction
According to our survey on the measure word distribution in the Chinese Penn Treebank and the test datasets distributed by Linguistic Data Consortium (LDC) for Chinese-to-English machine translation evaluation, the average occurrence is 0.505 and 0.319 measure
Introduction
Therefore, in the English-to-Chinese machine translation task we need to take additional efforts to generate the missing measure words in Chinese.
Introduction
In most statistical machine translation (SMT) models (Och et al., 2004; Koehn et al., 2003; Chiang, 2005), some of measure words can be generated without modification or additional processing.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Abstract
This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT).
Conclusion
Statistical Machine Translation .
Conclusion
The Mathematics of Statistical Machine Translation : Pa-
Conclusion
Statistical Significance Tests for Machine Translation Evaluation.
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola
Abstract
An efficient decoding algorithm is a crucial element of any statistical machine translation system.
Introduction
Phrase-based systems (Koehn et al., 2003) are probably the most widespread class of Statistical Machine Translation systems, and arguably one of the most successful.
Phrase-based Decoding as TSP
h - mt - z' - 3 this machine translation is strange h - c - t - z' - a this curious translation is automatic ht - s - z' - a this translation strange is automatic
Phrase-based Decoding as TSP
For example, in the example of Figure 3, the cost of this - machine translation - is - strange, can only take into account the conditional probability of the word strange relative to the word is, but not relative to the words translation and is.
Phrase-based Decoding as TSP
machine translation .
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
Abstract
In this paper we study the use of sentence-level dialect identification in optimizing machine translation system selection when translating mixed dialect input.
Abstract
We test our approach on Arabic, a prototypical diglossic language; and we optimize the combination of four different machine translation systems.
Introduction
For statistical machine translation (MT), which relies on the existence of parallel data, translating from nonstandard dialects is a challenge.
Machine Translation Experiments
We use the open-source Moses toolkit (Koehn et al., 2007) to build four Arabic-English phrase-based statistical machine translation systems (SMT).
Related Work
Arabic Dialect Machine Translation .
Related Work
System Selection and Combination in Machine Translation .
machine translation is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hopkins, Mark and May, Jonathan
Abstract
We then use this framework to compare several analytical models on data from the Workshop on Machine Translation (WMT).
Experiments
14I.e., machine translation specialists.
From Rankings to Relative Ability
6One could argue that it specifies a space of machine translation specialists, but likely these individuals are thought to be a representative sample of a broader community.
The WMT Translation Competition
Every year, the Workshop on Machine Translation (WMT) conducts a competition between machine translation systems.
The WMT Translation Competition
For each track, the organizers also assemble a panel of judges, typically machine translation specialists.1 The role of a judge is to repeatedly rank five different translations of the same source text.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Introduction
The systematic word order difference between two languages poses a challenge for current statistical machine translation (SMT) systems.
Introduction
:15 for Statistical Machine Translation
Introduction
The remainder of this paper is organized as follows: Section 2 introduces the basement of this research: the principle of statistical machine translation .
Translation System Overview
In statistical machine translation , we are given a source language sentence fi] = fl... fj .
Translation System Overview
In this paper, the phrase-based machine translation system
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Haffari, Gholamreza
Abstract
Modern phrase-based machine translation systems make extensive use of word-based translation models for inducing alignments from parallel corpora.
Experiments
The time complexity of our inference algorithm is 0(n6), which can be prohibitive for large scale machine translation tasks.
Introduction
The phrase-based approach (Koehn et al., 2003) to machine translation (MT) has transformed MT from a narrow research topic into a truly useful technology to end users.
Related Work
In the context of machine translation , ITG has been explored for statistical word alignment in both unsupervised (Zhang and Gildea, 2005; Cherry and Lin, 2007; Zhang et al., 2008; Pauls et al., 2010) and supervised (Haghighi et al., 2009; Cherry and Lin, 2006) settings, and for decoding (Petrov et al., 2008).
Related Work
As mentioned above, ours is not the first work attempting to generalise adaptor grammars for machine translation ; (Neubig et al., 2011) also developed a similar approach based around ITG using a Pitman-Yor Process prior.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Abstract
Large-scale discriminative machine translation promises to further the state-of-the-art, but has failed to deliver convincing gains over current heuristic frequency count systems.
Challenges for Discriminative SMT
These results could — and should — be applied to other models, discriminative and generative, phrase- and syntax-based, to further progress the state-of-the-art in machine translation .
Discussion and Further Work
Finally, while in this paper we have focussed on the science of discriminative machine translation , we believe that with suitable engineering this model will advance the state-of-the-art.
Evaluation
The development and test data was taken from the 2006 NAACL and 2007 ACL workshops on machine translation , also filtered for sentence length.4 Tuning of the regularisation parameter and MERT training of the benchmark models was performed on dev2006, while the test set was the concatenation of devtest2006, test2006 and test2007, amounting to 315 development and 1164 test sentences.
Introduction
Statistical machine translation (SMT) has seen a resurgence in popularity in recent years, with progress being driven by a move to phrase-based and syntax-inspired approaches.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Smith, Jason R. and Saint-Amand, Herve and Plamada, Magdalena and Koehn, Philipp and Callison-Burch, Chris and Lopez, Adam
Abstract
Parallel text is the fuel that drives modern machine translation systems.
Abstract
Furthermore, 22% of the true positives are potentially machine translations (judging by the quality), whereas in 13% of the cases one of the sentences contains additional content not ex-
Abstract
4 Machine Translation Experiments
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zollmann, Andreas and Vogel, Stephan
Conclusion and discussion
In this work we proposed methods of labeling phrase pairs to create automatically learned PSCFG rules for machine translation .
Introduction
The Probabilistic Synchronous Context Free Grammar (PSCFG) formalism suggests an intuitive approach to model the long-distance and lexically sensitive reordering phenomena that often occur across language pairs considered for statistical machine translation .
Introduction
SCFG Rules for Machine Translation
Introduction
Towards the ultimate goal of building end-to-end machine translation systems without any human annotations, we also experiment with automatically inferred word classes using distributional clustering (Kneser and Ney, 1993).
Related work
(2006) present a reordering model for machine translation , and make use of clustered phrase pairs to cope with data sparseness in the model.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Abstract
The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Experimental results
We have applied our composite 5-gram/2-SLM+2-gram/4-SLM+5-gramfl9LSA language model that is trained by 1.3 billion word corpus for the task of re-ranking the N -best list in statistical machine translation .
Experimental results
Chiang (2007) studied the performance of machine translation on Hiero, the BLEU score is 33.31% when n-gram is used to re-rank the N -best list, however, the BLEU score becomes significantly higher 37.09% when the n-gram is embedded directly into Hiero’s one pass decoder, this is because there is not much diversity in the N -best list.
Introduction
The Markov chain (n-gram) source models, which predict each word on the basis of previous n-l words, have been the workhorses of state-of—the-art speech recognizers and machine translators that help to resolve acoustic or foreign language ambiguities by placing higher probability on more likely original underlying word strings.
Introduction
As the machine translation (MT) working groups stated on page 3 of their final report (Lavie et al., 2006), “These approaches have resulted in small improvements in MT quality, but have not fundamentally solved the problem.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, Yanqing and Skiena, Steven
Knowledge Graph Construction
0 Machine Translation - We script the Google translation API to get even more semantic links.
Knowledge Graph Construction
In total, machine translation provides 53.2% of the total links and establishes connections between 3.5 million vertices.
Related Work
The ready availability of machine translation to and from English has prompted efforts to employ translation for sentiment analysis (Bautin et al., 2008).
Related Work
(2008) demonstrate that machine translation can perform quite well when extending the subjectivity analysis to multilingual environment, which makes it inspiring to replicate their work on lexicon-based sentiment analysis.
Related Work
(2013) combine machine translation and word representation to generate bilingual language resources.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas
Conclusion and Future Work
We demonstrated that our EMBOT-based machine translation system beats a standard tree-to-tree system (Moses tree-to-tree) on the WMT 2009 translation task English —> German.
Conclusion and Future Work
To achieve this we implemented the formal model as described in Section 2 inside the Moses machine translation toolkit.
Introduction
Besides phrase-based machine translation systems (Koehn et al., 2003), syntax-based systems have become widely used because of their ability to handle nonlocal reordering.
Introduction
In this contribution, we report on our novel statistical machine translation system that uses an [MBOT-based translation model.
Theoretical Model
In this section, we present the theoretical generative model used in our approach to syntax-based machine translation .
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Abstract
This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs.
Conclusion
Machine translation systems using phrase tables learned directly by the proposed model were able to achieve accuracy competitive with the traditional pipeline of word alignment and heuristic phrase extraction, the first such result for an unsupervised model.
Experimental Evaluation
The data for French, German, and Spanish are from the 2010 Workshop on Statistical Machine Translation (Callison-Burch et al., 2010).
Introduction
The training of translation models for phrase-based statistical machine translation (SMT) systems (Koehn et al., 2003) takes unaligned bilingual training data as input, and outputs a scored table of phrase pairs.
Introduction
Using this model, we perform machine translation experiments over four language pairs.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lu, Bin and Tan, Chenhao and Cardie, Claire and K. Tsou, Benjamin
A Joint Model with Unlabeled Parallel Text
sentences in one language with their corresponding automatic translations), we use an automatic machine translation system (e.g.
Abstract
Experiments on multiple data sets show that the proposed approach (1) outperforms the monolingual baselines, significantly improving the accuracy for both languages by 3.44%-8.l2%; (2) outperforms two standard approaches for leveraging unlabeled data; and (3) produces (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Conclusion
Moreover, the proposed approach continues to produce (albeit smaller) performance gains when employing pseudo-parallel data from machine translation engines.
Related Work
(2008; 2010) instead automatically translate the English resources using automatic machine translation engines for subjectivity classification.
Results and Analysis
As discussed in Section 3.4, we generate pseudo-parallel data by translating the monolingual sentences in each setting using Google’s machine translation system.
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Macherey, Klaus
Abstract
Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final.
Conclusion
We also look forward to discovering the best way to take advantage of these new alignments in downstream applications like machine translation , supervised word alignment, bilingual parsing (Burkett et al., 2010), part-of-speech tag induction (Naseem et al., 2009), or cross-lingual model projection (Smith and Eisner, 2009; Das and Petrov, 2011).
Experimental Results
Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006).
Experimental Results
Finally, we evaluated our bidirectional model in a large-scale end-to-end phrase-based machine translation system from Chinese to English, based on the alignment template approach (Och and Ney, 2004).
Introduction
Machine translation systems typically combine the predictions of two directional models, one which aligns f to e and the other e to f (Och et al., 1999).
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Data preparation
This is done using the scripts provided by the Statistical Machine Translation system Moses (Koehn et al., 2007).
Evaluation
In addition to these, the system’s output can be compared against the L2 reference translation(s) using established Machine Translation evaluation metrics.
Introduction
Whereas machine translation generally concerns the translation of whole sentences or texts from one language to the other, this study focusses on the translation of native language (henceforth L1) words and phrases, i.e.
Introduction
the role of the translation model in Statistical Machine Translation (SMT).
System
It has also been used in machine translation studies in which local source context is used to classify source phrases into target phrases, rather than looking them up in a phrase table (Stroppa et al., 2007; Haque et al., 2011).
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Haffari, Gholamreza and Sarkar, Anoop
Abstract
Statistical machine translation (SMT) models require bilingual corpora for training, and these corpora are often multilingual with parallel text in multiple languages simultaneously.
Introduction
The main source of training data for statistical machine translation (SMT) models is a parallel corpus.
Introduction
In our case, the multiple tasks are individual machine translation tasks for several language pairs.
Introduction
11 Statistical Machine Translation*
Sentence Selection: Multiple Language Pairs
For the single language pair setting, (Haffari et al., 2009) presents and compares several sentence selection methods for statistical phrase-based machine translation .
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Abstract
Due to the richness of Chinese abbreviations, many of them may not appear in available parallel corpora, in which case current machine translation systems simply treat them as unknown words and leave them untranslated.
Conclusions
We integrate our method into a state-of-the-art phrase-based baseline translation system, i.e., Moses (Koehn et al., 2007), and show that the integrated system consistently improves the performance of the baseline system on various NIST machine translation test sets.
Related Work
Though automatically extracting the relations between full-form Chinese phrases and their abbreviations is an interesting and important task for many natural language processing applications (e.g., machine translation , question answering, information retrieval, and so on), not much work is available in the literature.
Related Work
None of the above work has addressed the Chinese abbreviation issue in the context of a machine translation task, which is the primary goal in this paper.
Related Work
To the best of our knowledge, our work is the first to systematically model Chinese abbreviation expansion to improve machine translation .
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Gao, Wei and Blitzer, John and Zhou, Ming and Wong, Kam-Fai
Experiments and Results
In order to compute cross-lingual document similarities based on machine translation
Features and Similarities
The cross-lingual similarities are valuated using different translation mechanisms, e.g., dictionary-based translation or machine translation , or even without any translation at all.
Features and Similarities
Similarity Based on Machine Translation (MT): For machine translation , the cross-lingual measure actually becomes a monolingual similarity between one document and another’s translation.
Introduction
As we will see, machine translation can provide important predictive information in our setting, but we do not wish to display machine-translated output to the user.
Introduction
We approach our problem by learning a ranking function for bilingual queries — queries that are easily translated (e.g., with machine translation ) and appear in the query logs of two languages (e.g., English and Chinese).
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Abstract
Unsupervised word segmentation (UWS) can provide domain-adaptive segmentation for statistical machine translation (SMT) without annotated data, and bilingual UWS can even optimize segmentation for alignment.
Complexity Analysis
The first bilingual corpus: OpenMT06 was used in the NIST open machine translation 2006 Evaluation 2.
Complexity Analysis
PatentMT9 is from the shared task of NTCIR-9 patent machine translation .
Complexity Analysis
For the bilingual tasks, the publicly available system of Moses (Koehn et al., 2007) with default settings is employed to perform machine translation , and BLEU (Papineni et al., 2002) was used to evaluate the quality.
Introduction
For example, in machine translation , there are various parallel corpora such as
machine translation is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Abstract
We improve the quality of statistical machine translation (SMT) by applying models that predict word forms from their stems using extensive morphological and syntactic information from both the source and target languages.
Introduction
One of the outstanding problems for further improving machine translation (MT) systems is the difficulty of dividing the MT problem into sub-problems and tackling each subproblem in isolation to improve the overall quality of MT.
Introduction
This paper describes a successful attempt to integrate a subcomponent for generating word inflections into a statistical machine translation (SMT)
Machine translation systems and data
We integrated the inflection prediction model with two types of machine translation systems: systems that make use of syntax and surface phrase-based systems.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Maletti, Andreas
Introduction
A (formal) translation model is at the core of every machine translation system.
Introduction
Contrary, in the field of syntax-based machine translation , the translation models have full access to the syntax of the sentences and can base their decision on it.
Introduction
In this contribution, we restrict MBOT to a form that is particularly relevant in machine translation .
The model
(2008) argue that STSG have sufficient expressive power for syntax-based machine translation , but Zhang et al.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cohn, Trevor and Specia, Lucia
Abstract
Our experiments on two machine translation quality estimation datasets show uniform significant accuracy gains from multitask learning, and consistently outperform strong baselines.
Conclusion
Our experiments showed how our approach outperformed competitive baselines on two machine translation quality regression problems, including the highly challenging problem of predicting post-editing time.
Conclusion
Models of individual annotators could be used to train machine translation systems to optimise an annotator-specific quality measure, or in active learning for corpus annotation, where the model can suggest the most appropriate instances for each annotator or the best annotator for a given instance.
Introduction
This is the case, for example, of annotations on the quality of sentences generated using machine translation (MT) systems, which are often used to build quality estimation models (Blatz et al., 2004; Specia et al., 2009) — our application of interest.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei
Introduction
Data-driven approaches have been quite active in recent machine translation (MT) research.
Related Work
In the machine translation area, most research on confidence measure focus on the confidence of MT output: how accurate a translated sentence is.
Related Work
(Ueff-ing et al., 2003) presented several word-level confidence measures for machine translation based on word posterior probabilities.
Translation
We evaluate the improved alignment on several Chinese-English and Arabic-English machine translation tasks.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Green, Spence and Wang, Sida and Cer, Daniel and Manning, Christopher D.
Abstract
We present a fast and scalable online method for tuning statistical machine translation models with large feature sets.
Abstract
Equally important is our analysis, which suggests techniques for mitigating overfitting and domain mismatch, and applies to other recent discriminative methods for machine translation .
Adaptive Online Algorithms
Machine translation is an unusual machine learning setting because multiple correct translations exist and decoding is comparatively expensive.
Introduction
Adaptation of discriminative learning methods for these types of features to statistical machine translation (MT) systems, which have historically used idiosyncratic learning techniques for a few dense features, has been an active research area for the past half-decade.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Eliciting Addressee’s Emotion
Following (Ritter et al., 2011), we apply the statistical machine translation model for generating a response to a given utterance.
Eliciting Addressee’s Emotion
Similar to ordinary machine translation systems, the model is learned from pairs of an utterance and a response by using off-the-shelf tools for machine translation .
Eliciting Addressee’s Emotion
Unlike machine translation , we do not use reordering models, because the positions of phrases are not considered to correlate strongly with the appropriateness of responses (Ritter et al., 2011).
Related Work
The linear interpolation of translation and/or language models is a widely-used technique for adapting machine translation systems to new domains (Sennrich, 2012).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Conclusions and Future Work
In machine translation , improved language models have resulted in significant improvements in translation performance (Brants et al., 2007).
Introduction
In some problem domains, such as machine translation , the translation is between two distinct languages and the language model can only be trained on data in the output language.
Related Work
Adaptation techniques have been shown to improve language modeling performance based on perplexity (Rosenfeld, 1996) and in application areas such as speech transcription (Bacchiani and Roark, 2003) and machine translation (Zhao et al., 2004), though no previous research has examined the lan-
Related Work
Many recent statistical simplification techniques build upon models from machine translation and utilize a simple language model during simplifica-tiorfldecoding both in English (Zhu et al., 2010; Woodsend and Lapata, 2011; Coster and Kauchak, 2011a; Wubben et al., 2012) and in other languages (Specia, 2010).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ligozat, Anne-Laure
Experiments
well handled by all machine translation systems 2.
Introduction
We thus investigated the possibility of using machine translation to create a parallel corpus, as has been done for spoken
Introduction
The idea is that using machine translation would enable us to have a large training corpus, either by using the English one and translating the test corpus, or by translating the training corpus.
Introduction
One of the questions posed was whether the quality of present machine translation systems would enable to learn the classification properly.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ling, Wang and Xiang, Guang and Dyer, Chris and Black, Alan and Trancoso, Isabel
Conclusion
We show that a considerable amount of parallel sentence pairs can be crawled from microblogs and these can be used to improve Machine Translation by updating our translation tables with translations of newer terms.
Experiments
5.2 Machine Translation Experiments
Experiments
We report on machine translation experiments using our harvested data in two domains: edited news and microblogs.
Introduction
Machine translation suffers acutely from the domain-mismatch problem caused by microblog text.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Abstract
Minimum Error Rate Training (MERT) and Minimum Bayes-Risk (MBR) decoding are used in most current state-of—the-art Statistical Machine Translation (SMT) systems.
Experiments
We also train two SCFG—based MT systems: a hierarchical phrase-based SMT (Chiang, 2007) system and a syntax augmented machine translation (SAMT) system using the approach described in Zollmann and Venugopal (2006).
Introduction
Statistical Machine Translation (SMT) systems have improved considerably by directly using the error criterion in both training and decoding.
Minimum Error Rate Training
In the context of statistical machine translation , the optimization procedure was first described in Och (2003) for N -best lists and later extended to phrase-lattices in Macherey et al.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nastase, Vivi and Strapparava, Carlo
Cross Language Text Categorization
Most CLTC methods rely heavily on machine translation (MT).
Cross Language Text Categorization
(2011) also use machine translation , but enhance the processing through domain adaptation by feature weighing, assuming that the training data in one language and the test data in the other come from different domains, or can exhibit different linguistic phenomena due to linguistic and cultural differences.
Cross Language Text Categorization
As we have seen in the literature review, machine translation and bilingual dictionaries can be used to cast these dimensions from the source language L5 to the target language Lt.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Shen, Libin and Xu, Jinxi and Weischedel, Ralph
Abstract
In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
Conclusions and Future Work
In this paper, we propose a novel string-to-dependency algorithm for statistical machine translation .
Introduction
In recent years, hierarchical methods have been successfully applied to Statistical Machine Translation (Graehl and Knight, 2004; Chiang, 2005; Ding and Palmer, 2005; Quirk et al., 2005).
Introduction
1.1 Hierarchical Machine Translation
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nesson, Rebecca and Satta, Giorgio and Shieber, Stuart M.
Abstract
Synchronous Tree—Adjoining Grammar (STAG) is a promising formalism for syntax—aware machine translation and simultaneous computation of natural-language syntax and semantics.
Introduction
Recently, the desire to incorporate syntax-awareness into machine translation systems has generated interest in
Introduction
Without efficient algorithms for processing it, its potential for use in machine translation and TAG semantics systems is limited.
Synchronous Tree-Adjoining Grammar
In order for STAG to be used in machine translation and other natural-language processing tasks it must be possible to process it efficiently.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Abstract
Out-of-vocabulary (oov) words or phrases still remain a challenge in statistical machine translation especially when a limited amount of parallel text is available for training or when there is a domain shift from training data to test data.
Collocational Lexicon Induction
This approach has also been used in machine translation to find in-vocabulary paraphrases for oov words on the source side and find a way to translate them.
Experiments & Results 4.1 Experimental Setup
BLEU (Papineni et al., 2002) is still the de facto evaluation metric for machine translation and we use that to measure the quality of our proposed approaches for MT.
Introduction
Out-of-vocabulary (oov) words or phrases still remain a challenge in statistical machine translation .
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Avramidis, Eleftherios and Koehn, Philipp
Factored Model
The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model, the reordering model, the translation models and the generation models.
Introduction
Traditional statistical machine translation methods are based on mapping on the lexical level, which takes place in a local window of a few words.
Introduction
Our method is based on factored phrase-based statistical machine translation models.
Introduction
Traditional statistical machine translation models deal with this problems in two ways:
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Chenguang and Duan, Nan and Zhou, Ming and Zhang, Ming
Experiment
The bilingual corpus includes 5.1M sentence pairs from the NIST 2008 constrained track of Chinese-to-English machine translation task.
Experiment
Development data are generated based on the English references of NIST 2008 constrained track of Chinese-to-English machine translation task.
Introduction
Comparing to these works, our paraphrasing engine alters queries in a similar way to statistical machine translation , with systematic tuning and decoding components.
Paraphrasing for Web Search
Similar to statistical machine translation (SMT), given an input query Q, our paraphrasing engine generates paraphrase candidates1 based on a linear model.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Abstract
Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT.
Conclusion and Future Work
Last, some related approaches (Smith and Clark, 2009; Phillips, 2011) combine SMT and example-based machine translation (EBMT) (Nagao, 1984).
Introduction
Statistical machine translation (SMT), especially the phrase-based model (Koehn et al., 2003), has developed very fast in the last decade.
Problem Formulation
Compared with the standard phrase-based machine translation model, the translation problem is reformulated as follows (only based on the best TM, however, it is similar for multiple TM sentences):
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Conclusion
This work was funded by the DFG Research Project Distributional Approaches to Semantic Relatedness (Marion Weller), the DFG Heisenberg Fellowship SCHU-25 80/ 1-1 (Sabine Schulte im Walde), as well as by the Deutsche Forschungsge-meinschaft grant Models of Morphosyntax for Statistical Machine Translation (Alexander Fraser).
Experiments and evaluation
9English/German data released for the 2009 ACL Workshop on Machine Translation shared task.
Previous work
as a hierarchical machine translation system using a string-to-tree setup.
Previous work
(2012) evaluated user reactions to different error types in machine translation and came to the result that morphological
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Congle and Baldwin, Tyler and Ho, Howard and Kimelfeld, Benny and Li, Yunyao
Introduction
It is an important processing step for a wide range of Natural Language Processing (NLP) tasks such as text-to-speech synthesis, speech recognition, information extraction, parsing, and machine translation (Sproat et al., 2001).
Related Work
Research on SMS and Twitter normalization has been roughly categorized as drawing inspiration from three other areas of NLP (Kobus et al., 2008): machine translation , spell checking, and automatic speech recognition.
Related Work
The statistical machine translation (SMT) metaphor was the first proposed to handle the text normalization problem (Aw et al., 2006).
Related Work
(2008) undertook a hybrid approach that pulls inspiration from both the machine translation and speech recognition metaphors.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Abstract
In this paper, we demonstrate that accurate machine translation is possible without the concept of “words,” treating MT as a problem of transformation between character strings.
Introduction
Traditionally, the task of statistical machine translation (SMT) is defined as translating a source sen-
Introduction
boundaries, all machine translation systems perform at least some precursory form of tokenization, splitting punctuation and words to prevent the sparsity that would occur if punctuated and non-punctuated words were treated as different entities.
Introduction
In this paper, we propose improvements to the alignment process tailored to character-based machine translation , and demonstrate that it is, in fact, possible to achieve translation accuracies that ap-
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Abstract
In this work, we tackle the task of machine translation (MT) without parallel training data.
Introduction
Bilingual corpora are a staple of statistical machine translation (SMT) research.
Machine Translation as a Decipherment Task
From a decipherment perspective, machine translation is a much more complex task than word substitution decipherment and poses several technical challenges: (1) scalability due to large corpora sizes and huge translation tables, (2) nondeterminism in translation mappings (a word can have multiple translations), (3) reordering of words
Word Substitution Decipherment
Before we tackle machine translation without parallel data, we first solve a simpler problem—word substitution decipherment.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Abstract
Automated methods for identifying whether sentences are grammatical have various potential applications (e.g., machine translation , automated essay scoring, computer-assisted language learning).
Introduction
Such a system could be used, for example, to check or to rank outputs from systems for text summarization, natural language generation, or machine translation .
Introduction
While some applications (e.g., grammar checking) rely on such fine-grained predictions, others might be better addressed by sentence-level grammaticality judgments (e. g., machine translation evaluation).
Introduction
ity of machine translation outputs (Gamon et al., 2005; Parton et al., 2011), such as the MT Quality Estimation Shared Tasks (Bojar et al., 2013, §6), but relatively little on evaluating the grammaticality of naturally occurring text.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Abstract
This paper extends the training and tuning regime for phrase-based statistical machine translation to obtain fluent translations into morphologically complex languages (we build an English to Finnish translation system).
Conclusion and Future Work
In order to help with replication of the results in this paper, we have run the various morphological analysis steps and created the necessary training, tuning and test data files needed in order to train, tune and test any phrase-based machine translation system with our data.
Conclusion and Future Work
We would particularly like to thank the developers of the open-source Moses machine translation toolkit and the Omorfi morphological analyzer for Finnish which we used for our experiments.
Translation and Morphology
Languages with rich morphological systems present significant hurdles for statistical machine translation (SMT), most notably data sparsity, source-target asymmetry, and problems with automatic evaluation.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010).
Introduction
Unlike previous approaches to joint modeling (Le et al., 2012), our feature can be easily integrated into any statistical machine translation (SMT) decoder, which leads to substantially larger improvements than k-best rescoring only.
Model Variations
We have described a novel formulation for a neural network-based machine translation joint model, along with several simple variations of this model.
Model Variations
One of the biggest goals of this work is to quell any remaining doubts about the utility of neural networks in machine translation .
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Abstract
Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific purpose.
Inferring a learning curve from mostly monolingual data
The ability to predict the amount of parallel data required to achieve a given level of quality is very valuable in planning business deployments of statistical machine translation ; yet, we are not aware of any rigorous proposal for addressing this need.
Introduction
Parallel data in the domain of interest is the key resource when training a statistical machine translation (SMT) system for a specific business purpose.
Related Work
(2008), the authors examined corpus features that contribute most to the machine translation performance.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Abstract
In this paper, we address the issue for learning better translation consensus in machine translation (MT) research, and explore the search of translation consensus from similar, rather than the same, source sentences or their spans.
Abstract
Experimental results show that, our method can significantly improve machine translation performance on both IWSLT and NIST data, compared with a state-of-the-art baseline.
Conclusion and Future Work
To calculate consensus statistics, we develop a novel structured label propagation method for structured learning problems, such as machine translation .
Experiments and Results
G-Re-Rank-GC and G-Decode-GC improve the performance of machine translation according to the baseline.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Riesa, Jason and Marcu, Daniel
Abstract
Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
Conclusion
We have opened up the word alignment task to advances in hypergraph algorithms currently used in parsing and machine translation decoding.
Experiments
For each set of translation rules, we train a machine translation system and decode a held-out test corpus for which we report results below.
Introduction
Automatic word alignment is generally accepted as a first step in training any statistical machine translation system.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pitler, Emily and Louis, Annie and Nenkova, Ani
Indicators of linguistic quality
For this reason, LMs are widely used in applications such as generation and machine translation to guide the production of sentences.
Indicators of linguistic quality
These features are weakly but significantly correlated with the fluency of machine translated sentences.
Indicators of linguistic quality
Soricut and Marcu (2006) make an analogy to machine translation : two words are likely to be translations of each other if they often appear in parallel sentences; in texts, two words are likely to signal local coherence if they often appear in adjacent sentences.
Results and discussion
For example, at the 2008 ACL Workshop on Statistical Machine Translation , all fifteen automatic evaluation metrics, including variants of BLEU scores, achieved between 42% and 56% pairwise accuracy with human judgments at the sentence level (Callison-Burch et al., 2008).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
Abstract
Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics.
Abstract
Instead of difficult and expensive annotation, we build a gold-standard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation .
Introduction
When machine translation (MT) systems are applied in a new domain, many errors are a result of: (1) previously unseen (OOV) source language words, or (2) source language words that appear with a new sense and which require new transla-
Related Work
Work on active learning for machine translation has focused on collecting translations for longer unknown segments (e. g., Bloodgood and Callison-Burch (2010)).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Mauser, Arne and Ney, Hermann
Abstract
In this paper we show how to train statistical machine translation systems on real-life tasks using only nonparallel monolingual data from two languages.
Conclusion
We presented a method for learning statistical machine translation models from nonparallel data.
Conclusion
This work serves as a big step towards large-scale unsupervised training for statistical machine translation systems.
Introduction
In this work, we will develop, describe, and evaluate methods for large vocabulary unsupervised learning of machine translation models suitable for real-world tasks.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Riezler, Stefan and Simianer, Patrick and Haas, Carolin
Abstract
We propose a novel learning approach for statistical machine translation (SMT) that allows to extract supervision signals for structured learning from an extrinsic response to a translation input.
Introduction
In this paper, we propose a novel approach for learning and evaluation in statistical machine translation (SMT) that borrows ideas from response-based learning for grounded semantic parsing.
Introduction
We suggest that in a similar way the preservation of meaning in machine translation should be defined in the context of an interaction in an extrinsic task.
Related Work
Interactive scenarios have been used for evaluation purposes of translation systems for nearly 50 years, especially using human reading comprehension testing (Pfafflin, 1965; Fuji, 1999; Jones et al., 2005), and more recently, using face-to-face conversation mediated via machine translation (Sakamoto et al., 2013).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev
Abstract
Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation .
Conclusion
Our results make long-span syntactic LMs practical for real-time ASR, and can potentially impact machine translation decoding as well.
Introduction
Language models (LM) are crucial components in tasks that require the generation of coherent natural language text, such as automatic speech recognition (ASR) and machine translation (MT).
Related Work
This independence means our tools are useful for other tasks, such as machine translation .
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Shiqi and Lan, Xiang and Liu, Ting and Li, Sheng
Conclusions and Future Work
(1) It is the first statistical model specially designed for paraphrase generation, which is based on the analysis of the differences between paraphrase generation and other researches, especially machine translation .
Experimental Setup
In our experiments, the development set contains 200 sentences and the test set contains 500 sentences, both of which are randomly selected from the human translations of 2008 NIST Open Machine Translation Evaluation: Chinese to English Task.
Introduction
PG shows its importance in many areas, such as question expansion in question answering (QA) (Duboue and Chu-Carroll, 2006), text polishing in natural language generation (NLG) (Iordanskaja et al., 1991), text simplification in computer-aided reading (Carroll et al., 1999), and sentence similarity computation in the automatic evaluation of machine translation (MT) (Kauchak and Barzilay, 2006) and summarization (Zhou et al., 2006).
Statistical Paraphrase Generation
o Sentence similarity computation: Given a reference sentence s’, this application aims to paraphrase s into t, so that t is more similar (closer in wording) with s’ than s. This application is important for the automatic evaluation of machine translation and summarization, since we can paraphrase the human translations/summaries to make them more similar to the system outputs, which can refine the accuracy of the evaluation (Kauchak and Barzilay, 2006).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Liu, Kang
Abstract
The experimental results show that the proposed method outperforms the baseline statistical machine translation system by 30.42%.
Experiments
Then the phrase-based machine translation system MOSES2 is adopted to translate the 503 Chinese NEs in testing set into English.
Experiments
First, word order determination is difficult in statistical machine translation (SMT), while search engines are insensitive to this problem.
Introduction
The task of Named Entity (NE) translation is to translate a named entity from the source language to the target language, which plays an important role in machine translation and cross-language information retrieval (CLIR).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Abstract
This paper revisits the pivot language approach for machine translation .
Introduction
Current statistical machine translation (SMT) systems rely on large parallel and monolingual training corpora to produce translations of relatively higher quality.
Introduction
In order to fill up this data gap, we make use of rule-based machine translation (RBMT) systems to translate the pivot sentences in the source-pivot or pivot-target
Translation Selection
We regard sentence-level translation selection as a machine translation (MT) evaluation problem and formalize this problem with a regression learning model.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Abstract
Predicate-argument structure contains rich semantic information of which statistical machine translation hasn’t taken full advantage.
Abstract
In this paper, we propose two discriminative, feature-based models to exploit predicate-argument structures for statistical machine translation : 1) a predicate translation model and 2) an argument reordering model.
Abstract
The two models are integrated into a state-of-the-art phrase-based machine translation system and evaluated on Chinese-to-English translation tasks with large-scale training data.
Introduction
Recent years have witnessed increasing efforts towards integrating predicate-argument structures into statistical machine translation (SMT) (Wu and Fung, 2009b; Liu and Gildea, 2010).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Abstract
Long distance word reordering is a major challenge in statistical machine translation research.
Abstract
We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase-based SMT system.
Introduction
Modeling word reordering between source and target sentences has been a research focus since the emerging of statistical machine translation .
Introduction
We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and experimental results show that our approach can bring significant improvements to the baseline phrase-based SMT system in both pre-ordering and integrated decoding settings.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Rappoport, Ari
Conclusion
We are currently attempting to construct a parser for UCCA and to apply it to several semantic tasks, notably English-French machine translation .
Introduction
One example is machine translation to target languages that do not express this structural distinction (e.g., both (a) and (b) would be translated to the same German sentence “John duschte”).
Related Work
A different strand of work addresses the construction of an interlingual representation, often with a motivation of applying it to machine translation .
UCCA’s Benefits to Semantic Tasks
Aside from machine translation , a great variety of semantic tasks can benefit from a scheme that is relatively insensitive to syntactic variation.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
Introduction
Automatic word alignment is an important task for statistical machine translation .
Related Work
Recently, FFNNs have been applied successfully to several tasks, such as speech recognition (Dahl et al., 2012), statistical machine translation (Le et al., 2012; Vaswani et al., 2013), and other popular natural language processing tasks (Collobert and Weston, 2008; Collobert et al., 2011).
Training
5.4 Machine Translation Results
Training
Our experiments have shown that the proposed model outperforms the FFNN-based model (Yang et al., 2013) for word alignment and machine translation , and that the agreement constraint improves alignment performance.
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun
Background
Statistical machine translation is a decision problem where we need decide on the best of target sentence matching a source sentence.
Introduction
System combination aims to find consensus translations among different machine translation systems.
Related Work
In machine translation , confusion-network based combination techniques (e.g., (Rosti et al., 2007; He et al., 2008)) have achieved the state-of-the-art performance in MT evaluations.
Related Work
Hypergraphs have been successfully used in parsing (Klein and Manning., 2001; Huang and Chiang, 2005; Huang, 2008) and machine translation (Huang and Chiang, 2007; Mi et al., 2008; Mi and Huang, 2008).
machine translation is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Abstract
Statistical Machine Translation (SMT) usually utilizes contextual information to disambiguate translation candidates.
Experiments
We evaluate the performance of our neural network based topic similarity model on a Chinese-to-English machine translation task.
Introduction
Making translation decisions is a difficult task in many Statistical Machine Translation (SMT) systems.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yıldız, Olcay Taner and Solak, Ercan and Görgün, Onur and Ehsani, Razieh
Abstract
In this paper, we report our preliminary efforts in building an English-Turkish parallel treebank corpus for statistical machine translation .
Introduction
For example, EuroParl corpus (Koehn, 2002), one of the biggest parallel corpora in statistical machine translation , contains 22 languages (but not Turkish).
Introduction
In this study, we report our preliminary efforts in constructing an English-Turkish parallel treebank corpus for statistical machine translation .
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Xu, Jian-Ming and Ittycheriah, Abraham and Roukos, Salim
Abstract
We present an adaptive translation quality estimation (QE) method to predict the human-targeted translation error rate (HTER) for a document-specific machine translation model.
Introduction
Machine translation (MT) systems suffer from an inconsistent and unstable translation quality.
Related Work
There has been a long history of study in confidence estimation of machine translation .
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cai, Jingsheng and Utiyama, Masao and Sumita, Eiichiro and Zhang, Yujie
Abstract
In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.
Introduction
This is especially important on the point of the system combination of PBSMT systems, because the diversity of outputs from machine translation systems is important for system combination (Cer et al., 2013).
Introduction
By using both our rules and Wang et al.’s rules, one can obtain diverse machine translation results because the pre-ordering results of these two rule sets are generally different.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Abstract
This study investigates on building a better Chinese word segmentation model for statistical machine translation .
Abstract
The experiments on a Chinese-to-English machine translation task reveal that the proposed model can bring positive segmentation effects to translation quality.
Introduction
The empirical works show that word segmentation can be beneficial to Chinese-to-English statistical machine translation (SMT) (Xu et al., 2005; Chang et al., 2008; Zhao et al., 2013).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yamangil, Elif and Shieber, Stuart M.
Introduction
Such induction of tree mappings has application in a variety of natural-language-processing tasks including machine translation , paraphrase, and sentence compression.
Introduction
Such models have been used as generative solutions to several other segmentation problems, ranging from word segmentation (Goldwater et al., 2006), to parsing (Cohn et al., 2009; Post and Gildea, 2009) and machine translation (DeNero et al., 2008; Cohn and Blunsom, 2009; Liu and Gildea, 2009).
Introduction
possibility of searching over the infinite space of grammars (and, in machine translation , possible word alignments), thus sidestepping the narrowness problem outlined above as well.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cherry, Colin
Introduction
Statistical machine translation (SMT) is complicated by the fact that words can move during translation.
Introduction
1We use the term “syntactic cohesion” throughout this paper to mean what has previously been referred to as “phrasal cohesion”, because the nonlinguistic sense of “phrase” has become so common in machine translation literature.
Introduction
Phrase-based decoding (Koehn et al., 2003) is a dominant formalism in statistical machine translation .
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Conclusion
We believe this general framework could also be applied to other problems involving forests or lattices, such as sequence labeling and machine translation .
Forest Reranking
For nonlocal features, we adapt cube pruning from forest rescoring (Chiang, 2007; Huang and Chiang, 2007), since the situation here is analogous to machine translation decoding with integrated language models: we can View the scores of unit nonlocal features as the language model cost, computed on-the-fly when combining sub-constituents.
Introduction
Discriminative reranking has become a popular technique for many NLP problems, in particular, parsing (Collins, 2000) and machine translation (Shen et al., 2005).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Quirk, Chris and Moore, Robert C. and Gildea, Daniel
Experiments
These experiments also indicate that a very sparse prior is needed for machine translation tasks.
Introduction
Most state-of—the-art statistical machine translation systems are based on large phrase tables extracted from parallel text using word-level alignments.
Introduction
While this approach has been very successful, poor word-level alignments are nonetheless a common source of error in machine translation systems.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Abstract
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Introduction
Statistical machine translation systems based on synchronous grammars have recently shown great promise, but one stumbling block to their widespread adoption is that the decoding, or search, problem during translation is more computationally demanding than in phrase-based systems.
Introduction
We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa
Introduction
Automatic evaluation methods based on similarity to human references have substantially accelerated the development cycle of many NLP tasks, such as Machine Translation , Automatic Summarization, Sentence Compression and Language Generation.
Introduction
context of Machine Translation , a considerable effort has also been made to include deeper linguistic information in automatic evaluation metrics, both syntactic and semantic (see Section 2 for details).
Previous Work on Machine Translation Meta-Evaluation
Insofar as automatic evaluation metrics for machine translation have been proposed, different meta-evaluation frameworks have been gradually introduced.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting
Log-linear Models
BLEU score, a method originally proposed to automatically evaluate machine translation quality (Papineni et al., 2002), has been widely used as a metric to evaluate general-purpose sentence generation (Langkilde, 2002; White et al., 2007; Guo et al.
Log-linear Models
3 The BLEU scoring script is supplied by NIST Open Machine Translation Evaluation at ftp://iaguarncsl.nist.gov/mt/resources/mteval-vl lb.pl
Log-linear Models
(MERT), which is popular in statistical machine translation (Och, 2003).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jiang, Long and Yang, Shiquan and Zhou, Ming and Liu, Xiaohua and Zhu, Qingsheng
Abstract
Mining bilingual data (including bilingual sentences and termsl) from the Web can benefit many NLP applications, such as machine translation and cross language information retrieval.
Conclusions
We also want to evaluate the usefulness of our mined data for machine translation or other applications.
Introduction
Bilingual data (including bilingual sentences and bilingual terms) are critical resources for building many applications, such as machine translation (Brown, 1993) and cross language information retrieval (Nie et al., 1999).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.
Abstract
Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres.
Expt. 2: Predicting Pairwise Preferences
This experiment uses the 2006—2008 corpora of the Workshop on Statistical Machine Translation (WMT).7 It consists of data from EU-ROPARL (Koehn, 2005) and various news commentaries, with five source languages (French, German, Spanish, Czech, and Hungarian).
Introduction
Constant evaluation is vital to the progress of machine translation (MT).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Parton, Kristen and McKeown, Kathleen R. and Coyne, Bob and Diab, Mona T. and Grishman, Ralph and Hakkani-Tür, Dilek and Harper, Mary and Ji, Heng and Ma, Wei Yun and Meyers, Adam and Stolbach, Sara and Sun, Ang and Tur, Gokhan and Xu, Wei and Yaman, Sibel
Abstract
Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT).
Introduction
0 How much does machine translation (MT) degrade the performance of cross-lingual 5W systems, as compared to monolingual performance?
The Chinese-English 5W Task
In this task, both machine translation (MT) and SW extraction must succeed in order to produce correct answers.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Abstract
We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation .
Conclusion
We have presented a novel way to integrate transliterations into machine translation .
Conclusion
transliteration can be very effective in machine translation for more than just translating OOV words.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Introduction
Linguistically syntax-based statistical machine translation models have made promising progress in recent years.
Related Work
The concept of packed forest has been used in machine translation for several years.
Related Work
(2008) and Mi and Huang (2008) use forest to direct translation and extract rules rather than l-best tree in order to weaken the influence of parsing errors, this is also the first time to use forest directly in machine translation .
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Abstract
Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data.
Experimental Evaluation
We conducted our experiments on the German-English data published for the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
Introduction
Europarl task from the ACL 2008 Workshop on Statistical Machine Translation (WMT08).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Abstract
In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination.
Abstract
We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system.
Introduction
Recent research on Statistical Machine Translation (SMT) has achieved substantial progress.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Introduction
Word segmentation and part-of-speech (POS) tagging are two critical and necessary initial procedures with respect to the majority of high-level Chinese language processing tasks such as syntax parsing, information extraction and machine translation .
Related Work
(2008) described a Bayesian semi-supervised CWS model by considering the segmentation as the hidden variable in machine translation .
Related Work
Unlike this model, the proposed approach is targeted at a general model, instead of one oriented to machine translation task.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Crescenzi, Pierluigi and Gildea, Daniel and Marino, Andrea and Rossi, Gianluca and Satta, Giorgio
Concluding remarks
Grammar factorization for synchronous models is an important component of current machine translation systems (Zhang et al., 2006), and algorithms for factorization have been studied by Gildea et al.
Concluding remarks
These algorithms do not result in what we refer as head-driven strategies, although, as machine translation systems improve, lexicalized rules may become important in this setting as well.
Introduction
Similar questions have arisen in the context of machine translation , as the SCFGs used to model translation are also instances of LCFRSs, as already mentioned.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lo, Chi-kiu and Wu, Dekai
Abstract
As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent.
Abstract
We argue that BLEU (Papineni et al., 2002) and other automatic n- gram based MT evaluation metrics do not adequately capture the similarity in meaning between the machine translation and the reference translation—which, ultimately, is essential for MT output to be useful.
Abstract
the most essential semantic information being captured by machine translation systems?
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nederhof, Mark-Jan and Satta, Giorgio
Discussion
Prefix probabilities and right prefix probabilities for PSCFGs can be exploited to compute probability distributions for the next word or part-of-speech in left-to-right incremental translation of speech, or alternatively as a predictive tool in applications of interactive machine translation , of the kind described by Foster et al.
Introduction
Within the area of statistical machine translation , there has been a growing interest in so-called syntax-based translation models, that is, models that define mappings between languages through hierarchical sentence structures.
Prefix probabilities
One should add that, in real world machine translation applications, it has been observed that recognition (and computation of inside probabilities) for SCFGs can typically be carried out in low-degree polynomial time, and the worst cases mentioned above are not observed with real data.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhao, Bing and Lee, Young-Suk and Luo, Xiaoqiang and Li, Liu
Elementary Trees to String Grammar
3 significantly enriches reordering powers for syntax-based machine translation .
Introduction
Most syntax-based machine translation models with synchronous context free grammar (SCFG) have been relying on the off—the—shelf monolingual parse structures to learn the translation equivalences for string-to-tree, tree—to—string or tree—to—tree grammars.
Introduction
However, state—of-the-art monolingual parsers are not necessarily well suited for machine translation in terms of both labels and chunks/brackets.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cancedda, Nicola
Abstract
Some Statistical Machine Translation systems never see the light because the owner of the appropriate training data cannot release them, and the potential user of the system cannot disclose what should be translated.
Introduction
It is generally taken for granted that whoever is deploying a Statistical Machine Translation (SMT) system has unrestricted rights to access and use the parallel data required for its training.
Related work
private access to a phrase table or other resources for the purpose of performing statistical machine translation .
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
Discriminative training is an active area in statistical machine translation (SMT) (e.g., Och et al., 2002, 2003, Liang et al., 2006, Blunsom et al., 2008, Chiang et al., 2009, Foster et al, 2010, Xiao et al.
Abstract
5.3 Experiments on the IWSLT2011 benchmark As the second evaluation task, we apply our new method described in this paper to the 2011 IWSLT Chinese-to-English machine translation benchmark (Federico et al., 2011).
Abstract
Third, the new objective function and new optimization technique are successfully applied to two important machine translation tasks, with implementation issues resolved (e.g., training schedule and hyper-parameter tuning, etc.).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Konstas, Ioannis and Lapata, Mirella
Experimental Design
3In machine translation , Huang (2008) provides a soft algorithm that finds the forest oracle, i.e., the parse among the reranked candidates with the highest Parseval F—score.
Problem Formulation
In machine translation , a decoder that implements forest rescoring (Huang and Chiang, 2007) uses the language model as an external criterion of the goodness of sub-translations on account of their grammaticality.
Related Work
Discriminative reranking has been employed in many NLP tasks such as syntactic parsing (Char-niak and Johnson, 2005; Huang, 2008), machine translation (Shen et al., 2004; Li and Khudanpur, 2009) and semantic parsing (Ge and Mooney, 2006).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Abstract
Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level.
Introduction
To exploit topic information for statistical machine translation (SMT), researchers have proposed various topic-specific lexicon translation models (Zhao and Xing, 2006; Zhao and Xing, 2007; Tam et al., 2007) to improve translation quality.
Related Work
In addition to the topic-specific lexicon translation method mentioned in the previous sections, researchers also explore topic model for machine translation in other ways.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Eidelman, Vladimir and Marton, Yuval and Resnik, Philip
Abstract
However, these solutions are impractical in complex structured prediction problems such as statistical machine translation .
Conclusions and Future Work
Finally, although motivated by statistical machine translation , RM is a gradient-based method that can easily be applied to other problems.
Introduction
The desire to incorporate high-dimensional sparse feature representations into statistical machine translation (SMT) models has driven recent research away from Minimum Error Rate Training (MERT) (Och, 2003), and toward other discriminative methods that can optimize more features.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Goto, Isao and Utiyama, Masao and Sumita, Eiichiro and Tamura, Akihiro and Kurohashi, Sadao
Experiment
We used the patent data for the Japanese to English and Chinese to English translation subtasks from the NTCIR-9 Patent Machine Translation Task (Goto et al., 2011).
Introduction
Estimating appropriate word order in a target language is one of the most difficult problems for statistical machine translation (SMT).
Introduction
Experiments confirmed the effectiveness of our method for J apanese-English and Chinese-English translation, using NTCIR-9 Patent Machine Translation Task data sets (Goto et al., 2011).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Abstract
We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding.
Introduction
A key bottleneck of high-quality cross-lingual information access lies in the performance of Machine Translation (MT).
Related Work
In contrast, our name pair mining approach described in this paper does not require any machine translation or transliteration features.
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Related Work
(Ortiz-Martinez et al., 2010) delay the computation of translation model features for the purpose of interactive machine translation with online training.
Translation Model Architecture
One applications where this could be desirable is interactive machine translation , where one could work with a mix of compact, static tables, and tables designed to be incrementally trainable.
Translation Model Architecture
2 data sets are out-of-domain, made available by the 2012 Workshop on Statistical Machine Translation (Callison-Burch et al., 2012).
machine translation is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: