Abstract | We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. |
Abstract | Significant improvements are obtained over a state-of-the—art hierarchical phrase-based machine translation system. |
Conclusions and Future Work | similarity for terms from parallel corpora and applied it to statistical machine translation . |
Conclusions and Future Work | We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation . |
Experiments | We evaluate the algorithm of bilingual sense similarity via machine translation . |
Experiments | For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshuas, an open-source hierarchical phrase-based machine translation system written in Java. |
Introduction | tatistical Machine Translation |
Introduction | Is it useful for multilingual applications, such as statistical machine translation (SMT)? |
Introduction | The source and target sides of the rules with (*) at the end are not semantically equivalent; it seems likely that measuring the semantic similarity from their context between the source and target sides of rules might be helpful to machine translation . |
Abstract | As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. |
Abstract | Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced using automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR—7. |
Experiments | These English output sentences are sentences that 12 machine translation systems in NTCIR—7 translated from 100 Japanese sentences. |
Experiments | Table 1 presents types of the 12 machine translation systems. |
Experiments | 12 machine translation systems in respective automatic evaluation methods, and “All” are the correlation coefficients using the scores of 1,200 output sentences obtained using the 12 machine translation systems. |
Introduction | High-quality automatic evaluation has become increasingly important as various machine translation systems have developed. |
Introduction | Evaluation experiments using MT outputs obtained by 12 machine translation systems in NTCIR—7(Fujii et al., 2008) demonstrate that the scores obtained using our system yield the highest correlation with the human judgments among the automatic evaluation methods in both sentence-level adequacy and fluency. |
Introduction | Results confirmed that our method using noun-phrase chunking is effective for automatic evaluation for machine translation . |
Abstract | EXisting methods simply use machine translation for document translation or summary translation. |
Abstract | However, current machine translation services are far from satisfactory, which results in that the quality of the cross-language summary is usually very poor, both in readability and content. |
Introduction | A straightforward way for cross-language document summarization is to translate the summary from the source language to the target language by using machine translation services. |
Introduction | However, though machine translation techniques have been advanced a lot, the machine translation quality is far from satisfactory, and in many cases, the translated texts are hard to understand. |
Introduction | An empirical evaluation is conducted to evaluate the performance of machine translation quality prediction, and a user study is performed to evaluate the cross-language summary quality. |
Related Work 2.1 Machine Translation Quality Prediction | Machine translation evaluation aims to assess the correctness and quality of the translation. |
Related Work 2.1 Machine Translation Quality Prediction | Chae and Nenkova (2009) use surface syntactic features to assess the fluency of machine translation results. |
Related Work 2.1 Machine Translation Quality Prediction | In this study, we further predict the translation quality of an English sentence before the machine translation process, i.e., we do not leverage reference translation and the target sentence. |
Abstract | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. |
Conclusion | We have presented pseudo-word as a novel machine translational unit for phrase-based machine translation . |
Conclusion | Experimental results of Chinese-to-English translation task show that, in phrase-based machine translation model, pseudo-word performs significantly better than word in both spoken language translation domain and news domain. |
Experiments and Results | We conduct experiments on Chinese-to-English machine translation . |
Introduction | The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus generated from word-based models (Brown et al., 1993), proceeds with step of induction of phrase table (Koehn et al., 2003) or synchronous grammar (Chiang, 2007) and with model weights tuning step. |
Introduction | Some researchers have explored coarse-grained translational unit for machine translation . |
Introduction | (2008) used a Bayesian semi-supervised method that combines Chinese word segmentation model and Chinese-to-English translation model to derive a Chinese segmentation suitable for machine translation . |
Building the Corpus | In addition, the overall measure of success—induction of machine translation systems from limited resources—pushes the state of the art (Kumar et al., 2007). |
Building the Corpus | Variation will arise as a consequence, but we believe that it will be no worse than the variability in input that current machine translation training methods routinely deal with, and will not greatly injure the utility of the Corpus. |
Conclusion | We need leaner methods for building machine translation systems; new algorithms for cross-linguistic bootstrapping via multiple paths; more effective techniques for leveraging human effort in labeling data; scalable ways to get bilingual text for unwritten languages; and large scale social engineering to make it all happen quickly. |
Human Language Project | Although we strive for maximum generality, we also propose a specific driving “use case,” namely, machine translation (MT), (Hutchins and Somers, 1992; Koehn, 2010). |
Human Language Project | That is, we view machine translation as an approximation to language understanding. |
Human Language Project | Taking sentences in a reference language as the meaning representation, we arrive back at machine translation as the measure of success. |
Conclusion | The results show that CL—SCL is competitive with state-of-the-art machine translation technology while requiring fewer resources. |
Experiments | We chose a machine translation baseline to compare CL—SCL to another cross-language method. |
Experiments | Statistical machine translation technology offers a straightforward solution to the problem of cross-language text classification and has been used in a number of cross-language sentiment classification studies (Hiroshi et al., 2004; Bautin et al., 2008; Wan, 2009). |
Experiments | This difference can be explained by the fact that machine translation works better for European than for Asian languages such as Japanese. |
Introduction | For the application of f3 under language ’2' different approaches are current practice: machine translation of unlabeled documents from ’2' to S, dictionary-based translation of unlabeled |
Introduction | The approach solves the classification problem directly, instead of resorting to a more general and potentially much harder problem such as machine translation . |
Related Work | Recent work in cross-language text classification focuses on the use of automatic machine translation technology. |
Related Work | Most of these methods involve two steps: (1) translation of the documents into the source or the target language, and (2) dimensionality reduction or semi-supervised learning to reduce the noise introduced by the machine translation . |
Abstract | This paper presents a method that shares similarities with both spell checking and machine translation approaches. |
Conclusion and perspectives | With the intention to avoid wrong modifications of special tokens and to handle word boundaries as easily as possible, we designed a method that shares similarities with both spell checking and machine translation . |
Introduction | Evaluated in French, our method shares similarities with both spell checking and machine translation . |
Related work | (2008b), SMS normalization, up to now, has been handled through three well-known NLP metaphors: spell checking, machine translation and automatic speech recognition. |
Related work | The machine translation metaphor, which is historically the first proposed (Bangalore et al., 2002; Aw et al., 2006), considers the process of normalizing SMS as a translation task from a source language (the SMS) to a target language (its standard written form). |
Related work | (2006) proposed a statistical machine translation model working at the phrase-level, by splitting sentences into their k most probable phrases. |
Conclusions | We have presented a novel way to incorporate source syntactic structure in English-to-Turkish phrase-based machine translation by parsing the source sentences and then encoding many local and nonlocal source syntactic structures as additional complex tag factors. |
Introduction | Statistical machine translation into a morphologically complex language such as Turkish, Finnish or Arabic, involves the generation of target words with the proper morphology, in addition to properly ordering the target words. |
Introduction | We assume that the reader is familiar with the basics of phrase-based statistical machine translation (Koehn et al., 2003) and factored statistical machine translation (Koehn and Hoang, 2007). |
Related Work | Statistical Machine Translation into a morphologically rich language is a challenging problem in that, on the target side, the decoder needs to generate both the right sequence of constituents and the right sequence of morphemes for each word. |
Related Work | Using morphology in statistical machine translation has been addressed by many researchers for translation from or into morphologically rich(er) languages. |
Related Work | Goldwater and McClosky (2005) use morphological analysis on the Czech side to get improvements in Czech-to-English statistical machine translation . |
Abstract | Automatic error detection is desired in the postprocessing to improve machine translation quality. |
Abstract | We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. |
Conclusions and Future Work | Therefore our approach can be used for other machine translation systems, such as rule-based or example-based system, which generally do not produce N -best lists. |
Introduction | Translation hypotheses generated by a statistical machine translation (SMT) system always contain both correct parts (e.g. |
Introduction | Automatically distinguishing incorrect parts from correct parts is therefore very desirable not only for post-editing and interactive machine translation (Ueffing and Ney, 2007) but also for SMT itself: either by rescoring hypotheses in the N-best list using the probability of correctness calculated for each hypothesis (Zens and Ney, 2006) or by generating new hypotheses using N -best lists from one SMT system or multiple sys- |
Related Work | In this section, we present an overview of confidence estimation (CE) for machine translation at the word level. |
SMT System | To obtain machine-generated translation hypotheses for our error detection, we use a state-of-the-art phrase-based machine translation system MOSES (Koehn et al., 2003; Koehn et al., 2007). |
Abstract | We further apply the subtree alignment in machine translation with two methods. |
Introduction | Syntax based Statistical Machine Translation (SMT) systems allow the translation process to be more grammatically performed, which provides decent reordering capability. |
Introduction | In multilingual tasks such as machine translation , tree kernels are seldom applied. |
Introduction | Further experiments in machine translation also suggest that the obtained subtree alignment can improve the performance of both phrase and syntax based SMT systems. |
Substructure Spaces for BTKs | Due to the above issues, we annotate a new data set to apply the subtree alignment in machine translation . |
Substructure Spaces for BTKs | 7 Experiments on Machine Translation |
Substructure Spaces for BTKs | However, utilizing syntactic translational equivalences alone for machine translation loses the capability of modeling non-syntactic phrases (Koehn et al., 2003). |
Abstract | In addition Machine Translation is also applied to enrich the resource with lexical information for all languages. |
BabelNet | using (a) the human-generated translations provided in Wikipedia (the so-called inter-language links), as well as (b) a machine translation system to translate occurrences of the concepts within sense-tagged corpora, namely SemCor (Miller et al., 1993) — a corpus annotated with WordNet senses — and Wikipedia itself (Section 3.3). |
Conclusions | Further, we contribute a large set of sense occurrences harvested from Wikipedia and SemCor, a corpus that we input to a state-of-the-art machine translation system to fill in the gap between resource-rich languages — such as English — and resource-poorer ones. |
Experiment 2: Translation Evaluation | both from Wikipedia and the machine translation system. |
Experiment 2: Translation Evaluation | In contrast, good translations were produced using our machine translation method when enough sentences were available. |
Introduction | poor languages with the aid of Machine Translation . |
Methodology | An initial prototype used a statistical machine translation system based on Moses (Koehn et al., 2007) and trained on Europarl (Koehn, 2005). |
Abstract | This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). |
Conclusion | Statistical Machine Translation . |
Conclusion | The Mathematics of Statistical Machine Translation : Pa- |
Conclusion | Statistical Significance Tests for Machine Translation Evaluation. |
Indicators of linguistic quality | For this reason, LMs are widely used in applications such as generation and machine translation to guide the production of sentences. |
Indicators of linguistic quality | These features are weakly but significantly correlated with the fluency of machine translated sentences. |
Indicators of linguistic quality | Soricut and Marcu (2006) make an analogy to machine translation : two words are likely to be translations of each other if they often appear in parallel sentences; in texts, two words are likely to signal local coherence if they often appear in adjacent sentences. |
Results and discussion | For example, at the 2008 ACL Workshop on Statistical Machine Translation , all fifteen automatic evaluation metrics, including variants of BLEU scores, achieved between 42% and 56% pairwise accuracy with human judgments at the sentence level (Callison-Burch et al., 2008). |
Abstract | Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system. |
Conclusion | We have opened up the word alignment task to advances in hypergraph algorithms currently used in parsing and machine translation decoding. |
Experiments | For each set of translation rules, we train a machine translation system and decode a held-out test corpus for which we report results below. |
Introduction | Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. |
Introduction | Linguistically syntax-based statistical machine translation models have made promising progress in recent years. |
Related Work | The concept of packed forest has been used in machine translation for several years. |
Related Work | (2008) and Mi and Huang (2008) use forest to direct translation and extract rules rather than l-best tree in order to weaken the influence of parsing errors, this is also the first time to use forest directly in machine translation . |
Abstract | Several attempts have been made to learn phrase translation probabilities for phrase-based statistical machine translation that go beyond pure counting of phrases in word-aligned training data. |
Experimental Evaluation | We conducted our experiments on the German-English data published for the ACL 2008 Workshop on Statistical Machine Translation (WMT08). |
Introduction | Europarl task from the ACL 2008 Workshop on Statistical Machine Translation (WMT08). |
Abstract | In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. |
Abstract | We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system. |
Introduction | Recent research on Statistical Machine Translation (SMT) has achieved substantial progress. |
Introduction | Such induction of tree mappings has application in a variety of natural-language-processing tasks including machine translation , paraphrase, and sentence compression. |
Introduction | Such models have been used as generative solutions to several other segmentation problems, ranging from word segmentation (Goldwater et al., 2006), to parsing (Cohn et al., 2009; Post and Gildea, 2009) and machine translation (DeNero et al., 2008; Cohn and Blunsom, 2009; Liu and Gildea, 2009). |
Introduction | possibility of searching over the infinite space of grammars (and, in machine translation , possible word alignments), thus sidestepping the narrowness problem outlined above as well. |
Abstract | We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation . |
Conclusion | We have presented a novel way to integrate transliterations into machine translation . |
Conclusion | transliteration can be very effective in machine translation for more than just translating OOV words. |