Index of papers in Proc. ACL that mention
  • SMT system
Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal
Integration with SMT
We use the following method to integrate our transliterator into the overall SMT system:
Integration with SMT
In a tuning step, the Minimim Error Rate Training component of our SMT system iteratively adjusts the set of rule weights, including the weight associated with the transliteration feature, such that the English translations are optimized with respect to a set of known reference translations according to the BLEU translation metric.
Integration with SMT
At runtime, the transliterations then compete with the translations generated by the general SMT system .
Introduction
The SMT system drops most names in this example.
Introduction
The simplest way to integrate name handling into SMT is: (1) run a named-entity identification system on the source sentence, (2) transliterate identified entities with a special-purpose transliteration component, and (3) run the SMT system on the source sentence, as usual, but when looking up phrasal translations for the words identified in step 1, instead use the transliterations from step 2.
Introduction
The base SMT system may translate a commonly-occurring name just fine, due to the bitext it was trained on, while the transliteration component can easily supply a worse answer.
SMT system is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Abstract
In this paper, we propose a statistical model to generate appropriate measure words of nouns for an English-to-Chinese SMT system .
Abstract
Our model works as a postprocessing procedure over output of statistical machine translation systems, and can work with any SMT system .
Introduction
English-Chinese SMT Systems
Introduction
However, as we will show below, existing SMT systems do not deal well with the measure word generation in general due to data sparseness and long distance dependencies between measure words and their corresponding head words.
Introduction
Due to the limited size of bilingual corpora, many measure words, as well as the collocations between a measure and its head word, cannot be well covered by the phrase translation table in an SMT system .
Our Method
For those having English translations, such as 9K” (meter), “DEE” (ton), we just use the translation produced by the SMT system itself.
Our Method
The model is applied to SMT system outputs as a postprocessing procedure.
Our Method
Based on contextual information contained in both input source sentence and SMT system’s output translation, a measure word candidate set M is constructed.
SMT system is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Wang, Kun and Zong, Chengqing and Su, Keh-Yih
Abstract
Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system .
Conclusion and Future Work
The experiments show that the proposed Model-III outperforms both the TM and the SMT systems significantly (p < 0.05) in either BLEU or TER when fuzzy match score is above 0.4.
Conclusion and Future Work
Compared with the pure SMT system , Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction on a Chinese—English TM database.
Experiments
For the phrase-based SMT system , we adopted the Moses toolkit (Koehn et al., 2007).
Experiments
We first extract 95% of the bilingual sentences as a new training corpus to train a SMT system .
Experiments
Scores marked by “*” are significantly better ([9 < 0.05) than both the TM and the SMT systems .
Introduction
Especially, there is no guarantee that a SMT system can produce translations in a consistent manner (Ma et al., 2011).
Introduction
Afterwards, they merge the relevant translations of matched segments into the source sentence, and then force the SMT system to only translate those unmatched segments at decoding.
Introduction
Compared with the pure SMT system , the proposed integrated Model-III achieves 3.48 BLEU points improvement and 2.62 TER points reduction overall.
SMT system is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Sun, Hong and Zhou, Ming
Abstract
Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results.
Abstract
In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation.
Abstract
In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems .
Introduction
Thus researchers leverage bilingual parallel data for this task and apply two SMT systems (dual SMT system ) to translate the original sentences into another pivot language and then translate them back into the original language.
Introduction
Context features are added into the SMT system to improve translation correctness against polysemous.
Introduction
Previous work employs two separately trained SMT systems the parameters of which are tuned for SMT scheme and therefore cannot directly optimize the paraphrase purposes, for example, optimize the diversity against the input.
Paraphrasing with a Dual SMT System
Generating sentential paraphrase with the SMT system is done by first translating a source sentence into another pivot language, and then back into the source.
Paraphrasing with a Dual SMT System
Here, we call these two procedures a dual SMT system .
Paraphrasing with a Dual SMT System
2.1 Joint Inference of Dual SMT System
SMT system is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Abstract
Then we employ a hybrid method combining RBMT and SMT systems to fill up the data gap for pivot translation, where the source-pivot and pivot-target corpora are independent.
Experiments
5.3 Results by Using SMT Systems
Experiments
Table 3: CRR/ASR translation results by using SMT systems
Experiments
5.4 Results by Using both RBMT and SMT Systems
Introduction
Unfortunately, large quantities of parallel data are not readily available for some languages pairs, therefore limiting the potential use of current SMT systems .
Introduction
Experimental results show that (l) the performances of the three pivot methods are comparable when only SMT systems are used.
Pivot Methods for Phrase-based SMT
Where L is the number of features used in SMT systems .
Pivot Methods for Phrase-based SMT
This can be achieved by translating the pivot sentences in source-pivot corpus to target sentences with the pivot-target SMT system .
Pivot Methods for Phrase-based SMT
The other is to obtain source translations for the target sentences in the pivot-target corpus using the pivot-source SMT system .
Using RBMT Systems for Pivot Translation
Since it is easy to obtain monolingual corpora than bilingual corpora, we use RBMT systems to translate the available monolingual corpora to obtain synthetic bilingual corpus, which are added to the training data to improve the performance of SMT systems .
SMT system is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Liu, Zhanyi and Wang, Haifeng and Wu, Hua and Li, Sheng
Abstract
We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT.
Abstract
As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system .
Experiments on Phrase-Based SMT
We use FBIS corpus to train the Chinese-to-English SMT systems .
Experiments on Word Alignment
To train a Chinese-to-English SMT system , we need to perform both Chinese-to-English and
Improving Phrase Table
Phrase-based SMT system automatically extracts bilingual phrase pairs from the word aligned bilingual corpus.
Improving Phrase Table
These collocation probabilities are incorporated into the phrase-based SMT system as features.
Improving Statistical Bilingual Word Alignment
This method ignores the correlation of the words in the same alignment unit, so an alignment may include many unrelated words2, which influences the performances of SMT systems .
Introduction
1993) is the base of most SMT systems .
Introduction
Then the collocation information is employed to improve Bilingual Word Alignment (BWA) for various kinds of SMT systems and to improve phrase table for phrase-based SMT.
Introduction
Then the phrase collocation probabilities are used as additional features in phrase-based SMT systems .
SMT system is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Abstract
To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work.
Background
where Pr(e| f) is the probability that e is the translation of the given source string f. To model the posterior probability Pr(e| f) , most of the state-of-the-art SMT systems utilize the log-linear model proposed by Och and Ney (2002), as follows,
Background
In this paper, u denotes a log-linear model that has Mfixed features {h1(f,e), ..., hM(f,e)}, ,1 = {3.1, ..., AM} denotes the M parameters of u, and u(/1) denotes a SMT system based on u with parameters ,1.
Background
Suppose that there are T available SMT systems {u1(/1*1), ..., uT(/1*T)}, the task of system combination is to build a new translation system v(u1(/l*1), mm?» from mm), mfg}.
Introduction
With the emergence of various structurally different SMT systems, more and more studies are focused on combining multiple SMT systems for achieving higher translation accuracy rather than using a single translation system.
Introduction
One of the key factors in SMT system combination is the diversity in the ensemble of translation outputs (Macherey and Och, 2007).
Introduction
However, this requirement cannot be met in many cases, since we do not always have the access to multiple SMT engines due to the high cost of developing and tuning SMT systems .
SMT system is mentioned in 24 sentences in this paper.
Topics mentioned in this paper:
Riezler, Stefan and Simianer, Patrick and Haas, Carolin
Experiments
The bilingual SMT system used in our experiments is the state-of-the-art SCFG decoder CDEC (Dyer et al., 2010)5.
Experiments
We trained the SMT system on the English-German parallel web data provided in the COMMON CRAWL6 (Smith et al., 2013) dataset.
Experiments
Method 1 is the baseline system, consisting of the CDEC SMT system trained on the COMMON CRAWL data as described above.
Grounding SMT in Semantic Parsing
Given a manual German translation of the English query as source sentence, the SMT system produces an English target translation.
Introduction
This avoids the problem of un-reachability of independently generated reference translations by the SMT system .
Introduction
in which SMT systems can be trained and evaluated.
Related Work
(2012) propose a setup where an SMT system feeds into cross-language information retrieval, and receives feedback from the performance of translated queries with respect to cross-language retrieval performance.
Related Work
However, despite offering direct and reliable prediction of translation quality, the cost and lack of reusability has confined task-based evaluations involving humans to testing scenarios, but prevented a use for interactive training of SMT systems as in our work.
Response-based Online Learning
Firstly, update rules that require to compute a feature representation for the reference translation are suboptimal in SMT, because often human-generated reference translations cannot be generated by the SMT system .
Response-based Online Learning
Such “un-reachable” gold-standard translations need to be replaced by “surrogate” gold-standard translations that are close to the human-generated translations and still lie within the reach of the SMT system .
SMT system is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Introduction
While the research in statistical machine trans-ation (SMT) has made significant progress, most SMT systems (Koehn et al., 2003; Chiang, 2007; 3alley et al., 2006) rely on parallel corpora to extract ,ranslation entries.
Introduction
The richness and complexness )f Chinese abbreviations imposes challenges to the SMT systems .
Introduction
In particular, many Chinese abbrevi-1ti0ns may not appear in available parallel corpora, n which case current SMT systems treat them as mknown words and leave them untranslated.
Unsupervised Translation Induction for Chinese Abbreviations
Moreover, our approach utilizes both Chinese and English monolingual data to help MT, while most SMT systems utilizes only the English monolingual data to build a language model.
SMT system is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Conclusion and Future Work
Experimental results show that our approach is promising for SMT systems to learn a better translation model.
Experiments
This proves that bilingually induced topic representation with neural network helps the SMT system disambiguate translation candidates.
Introduction
For example, translation sense disambiguation approaches (Carpuat and Wu, 2005; Carpuat and Wu, 2007) are proposed for phrase-based SMT systems .
Introduction
Meanwhile, for hierarchical phrase-based or syntax-based SMT systems , there is also much work involving rich contexts to guide rule selection (He et al., 2008; Liu et al., 2008; Marton and Resnik, 2008; Xiong et al., 2009).
Introduction
Although these methods are effective and proven successful in many SMT systems , they only leverage within-
Topic Similarity Model with Neural Network
For the SMT system , the best translation candidate 6 is given by:
SMT system is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Yang, Nan and Li, Mu and Zhang, Dongdong and Yu, Nenghai
Abstract
We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and show that it can significantly outperform the baseline phrase-based SMT system .
Conclusion and Future Work
We also expect to explore better way to integrate ranking reorder model into SMT system instead of a simple penalty scheme.
Experiments
Lexicon features generally continue to improve the RankingSVM accuracy and reduce CLN on training data, but they do not bring further improvement for SMT systems beyond the top 100 most frequent words.
Integration into SMT system
There are two ways to integrate the ranking reordering model into a phrase-based SMT system : the pre-reorder method, and the decoding time constraint method.
Introduction
This is usually done in a preprocessing step, and then followed by a standard phrase-based SMT system that takes the reordered source sentence as input to finish the translation.
Introduction
The ranking model can not only be used in a pre-reordering based SMT system , but also be integrated into a phrase-based decoder serving as additional distortion features.
Introduction
We evaluated our approach on large-scale J apanese-English and English-Japanese machine translation tasks, and experimental results show that our approach can bring significant improvements to the baseline phrase-based SMT system in both pre-ordering and integrated decoding settings.
SMT system is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Features
Given a source sentence f, let {enHV be the N -best list generated by an SMT system , and let ef, is the i-th word in en.
Introduction
Automatically distinguishing incorrect parts from correct parts is therefore very desirable not only for post-editing and interactive machine translation (Ueffing and Ney, 2007) but also for SMT itself: either by rescoring hypotheses in the N-best list using the probability of correctness calculated for each hypothesis (Zens and Ney, 2006) or by generating new hypotheses using N -best lists from one SMT system or multiple sys-
Introduction
1) Calculate features that express the correctness of words either based on SMT model (e. g. translatiorfllanguage model) or based on SMT system output (e.g.
Introduction
However, it is not adequate to just use these features as discussed in (Shi and Zhou, 2005) because the information that they carry is either from the inner components of SMT systems or from system outputs.
SMT system is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Abstract
Our inflection generation models are trained independently of the SMT system .
Abstract
We investigate different ways of combining the inflection prediction component with the SMT system by training the base MT system on fully inflected forms or on word stems.
Abstract
We applied our inflection generation models in translating English into two morphologically complex languages, Russian and Arabic, and show that our model improves the quality of SMT over both phrasal and syntax-based SMT systems according to BLEU and human judge-ments.
Conclusion and future work
We have shown that an independent model of morphology generation can be successfully integrated with an SMT system , making improvements in both phrasal and syntax-based MT.
Introduction
We also demonstrate that our independently trained models are portable, showing that they can improve both syntactic and phrasal SMT systems .
Related work
In recent work, Koehn and Hoang (2007) proposed a general framework for including morphological features in a phrase-based SMT system by factoring the representation of words into a vector of morphological features and allowing a phrase-based MT system to work on any of the factored representations, which is implemented in the Moses system.
SMT system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin
Conclusion
We describe four versions of the model and implement an algorithm to integrate our proposed model into a syntax-based SMT system .
Decoding
Integrating the TNO Model into syntax-based SMT systems is nontrivial, especially with the MOS modeling.
Introduction
To show the effectiveness of our model, we integrate our TNO model into a state-of-the-art syntax-based SMT system , which uses synchronous context-free grammar (SCFG) rules to jointly model reordering and lexical translation.
Introduction
We show the efficacy of our proposal in a large-scale Chinese-to-English translation task where the introduction of our TNO model provides a significant gain over a state-of-the-art string-to-dependency SMT system (Shen et al., 2008) that we enhance with additional state-of-the-art features.
Introduction
Even though the experimental results carried out in this paper employ SCFG-based SMT systems, we would like to point out that our models is applicable to other systems including phrase-based SMT systems .
Model Decomposition and Variants
Each of these factors will act as an additional feature in the log-linear framework of our SMT system .
SMT system is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu
Abstract
Evaluations of J apanese-to-English translation on the NTCIR-9 data show that our induced Japanese POS tags for dependency trees improve the performance of a forest-to-string SMT system .
Experiment
We evaluated our bilingual infinite tree model for POS induction using an in-house developed syntax-based forest-to-string SMT system .
Experiment
Under the Moses phrase-based SMT system (Koehn et al., 2007) with the default settings, we achieved a 26.80% BLEU score.
Introduction
If we could discriminate POS tags for two cases, we might improve the performance of a Japanese-to-English SMT system .
Introduction
Experiments are carried out on the NTCIR-9 Japanese-to-English task using a binarized forest-to-string SMT system with dependency trees as its source side.
SMT system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Decoding with Sense-Based Translation Model
Figure 2: Architecture of SMT system with the sense-based translation model.
Decoding with Sense-Based Translation Model
Figure 2 shows the architecture of the SMT system enhanced with the sense-based translation model.
Experiments
Our baseline system is a state-of-the-art SMT system which adapts Bracketing Transduction Grammars (Wu, 1997) to phrasal translation and equips itself with a maximum entropy based reordering model (Xiong et al., 2006).
Introduction
These glosses, used as the sense predictions of their WSD system, are integrated into a word-based SMT system either to substitute for translation candidates of their translation model or to postedit the output of their SMT system .
Introduction
We integrate the proposed sense-based translation model into a state-of-the-art SMT system and conduct experiments on Chines-to-English translation using large-scale training data.
SMT system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Discussion
MERT and MBR decoding are popular techniques for incorporating the final evaluation metric into the development of SMT systems .
Introduction
These two techniques were originally developed for N -best lists of translation hypotheses and recently extended to translation lattices (Macherey et al., 2008; Tromble et al., 2008) generated by a phrase-based SMT system (Och and Ney, 2004).
Introduction
SMT systems based on synchronous context free grammars (SCFG) (Chiang, 2007; Zollmann and Venugopal, 2006; Galley et al., 2006) have recently been shown to give competitive performance relative to phrase-based SMT.
Translation Hypergraphs
A translation lattice compactly encodes a large number of hypotheses produced by a phrase-based SMT system .
Translation Hypergraphs
The corresponding representation for an SMT system based on SCFGs (e.g.
SMT system is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Sun, Jun and Zhang, Min and Tan, Chew Lim
Introduction
Further experiments in machine translation also suggest that the obtained subtree alignment can improve the performance of both phrase and syntax based SMT systems .
Substructure Spaces for BTKs
Most linguistically motivated syntax based SMT systems require an automatic parser to perform the rule induction.
Substructure Spaces for BTKs
We explore the effectiveness of subtree alignment for both phrase based and linguistically motivated syntax based SMT systems .
Substructure Spaces for BTKs
The findings suggest that with the modeling of non-syntactic phrases maintained, more emphasis on syntactic phrases can benefit both the phrase and syntax based SMT systems .
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Conclusions and Future Work
The two models have been integrated into a phrase-based SMT system and evaluated on Chinese-to-English translation tasks using large-scale training data.
Introduction
Unfortunately they are usually neither correctly translated nor translated at all in many SMT systems according to the error study by Wu and Fung (2009a).
Introduction
This suggests that conventional leXical and phrasal translation models adopted in those SMT systems are not sufficient to correctly translate predicates in source sentences.
Related Work
(2011) incorporate source language semantic role labels into a tree-to-string SMT system .
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Ensemble Decoding
As in the Hiero SMT system (Chiang, 2005), the cells which span up to a certain length (i.e.
Introduction
Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems : the language model and the translation model.
Related Work 5.1 Domain Adaptation
In a similar approach, Koehn and Schroeder (2007) use a feature of the factored translation model framework in Moses SMT system (Koehn and Schroeder, 2007) to use multiple alternative decoding paths.
Related Work 5.1 Domain Adaptation
The Moses SMT system implements (Koehn and Schroeder,
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
Conclusion
In the future, we will extend our methods to other translation models, such as the syntax-based model, to study how to further improve the performance of SMT systems .
Introduction
In Section 3, we present our models and show how to integrate the models into an SMT system .
Related Work
They added the labels assigned to connectives as an additional input to an SMT system , but their experimental results show that the improvements under the evaluation metric of BLEU were not significant.
Related Work
To the best of our knowledge, our work is the first attempt to exploit the source functional relationship to generate the target transitional expressions for grammatical cohesion, and we have successfully incorporated the proposed models into an SMT system with significant improvement of BLEU metrics.
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
(2009) improved a syntactic SMT system by adding as many as ten thousand syntactic features, and used Margin Infused Relaxed Algorithm (MIRA) to train the feature weights.
Abstract
Our work is based on a phrase-based SMT system .
Abstract
In a phrase-based SMT system , the total number of parameters of phrase and lexicon translation models, which we aim to learn discriminatively, is very large (see Table 1).
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Schamoni, Shigehiko and Hieber, Felix and Sokolov, Artem and Riezler, Stefan
Experiments
order to simulate pass-through behavior of out-of-vocabulary terms in SMT systems , additional features accounting for source and target term identity were added to DK and BM models.
Introduction
This approach is advantageous if large amounts of in-domain sentence-parallel data are available to train SMT systems , but relevance rankings to train retrieval models are not.
Related Work
In a direct translation approach (DT), a state-of-the-art SMT system is used to produce a single best translation that is used as search query in the target language.
Related Work
For example, Google’s CLIR approach combines their state-of-the-art SMT system with their proprietary search engine (Chin et al., 2008).
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting
Discussion
Removing redundant words: Mostly, translating redundant words may confuse the SMT system and would be unnecessary.
Introduction
The translation quality of the SMT system is highly related to the coverage of translation models.
Introduction
This problem is more serious for online SMT systems in real-world applications.
Introduction
the input sentences of the SMT system using automatically extracted paraphrase rules which can capture structures on sentence level in addition to paraphrases on the word or phrase level.
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Experiments
To optimize SMT system , we tune the parameters on NIST MT06, and report results on three test sets: MT02, MT03 and MT05.2
Introduction
In contrast, machine translation uses inherently multilingual data: an SMT system must translate a phrase or sentence from a source language to a different target language, so existing applications of topic models (Eidelman et al., 2012) are wilfully ignoring available information on the target side that could aid domain discovery.
Topic Models for Machine Translation
Cross-Domain SMT A SMT system is usually trained on documents with the same genre (e.g., sports, business) from a similar style (e.g., newswire, blog-posts).
Topic Models for Machine Translation
Domain Adaptation for SMT Training a SMT system using diverse data requires domain adaptation.
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cancedda, Nicola
Conclusions
Some SMT systems never get deployed because of legitimate and incompatible concerns of the prospective users and of the training data owners.
Conclusions
This same method can be easily extended to other resources used by SMT systems , and indeed even beyond SMT itself, whenever similar constraints on data access exist.
Experiments
We validated our simple implementation using a phrase table of 38,488,777 lines created with the Moses toolkit3(Koehn et al., 2007) phrase-based SMT system , corresponding to 15,764,069 entries
Introduction
At the same time, the prospective user of the SMT system that could be derived from such TM might be subject to confidentiality constraints on the text stream needing translation, so that sending out text to translate to an SMT system deployed by the owner of the PT is not an option.
SMT system is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Guzmán, Francisco and Joty, Shafiq and Màrquez, Llu'is and Nakov, Preslav
Introduction
Although modern SMT systems have switched to a discriminative log-linear framework, which allows for additional sources as features, it is generally hard to incorporate dependencies beyond a small window of adjacent words, thus making it difficult to use linguistically-rich models.
Introduction
We believe that the semantic and pragmatic information captured in the form of DTs (i) can help develop discourse-aware SMT systems that produce coherent translations, and (ii) can yield better MT evaluation metrics.
Introduction
While in this work we focus on the latter, we think that the former is also within reach, and that SMT systems would benefit from preserving the coherence relations in the source language when generating target-language translations.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro
Training
In the translation tasks, we used the Moses phrase-based SMT systems (Koehn et al., 2007).
Training
In addition, for a detailed comparison, we evaluated the SMT system where the IBM Model 4 was trained from all the training data (I BM 4a“).
Training
Consequently, the SMT system using RN Nu+c trained from a small part of training data can achieve comparable performance to that using I BM 4 trained from all training data, which is shown in Table 3.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
Machine Translation Experiments
We used MT08 and EgyDevV3 to tune SMT systems while we divided the remaining sets among classifier training data (5,562 sentences), dev (1,802 sentences) and blind test (1,804 sentences) sets to ensure each of these new sets has a variety of dialects and genres (weblog and newswire).
Machine Translation Experiments
This MSA-pivoting system uses Salloum and Habash (2013)’s DA-MSA MT system followed by an Arabic-English SMT system which is trained on both corpora augmented with the DA-English where the DA side is preprocessed with the same DA-MSA MT system then tokenized with MADA-ARZ.
Machine Translation Experiments
Test sets are similarly preprocessed before decoding with the SMT system .
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Methods
We approach this problem by augmenting an SMT system built over target segments with features that reflect the desegmented target words.
Methods
In this section, we describe our various strategies for desegmenting the SMT system’s output space, along with the features that we add to take advantage of this desegmented view.
Related Work
Other approaches train an SMT system to predict lemmas instead of surface forms, and then inflect the SMT output as a postprocessing step (Minkov et al., 2007; Clifton and Sarkar, 2011; Fraser et al., 2012; El Kholy and Habash, 2012b).
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun
Abstract
Current SMT systems usually decode with single translation models and cannot benefit from the strengths of other models in decoding phase.
Background
Most SMT systems approximate the summation over all possible derivations by using l-best derivation for efficiency.
Background
By now, most current SMT systems , adopting either max-derivation decoding or max-translation decoding, have only used single models in decoding phase.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Introduction
The contribution of this paper is to improve the prediction of case in our SMT system by implementing and combining two alternative routes to integrate subcategorization information from the syntax-semantic interface: (i) We regard the translation as a function of the source language input, and project the syntactic functions of the English nouns to their German translations in the
Translation pipeline
Table 2 illustrates the different steps of the inflection process: the markup (number and gender on nouns) in the stemmed output of the SMT system is part of the input to the respective feature prediction.
Using subcategorization information
In contrast, the SMT system often produces more isomorphic translations, which is helpful for annotating source-side features on the target language.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Experimental Evaluation
The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
Introduction
A phrase-based SMT system takes a source sentence and produces a translation by segmenting the sentence into phrases and translating those phrases separately (Koehn et al., 2003).
Introduction
The phrase translation table, which contains the bilingual phrase pairs and the corresponding translation probabilities, is one of the main components of an SMT system .
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram
Inferring a learning curve from mostly monolingual data
For enabling this work we trained a multitude of instances of the same phrase-based SMT system on 30 distinct combinations of language-pair and domain, each with fourteen distinct training sets of increasing size and tested these instances on multiple in—domain datasets, generating 96 learning curves.
Introduction
This prediction, or more generally the prediction of the learning curve of an SMT system as a function of available in-domain parallel data, is the objective of this paper.
Introduction
An extensive study across six parametric function families, empirically establishing that a certain three-parameter power-law family is well suited for modeling learning curves for the Moses SMT system when the evaluation score is BLEU.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun
Introduction
However, the state-of-the-art SMT systems translate sentences by using sequences of synchronous rules or phrases, instead of translating word by word.
Topic Similarity Model
Hellinger function is used to calculate distribution distance and is popular in topic model (Blei and Laf-ferty, 2007).1 By topic similarity, we aim to encourage or penalize the application of a rule for a given document according to their topic distributions, which then helps the SMT system make better translation decisions.
Topic Similarity Model
By incorporating the topic sensitivity model with the topic similarity model, we enable our SMT system to balance the selection of these two types of rules.
SMT system is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: