Baselines | 2.1 Log-Linear Mixture |
Baselines | Log-linear translation model (TM) mixtures are of the form: |
Baselines | where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003). |
Ensemble Decoding | In the typical log-linear model SMT, the posterior |
Ensemble Decoding | log-linear mixture). |
Ensemble Decoding | Since in log-linear models, the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale. |
Experiments & Results 4.1 Experimental Setup | It was filtered to retain the top 20 translations for each source phrase using the TM part of the current log-linear model. |
Introduction | In addition to the basic approach of concatenation of in-domain and out-of-domain data, we also trained a log-linear mixture model (Foster and Kuhn, 2007) |
Related Work 5.1 Domain Adaptation | Two famous examples of such methods are linear mixtures and log-linear mixtures (Koehn and Schroeder, 2007; Civera and Juan, 2007; Foster and Kuhn, 2007) which were used as baselines and discussed in Section 2. |
Abstract | We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features. |
Introduction | Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability. |
Lexical-phonetic model | (2008), we parameterize these distributions with a log-linear model. |
Lexical-phonetic model | In modern phonetics and phonology, these generalizations are usually expressed as Optimality Theory constraints; log-linear models such as ours have previously been used to implement stochas- |
Conclusion and Future Work | The consensus statistics are integrated into the conventional log-linear model as features. |
Experiments and Results | Instead of using graph-based consensus confidence as features in the log-linear model, we perform structured label propagation (Struct-LP) to re-rank the n-best list directly, and the similarity measures for source sentences and translation candidates are symmetrical sentence level BLEU (equation (10)). |
Features and Training | Therefore, we can alternatively update graph-based consensus features and feature weights in the log-linear model. |
Graph-based Translation Consensus | Our MT system with graph-based translation consensus adopts the conventional log-linear model. |
Abstract | Och (2003) proposed using a log-linear model to incorporate multiple features for translation, and proposed a minimum error rate training (MERT) method to train the feature weights to optimize a desirable translation metric. |
Abstract | While the log-linear model itself is discriminative, the phrase and lexicon translation features, which are among the most important components of SMT, are derived from either generative models or heuristics (Koehn et al., 2003, Brown et al., 1993). |
Abstract | In that work, multiple features, most of them are derived from generative models, are incorporated into a log-linear model, and the relative weights of them are tuned discriminatively on a small tuning set. |
Data and task | (2004) to train a log-linear model for projection. |
Data and task | Compared to the joint log-linear model of Burkett et al. |
Introduction | (2010a), our model can incorporate both monolingual and bilingual features in a log-linear framework. |
Learning and Inference | These features are incorporated as a log-linear dis- |
Learning and Inference | For each word in a mention 10, we introduced 12 binary features f for our featurized log-linear distribution (§3.1.2). |
Model | This uses a log-linear distribution with partition function Z. |