Abstract | Most statistical machine translation (SMT) systems are modeled using a log-linear framework. |
Abstract | Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. |
Abstract | additive neural networks, for SMT to go beyond the log-linear translation model. |
Introduction | Recently, great progress has been achieved in SMT, especially since Och and Ney (2002) proposed the log-linear model: almost all the state-of-the-art SMT systems are based on the log-linear model. |
Introduction | Regardless of how successful the log-linear model is in SMT, it still has some shortcomings. |
Introduction | Compared with the log-linear model, it has more powerful expressive abilities and can deeply interpret and represent features with hidden units in neural networks. |
A Joint Model for Two Formalisms | As is standard in such settings, the distribution will be log-linear in a set of features of these parses. |
A Joint Model for Two Formalisms | Instead, we assume that the distribution over yCFG is a log-linear model with parameters 601:0 (i.e., a sub-vector of 6) , namely: |
Evaluation Setup | In this setup, the model reduces to a normal log-linear model for the target formalism. |
Experiment and Analysis | It’s not surprising that Cahill’s model outperforms our log-linear model because it relies heavily on handcrafted rules optimized for the dataset. |
Features | Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree. |
Markov Topic Regression - MTR | log-linear models with parameters, AiméRM , is |
Markov Topic Regression - MTR | trained to predict Blgikfik, for each w, of a tag 81.32 BEE) = exp(f(wl; A?» (2) where the log-linear function f is: n23} = m; A295 = 9531905771 (3) |
Markov Topic Regression - MTR | labeled data, 712?, based on the log-linear model in Eq. |
Semi-Supervised Semantic Labeling | The a: is used as the input matrix of the kth log-linear model (corresponding to kth semantic tag (topic)) to infer the [3 hyper-parameter of MTR in Eq. |
Inference | We then report the corresponding chains 0(a) as the system output.3 For learning, the gradient takes the standard form of the gradient of a log-linear model, a difference of expected feature counts under the gold annotation and under no annotation. |
Introduction | We use a log-linear model that can be expressed as a factor graph. |
Models | Each unary factor A,- has a log-linear form with features examining mention 2', its selected antecedent ai, and the document context at. |
Models | The final log-linear model is given by the following formula: |
Comparative Study | (4) The orientation probability is modeled in a log-linear framework using a set of N feature func-tiOIlS €{,i,j, de-IJH), n = 1, . |
Comparative Study | Finally, in the log-linear framework (Equation 2) a new jump model is added which uses the reordered source sentence to calculate the cost. |
Tagging-style Reordering Model | The number of source words that have inconsistent labels is the penalty and is then added into the log-linear framework as a new feature. |
Translation System Overview | We model Pr(e{|fi]) directly using a log-linear combination of several models (Och and Ney, |
Translation Model Architecture | Our translation model is embedded in a log-linear model as is common for SMT, and treated as a single translation model in this log-linear combination. |
Translation Model Architecture | Log-linear weights are optimized using MERT (Och and Ney, 2003). |
Translation Model Architecture | Future work could involve merging our translation model framework with the online adaptation of other models, or the log-linear weights. |
Learning | The noise model that gbc parameterizes is a local log-linear model, so we follow the approach of Berg-Kirkpatrick et al. |
Model | logistic( Z [M917 kw - Wedgie/D k’ :1 The fact that the parameterization is log-linear will ensure that, during the unsupervised learning process, updating the shape parameters gbc is simple and feasible. |
Results and Analysis | (2010), we use a regularization term in the optimization of the log-linear model parameters (15¢ during the M-step. |
Experiments | One is the log-linear combination of TMs trained on each subcorpus (Koehn and Schroeder, 2007), with weights of each model tuned under minimal error rate training using MIRA. |
Introduction | Research on mixture models has considered both linear and log-linear mixtures. |
Introduction | (Koehn and Schroeder, 2007), instead, opted for combining the sub-models directly in the SMT log-linear framework. |
Corpus Data and Baseline SMT | Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric. |
Incremental Topic-Based Adaptation | We add this feature to the log-linear translation model with its own weight, which is tuned with MERT. |
Introduction | Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model. |
Building Dialog Trees from Instructions | Given a single instruction 2' with category au, we use a log-linear model to represent the distri- |
Understanding Initial Queries | We employ a log-linear model and try to maximize initial dialog state distribution over the space of all nodes in a dialog network: |
Understanding Query Refinements | Dialog State Update Model We use a log-linear model to maximize a dialog state distribution over the space of all nodes in a dialog network: |
Experimental Setup | But instead of using just the PMI scores of bilingual NE pairs, as in our work, they employed a feature-rich log-linear model to capture bilingual correlations. |
Experimental Setup | Parameters in their log-linear model require training with bilingually annotated data, which is not readily available. |
Related Work | (2010a) presented a supervised learning method for performing joint parsing and word alignment using log-linear models over parse trees and an ITG model over alignment. |