Argument Identification | where p9 is a log-linear model normalized over the set Ry, with features described in Table 1. |
Argument Identification | Inference Although our learning mechanism uses a local log-linear model , we perform inference globally on a per-frame basis by applying hard structural constraints. |
Discussion | However, since the input representation is shared across all frames, every other training example from all the lexical units affects the optimal estimate, since they all modify the joint parameter matrix M. By contrast, in the log-linear models each label has its own set of parameters, and they interact only via the normalization constant. |
Discussion | They also use a log-linear model , but they incorporate a latent variable that uses WordNet (Fellbaum, 1998) to get lexical-semantic relationships and smooths over frames for ambiguous lexical units. |
Discussion | Another difference is that when training the log-linear model , they normalize over all frames, while we normalize over the allowed frames for the current lexical unit. |
Experiments | The baselines use a log-linear model that models the following probability at training time: |
Experiments | For comparison with our model from §3, which we call WSABIE EMBEDDING, we implemented two baselines with the log-linear model . |
Experiments | So the second baseline has the same input representation as WSABIE EMBEDDING but uses a log-linear model instead of WSABIE. |
Introduction | Word embedding is used as the input to learn translation confidence score, which is combined with commonly used features in the conventional log-linear model . |
Our Model | The difference between our model and the conventional log-linear model includes: |
Phrase Pair Embedding | Instead of integrating the sparse features directly into the log-linear model , we use them as the input to learn a phrase pair embedding. |
Phrase Pair Embedding | To train the neural network, we add the confidence scores to the conventional log-linear model as features. |
Related Work | Together with other commonly used features, the translation confidence score is integrated into a conventional log-linear model . |
A semantic span can include one or more eus. | Following Och and Ney (2002), our model is framed as a log-linear model: |
A semantic span can include one or more eus. | courage the decoder to generate transitional words and phrases; the score is utilized as an additional feature hk (es, ft) in the log-linear model . |
A semantic span can include one or more eus. | In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen. |
Conclusion | Our contributions can be summarized as: l) the new translation rules are more discriminative and sensitive to cohesive information by converting the source string into a CSS-based tagged-flattened string; 2) the new additional features embedded in the log-linear model can encourage the decoder to produce transitional expressions. |
Our Proposal: A Latent LC Approach | We address the task with a latent variable log-linear model , representing the LCs of the predicates. |
Our Proposal: A Latent LC Approach | The introduction of latent variables into the log-linear model leads to a non-convex objective function. |
Our Proposal: A Latent LC Approach | Once h has been fixed, the model collapses to a convex log-linear model . |
Experiments | We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012). |
Introduction | We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task. |
Topic Similarity Model with Neural Network | The similarity scores are integrated into the standard log-linear model for making translation decisions. |
Related Work | (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model : linearity and the lack of deep interpretation and representation in features. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To combine these learned features (DBN and DAB feature) into the log-linear model , we need to eliminate the impact of the nonlinear learning mechanism. |
Experimental Setup | The decoder’s log-linear model includes a standard feature set. |
Experimental Setup | The decoder’s log-linear model is tuned with MERT (Och, 2003). |
Experimental Setup | Both the decoder’s log-linear model and the re-ranking models are trained on the same development set. |
Batch Models | 7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013). |
Batch Models | The corresponding log-linear model is defined as: |
Experimental Setup | We experiment with log-linear models defined in Eq. |