Argument Identification | where p9 is a log-linear model normalized over the set Ry, with features described in Table 1. |
Argument Identification | Inference Although our learning mechanism uses a local log-linear model, we perform inference globally on a per-frame basis by applying hard structural constraints. |
Discussion | We believe that the WSABIE EMBEDDING model performs better than the LOG-LINEAR EMBEDDING baseline (that uses the same input representation) because the former setting allows examples with different labels and confusion sets to share information; this is due to the fact that all labels live in the same label space, and a single projection matrix is shared across the examples to map the input features to this space. |
Discussion | Consequently, the WSABIE EMBEDDING model can share more information between different examples in the training data than the LOG-LINEAR EMBEDDING model. |
Discussion | Since the LOG-LINEAR WORDS model always performs better than the LOG-LINEAR EMBEDDING model, we conclude that the primary benefit does not come from the input embedding representation.15 |
Experiments | The baselines use a log-linear model that models the following probability at training time: |
Experiments | For comparison with our model from §3, which we call WSABIE EMBEDDING, we implemented two baselines with the log-linear model. |
Experiments | We call this baseline LOG-LINEAR WORDS. |
Expected BLEU Training | We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003). |
Expected BLEU Training | We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1): |
Experiments | Log-linear weights are tuned with MERT. |
Experiments | Log-linear weights are estimated on the 2009 data set comprising 2525 sentences. |
Experiments | ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set, augmented by scores corresponding to the neural network models. |
Introduction | The expected BLEU objective provides an efficient way of achieving this for machine translation (Rosti et al., 2010; Rosti et al., 2011; He and Deng, 2012; Gao and He, 2013; Gao et al., 2014) instead of solely relying on traditional optimizers such as Minimum Error Rate Training (MERT) that only adjust the weighting of entire component models within the log-linear framework of machine translation (§3). |
Data | While the character clustering stage is essentially performing proper noun coreference resolution, approximately 74% of references to characters in books come in the form of pronouns.5 To resolve this more difficult class at the scale of an entire book, we train a log-linear discriminative classifier only on the task of resolving pronominal anaphora (i.e., ignoring generic noun phrases such as the paint or the rascal). |
Data | To manage the degrees of freedom in the model described in §4, we perform dimensionality reduction on the vocabulary by learning word embed-dings with a log-linear continuous skip-gram language model (Mikolov et al., 2013) on the entire collection of 15,099 books. |
Experiments | A Basic persona model, which ablates author information but retains the same log-linear architecture; here, the n-vector is of size P + 1 and does not model author effects. |
Model | In order to separate out the effects that a character’s persona has on the words that are associated with them (as opposed to other factors, such as time period, genre, or author), we adopt a hierarchical Bayesian approach in which the words we observe are generated conditional on a combination of different effects captured in a log-linear (or “maximum entropy”) distribution. |
Model | This SAGE model can be understood as a log-linear distribution with three kinds of features (metadata, persona, and back- |
Model | Number of personas (hyperparameter) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter |
Abstract | We present a method to jointly learn features and weights directly from distributional data in a log-linear framework. |
Abstract | The model uses an Indian Buffet Process prior to learn the feature values used in the log-linear method, and is the first algorithm for learning phonological constraints without presupposing constraint structure. |
Introduction | These constraint-driven decisions can be modeled with a log-linear system. |
Introduction | We consider this question by examining the dominant framework in modern phonology, Optimality Theory (Prince and Smolensky, 1993, OT), implemented in a log-linear framework, MaXEnt OT (Goldwater and Johnson, 2003), with output forms’ probabilities based on a weighted sum of |
Phonology and Optimality Theory 2.1 OT structure | In IBPOT, we use the log-linear EVAL developed by Goldwater and J ohn-son (2003) in their MaxEnt OT system. |
The IBPOT Model | The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function: |
Experiments | We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012). |
Introduction | We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task. |
Topic Similarity Model with Neural Network | The similarity scores are integrated into the standard log-linear model for making translation decisions. |
Topic Similarity Model with Neural Network | We incorporate the learned topic similarity scores into the standard log-linear framework for SMT. |
Topic Similarity Model with Neural Network | In addition to traditional SMT features, we add new topic-related features into the standard log-linear framework. |
Introduction | Word embedding is used as the input to learn translation confidence score, which is combined with commonly used features in the conventional log-linear model. |
Our Model | The difference between our model and the conventional log-linear model includes: |
Phrase Pair Embedding | Instead of integrating the sparse features directly into the log-linear model, we use them as the input to learn a phrase pair embedding. |
Phrase Pair Embedding | To train the neural network, we add the confidence scores to the conventional log-linear model as features. |
Related Work | Together with other commonly used features, the translation confidence score is integrated into a conventional log-linear model. |
Experimental Setup | Determining h for each predicate yields a regular log-linear binary classification model. |
Our Proposal: A Latent LC Approach | We address the task with a latent variable log-linear model, representing the LCs of the predicates. |
Our Proposal: A Latent LC Approach | The introduction of latent variables into the log-linear model leads to a non-convex objective function. |
Our Proposal: A Latent LC Approach | Once h has been fixed, the model collapses to a convex log-linear model. |
A semantic span can include one or more eus. | Following Och and Ney (2002), our model is framed as a log-linear model: |
A semantic span can include one or more eus. | courage the decoder to generate transitional words and phrases; the score is utilized as an additional feature hk (es, ft) in the log-linear model. |
A semantic span can include one or more eus. | In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen. |
Conclusion | Our contributions can be summarized as: l) the new translation rules are more discriminative and sensitive to cohesive information by converting the source string into a CSS-based tagged-flattened string; 2) the new additional features embedded in the log-linear model can encourage the decoder to produce transitional expressions. |
Batch Models | Our goal is assign to a category each user of interest 2),- based on f Here we focus on a binary assignment into the categories Democratic D or Republican R. The log-linear |
Batch Models | 7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013). |
Batch Models | The corresponding log-linear model is defined as: |
Experimental Setup | We experiment with log-linear models defined in Eq. |
Detailed generative story | This is a conditional log-linear model parameterized by qb, where gbk, ~ N(0, 0,3). |
Detailed generative story | When 6L is the special end-of-string symbol #, the only allowed edits are the insertion (g) and the substitution We define the edit probability using a locally normalized log-linear model: |
Experiments | We leave other hyperparameters fixed: 16 latent topics, and Gaussian priors N (0, l) on all log-linear parameters. |
Method | (2013a) propose two log-linear models, namely the Skip-gram and CBOW model, to efficiently induce word embeddings. |
Method | The Skip-gram model adopts log-linear classifiers to predict context words given the current word w(t) as input. |
Method | Then, log-linear classifiers are employed, taking the embedding as input and predict w(t)’s context words within a certain range, e.g. |
Related Work | Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding. |
Unified Linguistic Reordering Models | For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq. |
Unified Linguistic Reordering Models | For the semantic reordering models, we also add two new features into the log-linear translation model. |
Related Work | (2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model. |
Semi-Supervised Deep Auto-encoder Features Learning for SMT | To combine these learned features (DBN and DAB feature) into the log-linear model, we need to eliminate the impact of the nonlinear learning mechanism. |
Abstract | Our approach defines a log-linear model over latent extraction predicates, which select lists of entities from the web page. |
Approach | Given a query cc and a web page 212, we define a log-linear distribution over all extraction predicates z E Z(w) as |
Approach | To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z. |
Experimental Setup | The decoder’s log-linear model includes a standard feature set. |
Experimental Setup | The decoder’s log-linear model is tuned with MERT (Och, 2003). |
Experimental Setup | Both the decoder’s log-linear model and the re-ranking models are trained on the same development set. |