Index of papers in Proc. ACL 2014 that mention
  • log-linear
Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman
Argument Identification
where p9 is a log-linear model normalized over the set Ry, with features described in Table 1.
Argument Identification
Inference Although our learning mechanism uses a local log-linear model, we perform inference globally on a per-frame basis by applying hard structural constraints.
Discussion
We believe that the WSABIE EMBEDDING model performs better than the LOG-LINEAR EMBEDDING baseline (that uses the same input representation) because the former setting allows examples with different labels and confusion sets to share information; this is due to the fact that all labels live in the same label space, and a single projection matrix is shared across the examples to map the input features to this space.
Discussion
Consequently, the WSABIE EMBEDDING model can share more information between different examples in the training data than the LOG-LINEAR EMBEDDING model.
Discussion
Since the LOG-LINEAR WORDS model always performs better than the LOG-LINEAR EMBEDDING model, we conclude that the primary benefit does not come from the input embedding representation.15
Experiments
The baselines use a log-linear model that models the following probability at training time:
Experiments
For comparison with our model from §3, which we call WSABIE EMBEDDING, we implemented two baselines with the log-linear model.
Experiments
We call this baseline LOG-LINEAR WORDS.
log-linear is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Expected BLEU Training
We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003).
Expected BLEU Training
We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1):
Experiments
Log-linear weights are tuned with MERT.
Experiments
Log-linear weights are estimated on the 2009 data set comprising 2525 sentences.
Experiments
ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set, augmented by scores corresponding to the neural network models.
Introduction
The expected BLEU objective provides an efficient way of achieving this for machine translation (Rosti et al., 2010; Rosti et al., 2011; He and Deng, 2012; Gao and He, 2013; Gao et al., 2014) instead of solely relying on traditional optimizers such as Minimum Error Rate Training (MERT) that only adjust the weighting of entire component models within the log-linear framework of machine translation (§3).
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Underwood, Ted and Smith, Noah A.
Data
While the character clustering stage is essentially performing proper noun coreference resolution, approximately 74% of references to characters in books come in the form of pronouns.5 To resolve this more difficult class at the scale of an entire book, we train a log-linear discriminative classifier only on the task of resolving pronominal anaphora (i.e., ignoring generic noun phrases such as the paint or the rascal).
Data
To manage the degrees of freedom in the model described in §4, we perform dimensionality reduction on the vocabulary by learning word embed-dings with a log-linear continuous skip-gram language model (Mikolov et al., 2013) on the entire collection of 15,099 books.
Experiments
A Basic persona model, which ablates author information but retains the same log-linear architecture; here, the n-vector is of size P + 1 and does not model author effects.
Model
In order to separate out the effects that a character’s persona has on the words that are associated with them (as opposed to other factors, such as time period, genre, or author), we adopt a hierarchical Bayesian approach in which the words we observe are generated conditional on a combination of different effects captured in a log-linear (or “maximum entropy”) distribution.
Model
This SAGE model can be understood as a log-linear distribution with three kinds of features (metadata, persona, and back-
Model
Number of personas (hyperparameter) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Doyle, Gabriel and Bicknell, Klinton and Levy, Roger
Abstract
We present a method to jointly learn features and weights directly from distributional data in a log-linear framework.
Abstract
The model uses an Indian Buffet Process prior to learn the feature values used in the log-linear method, and is the first algorithm for learning phonological constraints without presupposing constraint structure.
Introduction
These constraint-driven decisions can be modeled with a log-linear system.
Introduction
We consider this question by examining the dominant framework in modern phonology, Optimality Theory (Prince and Smolensky, 1993, OT), implemented in a log-linear framework, MaXEnt OT (Goldwater and Johnson, 2003), with output forms’ probabilities based on a weighted sum of
Phonology and Optimality Theory 2.1 OT structure
In IBPOT, we use the log-linear EVAL developed by Goldwater and J ohn-son (2003) in their MaxEnt OT system.
The IBPOT Model
The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function:
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012).
Introduction
We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
Topic Similarity Model with Neural Network
The similarity scores are integrated into the standard log-linear model for making translation decisions.
Topic Similarity Model with Neural Network
We incorporate the learned topic similarity scores into the standard log-linear framework for SMT.
Topic Similarity Model with Neural Network
In addition to traditional SMT features, we add new topic-related features into the standard log-linear framework.
log-linear is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Introduction
Word embedding is used as the input to learn translation confidence score, which is combined with commonly used features in the conventional log-linear model.
Our Model
The difference between our model and the conventional log-linear model includes:
Phrase Pair Embedding
Instead of integrating the sparse features directly into the log-linear model, we use them as the input to learn a phrase pair embedding.
Phrase Pair Embedding
To train the neural network, we add the confidence scores to the conventional log-linear model as features.
Related Work
Together with other commonly used features, the translation confidence score is integrated into a conventional log-linear model.
log-linear is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Experimental Setup
Determining h for each predicate yields a regular log-linear binary classification model.
Our Proposal: A Latent LC Approach
We address the task with a latent variable log-linear model, representing the LCs of the predicates.
Our Proposal: A Latent LC Approach
The introduction of latent variables into the log-linear model leads to a non-convex objective function.
Our Proposal: A Latent LC Approach
Once h has been fixed, the model collapses to a convex log-linear model.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
Following Och and Ney (2002), our model is framed as a log-linear model:
A semantic span can include one or more eus.
courage the decoder to generate transitional words and phrases; the score is utilized as an additional feature hk (es, ft) in the log-linear model.
A semantic span can include one or more eus.
In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen.
Conclusion
Our contributions can be summarized as: l) the new translation rules are more discriminative and sensitive to cohesive information by converting the source string into a CSS-based tagged-flattened string; 2) the new additional features embedded in the log-linear model can encourage the decoder to produce transitional expressions.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Coppersmith, Glen and Van Durme, Benjamin
Batch Models
Our goal is assign to a category each user of interest 2),- based on f Here we focus on a binary assignment into the categories Democratic D or Republican R. The log-linear
Batch Models
7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013).
Batch Models
The corresponding log-linear model is defined as:
Experimental Setup
We experiment with log-linear models defined in Eq.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Andrews, Nicholas and Eisner, Jason and Dredze, Mark
Detailed generative story
This is a conditional log-linear model parameterized by qb, where gbk, ~ N(0, 0,3).
Detailed generative story
When 6L is the special end-of-string symbol #, the only allowed edits are the insertion (g) and the substitution We define the edit probability using a locally normalized log-linear model:
Experiments
We leave other hyperparameters fixed: 16 latent topics, and Gaussian priors N (0, l) on all log-linear parameters.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting
Method
(2013a) propose two log-linear models, namely the Skip-gram and CBOW model, to efficiently induce word embeddings.
Method
The Skip-gram model adopts log-linear classifiers to predict context words given the current word w(t) as input.
Method
Then, log-linear classifiers are employed, taking the embedding as input and predict w(t)’s context words within a certain range, e.g.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Related Work
Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding.
Unified Linguistic Reordering Models
For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq.
Unified Linguistic Reordering Models
For the semantic reordering models, we also add two new features into the log-linear translation model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Related Work
(2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
To combine these learned features (DBN and DAB feature) into the log-linear model, we need to eliminate the impact of the nonlinear learning mechanism.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pasupat, Panupong and Liang, Percy
Abstract
Our approach defines a log-linear model over latent extraction predicates, which select lists of entities from the web page.
Approach
Given a query cc and a web page 212, we define a log-linear distribution over all extraction predicates z E Z(w) as
Approach
To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
The decoder’s log-linear model includes a standard feature set.
Experimental Setup
The decoder’s log-linear model is tuned with MERT (Och, 2003).
Experimental Setup
Both the decoder’s log-linear model and the re-ranking models are trained on the same development set.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: