Index of papers in Proc. ACL that mention
  • log-linear model
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Abstract
We propose a log-linear model to compute the paraphrase likelihood of two patterns and exploit feature functions based on maximum likelihood estimation (MLE) and lexical weighting (LW).
Conclusion
We use a log-linear model to compute the paraphrase likelihood and exploit feature functions based on MLE and LW.
Conclusion
In addition, the log-linear model with the proposed feature functions significantly outperforms the conventional models.
Experiments
4.1 Evaluation of the Log-linear Model
Experiments
As previously mentioned, in the log-linear model of this paper, we use both MLE based and LW based feature functions.
Experiments
In this section, we evaluate the log-linear model (LL-Model) and compare it with the MLE based model (MLE-Model) presented by Bannard and Callison-Burch (2005)6.
Introduction
parsing and English-foreign language word alignment, (2) aligned patterns induction, which produces English patterns along with the aligned pivot patterns in the foreign language, (3) paraphrase patterns extraction, in which paraphrase patterns are extracted based on a log-linear model .
Introduction
Secondly, we propose a log-linear model for computing the paraphrase likelihood.
Introduction
Besides, the log-linear model is more effective than the conventional model presented in (Bannard and Callison-Burch, 2005).
Proposed Method
In order to exploit more and richer information to estimate the paraphrase likelihood, we propose a log-linear model:
Proposed Method
In this paper, 4 feature functions are used in our log-linear model , which include:
log-linear model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Abstract
Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential.
Introduction
Recently, great progress has been achieved in SMT, especially since Och and Ney (2002) proposed the log-linear model: almost all the state-of-the-art SMT systems are based on the log-linear model .
Introduction
Regardless of how successful the log-linear model is in SMT, it still has some shortcomings.
Introduction
Compared with the log-linear model , it has more powerful expressive abilities and can deeply interpret and represent features with hidden units in neural networks.
log-linear model is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting
Abstract
This paper describes log-linear models for a general-purpose sentence realizer based on dependency structures.
Abstract
Then the best linearizations compatible with the relative order are selected by log-linear models .
Abstract
The log-linear models incorporate three types of feature functions, including dependency relations, surface words and headwords.
Introduction
The other is log-linear model with different syntactic and semantic features (Velldal and Oepen, 2005; Nakanishi et al., 2005; Cahill et al., 2007).
Introduction
Compared with n-gram model, log-linear model is more powerful in that it is easy to integrate a variety of features, and to tune feature weights to maximize the probability.
Introduction
This paper presents a general-purpose realizer based on log-linear models for directly linearizing dependency relations given dependency structures.
Log-linear Models
We use log-linear models for selecting the sequence with the highest probability from all the possible linearizations of a subtree.
Log-linear Models
4.1 The Log-linear Model
Log-linear Models
Log-linear models employ a set of feature functions to describe properties of the data, and a set of learned weights to determine the contribution of each feature.
log-linear model is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Abstract
Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for Ll-regularized log-linear models .
Introduction
Log-linear models (a.k.a maximum entropy models) are one of the most widely-used probabilistic models in the field of natural language processing (NLP).
Introduction
Log-linear models have a major advantage over other
Introduction
Kazama and Tsujii (2003) describe a method for training a Ll-regularized log-linear model with a bound constrained version of the BFGS algorithm (Nocedal, 1980).
Log-Linear Models
In this section, we briefly describe log-linear models used in NLP tasks and L1 regularization.
Log-Linear Models
A log-linear model defines the following probabilistic distribution over possible structure y for input x:
Log-Linear Models
The weights of the features in a log-linear model are optimized in such a way that they maximize the regularized conditional log-likelihood of the training data:
log-linear model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman
Argument Identification
where p9 is a log-linear model normalized over the set Ry, with features described in Table 1.
Argument Identification
Inference Although our learning mechanism uses a local log-linear model , we perform inference globally on a per-frame basis by applying hard structural constraints.
Discussion
However, since the input representation is shared across all frames, every other training example from all the lexical units affects the optimal estimate, since they all modify the joint parameter matrix M. By contrast, in the log-linear models each label has its own set of parameters, and they interact only via the normalization constant.
Discussion
They also use a log-linear model , but they incorporate a latent variable that uses WordNet (Fellbaum, 1998) to get lexical-semantic relationships and smooths over frames for ambiguous lexical units.
Discussion
Another difference is that when training the log-linear model , they normalize over all frames, while we normalize over the allowed frames for the current lexical unit.
Experiments
The baselines use a log-linear model that models the following probability at training time:
Experiments
For comparison with our model from §3, which we call WSABIE EMBEDDING, we implemented two baselines with the log-linear model .
Experiments
So the second baseline has the same input representation as WSABIE EMBEDDING but uses a log-linear model instead of WSABIE.
log-linear model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
Note that under the log-linear model , applying threshold for filtering is equivalent to comparing the “likelihood” ratio.
Conclusions
In this paper, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall.
Discussions
The generic phrase training algorithm follows an information retrieval perspective as in (Venugopal et al., 2003) but aims to improve both precision and recall with the trainable log-linear model .
Discussions
Under the general framework, one can put as many features as possible together under the log-linear model to evaluate the quality of a phrase and a phase pair.
Experimental Results
Our decoder is a phrase-based multi-stack implementation of the log-linear model similar to Pharaoh (Koehn et al., 2003).
Experimental Results
Like other log-linear model based decoders, active features in our translation engine include translation models in two directions, lexicon weights in two directions, language model, lexicalized distortion models, sentence length penalty and other heuristics.
Experimental Results
Since the translation engine implements a log-linear model , the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process.
log-linear model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Cahill, Aoife and Riester, Arndt
Abstract
We build a log-linear model that incorporates these asymmetries for ranking German string reali-sations from input LFG F-structures.
Generation Ranking
(2007), a log-linear model based on the Lexical Functional Grammar (LFG) Framework (Kaplan and Bresnan, 1982).
Generation Ranking
(2007) describe a log-linear model that uses linguistically motivated features and improves over a simple trigram language model baseline.
Generation Ranking
We take this log-linear model as our starting point.3
Generation Ranking Experiments
We tune the parameters of the log-linear model on a small development set of 63 sentences, and carry out the final evaluation on 261 unseen sentences.
Generation Ranking Experiments
We evaluate the string chosen by the log-linear model against the original treebank string in terms of exact match and BLEU score (Papineni et al.,
log-linear model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Discriminative Synchronous Transduction
3.1 A global log-linear model
Discriminative Synchronous Transduction
Our findings echo those observed for latent variable log-linear models successfully used in monolingual parsing (Clark and Curran, 2007; Petrov et al., 2007).
Discriminative Synchronous Transduction
This method has been demonstrated to be effective for (non-convex) log-linear models with latent variables (Clark and Curran, 2004; Petrov et al., 2007).
Introduction
First, we develop a log-linear model of translation which is globally trained on a significant number of parallel sentences.
log-linear model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Smith, Noah A.
Product of Experts
For these bigram or trigram overlap features, a similar log-linear model has to be normalized with a partition function, which considers the (unnormalized) scores of all possible target sentences, given the source sentence.
QG for Paraphrase Modeling
5 We use log-linear models three times: for the configuration, the lexical semantics class, and the word.
QG for Paraphrase Modeling
(2007),6 we employ a 14-feature log-linear model over all logically possible combinations of the 14 WordNet relations (Miller, 1995).7 Similarly to Eq.
QG for Paraphrase Modeling
14, we normalize this log-linear model based on the set of relations that are nonempty in WordNet for the word 3360-).
log-linear model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Introduction
Word embedding is used as the input to learn translation confidence score, which is combined with commonly used features in the conventional log-linear model .
Our Model
The difference between our model and the conventional log-linear model includes:
Phrase Pair Embedding
Instead of integrating the sparse features directly into the log-linear model , we use them as the input to learn a phrase pair embedding.
Phrase Pair Embedding
To train the neural network, we add the confidence scores to the conventional log-linear model as features.
Related Work
Together with other commonly used features, the translation confidence score is integrated into a conventional log-linear model .
log-linear model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Barzilay, Regina and Globerson, Amir
A Joint Model for Two Formalisms
Instead, we assume that the distribution over yCFG is a log-linear model with parameters 601:0 (i.e., a sub-vector of 6) , namely:
Evaluation Setup
In this setup, the model reduces to a normal log-linear model for the target formalism.
Experiment and Analysis
It’s not surprising that Cahill’s model outperforms our log-linear model because it relies heavily on handcrafted rules optimized for the dataset.
Features
Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree.
log-linear model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Baselines
where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model , set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Ensemble Decoding
In the typical log-linear model SMT, the posterior
Ensemble Decoding
Since in log-linear models , the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale.
Experiments & Results 4.1 Experimental Setup
It was filtered to retain the top 20 translations for each source phrase using the TM part of the current log-linear model .
log-linear model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Conclusion and Future Work
The consensus statistics are integrated into the conventional log-linear model as features.
Experiments and Results
Instead of using graph-based consensus confidence as features in the log-linear model , we perform structured label propagation (Struct-LP) to re-rank the n-best list directly, and the similarity measures for source sentences and translation candidates are symmetrical sentence level BLEU (equation (10)).
Features and Training
Therefore, we can alternatively update graph-based consensus features and feature weights in the log-linear model .
Graph-based Translation Consensus
Our MT system with graph-based translation consensus adopts the conventional log-linear model .
log-linear model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob
Abstract
We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
Introduction
Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
Lexical-phonetic model
(2008), we parameterize these distributions with a log-linear model .
Lexical-phonetic model
In modern phonetics and phonology, these generalizations are usually expressed as Optimality Theory constraints; log-linear models such as ours have previously been used to implement stochas-
log-linear model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
Following Och and Ney (2002), our model is framed as a log-linear model:
A semantic span can include one or more eus.
courage the decoder to generate transitional words and phrases; the score is utilized as an additional feature hk (es, ft) in the log-linear model .
A semantic span can include one or more eus.
In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen.
Conclusion
Our contributions can be summarized as: l) the new translation rules are more discriminative and sensitive to cohesive information by converting the source string into a CSS-based tagged-flattened string; 2) the new additional features embedded in the log-linear model can encourage the decoder to produce transitional expressions.
log-linear model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Related Work
(2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model : linearity and the lack of deep interpretation and representation in features.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
To combine these learned features (DBN and DAB feature) into the log-linear model , we need to eliminate the impact of the nonlinear learning mechanism.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
The decoder’s log-linear model includes a standard feature set.
Experimental Setup
The decoder’s log-linear model is tuned with MERT (Och, 2003).
Experimental Setup
Both the decoder’s log-linear model and the re-ranking models are trained on the same development set.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012).
Introduction
We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
Topic Similarity Model with Neural Network
The similarity scores are integrated into the standard log-linear model for making translation decisions.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Our Proposal: A Latent LC Approach
We address the task with a latent variable log-linear model , representing the LCs of the predicates.
Our Proposal: A Latent LC Approach
The introduction of latent variables into the log-linear model leads to a non-convex objective function.
Our Proposal: A Latent LC Approach
Once h has been fixed, the model collapses to a convex log-linear model .
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Coppersmith, Glen and Van Durme, Benjamin
Batch Models
7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013).
Batch Models
The corresponding log-linear model is defined as:
Experimental Setup
We experiment with log-linear models defined in Eq.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Experimental Setup
But instead of using just the PMI scores of bilingual NE pairs, as in our work, they employed a feature-rich log-linear model to capture bilingual correlations.
Experimental Setup
Parameters in their log-linear model require training with bilingually annotated data, which is not readily available.
Related Work
(2010a) presented a supervised learning method for performing joint parsing and word alignment using log-linear models over parse trees and an ITG model over alignment.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Choudhury, Pallavi and Quirk, Chris and Dolan, Bill and Zettlemoyer, Luke
Building Dialog Trees from Instructions
Given a single instruction 2' with category au, we use a log-linear model to represent the distri-
Understanding Initial Queries
We employ a log-linear model and try to maximize initial dialog state distribution over the space of all nodes in a dialog network:
Understanding Query Refinements
Dialog State Update Model We use a log-linear model to maximize a dialog state distribution over the space of all nodes in a dialog network:
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Durrett, Greg and Hall, David and Klein, Dan
Inference
We then report the corresponding chains 0(a) as the system output.3 For learning, the gradient takes the standard form of the gradient of a log-linear model , a difference of expected feature counts under the gold annotation and under no annotation.
Introduction
We use a log-linear model that can be expressed as a factor graph.
Models
The final log-linear model is given by the following formula:
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Markov Topic Regression - MTR
log-linear models with parameters, AiméRM , is
Markov Topic Regression - MTR
labeled data, 712?, based on the log-linear model in Eq.
Semi-Supervised Semantic Labeling
The a: is used as the input matrix of the kth log-linear model (corresponding to kth semantic tag (topic)) to infer the [3 hyper-parameter of MTR in Eq.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
Och (2003) proposed using a log-linear model to incorporate multiple features for translation, and proposed a minimum error rate training (MERT) method to train the feature weights to optimize a desirable translation metric.
Abstract
While the log-linear model itself is discriminative, the phrase and lexicon translation features, which are among the most important components of SMT, are derived from either generative models or heuristics (Koehn et al., 2003, Brown et al., 1993).
Abstract
In that work, multiple features, most of them are derived from generative models, are incorporated into a log-linear model , and the relative weights of them are tuned discriminatively on a small tuning set.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
where Pr(e| f) is the probability that e is the translation of the given source string f. To model the posterior probability Pr(e| f) , most of the state-of-the-art SMT systems utilize the log-linear model proposed by Och and Ney (2002), as follows,
Background
In this paper, u denotes a log-linear model that has Mfixed features {h1(f,e), ..., hM(f,e)}, ,1 = {3.1, ..., AM} denotes the M parameters of u, and u(/1) denotes a SMT system based on u with parameters ,1.
Background
In this paper, we use the term training set to emphasize the training of log-linear model .
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
The requirement for a log-linear model aims to provide a natural way to integrate the new co-decoding features.
Collaborative Decoding
Referring to the log-linear model formulation, the translation posterior P(e'|dk) can be computed as:
Conclusion
In this paper, we present a framework of collaborative decoding, in which multiple MT decoders are coordinated to search for better translations by re-ranking partial hypotheses using augmented log-linear models with translation consensus -based features.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Consensus Decoding Algorithms
The distribution P(e| f) can be induced from a translation system’s features and weights by expo-nentiating with base I) to form a log-linear model:
Experimental Results
The log-linear model weights were trained using MIRA, a margin-based optimization procedure that accommodates many features (Crammer and Singer, 2003; Chiang et al., 2008).
Experimental Results
We tuned b, the base of the log-linear model , to optimize consensus decoding performance.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cherry, Colin
Cohesive Decoding
This count becomes a feature in the decoder’s log-linear model , the weight of which is trained with MERT.
Experiments
Weights for the log-linear model are set using MERT, as implemented by Venugopal and Vogel (2005).
Experiments
Since adding features to the decoder’s log-linear model is straightforward, we also experiment with a combined system that uses both the cohesion constraint and a lexical reordering model.
log-linear model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: