Index of papers in Proc. ACL that mention
  • log-linear
Zhao, Shiqi and Wang, Haifeng and Liu, Ting and Li, Sheng
Abstract
We propose a log-linear model to compute the paraphrase likelihood of two patterns and exploit feature functions based on maximum likelihood estimation (MLE) and lexical weighting (LW).
Conclusion
We use a log-linear model to compute the paraphrase likelihood and exploit feature functions based on MLE and LW.
Conclusion
In addition, the log-linear model with the proposed feature functions significantly outperforms the conventional models.
Experiments
4.1 Evaluation of the Log-linear Model
Experiments
As previously mentioned, in the log-linear model of this paper, we use both MLE based and LW based feature functions.
Experiments
In this section, we evaluate the log-linear model (LL-Model) and compare it with the MLE based model (MLE-Model) presented by Bannard and Callison-Burch (2005)6.
Introduction
parsing and English-foreign language word alignment, (2) aligned patterns induction, which produces English patterns along with the aligned pivot patterns in the foreign language, (3) paraphrase patterns extraction, in which paraphrase patterns are extracted based on a log-linear model.
Introduction
Secondly, we propose a log-linear model for computing the paraphrase likelihood.
Introduction
Besides, the log-linear model is more effective than the conventional model presented in (Bannard and Callison-Burch, 2005).
Proposed Method
In order to exploit more and richer information to estimate the paraphrase likelihood, we propose a log-linear model:
Proposed Method
In this paper, 4 feature functions are used in our log-linear model, which include:
log-linear is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
liu, lemao and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Abstract
Most statistical machine translation (SMT) systems are modeled using a log-linear framework.
Abstract
Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential.
Abstract
additive neural networks, for SMT to go beyond the log-linear translation model.
Introduction
Recently, great progress has been achieved in SMT, especially since Och and Ney (2002) proposed the log-linear model: almost all the state-of-the-art SMT systems are based on the log-linear model.
Introduction
Regardless of how successful the log-linear model is in SMT, it still has some shortcomings.
Introduction
Compared with the log-linear model, it has more powerful expressive abilities and can deeply interpret and represent features with hidden units in neural networks.
log-linear is mentioned in 34 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting
Abstract
This paper describes log-linear models for a general-purpose sentence realizer based on dependency structures.
Abstract
Then the best linearizations compatible with the relative order are selected by log-linear models.
Abstract
The log-linear models incorporate three types of feature functions, including dependency relations, surface words and headwords.
Introduction
The other is log-linear model with different syntactic and semantic features (Velldal and Oepen, 2005; Nakanishi et al., 2005; Cahill et al., 2007).
Introduction
Compared with n-gram model, log-linear model is more powerful in that it is easy to integrate a variety of features, and to tune feature weights to maximize the probability.
Introduction
This paper presents a general-purpose realizer based on log-linear models for directly linearizing dependency relations given dependency structures.
Log-linear Models
We use log-linear models for selecting the sequence with the highest probability from all the possible linearizations of a subtree.
Log-linear Models
4.1 The Log-linear Model
Log-linear Models
Log-linear models employ a set of feature functions to describe properties of the data, and a set of learned weights to determine the contribution of each feature.
log-linear is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Das, Dipanjan and Weston, Jason and Ganchev, Kuzman
Argument Identification
where p9 is a log-linear model normalized over the set Ry, with features described in Table 1.
Argument Identification
Inference Although our learning mechanism uses a local log-linear model, we perform inference globally on a per-frame basis by applying hard structural constraints.
Discussion
We believe that the WSABIE EMBEDDING model performs better than the LOG-LINEAR EMBEDDING baseline (that uses the same input representation) because the former setting allows examples with different labels and confusion sets to share information; this is due to the fact that all labels live in the same label space, and a single projection matrix is shared across the examples to map the input features to this space.
Discussion
Consequently, the WSABIE EMBEDDING model can share more information between different examples in the training data than the LOG-LINEAR EMBEDDING model.
Discussion
Since the LOG-LINEAR WORDS model always performs better than the LOG-LINEAR EMBEDDING model, we conclude that the primary benefit does not come from the input embedding representation.15
Experiments
The baselines use a log-linear model that models the following probability at training time:
Experiments
For comparison with our model from §3, which we call WSABIE EMBEDDING, we implemented two baselines with the log-linear model.
Experiments
We call this baseline LOG-LINEAR WORDS.
log-linear is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia
Abstract
Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasi-Newton method for Ll-regularized log-linear models.
Introduction
Log-linear models (a.k.a maximum entropy models) are one of the most widely-used probabilistic models in the field of natural language processing (NLP).
Introduction
Log-linear models have a major advantage over other
Introduction
Kazama and Tsujii (2003) describe a method for training a Ll-regularized log-linear model with a bound constrained version of the BFGS algorithm (Nocedal, 1980).
Log-Linear Models
In this section, we briefly describe log-linear models used in NLP tasks and L1 regularization.
Log-Linear Models
A log-linear model defines the following probabilistic distribution over possible structure y for input x:
Log-Linear Models
The weights of the features in a log-linear model are optimized in such a way that they maximize the regularized conditional log-likelihood of the training data:
log-linear is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Cahill, Aoife and Riester, Arndt
Abstract
We investigate the influence of information status (IS) on constituent order in German, and integrate our findings into a log-linear surface realisation ranking model.
Abstract
We build a log-linear model that incorporates these asymmetries for ranking German string reali-sations from input LFG F-structures.
Conclusions
By calculating strong asymmetries between pairs of IS labels, and establishing the most frequent syntactic characteristics of these asymmetries, we designed a new set of features for a log-linear ranking model.
Generation Ranking
(2007), a log-linear model based on the Lexical Functional Grammar (LFG) Framework (Kaplan and Bresnan, 1982).
Generation Ranking
(2007) describe a log-linear model that uses linguistically motivated features and improves over a simple trigram language model baseline.
Generation Ranking
We take this log-linear model as our starting point.3
Generation Ranking Experiments
These are all automatically removed from the list of features to give a total of 130 new features for the log-linear ranking model.
Generation Ranking Experiments
We train the log-linear ranking model on 7759 F-structures from the TIGER treebank.
Generation Ranking Experiments
We tune the parameters of the log-linear model on a small development set of 63 sentences, and carry out the final evaluation on 261 unseen sentences.
log-linear is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Experimental Evaluation
The features are combined in a log-linear way.
Experimental Evaluation
In Table 5 we can see that the performance of the heuristic phrase model can be increased by 0.6 BLEU on TEST by filtering the phrase table to contain the same phrases as the count model and reoptimizing the log-linear model weights.
Experimental Evaluation
Log-linear interpolation of the count model with the heuristic yields a further increase, showing an improvement of 1.3 BLEU on DEV and 1.4 BLEU on TEST over the baseline.
Introduction
The translation process is implemented as a weighted log-linear combination of several models hm(e{, sf , ff) including the logarithm of the phrase probability in source-to-target as well as in target-to-source direction.
Phrase Model Training
The log-linear interpolations pint(f|é) of the phrase translation probabilities are estimated as
Phrase Model Training
As a generalization of the fixed interpolation of the two phrase tables we also experimented with adding the two trained phrase probabilities as additional features to the log-linear framework.
Phrase Model Training
With good log-linear feature weights, feature-wise combination should perform at least as well as fixed interpolation.
log-linear is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Baselines
2.1 Log-Linear Mixture
Baselines
Log-linear translation model (TM) mixtures are of the form:
Baselines
where m ranges over IN and OUT, pm(é| f) is an estimate from a component phrase table, and each Am is a weight in the top-level log-linear model, set so as to maximize dev-set BLEU using minimum error rate training (Och, 2003).
Ensemble Decoding
In the typical log-linear model SMT, the posterior
Ensemble Decoding
log-linear mixture).
Ensemble Decoding
Since in log-linear models, the model scores are not normalized to form probability distributions, the scores that different models assign to each phrase-pair may not be in the same scale.
Experiments & Results 4.1 Experimental Setup
It was filtered to retain the top 20 translations for each source phrase using the TM part of the current log-linear model.
Introduction
In addition to the basic approach of concatenation of in-domain and out-of-domain data, we also trained a log-linear mixture model (Foster and Kuhn, 2007)
Related Work 5.1 Domain Adaptation
Two famous examples of such methods are linear mixtures and log-linear mixtures (Koehn and Schroeder, 2007; Civera and Juan, 2007; Foster and Kuhn, 2007) which were used as baselines and discussed in Section 2.
log-linear is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Discriminative Synchronous Transduction
3.1 A global log-linear model
Discriminative Synchronous Transduction
Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence.
Discriminative Synchronous Transduction
Our findings echo those observed for latent variable log-linear models successfully used in monolingual parsing (Clark and Curran, 2007; Petrov et al., 2007).
Discussion and Further Work
Such approaches have been shown to be effective in log-linear word-alignment models Where only a small supervised corpus is available (Blunsom and Cohn, 2006).
Introduction
First, we develop a log-linear model of translation which is globally trained on a significant number of parallel sentences.
log-linear is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
Note that under the log-linear model, applying threshold for filtering is equivalent to comparing the “likelihood” ratio.
Conclusions
In this paper, the problem of extracting phrase translation is formulated as an information retrieval process implemented with a log-linear model aiming for a balanced precision and recall.
Discussions
The generic phrase training algorithm follows an information retrieval perspective as in (Venugopal et al., 2003) but aims to improve both precision and recall with the trainable log-linear model.
Discussions
Under the general framework, one can put as many features as possible together under the log-linear model to evaluate the quality of a phrase and a phase pair.
Experimental Results
Our decoder is a phrase-based multi-stack implementation of the log-linear model similar to Pharaoh (Koehn et al., 2003).
Experimental Results
Like other log-linear model based decoders, active features in our translation engine include translation models in two directions, lexicon weights in two directions, language model, lexicalized distortion models, sentence length penalty and other heuristics.
Experimental Results
Since the translation engine implements a log-linear model, the discriminative training of feature weights in the decoder should be embedded in the whole end-to-end system jointly with the discriminative phrase table training process.
log-linear is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Smith, Noah A.
Product of Experts
These features have to be included in estimating pkn-d, which has log-linear component models (Eq.
Product of Experts
For these bigram or trigram overlap features, a similar log-linear model has to be normalized with a partition function, which considers the (unnormalized) scores of all possible target sentences, given the source sentence.
QG for Paraphrase Modeling
5 We use log-linear models three times: for the configuration, the lexical semantics class, and the word.
QG for Paraphrase Modeling
(2007),6 we employ a 14-feature log-linear model over all logically possible combinations of the 14 WordNet relations (Miller, 1995).7 Similarly to Eq.
QG for Paraphrase Modeling
14, we normalize this log-linear model based on the set of relations that are nonempty in WordNet for the word 3360-).
log-linear is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Doyle, Gabriel and Bicknell, Klinton and Levy, Roger
Abstract
We present a method to jointly learn features and weights directly from distributional data in a log-linear framework.
Abstract
The model uses an Indian Buffet Process prior to learn the feature values used in the log-linear method, and is the first algorithm for learning phonological constraints without presupposing constraint structure.
Introduction
These constraint-driven decisions can be modeled with a log-linear system.
Introduction
We consider this question by examining the dominant framework in modern phonology, Optimality Theory (Prince and Smolensky, 1993, OT), implemented in a log-linear framework, MaXEnt OT (Goldwater and Johnson, 2003), with output forms’ probabilities based on a weighted sum of
Phonology and Optimality Theory 2.1 OT structure
In IBPOT, we use the log-linear EVAL developed by Goldwater and J ohn-son (2003) in their MaxEnt OT system.
The IBPOT Model
The weight vector w provides weight for both F and M. Probabilities of output forms are given by a log-linear function:
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Underwood, Ted and Smith, Noah A.
Data
While the character clustering stage is essentially performing proper noun coreference resolution, approximately 74% of references to characters in books come in the form of pronouns.5 To resolve this more difficult class at the scale of an entire book, we train a log-linear discriminative classifier only on the task of resolving pronominal anaphora (i.e., ignoring generic noun phrases such as the paint or the rascal).
Data
To manage the degrees of freedom in the model described in §4, we perform dimensionality reduction on the vocabulary by learning word embed-dings with a log-linear continuous skip-gram language model (Mikolov et al., 2013) on the entire collection of 15,099 books.
Experiments
A Basic persona model, which ablates author information but retains the same log-linear architecture; here, the n-vector is of size P + 1 and does not model author effects.
Model
In order to separate out the effects that a character’s persona has on the words that are associated with them (as opposed to other factors, such as time period, genre, or author), we adopt a hierarchical Bayesian approach in which the words we observe are generated conditional on a combination of different effects captured in a log-linear (or “maximum entropy”) distribution.
Model
This SAGE model can be understood as a log-linear distribution with three kinds of features (metadata, persona, and back-
Model
Number of personas (hyperparameter) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Expected BLEU Training
We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003).
Expected BLEU Training
We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1):
Experiments
Log-linear weights are tuned with MERT.
Experiments
Log-linear weights are estimated on the 2009 data set comprising 2525 sentences.
Experiments
ther lattices or the unique 100-best output of the phrase-based decoder and reestimate the log-linear weights by running a further iteration of MERT on the n-best list of the development set, augmented by scores corresponding to the neural network models.
Introduction
The expected BLEU objective provides an efficient way of achieving this for machine translation (Rosti et al., 2010; Rosti et al., 2011; He and Deng, 2012; Gao and He, 2013; Gao et al., 2014) instead of solely relying on traditional optimizers such as Minimum Error Rate Training (MERT) that only adjust the weighting of entire component models within the log-linear framework of machine translation (§3).
log-linear is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
We evaluate the performance of adding new topic-related features to the log-linear model and compare the translation accuracy with the method in (Xiao et al., 2012).
Introduction
We integrate topic similarity features in the log-linear model and evaluate the performance on the NIST Chinese-to-English translation task.
Topic Similarity Model with Neural Network
The similarity scores are integrated into the standard log-linear model for making translation decisions.
Topic Similarity Model with Neural Network
We incorporate the learned topic similarity scores into the standard log-linear framework for SMT.
Topic Similarity Model with Neural Network
In addition to traditional SMT features, we add new topic-related features into the standard log-linear framework.
log-linear is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhang, Yuan and Barzilay, Regina and Globerson, Amir
A Joint Model for Two Formalisms
As is standard in such settings, the distribution will be log-linear in a set of features of these parses.
A Joint Model for Two Formalisms
Instead, we assume that the distribution over yCFG is a log-linear model with parameters 601:0 (i.e., a sub-vector of 6) , namely:
Evaluation Setup
In this setup, the model reduces to a normal log-linear model for the target formalism.
Experiment and Analysis
It’s not surprising that Cahill’s model outperforms our log-linear model because it relies heavily on handcrafted rules optimized for the dataset.
Features
Feature functions in log-linear models are designed to capture the characteristics of each derivation in the tree.
log-linear is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Introduction
Word embedding is used as the input to learn translation confidence score, which is combined with commonly used features in the conventional log-linear model.
Our Model
The difference between our model and the conventional log-linear model includes:
Phrase Pair Embedding
Instead of integrating the sparse features directly into the log-linear model, we use them as the input to learn a phrase pair embedding.
Phrase Pair Embedding
To train the neural network, we add the confidence scores to the conventional log-linear model as features.
Related Work
Together with other commonly used features, the translation confidence score is integrated into a conventional log-linear model.
log-linear is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Abend, Omri and Cohen, Shay B. and Steedman, Mark
Experimental Setup
Determining h for each predicate yields a regular log-linear binary classification model.
Our Proposal: A Latent LC Approach
We address the task with a latent variable log-linear model, representing the LCs of the predicates.
Our Proposal: A Latent LC Approach
The introduction of latent variables into the log-linear model leads to a non-convex objective function.
Our Proposal: A Latent LC Approach
Once h has been fixed, the model collapses to a convex log-linear model.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Markov Topic Regression - MTR
log-linear models with parameters, AiméRM , is
Markov Topic Regression - MTR
trained to predict Blgikfik, for each w, of a tag 81.32 BEE) = exp(f(wl; A?» (2) where the log-linear function f is: n23} = m; A295 = 9531905771 (3)
Markov Topic Regression - MTR
labeled data, 712?, based on the log-linear model in Eq.
Semi-Supervised Semantic Labeling
The a: is used as the input matrix of the kth log-linear model (corresponding to kth semantic tag (topic)) to infer the [3 hyper-parameter of MTR in Eq.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sennrich, Rico and Schwenk, Holger and Aransa, Walid
Translation Model Architecture
Our translation model is embedded in a log-linear model as is common for SMT, and treated as a single translation model in this log-linear combination.
Translation Model Architecture
Log-linear weights are optimized using MERT (Och and Ney, 2003).
Translation Model Architecture
Future work could involve merging our translation model framework with the online adaptation of other models, or the log-linear weights.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann
Comparative Study
(4) The orientation probability is modeled in a log-linear framework using a set of N feature func-tiOIlS €{,i,j, de-IJH), n = 1, .
Comparative Study
Finally, in the log-linear framework (Equation 2) a new jump model is added which uses the reordered source sentence to calculate the cost.
Tagging-style Reordering Model
The number of source words that have inconsistent labels is the penalty and is then added into the log-linear framework as a new feature.
Translation System Overview
We model Pr(e{|fi]) directly using a log-linear combination of several models (Och and Ney,
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Durrett, Greg and Hall, David and Klein, Dan
Inference
We then report the corresponding chains 0(a) as the system output.3 For learning, the gradient takes the standard form of the gradient of a log-linear model, a difference of expected feature counts under the gold annotation and under no annotation.
Introduction
We use a log-linear model that can be expressed as a factor graph.
Models
Each unary factor A,- has a log-linear form with features examining mention 2', its selected antecedent ai, and the document context at.
Models
The final log-linear model is given by the following formula:
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
Following Och and Ney (2002), our model is framed as a log-linear model:
A semantic span can include one or more eus.
courage the decoder to generate transitional words and phrases; the score is utilized as an additional feature hk (es, ft) in the log-linear model.
A semantic span can include one or more eus.
In general, according to formula (3), the translation quality based on the log-linear model is related tightly with the features chosen.
Conclusion
Our contributions can be summarized as: l) the new translation rules are more discriminative and sensitive to cohesive information by converting the source string into a CSS-based tagged-flattened string; 2) the new additional features embedded in the log-linear model can encourage the decoder to produce transitional expressions.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Coppersmith, Glen and Van Durme, Benjamin
Batch Models
Our goal is assign to a category each user of interest 2),- based on f Here we focus on a binary assignment into the categories Democratic D or Republican R. The log-linear
Batch Models
7We use log-linear models over reasonable alternatives such as perceptron or SVM, following the practice of a wide range of previous work in related areas (Smith, 2004; Liu et a1., 2005; Poon et a1., 2009) including text classification in social media (Van Durme, 2012b; Yang and Eisenstein, 2013).
Batch Models
The corresponding log-linear model is defined as:
Experimental Setup
We experiment with log-linear models defined in Eq.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Chen, Harr and Zettlemoyer, Luke and Barzilay, Regina
A Log-Linear Model for Actions
Given a state 3 = (5, d, j, W), the space of possible next actions is defined by enumerating sub-spans of unused words in the current sentence (i.e., subspans of the jth sentence of d not in W), and the possible commands and parameters in environment state 5.4 We model the policy distribution p(a|s; 6) over this action space in a log-linear fashion (Della Pietra et al., 1997; Lafferty et al., 2001), giving us the flexibility to incorporate a diverse range of features.
Abstract
We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection.
Introduction
Our policy is modeled in a log-linear fashion, allowing us to incorporate features of both the instruction text and the environment.
Reinforcement Learning
which is the derivative of a log-linear distribution.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Conclusion and Future Work
The consensus statistics are integrated into the conventional log-linear model as features.
Experiments and Results
Instead of using graph-based consensus confidence as features in the log-linear model, we perform structured label propagation (Struct-LP) to re-rank the n-best list directly, and the similarity measures for source sentences and translation candidates are symmetrical sentence level BLEU (equation (10)).
Features and Training
Therefore, we can alternatively update graph-based consensus features and feature weights in the log-linear model.
Graph-based Translation Consensus
Our MT system with graph-based translation consensus adopts the conventional log-linear model.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob
Abstract
We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
Introduction
Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
Lexical-phonetic model
(2008), we parameterize these distributions with a log-linear model.
Lexical-phonetic model
In modern phonetics and phonology, these generalizations are usually expressed as Optimality Theory constraints; log-linear models such as ours have previously been used to implement stochas-
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Dependency Parsing
Given the training set {(Xi, MHz-1:1, parameter estimation for log-linear models generally resolve around optimization of a regularized conditional
Dependency Parsing
In this paper we use the dual exponenti-ated gradient (EG)2 descent, which is a particularly effective optimization algorithm for log-linear models (Collins et al., 2008).
Experiments
Some previous studies also found a log-linear relationship between unlabeled data (Suzuki and Isozaki, 2008; Suzuki et al., 2009; Bergsma et al., 2010; Pitler et al., 2010).
Web-Derived Selectional Preference Features
Log-linear dependency parsing model is sensitive to inappropriately scaled feature.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
In our work, any Maximum A Posteriori (MAP) SMT model with log-linear formulation (Och, 2002) can be a qualified candidate for a baseline model.
Collaborative Decoding
The requirement for a log-linear model aims to provide a natural way to integrate the new co-decoding features.
Collaborative Decoding
Referring to the log-linear model formulation, the translation posterior P(e'|dk) can be computed as:
Conclusion
In this paper, we present a framework of collaborative decoding, in which multiple MT decoders are coordinated to search for better translations by re-ranking partial hypotheses using augmented log-linear models with translation consensus -based features.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Aw, Aiti and Li, Haizhou
Analysis
We want to further study the happenings after we integrate the constraint feature (our SDB model and Marton and Resnik’s XP+) into the log-linear translation model.
Introduction
These constituent matching/violation counts are used as a feature in the decoder’s log-linear model and their weights are tuned via minimal error rate training (MERT) (Och, 2003).
Introduction
Similar to previous methods, our SDB model is integrated into the decoder’s log-linear model as a feature so that we can inherit the idea of soft constraints.
The Syntax-Driven Bracketing Model 3.1 The Model
new feature into the log-linear translation model: PSDB (b|T, This feature is computed by the SDB model described in equation (3) or equation (4), which estimates a probability that a source span is to be translated as a unit within particular syntactic contexts.
log-linear is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
PCS Induction
The feature-based model replaces the emission distribution with a log-linear model, such that:
PCS Induction
This locally normalized log-linear model can look at various aspects of the observation :5, incorporating overlapping features of the observation.
PCS Induction
We adopted this state-of-the-art model because it makes it easy to experiment with various ways of incorporating our novel constraint feature into the log-linear emission model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cherry, Colin
Cohesive Decoding
This count becomes a feature in the decoder’s log-linear model, the weight of which is trained with MERT.
Experiments
Weights for the log-linear model are set using MERT, as implemented by Venugopal and Vogel (2005).
Experiments
Since adding features to the decoder’s log-linear model is straightforward, we also experiment with a combined system that uses both the cohesion constraint and a lexical reordering model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Experimental Setup
The decoder’s log-linear model includes a standard feature set.
Experimental Setup
The decoder’s log-linear model is tuned with MERT (Och, 2003).
Experimental Setup
Both the decoder’s log-linear model and the re-ranking models are trained on the same development set.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pasupat, Panupong and Liang, Percy
Abstract
Our approach defines a log-linear model over latent extraction predicates, which select lists of entities from the web page.
Approach
Given a query cc and a web page 212, we define a log-linear distribution over all extraction predicates z E Z(w) as
Approach
To construct the log-linear model, we define a feature vector gb(:c, w, z) for each query at, web page 212, and extraction predicate z.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lu, Shixiang and Chen, Zhenbiao and Xu, Bo
Related Work
(2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
Each translation rule in the phrase-based translation model has a set number of features that are combined in the log-linear model (Och and Ney, 2002), and our semi-supervised DAE features can also be combined in this model.
Semi-Supervised Deep Auto-encoder Features Learning for SMT
To combine these learned features (DBN and DAB feature) into the log-linear model, we need to eliminate the impact of the nonlinear learning mechanism.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Consensus Decoding Algorithms
The distribution P(e| f) can be induced from a translation system’s features and weights by expo-nentiating with base I) to form a log-linear model:
Experimental Results
The log-linear model weights were trained using MIRA, a margin-based optimization procedure that accommodates many features (Crammer and Singer, 2003; Chiang et al., 2008).
Experimental Results
We tuned b, the base of the log-linear model, to optimize consensus decoding performance.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Junhui and Marton, Yuval and Resnik, Philip and Daumé III, Hal
Related Work
Both are close to our work; however, our model generates reordering features that are integrated into the log-linear translation model during decoding.
Unified Linguistic Reordering Models
For models with syntactic reordering, we add two new features (i.e., one for the leftmost reordering model and the other for the rightmost reordering model) into the log-linear translation model in Eq.
Unified Linguistic Reordering Models
For the semantic reordering models, we also add two new features into the log-linear translation model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fu, Ruiji and Guo, Jiang and Qin, Bing and Che, Wanxiang and Wang, Haifeng and Liu, Ting
Method
(2013a) propose two log-linear models, namely the Skip-gram and CBOW model, to efficiently induce word embeddings.
Method
The Skip-gram model adopts log-linear classifiers to predict context words given the current word w(t) as input.
Method
Then, log-linear classifiers are employed, taking the embedding as input and predict w(t)’s context words within a certain range, e.g.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Zettlemoyer, Luke and Barzilay, Regina
Algorithm
Specifically, we modify the log-linear policy p(a|s; q, 6) by adding lookahead features gb(s, a, q) which complement the local features used in the previous model.
Background
A Log-Linear Parameterization The policy
Background
function used for action selection is defined as a log-linear distribution over actions: €9-¢(s,a)
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
where Pr(e| f) is the probability that e is the translation of the given source string f. To model the posterior probability Pr(e| f) , most of the state-of-the-art SMT systems utilize the log-linear model proposed by Och and Ney (2002), as follows,
Background
In this paper, u denotes a log-linear model that has Mfixed features {h1(f,e), ..., hM(f,e)}, ,1 = {3.1, ..., AM} denotes the M parameters of u, and u(/1) denotes a SMT system based on u with parameters ,1.
Background
In this paper, we use the term training set to emphasize the training of log-linear model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Learning
The noise model that gbc parameterizes is a local log-linear model, so we follow the approach of Berg-Kirkpatrick et al.
Model
logistic( Z [M917 kw - Wedgie/D k’ :1 The fact that the parameterization is log-linear will ensure that, during the unsupervised learning process, updating the shape parameters gbc is simple and feasible.
Results and Analysis
(2010), we use a regularization term in the optimization of the log-linear model parameters (15¢ during the M-step.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Andrews, Nicholas and Eisner, Jason and Dredze, Mark
Detailed generative story
This is a conditional log-linear model parameterized by qb, where gbk, ~ N(0, 0,3).
Detailed generative story
When 6L is the special end-of-string symbol #, the only allowed edits are the insertion (g) and the substitution We define the edit probability using a locally normalized log-linear model:
Experiments
We leave other hyperparameters fixed: 16 latent topics, and Gaussian priors N (0, l) on all log-linear parameters.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mylonakis, Markos and Sima'an, Khalil
Conclusions
Future work directions include investigating the impact of hierarchical phrases for our models as well as any gains from additional features in the log-linear decoding model.
Experiments
The induced joint translation model can be used to recover arg maxe p(e|f), as it is equal to arg maxe p(e, f We employ the induced probabilistic HR-SCFG G as the backbone of a log-linear , feature based translation model, with the derivation probability p(D) under the grammar estimate being
Experiments
We train the feature weights under MERT and decode with the resulting log-linear model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Mengqiu and Che, Wanxiang and Manning, Christopher D.
Experimental Setup
But instead of using just the PMI scores of bilingual NE pairs, as in our work, they employed a feature-rich log-linear model to capture bilingual correlations.
Experimental Setup
Parameters in their log-linear model require training with bilingually annotated data, which is not readily available.
Related Work
(2010a) presented a supervised learning method for performing joint parsing and word alignment using log-linear models over parse trees and an ITG model over alignment.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Volkova, Svitlana and Choudhury, Pallavi and Quirk, Chris and Dolan, Bill and Zettlemoyer, Luke
Building Dialog Trees from Instructions
Given a single instruction 2' with category au, we use a log-linear model to represent the distri-
Understanding Initial Queries
We employ a log-linear model and try to maximize initial dialog state distribution over the space of all nodes in a dialog network:
Understanding Query Refinements
Dialog State Update Model We use a log-linear model to maximize a dialog state distribution over the space of all nodes in a dialog network:
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
He, Xiaodong and Deng, Li
Abstract
Och (2003) proposed using a log-linear model to incorporate multiple features for translation, and proposed a minimum error rate training (MERT) method to train the feature weights to optimize a desirable translation metric.
Abstract
While the log-linear model itself is discriminative, the phrase and lexicon translation features, which are among the most important components of SMT, are derived from either generative models or heuristics (Koehn et al., 2003, Brown et al., 1993).
Abstract
In that work, multiple features, most of them are derived from generative models, are incorporated into a log-linear model, and the relative weights of them are tuned discriminatively on a small tuning set.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hewavitharana, Sanjika and Mehay, Dennis and Ananthakrishnan, Sankaranarayanan and Natarajan, Prem
Corpus Data and Baseline SMT
Our phrase-based decoder is similar to Moses (Koehn et al., 2007) and uses the phrase pairs and target LM to perform beam search stack decoding based on a standard log-linear model, the parameters of which were tuned with MERT (Och, 2003) on a held-out development set (3,534 sentence pairs, 45K words) using BLEU as the tuning metric.
Incremental Topic-Based Adaptation
We add this feature to the log-linear translation model with its own weight, which is tuned with MERT.
Introduction
Translation phrase pairs that originate in training conversations whose topic distribution is similar to that of the current conversation are given preference through a single similarity feature, which augments the standard phrase-based SMT log-linear model.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kim, Sungchul and Toutanova, Kristina and Yu, Hwanjo
Data and task
(2004) to train a log-linear model for projection.
Data and task
Compared to the joint log-linear model of Burkett et al.
Introduction
(2010a), our model can incorporate both monolingual and bilingual features in a log-linear framework.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Experiments
One is the log-linear combination of TMs trained on each subcorpus (Koehn and Schroeder, 2007), with weights of each model tuned under minimal error rate training using MIRA.
Introduction
Research on mixture models has considered both linear and log-linear mixtures.
Introduction
(Koehn and Schroeder, 2007), instead, opted for combining the sub-models directly in the SMT log-linear framework.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Sim, Yanchuan and Smith, Noah A.
Learning and Inference
These features are incorporated as a log-linear dis-
Learning and Inference
For each word in a mention 10, we introduced 12 binary features f for our featurized log-linear distribution (§3.1.2).
Model
This uses a log-linear distribution with partition function Z.
log-linear is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: