Index of papers in Proc. ACL that mention
  • model parameters
Gormley, Matthew R. and Eisner, Jason
Abstract
Finding the optimal model parameters is then usually a difficult nonconvex optimization problem.
Abstract
We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints.
Introduction
The node branches on a single model parameter 6m to partition its subspace.
Introduction
A variety of ways to find better local optima have been explored, including heuristic initialization of the model parameters (Spitkovsky et al., 2010a), random restarts (Smith, 2006), and annealing (Smith and Eisner, 2006; Smith, 2006).
Introduction
search with certificates of e-optimality for both the corpus parse and the model parameters .
The Constrained Optimization Task
The nonlinear constraints ensure that the model parameters are true log-probabilities.
The Constrained Optimization Task
Feature / model parameter index Sentence index
The Constrained Optimization Task
Conditional distribution index Number of model parameters
model parameters is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Experimental results
For the 44 and 230 million tokens corpora, all sentences are automatically parsed and used to initialize model parameters , while for 1.3 billion tokens corpus, we parse the sentences from a portion of the corpus that
Experimental results
contain 230 million tokens, then use them to initialize model parameters .
Experimental results
Nevertheless, experimental results show that this approach is effective to provide initial values of model parameters .
Training algorithm
The objective of maximum likelihood estimation is to maximize the likelihood £(D, p) respect to model parameters .
Training algorithm
and denote ’2' N as the collection of N -best list parse trees for sentences over entire corpus D under model parameter p.
Training algorithm
mate model parameters .
model parameters is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Introduction
From these corpora, we estimate translation model parameters : word-to-word translation tables, fertilities, distortion parameters, phrase tables, syntactic transformations, etc.
Introduction
A language model P (e) is typically used in SMT decoding (Koehn, 2009), but here P (6) actually plays a central role in training translation model parameters .
Machine Translation as a Decipherment Task
During decipherment training, our objective is to estimate the model parameters 0 in order to maximize the probability of the foreign corpus f. From Equation 4 we have:
Machine Translation as a Decipherment Task
5 For Bayesian MT decipherment, we set a high prior value on the language model (104) and use sparse priors for the IBM 3 model parameters t, n, d,p (0.01, 0.01, 0.01, 0.01).
Word Substitution Decipherment
During decipherment, our goal is to estimate the channel model parameters 6.
Word Substitution Decipherment
These methods are attractive for their ability to manage uncertainty about model parameters and allow one to incorporate prior knowledge during inference.
model parameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Finkel, Jenny Rose and Manning, Christopher D.
Base Models
be the value of feature 2' for subtree 7“ over sentence s, and let E9 [fi|s] be the expected value of feature 2' in sentence 3, based on the current model parameters 6.
Hierarchical Joint Learning
After training has been completed, we retain only the joint model’s parameters .
Hierarchical Joint Learning
The first summation in this equation computes the log-likelihood of each model, using the data and parameters which correspond to that model, and the prior likelihood of that model’s parameters , based on a Gaussian prior centered around the top-level, non-model-specific parameters 6*, and with model-specific variance am.
Hierarchical Joint Learning
We need to compute partial derivatives in order to optimize the model parameters .
model parameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Kushman, Nate and Lei, Tao and Barzilay, Regina
In our dataset only 11% of Candidate Relations are valid.
Initialization: Model parameters 0,, = 0 and 00 = 0.
In our dataset only 11% of Candidate Relations are valid.
As before, 633 is the vector of model parameters , and gbx is the feature function.
In our dataset only 11% of Candidate Relations are valid.
Therefore, during learning, we need to find the model parameters that maximize expected future reward (Sutton and Barto, 1998).
Model
Update the model parameters , using the low-level planner’s success or failure as the source of supervision.
Model
where 6C is the vector of model parameters .
model parameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Arnold, Andrew and Nallapati, Ramesh and Cohen, William W.
Models considered 2.1 Basic Conditional Random Fields
The model parameters Nd), then, form the parameters of the leaves of this hierarchy.
Models considered 2.1 Basic Conditional Random Fields
(3) represent the likelihood of data in each domain given their corresponding model parameters, the second line represents the likelihood of each model parameter in each domain given the hyper-parameter of its parent in the tree hierarchy of features and the last term goes over the entire tree ’2' except the leaf nodes.
Models considered 2.1 Basic Conditional Random Fields
We perform a MAP estimation for each model parameter as well as the hyper-parameters.
model parameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Andrews, Nicholas and Eisner, Jason and Dredze, Mark
Detailed generative story
This is a conditional log-linear model parameterized by qb, where gbk, ~ N(0, 0,3).
Overview and Related Work
For learning, we iteratively adjust our model’s parameters to better explain our samples.
Overview and Related Work
(2012) we use topics as the contexts, but learn mention topics jointly with other model parameters .
Parameter Estimation
E-step: Collect samples by MCMC simulation as in §5, given current model parameters 6 and qb.
model parameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Long, Fan and Barzilay, Regina and Rinard, Martin
Model
We assume the generative model operates by first generating the model parameters from a set of Dirichlet distributions.
Model
0 Generating Model Parameters : For every pair of feature type f and phrase tag 2, draw a multinomial distribution parameter 63 from a Dirichlet prior P(6§;).
Model
Learning the Model During inference, we want to estimate the hidden specification trees 1: given the observed natural language specifications w, after integrating the model parameters out, i.e.
model parameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Sogaard, Anders
Robust perceptron learning
Antagonistic adversaries choose transformations informed by the current model parameters w, but random adversaries randomly select transformations from a predefined set of possible transformations, e.g.
Robust perceptron learning
In an online setting feature bagging can be modelled as a game between a learner and an adversary, in which (a) the adversary can only choose between deleting transformations, (b) the adversary cannot see model parameters when choosing a transformation, and in which (c) the adversary only moves in between passes over the data.1
Robust perceptron learning
LRA is an adversarial game in which the two players are unaware of the other player’s current move, and in particular, where the adversary does not see model parameters and only randomly corrupts the data points.
model parameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Introduction
For example, in Expectation-Maximization (Dempster et al., 1977), the Expectation (E) step computes the posterior distribution over possible completions of the data, and the Maximization (M) step reestimates the model parameters as
Word Alignment
Different models parameterize this probability distribution in different ways.
Word Alignment
It also contains most of the model’s parameters and is where overfitting occurs most.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Riezler, Stefan and Simianer, Patrick and Haas, Carolin
Experiments
This can explain the success of response-based learning: Lexical and structural variants of reference translations can be used to boost model parameters towards translations with positive feedback, while the same translations might be considered as negative examples in standard structured learning.
Introduction
Here, leam-ing proceeds by “trying out” translation hypotheses, receiving a response from interacting in the task, and converting this response into a supervision signal for updating model parameters .
Response-based Online Learning
(2010) or Goldwasser and Roth (2013) describe a response-driven learning framework for the area of semantic parsing: Here a meaning representation is “tried out” by itera-tively generating system outputs, receiving feedback from world interaction, and updating the model parameters .
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Model Training
gym] is the plausible score for the best translation candidate given the model parameters W and V .
Phrase Pair Embedding
Table 1: The relationship between the size of training data and the number of model parameters .
Phrase Pair Embedding
Table 1 shows the relationship between the size of training data and the number of model parameters .
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lei, Tao and Xin, Yu and Zhang, Yuan and Barzilay, Regina and Jaakkola, Tommi
Experimental Setup
We should note that since our model parameter A is represented and learned in the low-rank form, we only have to store and maintain the low-rank projections U gbh, ngm and nghm rather than explicitly calculate the feature tensor gbh®gbm®gbh,m.
Problem Formulation
We will directly learn a low-rank tensor A (because r is small) in this form as one of our model parameters .
Problem Formulation
where 6 6 RL, U 6 RM”, V 6 RM”, and W E 1Rer are the model parameters to be learned.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
Unsupervised Grammar Induction Our first method for grammar induction is fully unsupervised Viterbi EM training of the Dependency Model with Valence (DMV) (Klein and Manning, 2004), with uniform initialization of the model parameters .
Related Work
(2010a) show that Viterbi (hard) EM training of the DMV with simple uniform initialization of the model parameters yields higher accuracy models than standard soft-EM
Related Work
In Viterbi EM, the E—step finds the maximum likelihood corpus parse given the current model parameters .
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Decipherment Model for Machine Translation
During decipherment training, our objective is to estimate the model parameters in order to maximize the probability of the source text f as suggested by Ravi and Knight (2011b).
Decipherment Model for Machine Translation
Instead, we propose a new Bayesian inference framework to estimate the translation model parameters .
Introduction
The parallel corpora are used to estimate translation model parameters involving word-to-word translation tables, fertilities, distortion, phrase translations, syntactic transformations, etc.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Metallinou, Angeliki and Bohus, Dan and Williams, Jason
Generative state tracking
where ¢,(x, y) are feature functions jointly defined on features and labels, and A, are the model parameters .
Generative state tracking
This formulation also decouples the number of models parameters (i.e.
Generative state tracking
Second, model parameters in DISCIND are trained independently of competing hypotheses.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Berg-Kirkpatrick, Taylor and Durrett, Greg and Klein, Dan
Model
The Bernoulli parameter of a pixel inside a glyph bounding box depends on the pixel’s location inside the box (as well as on di and 21,-, but for simplicity of exposition, we temporarily suppress this dependence) and on the model parameters governing glyph shape (for each character type c, the parameter matrix gbc specifies the shape of the character’s glyph.)
Results and Analysis
(2010), we use a regularization term in the optimization of the log-linear model parameters (15¢ during the M-step.
Results and Analysis
Figure 8: The central glyph is a representation of the initial model parameters for the glyph shape for g, and surrounding this are the learned parameters for documents from various years.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Decipherment
These methods are attractive for their ability to manage uncertainty about model parameters and allow one to incorporate prior knowledge during inference.
Decipherment
Our goal is to estimate the channel model parameters 6 in order to maximize the probability of the observed ciphertext c:
Decipherment
The base distribution P0 represents prior knowledge about the model parameter distributions.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pantel, Patrick and Fuxman, Ariel
Association Model
Basic Interpolation: This smoothing model, Pinw(e|q), linearly combines our foreground and background models using a model parameter 04:
Association Model
Section 5.2 outlines our procedure for leam-ing the model parameters for both 15mm(e|q) and
Experimental Results
5.2.1 Model Parameters
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K and Silver, David and Barzilay, Regina
Adding Linguistic Knowledge to the Monte-Carlo Framework
Since our model is a nonlinear approximation of the underlying action-value function of the game, we learn model parameters by applying nonlinear regression to the observed final utilities from the simulated roll-outs.
Adding Linguistic Knowledge to the Monte-Carlo Framework
The resulting update to model parameters 6 is of the form:
Adding Linguistic Knowledge to the Monte-Carlo Framework
We use the same experimental settings across all methods, and all model parameters are initialized to zero.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and Kozhevnikov, Mikhail
A Model of Semantics
We select the model parameters 6 by maximizing the marginal likelihood of the data, where the data D is given in the form of groups w =
Empirical Evaluation
When estimating the model parameters , we followed the training regime prescribed in (Liang et al., 2009).
Inference with NonContradictory Documents
In the supervised case, where a and m are observable, estimation of the generative model parameters is generally straightforward.
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Challenges for Discriminative SMT
This itself provides robustness to noisy data, in addition to the explicit regularisation from a prior over the model parameters .
Discriminative Synchronous Transduction
Here k ranges over the model’s features, and A = {M} are the model parameters (weights for their corresponding features).
Discriminative Synchronous Transduction
Each L-BFGS iteration requires the objective value and its gradient with respect to the model parameters .
model parameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: