Index of papers in Proc. ACL 2013 that mention
  • hyperparameters
Cohn, Trevor and Specia, Lucia
Gaussian Process Regression
Specifically, we can derive the gradient of the (log) marginal likelihood with respect to the model hyperparameters (i.e., a, an, 08 etc.)
Gaussian Process Regression
Note that in general the marginal likelihood is non-convex in the hyperparameter values, and consequently the solutions may only be locally optimal.
Gaussian Process Regression
Here we bootstrap the learning of complex models with many hyperparameters by initialising
Multitask Quality Estimation 4.1 Experimental Setup
GP: All GP models were implemented using the GPML Matlab toolbox.7 Hyperparameter optimi-sation was performed using conjugate gradient ascent of the log marginal likelihood function, with up to 100 iterations.
Multitask Quality Estimation 4.1 Experimental Setup
The simpler models were initialised with all hyperparameters set to one, while more complex models were initialised using the
hyperparameters is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Johnson, Mark
Bayesian inference for PCFGs
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters 80.
Bayesian inference for PCFGs
Result: A vector of parameters 8 repeat draw 0 from products of Dirichlet with i hyperparameters oz + f (t)
Bayesian inference for PCFGs
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous rule parameters 0 .
hyperparameters is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kim, Young-Bum and Snyder, Benjamin
Analysis
Next we examine the transition Dirichlet hyperparameters learned by our model.
Analysis
As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
Analysis
Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Experiments
The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
Inference
The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
Inference
Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
Inference
To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:
hyperparameters is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Distributional Semantic Hidden Markov Models
We follow the “neutral” setting of hyperparameters given by Ormoneit and Tresp (1995), so that the MAP estimate for the covariance matrix for (event or slot) state 2' becomes:
Distributional Semantic Hidden Markov Models
where j indexes all the relevant semantic vectors 953- in the training set, rij is the posterior responsibility of state i for vector :53, and [3 is the remaining hyperparameter that we tune to adjust the amount of regularization.
Distributional Semantic Hidden Markov Models
We tune the hyperparameters (N E, N3, 6, [3, k) and the number of EM iterations by twofold cross-validationl.
Guided Summarization Slot Induction
We trained a DSHMM separately for each of the five domains with different semantic models, tuning hyperparameters by twofold cross-validation.
Related Work
Distributions that generate the latent variables and hyperparameters are omitted for clarity.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu
Bilingual Infinite Tree Model
Our procedure alternates between sampling each of the following variables: the auxiliary variables u, the state assignments z, the transition probabilities 71', the shared DP parameters ,6, and the hyperparameters 040 and y.
Bilingual Infinite Tree Model
040 is parameterized by a gamma hyperprior with hyperparameters aa and 045.
Bilingual Infinite Tree Model
7 is parameterized by a gamma hyperprior with hyperparameters ya and 7b.
Experiment
In sampling a0 and 7, hyperparameters aa, ab, ya, and 7;, are set to 2, 1, 1, and 1, respectively, which is the same setting in Gael et al.
Experiment
The development test data is used to set up hyperparameters , i.e., to terminate tuning iterations.
hyperparameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Hierarchical Phrase Table Combination
All the parameters 6j and hyperparameters dj and sj , are obtained by learning on the jth domain.
Hierarchical Phrase Table Combination
Returning the hyperparameters again when cascading another domain may improve the performance of the combination weight, but we will leave it for future work.
Phrase Pair Extraction with Unsupervised Phrasal ITGs
3. d and s are the discount and strengthen hyperparameters .
Related Work
However, their methods usually require numbers of hyperparameters , such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and O'Connor, Brendan and Smith, Noah A.
Models
P Number of personas (hyperparameter) K Number of word topics ( hyperparameter ) D Number of movie plot summaries E Number of characters in movie d W Number of (role, word) tuples used by character 6 (bk Topic kr’s distribution over V words.
Models
Next, let a persona p be defined as a set of three multinomials $10 over these K topics, one for each typed role 7“, each drawn from a Dirichlet with a role-specific hyperparameter (VT).
Models
In other words, the probability that character 6 embodies persona k is proportional to the number of other characters in the plot summary who also embody that persona (plus the Dirichlet hyperparameter 0%) times the contribution of each observed word wj for that character, given its current topic assignment zj.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Method
,u and A are two hyperparameters whose values are discussed in Section 5.
Method
Based on the development data, the hyperparameters of our model were tuned among the following settings: for the graph propagation, ,u E {0.2,0.5,0.8} and A E {0.1,0.3,0.5,0.8}; for the CRFs training, 04 E {0.1,0.3,0.5,0.7,0.9}.
Method
With the chosen set of hyperparameters , the test data was used to measure the final performance.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: