Index of papers in Proc. ACL 2014 that mention
  • hyperparameter
Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon
Experiments
Hyperparameters are inferred, which leads to a dominant topic that includes mainly light verbs (have, let, see, do).
Experiments
Each condition (model, vowel speakers, consonant set) is run five times, using 1500 iterations of Gibbs sampling with hyperparameter sampling.
Inference: Gibbs Sampling
Squared nodes depict hyperparameters .
Inference: Gibbs Sampling
A is the set of hyperparameters used by H L when generating lexical items (see Section 3.2).
Inference: Gibbs Sampling
5.3 Hyperparameters
hyperparameter is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yang, Min and Zhu, Dingju and Chow, Kam-Pui
Algorithm
scalars 756) and vie) are hyperparameters of the
Algorithm
The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’).
Algorithm
They are generated from Dirichlet priors Dir(oz(e)) and Dir(a(n)) with 04(5) and 0407’) being hyperparameters .
Experiments
We first settle down the implementation details for the EaLDA model, specifying the hyperparameters that we choose for the experiment.
Experiments
We set topic number M = 6, K = 4, and hyperparameters 04 = 0.75, 04(6) 2 ox”) = 045,001) = 0.5.
hyperparameter is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Smith, Noah A.
Experiments
Table 6 shows examples of zero and nonzero topics for the dev.-tuned hyperparameter values.
Group Lasso
where Aglas is a hyperparameter tuned on a development data, and Ag is a group specific weight.
Notation
Both methods disprefer weights of large magnitude; smaller (relative) magnitude means a feature (here, a word) has a smaller effect on the prediction, and zero means a feature has no effect.2 The hyperparameter A in each case is typically tuned on a development dataset.
Structured Regularizers for Text
As a result, besides Aglas , we have an additional hyperparameter , denoted by Alas.
Structured Regularizers for Text
Since the lasso-like penalty does not occur naturally in a non tree-structured regularizer, we add an additional lasso penalty for each word type (with hyperparameter Alas) to also encourage weights of irrelevant words to go to zero.
Structured Regularizers for Text
Similar to the parse tree regularizer, for the lasso-like penalty on each word, we tune one group weight for all word types on a development data with a hyperparameter Alas.
hyperparameter is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Experiments
There are four hyperparameters in our model to be tuned by using the development data (devMT) among the following settings: for the graph propagation, ,0 E {0205,08} and p E {0.1,0.3,0.5,0.8}; for the PR learning, A E {0 g A,- g 1} and 0 E {0 S a, g 1} where the step is 0.1.
Experiments
The optimal hyperparameter values were found to be: STS-NO-GP (04 = 0.8) and 77 = 0.6) and STS-GP-PL (,u = 0.5,,0 = 03,04 2 0.8 and 77 = 0.6).
Experiments
The optimal hyperparameter values were found to be: VES-NO-GP (04 = 0.7) and VES-GP-PL (,u = 0.5, ,0 = 0.3 and 04 = 0.7).
Methodology
The hyperparameter A is used to control the impacts of the penalty term.
hyperparameter is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Underwood, Ted and Smith, Noah A.
Model
The generative story runs as follows (Figure 2 depicts the full graphical model): Let there be M unique authors in the data, P latent personas (a hyperparameter to be set), and V words in the vocabulary (in the general setting these may be word types; in our data the vocabulary is the set of 1,000 unique cluster IDs).
Model
This is proportional to the number of other characters in document d who also (currently) have that persona (plus the Dirichlet hyperparameter which acts as a smoother) times the probability (under pdfi = z) of all of the words
Model
Number of personas ( hyperparameter ) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter
hyperparameter is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Peng, Nanyun and Wang, Yiming and Dredze, Mark
Code-Switching
We use asymmetric Dirichlet priors (Wallach et al., 2009), and let the optimization process learn the hyperparameters .
Code-Switching
We optimize the hyperparameters 04, 6, 7 and 6 by interleaving sampling iterations with a Newton-Raphson update to obtain the MLE estimate for the hyperparameters .
Code-Switching
Where H is the Hessian matrix and 3—2 is the gradient of the likelihood function with respect to the optimizing hyperparameter .
hyperparameter is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: