Index of papers in Proc. ACL 2014 that mention

Seen in text as:

Seen in 29 sentences in 6 papers.

Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon

Experiments	Hyperparameters are inferred, which leads to a dominant topic that includes mainly light verbs (have, let, see, do).
Experiments	Each condition (model, vowel speakers, consonant set) is run five times, using 1500 iterations of Gibbs sampling with hyperparameter sampling.
Inference: Gibbs Sampling	Squared nodes depict hyperparameters .
Inference: Gibbs Sampling	A is the set of hyperparameters used by H L when generating lexical items (see Section 3.2).
Inference: Gibbs Sampling	5.3 Hyperparameters

hyperparameter is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

Yang, Min and Zhu, Dingju and Chow, Kam-Pui

Algorithm	scalars 756) and vie) are hyperparameters of the
Algorithm	The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’).
Algorithm	They are generated from Dirichlet priors Dir(oz(e)) and Dir(a(n)) with 04(5) and 0407’) being hyperparameters .
Experiments	We first settle down the implementation details for the EaLDA model, specifying the hyperparameters that we choose for the experiment.
Experiments	We set topic number M = 6, K = 4, and hyperparameters 04 = 0.75, 04(6) 2 ox”) = 045,001) = 0.5.

hyperparameter is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Yogatama, Dani and Smith, Noah A.

Experiments	Table 6 shows examples of zero and nonzero topics for the dev.-tuned hyperparameter values.
Group Lasso	where Aglas is a hyperparameter tuned on a development data, and Ag is a group specific weight.
Notation	Both methods disprefer weights of large magnitude; smaller (relative) magnitude means a feature (here, a word) has a smaller effect on the prediction, and zero means a feature has no effect.2 The hyperparameter A in each case is typically tuned on a development dataset.
Structured Regularizers for Text	As a result, besides Aglas , we have an additional hyperparameter , denoted by Alas.
Structured Regularizers for Text	Since the lasso-like penalty does not occur naturally in a non tree-structured regularizer, we add an additional lasso penalty for each word type (with hyperparameter Alas) to also encourage weights of irrelevant words to go to zero.
Structured Regularizers for Text	Similar to the parse tree regularizer, for the lasso-like penalty on each word, we tune one group weight for all word types on a development data with a hyperparameter Alas.

hyperparameter is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

Experiments	There are four hyperparameters in our model to be tuned by using the development data (devMT) among the following settings: for the graph propagation, ,0 E {0205,08} and p E {0.1,0.3,0.5,0.8}; for the PR learning, A E {0 g A,- g 1} and 0 E {0 S a, g 1} where the step is 0.1.
Experiments	The optimal hyperparameter values were found to be: STS-NO-GP (04 = 0.8) and 77 = 0.6) and STS-GP-PL (,u = 0.5,,0 = 03,04 2 0.8 and 77 = 0.6).
Experiments	The optimal hyperparameter values were found to be: VES-NO-GP (04 = 0.7) and VES-GP-PL (,u = 0.5, ,0 = 0.3 and 04 = 0.7).
Methodology	The hyperparameter A is used to control the impacts of the penalty term.

hyperparameter is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Bamman, David and Underwood, Ted and Smith, Noah A.

Model	The generative story runs as follows (Figure 2 depicts the full graphical model): Let there be M unique authors in the data, P latent personas (a hyperparameter to be set), and V words in the vocabulary (in the general setting these may be word types; in our data the vocabulary is the set of 1,000 unique cluster IDs).
Model	This is proportional to the number of other characters in document d who also (currently) have that persona (plus the Dirichlet hyperparameter which acts as a smoother) times the probability (under pdfi = z) of all of the words
Model	Number of personas ( hyperparameter ) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter

hyperparameter is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Peng, Nanyun and Wang, Yiming and Dredze, Mark

Code-Switching	We use asymmetric Dirichlet priors (Wallach et al., 2009), and let the optimization process learn the hyperparameters .
Code-Switching	We optimize the hyperparameters 04, 6, 7 and 6 by interleaving sampling iterations with a Newton-Raphson update to obtain the MLE estimate for the hyperparameters .
Code-Switching	Where H is the Hessian matrix and 3—2 is the gradient of the likelihood function with respect to the optimizing hyperparameter .

hyperparameter is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: