Experiments | Hyperparameters are inferred, which leads to a dominant topic that includes mainly light verbs (have, let, see, do). |
Experiments | Each condition (model, vowel speakers, consonant set) is run five times, using 1500 iterations of Gibbs sampling with hyperparameter sampling. |
Inference: Gibbs Sampling | Squared nodes depict hyperparameters . |
Inference: Gibbs Sampling | A is the set of hyperparameters used by H L when generating lexical items (see Section 3.2). |
Inference: Gibbs Sampling | 5.3 Hyperparameters |
Algorithm | scalars 756) and vie) are hyperparameters of the |
Algorithm | The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’). |
Algorithm | They are generated from Dirichlet priors Dir(oz(e)) and Dir(a(n)) with 04(5) and 0407’) being hyperparameters . |
Experiments | We first settle down the implementation details for the EaLDA model, specifying the hyperparameters that we choose for the experiment. |
Experiments | We set topic number M = 6, K = 4, and hyperparameters 04 = 0.75, 04(6) 2 ox”) = 045,001) = 0.5. |
Experiments | Table 6 shows examples of zero and nonzero topics for the dev.-tuned hyperparameter values. |
Group Lasso | where Aglas is a hyperparameter tuned on a development data, and Ag is a group specific weight. |
Notation | Both methods disprefer weights of large magnitude; smaller (relative) magnitude means a feature (here, a word) has a smaller effect on the prediction, and zero means a feature has no effect.2 The hyperparameter A in each case is typically tuned on a development dataset. |
Structured Regularizers for Text | As a result, besides Aglas , we have an additional hyperparameter , denoted by Alas. |
Structured Regularizers for Text | Since the lasso-like penalty does not occur naturally in a non tree-structured regularizer, we add an additional lasso penalty for each word type (with hyperparameter Alas) to also encourage weights of irrelevant words to go to zero. |
Structured Regularizers for Text | Similar to the parse tree regularizer, for the lasso-like penalty on each word, we tune one group weight for all word types on a development data with a hyperparameter Alas. |
Experiments | There are four hyperparameters in our model to be tuned by using the development data (devMT) among the following settings: for the graph propagation, ,0 E {0205,08} and p E {0.1,0.3,0.5,0.8}; for the PR learning, A E {0 g A,- g 1} and 0 E {0 S a, g 1} where the step is 0.1. |
Experiments | The optimal hyperparameter values were found to be: STS-NO-GP (04 = 0.8) and 77 = 0.6) and STS-GP-PL (,u = 0.5,,0 = 03,04 2 0.8 and 77 = 0.6). |
Experiments | The optimal hyperparameter values were found to be: VES-NO-GP (04 = 0.7) and VES-GP-PL (,u = 0.5, ,0 = 0.3 and 04 = 0.7). |
Methodology | The hyperparameter A is used to control the impacts of the penalty term. |
Model | The generative story runs as follows (Figure 2 depicts the full graphical model): Let there be M unique authors in the data, P latent personas (a hyperparameter to be set), and V words in the vocabulary (in the general setting these may be word types; in our data the vocabulary is the set of 1,000 unique cluster IDs). |
Model | This is proportional to the number of other characters in document d who also (currently) have that persona (plus the Dirichlet hyperparameter which acts as a smoother) times the probability (under pdfi = z) of all of the words |
Model | Number of personas ( hyperparameter ) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter |
Code-Switching | We use asymmetric Dirichlet priors (Wallach et al., 2009), and let the optimization process learn the hyperparameters . |
Code-Switching | We optimize the hyperparameters 04, 6, 7 and 6 by interleaving sampling iterations with a Newton-Raphson update to obtain the MLE estimate for the hyperparameters . |
Code-Switching | Where H is the Hessian matrix and 3—2 is the gradient of the likelihood function with respect to the optimizing hyperparameter . |