Background | The Dirichlet distribution is parametrized by hyperparameters ak(> 0). |
Background | where C(k) is the frequency of choice k in data D. For example, C(k) = C(wi, fk) in the estimation of p( This is very simple: we just need to add the observed counts to the hyperparameters . |
Experiments | We randomly chose 200 sets each for sets “A” and “B.” Set “A” is a development set to tune the value of the hyperparameters and |
Experiments | As for BCb, we assumed that all of the hyperparameters had the same value, i.e., 04k; = 04. |
Experiments | Because tuning hyperparameters involves the possibility of overfitting, its robustness should be assessed. |
Method | Note that with the Dirichlet prior, 04;; 2 04k, + C(wl, fk) and 6,; = 6;, + C(wg, fk), where 04],; and 6;, are the hyperparameters of the priors of ml and 2122, respectively. |
Method | To put it all together, we can obtain a new Bayesian similarity measure on words, which can be calculated only from the hyperparameters for the Dirichlet prior, 04 and 6, and the observed counts C(wi, fk). |
Experiments | Special emphasis is put on corpus construction, determination of upper bounds and baselines, and a sensitivity analysis of important hyperparameters . |
Experiments | SGD receives two hyperparameters as input: the number of iterations T, and the regularization parameter A. |
Experiments | Recall that CL-SCL receives three hyperparameters as input: the number of pivots m, the dimensionality of the cross-lingual representation k, |
Introduction | Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation. |
Supervised evaluation tasks | After choosing hyperparameters to maximize the dev Fl, we would retrain the model using these hyperparameters on the full 8936 sentence training set, and evaluate on test. |
Supervised evaluation tasks | One hyperparameter was l2-regularization sigma, which for most models was optimal at 2 or 3.2. |
Supervised evaluation tasks | The word embeddings also required a scaling hyperparameter , as described in Section 7.2. |
Unlabled Data | We can scale the embeddings by a hyperparameter , to control their standard deviation. |
Experimental setup | Unless stated otherwise, all results are based on runs of 1,000 iterations with 100 classes, with a 200-iteration bumin period after which hyperparameters were reesti-mated every 50 iterations.3 The probabilities estimated by the models (P(n|v, 7“) for LDA and P(n,v|7“) for ROOTH- and DUAL-LDA) were sampled every 50 iterations post-burnin and averaged over three runs to smooth out variance. |
Results | (2009) demonstrate that LDA is relatively insensitive to the choice of topic vocabulary size Z when the 04 and 6 hyperparameters are optimised appropriately during estimation. |
Results | In fact, we do not find that performance becomes significantly less robust when hyperparameter reestimation is deactiviated; correlation scores simply drop by a small amount (1—2 points), irrespective of the Z chosen. |
Three selectional preference models | Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv. |
Evaluation | All hyperparameters ac, BC were held constant at 04, 6 for simplicity and were fit using grid-search over 04 E [10—6,106],6 E [10—3,0.5]. |
Evaluation | Hyperparameters were handled the same way as for GS. |
The STSG Model | The hyperparameters do can be incorporated into the generative model as random variables; however, we opt to fix these at various constants to investigate different levels of sparsity. |
The STSG Model | Assuming fixed hyperparameters a 2 {ac} and ,6 2 {BC}, our inference problem is to find the posterior distribution of the derivation sequences |
Inference | Recall that ve is a hyperparameter for the Dirichlet prior on G0 and depends on the value of the corresponding indicator variable A6. |
Inference | Recall that each sparsity indicator A6 determines the value of the corresponding hyperparameter 216 of the Dirichlet prior for the character-edit base distribution Go. |
Model | The prior on the base distribution G0 is a Dirichlet distribution with hyperparameters 27, i.e., g; N Dirichlet(27). |