Index of papers in Proc. ACL that mention
  • hyperparameters
Cohn, Trevor and Specia, Lucia
Gaussian Process Regression
Specifically, we can derive the gradient of the (log) marginal likelihood with respect to the model hyperparameters (i.e., a, an, 08 etc.)
Gaussian Process Regression
Note that in general the marginal likelihood is non-convex in the hyperparameter values, and consequently the solutions may only be locally optimal.
Gaussian Process Regression
Here we bootstrap the learning of complex models with many hyperparameters by initialising
Multitask Quality Estimation 4.1 Experimental Setup
GP: All GP models were implemented using the GPML Matlab toolbox.7 Hyperparameter optimi-sation was performed using conjugate gradient ascent of the log marginal likelihood function, with up to 100 iterations.
Multitask Quality Estimation 4.1 Experimental Setup
The simpler models were initialised with all hyperparameters set to one, while more complex models were initialised using the
hyperparameters is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Vaswani, Ashish and Huang, Liang and Chiang, David
Conclusion
Even though we have used a small set of gold-standard alignments to tune our hyperparameters, we found that performance was fairly robust to variation in the hyperparameters , and translation performance was good even when gold-standard alignments were unavailable.
Experiments
We have implemented our algorithm as an open-source extension to GIZA++.1 Usage of the extension is identical to standard GIZA++, except that the user can switch the (0 prior on or off, and adjust the hyperparameters a and ,6.
Experiments
We set the hyperparameters a and ,6 by tuning on gold-standard word alignments (to maximize F1) when possible.
Experiments
The fact that we had to use hand-aligned data to tune the hyperparameters a and ,6 means that our method is no longer completely unsupervised.
Method
The hyperparameter ,6 controls the tightness of the approximation, as illustrated in Figure 1.
hyperparameters is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Johnson, Mark
Bayesian inference for PCFGs
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters 80.
Bayesian inference for PCFGs
Result: A vector of parameters 8 repeat draw 0 from products of Dirichlet with i hyperparameters oz + f (t)
Bayesian inference for PCFGs
Input: Grammar G, vector of trees t, vector of hyperparameters a, previous rule parameters 0 .
hyperparameters is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kim, Young-Bum and Snyder, Benjamin
Analysis
Next we examine the transition Dirichlet hyperparameters learned by our model.
Analysis
As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
Analysis
Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Experiments
The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
Inference
The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
Inference
Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
Inference
To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:
hyperparameters is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and De Saeger, Stijn and Kuroda, Kow and Murata, Masaki and Torisawa, Kentaro
Background
The Dirichlet distribution is parametrized by hyperparameters ak(> 0).
Background
where C(k) is the frequency of choice k in data D. For example, C(k) = C(wi, fk) in the estimation of p( This is very simple: we just need to add the observed counts to the hyperparameters .
Experiments
We randomly chose 200 sets each for sets “A” and “B.” Set “A” is a development set to tune the value of the hyperparameters and
Experiments
As for BCb, we assumed that all of the hyperparameters had the same value, i.e., 04k; = 04.
Experiments
Because tuning hyperparameters involves the possibility of overfitting, its robustness should be assessed.
Method
Note that with the Dirichlet prior, 04;; 2 04k, + C(wl, fk) and 6,; = 6;, + C(wg, fk), where 04],; and 6;, are the hyperparameters of the priors of ml and 2122, respectively.
Method
To put it all together, we can obtain a new Bayesian similarity measure on words, which can be calculated only from the hyperparameters for the Dirichlet prior, 04 and 6, and the observed counts C(wi, fk).
hyperparameters is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon
Experiments
Hyperparameters are inferred, which leads to a dominant topic that includes mainly light verbs (have, let, see, do).
Experiments
Each condition (model, vowel speakers, consonant set) is run five times, using 1500 iterations of Gibbs sampling with hyperparameter sampling.
Inference: Gibbs Sampling
Squared nodes depict hyperparameters .
Inference: Gibbs Sampling
A is the set of hyperparameters used by H L when generating lexical items (see Section 3.2).
Inference: Gibbs Sampling
5.3 Hyperparameters
hyperparameters is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lee, Chia-ying and Glass, James
Experimental Setup
Table 1: The values of the hyperparameters of our model, where [id and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix of training data.
Experimental Setup
Hyperparameters and Training Iterations The values of the hyperparameters of our model are shown in Table l, where Md and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix computed from training data.
Inference
We use P - - - ) to denote a conditional posterior probability given observed data, all the other variables, and hyperparameters for the model.
Inference
The conjugate prior we use for the two variables is a normal-Gamma distribution with hyperparameters ,uo, 14:0, a0 and fig (Murphy, 2007).
Inference
Assume we use a symmetric Dirichlet distribution with a positive hyperparameter
Model
2, where the shaded circle denotes the observed feature vectors, and the squares denote the hyperparameters of the priors used in our model.
Results
6In the future, we plan to extend the model and infer the values of these hyperparameters from data directly.
hyperparameters is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Smith, Noah A.
Experiments
Table 6 shows examples of zero and nonzero topics for the dev.-tuned hyperparameter values.
Group Lasso
where Aglas is a hyperparameter tuned on a development data, and Ag is a group specific weight.
Notation
Both methods disprefer weights of large magnitude; smaller (relative) magnitude means a feature (here, a word) has a smaller effect on the prediction, and zero means a feature has no effect.2 The hyperparameter A in each case is typically tuned on a development dataset.
Structured Regularizers for Text
As a result, besides Aglas , we have an additional hyperparameter , denoted by Alas.
Structured Regularizers for Text
Since the lasso-like penalty does not occur naturally in a non tree-structured regularizer, we add an additional lasso penalty for each word type (with hyperparameter Alas) to also encourage weights of irrelevant words to go to zero.
Structured Regularizers for Text
Similar to the parse tree regularizer, for the lasso-like penalty on each word, we tune one group weight for all word types on a development data with a hyperparameter Alas.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yang, Min and Zhu, Dingju and Chow, Kam-Pui
Algorithm
scalars 756) and vie) are hyperparameters of the
Algorithm
The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’).
Algorithm
They are generated from Dirichlet priors Dir(oz(e)) and Dir(a(n)) with 04(5) and 0407’) being hyperparameters .
Experiments
We first settle down the implementation details for the EaLDA model, specifying the hyperparameters that we choose for the experiment.
Experiments
We set topic number M = 6, K = 4, and hyperparameters 04 = 0.75, 04(6) 2 ox”) = 045,001) = 0.5.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi
Experiments
The averages of hyperparameters of PROP were 0.84 d: 0.05 for A and 0.85 d: 0.10 for the threshold.
Experiments
Proposed Model (PROP): Using the training data, we determined the two hyperparameters , A and the threshold to round gbrs to 1 or 0, so that they maximized the F value.
Experiments
hand, our model learns parameters such as or for each relation and thus the hyperparameter of our model does not directly affect its performance.
Generative Model
In this section, we consider relation 7“ since parameters are conditionally independent if relation 7“ and the hyperparameter are given.
Generative Model
A is the hyperparameter and mst is constant.
Generative Model
where 0 S A S 1 is the hyperparameter that controls how strongly brs is affected by the main labeling process explained in the previous subsection.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Cheung, Jackie Chi Kit and Penn, Gerald
Distributional Semantic Hidden Markov Models
We follow the “neutral” setting of hyperparameters given by Ormoneit and Tresp (1995), so that the MAP estimate for the covariance matrix for (event or slot) state 2' becomes:
Distributional Semantic Hidden Markov Models
where j indexes all the relevant semantic vectors 953- in the training set, rij is the posterior responsibility of state i for vector :53, and [3 is the remaining hyperparameter that we tune to adjust the amount of regularization.
Distributional Semantic Hidden Markov Models
We tune the hyperparameters (N E, N3, 6, [3, k) and the number of EM iterations by twofold cross-validationl.
Guided Summarization Slot Induction
We trained a DSHMM separately for each of the five domains with different semantic models, tuning hyperparameters by twofold cross-validation.
Related Work
Distributions that generate the latent variables and hyperparameters are omitted for clarity.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Petrov, Slav
Conclusion
Because we are interested in applying our techniques to languages for which no labeled resources are available, we paid particular attention to minimize the number of free parameters and used the same hyperparameters for all language pairs.
Experiments and Results
We paid particular attention to minimize the number of free parameters, and used the same hyperparameters for all language pairs, rather than attempting language-specific tuning.
Experiments and Results
While we tried to minimize the number of free parameters in our model, there are a few hyperparameters that need to be set.
Experiments and Results
Fortunately, performance was stable across various values, and we were able to use the same hyperparameters for all languages.
PCS Projection
, |Vf|) are the label distributions over the foreign language vertices and ,u and V are hyperparameters that we discuss in §6.4.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Boyd-Graber, Jordan and Satinoff, Brianna
Constraints Shape Topics
In this model, a, 6, and 77 are Dirichlet hyperparameters set by the user; their role is explained below.
Constraints Shape Topics
where TM is the number of times topic k is used in document d, Phwd’n is the number of times the type wdm, is assigned to topic k, and 04, 6 are the hyperparameters of the two Dirichlet distributions, and B is the number of top-level branches (this is the vocabulary size for vanilla LDA).
Constraints Shape Topics
In order to make the constraints effective, we set the constraint word-distribution hyperparameter 77 to be much larger than the hyperparameter for the distribution over constraints and vocabulary 6.
Simulation Experiment
The hyperparameters for all experiments are 04 = 0.1, 6 = 0.01, and 77 = 100.
hyperparameters is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu
Bilingual Infinite Tree Model
Our procedure alternates between sampling each of the following variables: the auxiliary variables u, the state assignments z, the transition probabilities 71', the shared DP parameters ,6, and the hyperparameters 040 and y.
Bilingual Infinite Tree Model
040 is parameterized by a gamma hyperprior with hyperparameters aa and 045.
Bilingual Infinite Tree Model
7 is parameterized by a gamma hyperprior with hyperparameters ya and 7b.
Experiment
In sampling a0 and 7, hyperparameters aa, ab, ya, and 7;, are set to 2, 1, 1, and 1, respectively, which is the same setting in Gael et al.
Experiment
The development test data is used to set up hyperparameters , i.e., to terminate tuning iterations.
hyperparameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Supervised evaluation tasks
After choosing hyperparameters to maximize the dev Fl, we would retrain the model using these hyperparameters on the full 8936 sentence training set, and evaluate on test.
Supervised evaluation tasks
One hyperparameter was l2-regularization sigma, which for most models was optimal at 2 or 3.2.
Supervised evaluation tasks
The word embeddings also required a scaling hyperparameter , as described in Section 7.2.
Unlabled Data
We can scale the embeddings by a hyperparameter , to control their standard deviation.
hyperparameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Prettenhofer, Peter and Stein, Benno
Experiments
Special emphasis is put on corpus construction, determination of upper bounds and baselines, and a sensitivity analysis of important hyperparameters .
Experiments
SGD receives two hyperparameters as input: the number of iterations T, and the regularization parameter A.
Experiments
Recall that CL-SCL receives three hyperparameters as input: the number of pivots m, the dimensionality of the cross-lingual representation k,
Introduction
Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.
hyperparameters is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Experiments
Since our implementation is based on Unicode and learns all hyperparameters from the data, we also confirmed that NPYLM segments the Arabic Gigawords equally well.
Inference
9 Sample hyperparameters of 9
Inference
ba Na) to estimate A from the data for given language and word type.7 Here, l‘(:c) is a Gamma function and a, b are the hyperparameters chosen to give a nearly uniform prior distribution.8
Pitman-Yor process and n-gram models
6, d are hyperparameters that can be learned as Gamma and Beta posteriors, respectively, given the data.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun
Hierarchical Phrase Table Combination
All the parameters 6j and hyperparameters dj and sj , are obtained by learning on the jth domain.
Hierarchical Phrase Table Combination
Returning the hyperparameters again when cascading another domain may improve the performance of the combination weight, but we will leave it for future work.
Phrase Pair Extraction with Unsupervised Phrasal ITGs
3. d and s are the discount and strengthen hyperparameters .
Related Work
However, their methods usually require numbers of hyperparameters , such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang
Experiments
There are four hyperparameters in our model to be tuned by using the development data (devMT) among the following settings: for the graph propagation, ,0 E {0205,08} and p E {0.1,0.3,0.5,0.8}; for the PR learning, A E {0 g A,- g 1} and 0 E {0 S a, g 1} where the step is 0.1.
Experiments
The optimal hyperparameter values were found to be: STS-NO-GP (04 = 0.8) and 77 = 0.6) and STS-GP-PL (,u = 0.5,,0 = 03,04 2 0.8 and 77 = 0.6).
Experiments
The optimal hyperparameter values were found to be: VES-NO-GP (04 = 0.7) and VES-GP-PL (,u = 0.5, ,0 = 0.3 and 04 = 0.7).
Methodology
The hyperparameter A is used to control the impacts of the penalty term.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Experiments
We develop our features and tune their hyperparameter values on the ACE04 development set and then use these on the ACE04 test set.12 On the ACE05 and ACE05-ALL datasets, we directly transfer our Web features and their hyperparameter values from the ACE04 dev-set, without any retuning.
Semantics via Web Features
To capture this effect, we create a feature that indicates whether there is a match in the top 1:: seeds of the two headwords (where k: is a hyperparameter to tune).
Semantics via Web Features
We first collect the POS tags (using length 2 character prefixes to indicate coarse parts of speech) of the seeds matched in the top h’ seed lists of the two headwords, where h’ is another hyperparameter to tune.
Semantics via Web Features
We tune a separate bin-size hyperparameter for each of these three features.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yamangil, Elif and Shieber, Stuart M.
Evaluation
All hyperparameters ac, BC were held constant at 04, 6 for simplicity and were fit using grid-search over 04 E [10—6,106],6 E [10—3,0.5].
Evaluation
Hyperparameters were handled the same way as for GS.
The STSG Model
The hyperparameters do can be incorporated into the generative model as random variables; however, we opt to fix these at various constants to investigate different levels of sparsity.
The STSG Model
Assuming fixed hyperparameters a 2 {ac} and ,6 2 {BC}, our inference problem is to find the posterior distribution of the derivation sequences
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ó Séaghdha, Diarmuid
Experimental setup
Unless stated otherwise, all results are based on runs of 1,000 iterations with 100 classes, with a 200-iteration bumin period after which hyperparameters were reesti-mated every 50 iterations.3 The probabilities estimated by the models (P(n|v, 7“) for LDA and P(n,v|7“) for ROOTH- and DUAL-LDA) were sampled every 50 iterations post-burnin and averaged over three runs to smooth out variance.
Results
(2009) demonstrate that LDA is relatively insensitive to the choice of topic vocabulary size Z when the 04 and 6 hyperparameters are optimised appropriately during estimation.
Results
In fact, we do not find that performance becomes significantly less robust when hyperparameter reestimation is deactiviated; correlation scores simply drop by a small amount (1—2 points), irrespective of the Z chosen.
Three selectional preference models
Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark
Conclusion and future work
In this paper all of the hyperparameters 04A were tied and varied simultaneously, but it is desirable to learn these from data as well.
Conclusion and future work
Just before the camera-ready version of this paper was due we developed a method for estimating the hyperparameters by putting a vague Gamma hyper-prior on each 04A and sampled using Metropolis-Hastings with a sequence of increasingly narrow Gamma proposal distributions, producing results for each model that are as good or better than the best ones reported in Table l.
Word segmentation with adaptor grammars
We tied the Dirichlet Process concentration parameters a, and performed runs with 04 = 1, 10, 100 and 1000; apart from this, no attempt was made to optimize the hyperparameters .
Word segmentation with adaptor grammars
It may be possible to correct this by “tuning” the grammar’s hyperparameters , but we did not attempt this here.
hyperparameters is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and O'Connor, Brendan and Smith, Noah A.
Models
P Number of personas (hyperparameter) K Number of word topics ( hyperparameter ) D Number of movie plot summaries E Number of characters in movie d W Number of (role, word) tuples used by character 6 (bk Topic kr’s distribution over V words.
Models
Next, let a persona p be defined as a set of three multinomials $10 over these K topics, one for each typed role 7“, each drawn from a Dirichlet with a role-specific hyperparameter (VT).
Models
In other words, the probability that character 6 embodies persona k is proportional to the number of other characters in the plot summary who also embody that persona (plus the Dirichlet hyperparameter 0%) times the contribution of each observed word wj for that character, given its current topic assignment zj.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki
Inference
4.3 Hyperparameter Estimation
Inference
We treat hyperparameters {d, 0} as random variables and update their values for every MCMC iteration.
Inference
We place a prior on the hyperparameters as follows: d N Beta(1,1), 6 N Gamma(1,1).
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Viet-An and Boyd-Graber, Jordan and Resnik, Philip
Evaluating Topic Shift Tendency
2008 Elections To obtain a posterior estimate of 7r (Figure 3) we create 10 chains with hyperparameters sampled from the uniform distribution U (0, l) and averaged 7r over 10 chains (as described in Section 5).
Inference
Marginal counts are represented with - and >x< represents all hyperparameters .
Topic Segmentation Experiments
Initial hyperparameter values are sampled from U (0, 1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal, 2003) optimizes hyperparameters .
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Harr and Benson, Edward and Naseem, Tahira and Barzilay, Regina
Experimental Setup
Training Regimes and Hyperparameters For each run of our model we perform three random restarts to convergence and select the posterior with lowest final free energy.
Experimental Setup
Dirichlet hyperparameters are set to 0.1.
Model
Fixed hyperparameters are subscripted with zero.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor
The PYP-HMM
The arrangement of customers at tables defines a clustering which exhibits a power-law behavior controlled by the hyperparameters a and b.
The PYP-HMM
Sampling hyperparameters We treat the hyper-parameters {(cfl, If”) ,x E (U, B,T, E, 0)} as random variables in our model and infer their values.
The PYP-HMM
The result of this hyperparameter inference is that there are no user tunable parameters in the model, an important feature that we believe helps explain its consistently high performance across test settings.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel
Method
,u and A are two hyperparameters whose values are discussed in Section 5.
Method
Based on the development data, the hyperparameters of our model were tuned among the following settings: for the graph propagation, ,u E {0.2,0.5,0.8} and A E {0.1,0.3,0.5,0.8}; for the CRFs training, 04 E {0.1,0.3,0.5,0.7,0.9}.
Method
With the chosen set of hyperparameters , the test data was used to measure the final performance.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Underwood, Ted and Smith, Noah A.
Model
The generative story runs as follows (Figure 2 depicts the full graphical model): Let there be M unique authors in the data, P latent personas (a hyperparameter to be set), and V words in the vocabulary (in the general setting these may be word types; in our data the vocabulary is the set of 1,000 unique cluster IDs).
Model
This is proportional to the number of other characters in document d who also (currently) have that persona (plus the Dirichlet hyperparameter which acts as a smoother) times the probability (under pdfi = z) of all of the words
Model
Number of personas ( hyperparameter ) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina and Knight, Kevin
Inference
Recall that ve is a hyperparameter for the Dirichlet prior on G0 and depends on the value of the corresponding indicator variable A6.
Inference
Recall that each sparsity indicator A6 determines the value of the corresponding hyperparameter 216 of the Dirichlet prior for the character-edit base distribution Go.
Model
The prior on the base distribution G0 is a Dirichlet distribution with hyperparameters 27, i.e., g; N Dirichlet(27).
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Peng, Nanyun and Wang, Yiming and Dredze, Mark
Code-Switching
We use asymmetric Dirichlet priors (Wallach et al., 2009), and let the optimization process learn the hyperparameters .
Code-Switching
We optimize the hyperparameters 04, 6, 7 and 6 by interleaving sampling iterations with a Newton-Raphson update to obtain the MLE estimate for the hyperparameters .
Code-Switching
Where H is the Hessian matrix and 3—2 is the gradient of the likelihood function with respect to the optimizing hyperparameter .
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Reisinger, Joseph and Pasca, Marius
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation
(1) where 04 and 77 are hyperparameters smoothing the per-attribute set distribution over concepts and per-concept attribute distribution respectively (see Figure 2 for the graphical model).
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation
The hyperparameter *y controls the probability of branching via the per-node Dirichlet Process, and L is the fixed tree depth.
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation
Hyperparameters were a=0.1, 7720.1, 721.0.
hyperparameters is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: