Index of papers in Proc. ACL that mention

hyperparameters

Seen in text as:

hyperparameters (120)
hyperparameter (52)
Hyperparameters (6)

Seen in 170 sentences in 33 papers.

1. Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

Cohn, Trevor and Specia, Lucia

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Gaussian Process Regression	Specifically, we can derive the gradient of the (log) marginal likelihood with respect to the model hyperparameters (i.e., a, an, 08 etc.)
Gaussian Process Regression	Note that in general the marginal likelihood is non-convex in the hyperparameter values, and consequently the solutions may only be locally optimal.
Gaussian Process Regression	Here we bootstrap the learning of complex models with many hyperparameters by initialising
Multitask Quality Estimation 4.1 Experimental Setup	GP: All GP models were implemented using the GPML Matlab toolbox.7 Hyperparameter optimi-sation was performed using conjugate gradient ascent of the log marginal likelihood function, with up to 100 iterations.
Multitask Quality Estimation 4.1 Experimental Setup	The simpler models were initialised with all hyperparameters set to one, while more complex models were initialised using the

hyperparameters is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

2. Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm

Vaswani, Ashish and Huang, Liang and Chiang, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Even though we have used a small set of gold-standard alignments to tune our hyperparameters, we found that performance was fairly robust to variation in the hyperparameters , and translation performance was good even when gold-standard alignments were unavailable.
Experiments	We have implemented our algorithm as an open-source extension to GIZA++.1 Usage of the extension is identical to standard GIZA++, except that the user can switch the (0 prior on or off, and adjust the hyperparameters a and ,6.
Experiments	We set the hyperparameters a and ,6 by tuning on gold-standard word alignments (to maximize F1) when possible.
Experiments	The fact that we had to use hand-aligned data to tune the hyperparameters a and ,6 means that our method is no longer completely unsupervised.
Method	The hyperparameter ,6 controls the tightness of the approximation, as illustrated in Figure 1.

hyperparameters is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. The effect of non-tightness on Bayesian estimation of PCFGs

Cohen, Shay B. and Johnson, Mark

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian inference for PCFGs	Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters 80.
Bayesian inference for PCFGs	Result: A vector of parameters 8 repeat draw 0 from products of Dirichlet with i hyperparameters oz + f (t)
Bayesian inference for PCFGs	Input: Grammar G, vector of trees t, vector of hyperparameters a, previous rule parameters 0 .

hyperparameters is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

4. Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Kim, Young-Bum and Snyder, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	Next we examine the transition Dirichlet hyperparameters learned by our model.
Analysis	As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
Analysis	Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Experiments	The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
Inference	The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
Inference	Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
Inference	To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:

hyperparameters is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

5. A Bayesian Method for Robust Estimation of Distributional Similarities

Kazama, Jun'ichi and De Saeger, Stijn and Kuroda, Kow and Murata, Masaki and Torisawa, Kentaro

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The Dirichlet distribution is parametrized by hyperparameters ak(> 0).
Background	where C(k) is the frequency of choice k in data D. For example, C(k) = C(wi, fk) in the estimation of p( This is very simple: we just need to add the observed counts to the hyperparameters .
Experiments	We randomly chose 200 sets each for sets “A” and “B.” Set “A” is a development set to tune the value of the hyperparameters and
Experiments	As for BCb, we assumed that all of the hyperparameters had the same value, i.e., 04k; = 04.
Experiments	Because tuning hyperparameters involves the possibility of overfitting, its robustness should be assessed.
Method	Note that with the Dirichlet prior, 04;; 2 04k, + C(wl, fk) and 6,; = 6;, + C(wg, fk), where 04],; and 6;, are the hyperparameters of the priors of ml and 2122, respectively.
Method	To put it all together, we can obtain a new Bayesian similarity measure on words, which can be calculated only from the hyperparameters for the Dirichlet prior, 04 and 6, and the observed counts C(wi, fk).

hyperparameters is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. Weak semantic context helps phonetic learning in a model of infant language acquisition

Frank, Stella and Feldman, Naomi H. and Goldwater, Sharon

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Hyperparameters are inferred, which leads to a dominant topic that includes mainly light verbs (have, let, see, do).
Experiments	Each condition (model, vowel speakers, consonant set) is run five times, using 1500 iterations of Gibbs sampling with hyperparameter sampling.
Inference: Gibbs Sampling	Squared nodes depict hyperparameters .
Inference: Gibbs Sampling	A is the set of hyperparameters used by H L when generating lexical items (see Section 3.2).
Inference: Gibbs Sampling	5.3 Hyperparameters

hyperparameters is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

7. A Nonparametric Bayesian Approach to Acoustic Model Discovery

Lee, Chia-ying and Glass, James

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Table 1: The values of the hyperparameters of our model, where [id and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix of training data.
Experimental Setup	Hyperparameters and Training Iterations The values of the hyperparameters of our model are shown in Table l, where Md and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix computed from training data.
Inference	We use P - - - ) to denote a conditional posterior probability given observed data, all the other variables, and hyperparameters for the model.
Inference	The conjugate prior we use for the two variables is a normal-Gamma distribution with hyperparameters ,uo, 14:0, a0 and fig (Murphy, 2007).
Inference	Assume we use a symmetric Dirichlet distribution with a positive hyperparameter
Model	2, where the shaded circle denotes the observed feature vectors, and the squares denote the hyperparameters of the priors used in our model.
Results	6In the future, we plan to extend the model and infer the values of these hyperparameters from data directly.

hyperparameters is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

8. Linguistic Structured Sparsity in Text Categorization

Yogatama, Dani and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 6 shows examples of zero and nonzero topics for the dev.-tuned hyperparameter values.
Group Lasso	where Aglas is a hyperparameter tuned on a development data, and Ag is a group specific weight.
Notation	Both methods disprefer weights of large magnitude; smaller (relative) magnitude means a feature (here, a word) has a smaller effect on the prediction, and zero means a feature has no effect.2 The hyperparameter A in each case is typically tuned on a development dataset.
Structured Regularizers for Text	As a result, besides Aglas , we have an additional hyperparameter , denoted by Alas.
Structured Regularizers for Text	Since the lasso-like penalty does not occur naturally in a non tree-structured regularizer, we add an additional lasso penalty for each word type (with hyperparameter Alas) to also encourage weights of irrelevant words to go to zero.
Structured Regularizers for Text	Similar to the parse tree regularizer, for the lasso-like penalty on each word, we tune one group weight for all word types on a development data with a hyperparameter Alas.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

9. A Topic Model for Building Fine-grained Domain-specific Emotion Lexicon

Yang, Min and Zhu, Dingju and Chow, Kam-Pui

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Algorithm	scalars 756) and vie) are hyperparameters of the
Algorithm	The generative process of word distributions for non-emotion topics follows the standard LDA definition with a scalar hyperparameter 607’).
Algorithm	They are generated from Dirichlet priors Dir(oz(e)) and Dir(a(n)) with 04(5) and 0407’) being hyperparameters .
Experiments	We first settle down the implementation details for the EaLDA model, specifying the hyperparameters that we choose for the experiment.
Experiments	We set topic number M = 6, K = 4, and hyperparameters 04 = 0.75, 04(6) 2 ox”) = 045,001) = 0.5.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

10. Reducing Wrong Labels in Distant Supervision for Relation Extraction

Takamatsu, Shingo and Sato, Issei and Nakagawa, Hiroshi

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The averages of hyperparameters of PROP were 0.84 d: 0.05 for A and 0.85 d: 0.10 for the threshold.
Experiments	Proposed Model (PROP): Using the training data, we determined the two hyperparameters , A and the threshold to round gbrs to 1 or 0, so that they maximized the F value.
Experiments	hand, our model learns parameters such as or for each relation and thus the hyperparameter of our model does not directly affect its performance.
Generative Model	In this section, we consider relation 7“ since parameters are conditionally independent if relation 7“ and the hyperparameter are given.
Generative Model	A is the hyperparameter and mst is constant.
Generative Model	where 0 S A S 1 is the hyperparameter that controls how strongly brs is affected by the main labeling process explained in the previous subsection.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

11. Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributional Semantic Hidden Markov Models	We follow the “neutral” setting of hyperparameters given by Ormoneit and Tresp (1995), so that the MAP estimate for the covariance matrix for (event or slot) state 2' becomes:
Distributional Semantic Hidden Markov Models	where j indexes all the relevant semantic vectors 953- in the training set, rij is the posterior responsibility of state i for vector :53, and [3 is the remaining hyperparameter that we tune to adjust the amount of regularization.
Distributional Semantic Hidden Markov Models	We tune the hyperparameters (N E, N3, 6, [3, k) and the number of EM iterations by twofold cross-validationl.
Guided Summarization Slot Induction	We trained a DSHMM separately for each of the five domains with different semantic models, tuning hyperparameters by twofold cross-validation.
Related Work	Distributions that generate the latent variables and hyperparameters are omitted for clarity.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

12. Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

Das, Dipanjan and Petrov, Slav

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Because we are interested in applying our techniques to languages for which no labeled resources are available, we paid particular attention to minimize the number of free parameters and used the same hyperparameters for all language pairs.
Experiments and Results	We paid particular attention to minimize the number of free parameters, and used the same hyperparameters for all language pairs, rather than attempting language-specific tuning.
Experiments and Results	While we tried to minimize the number of free parameters in our model, there are a few hyperparameters that need to be set.
Experiments and Results	Fortunately, performance was stable across various values, and we were able to use the same hyperparameters for all languages.
PCS Projection	, \|Vf\|) are the label distributions over the foreign language vertices and ,u and V are hyperparameters that we discuss in §6.4.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

13. Interactive Topic Modeling

Hu, Yuening and Boyd-Graber, Jordan and Satinoff, Brianna

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Constraints Shape Topics	In this model, a, 6, and 77 are Dirichlet hyperparameters set by the user; their role is explained below.
Constraints Shape Topics	where TM is the number of times topic k is used in document d, Phwd’n is the number of times the type wdm, is assigned to topic k, and 04, 6 are the hyperparameters of the two Dirichlet distributions, and B is the number of top-level branches (this is the vocabulary size for vanilla LDA).
Constraints Shape Topics	In order to make the constraints effective, we set the constraint word-distribution hyperparameter 77 to be much larger than the hyperparameter for the distribution over constraints and vocabulary 6.
Simulation Experiment	The hyperparameters for all experiments are 04 = 0.1, 6 = 0.01, and 77 = 100.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

14. Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual Infinite Tree Model	Our procedure alternates between sampling each of the following variables: the auxiliary variables u, the state assignments z, the transition probabilities 71', the shared DP parameters ,6, and the hyperparameters 040 and y.
Bilingual Infinite Tree Model	040 is parameterized by a gamma hyperprior with hyperparameters aa and 045.
Bilingual Infinite Tree Model	7 is parameterized by a gamma hyperprior with hyperparameters ya and 7b.
Experiment	In sampling a0 and 7, hyperparameters aa, ab, ya, and 7;, are set to 2, 1, 1, and 1, respectively, which is the same setting in Gael et al.
Experiment	The development test data is used to set up hyperparameters , i.e., to terminate tuning iterations.

hyperparameters is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

15. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Supervised evaluation tasks	After choosing hyperparameters to maximize the dev Fl, we would retrain the model using these hyperparameters on the full 8936 sentence training set, and evaluate on test.
Supervised evaluation tasks	One hyperparameter was l2-regularization sigma, which for most models was optimal at 2 or 3.2.
Supervised evaluation tasks	The word embeddings also required a scaling hyperparameter , as described in Section 7.2.
Unlabled Data	We can scale the embeddings by a hyperparameter , to control their standard deviation.

hyperparameters is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

16. Cross-Language Text Classification Using Structural Correspondence Learning

Prettenhofer, Peter and Stein, Benno

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Special emphasis is put on corpus construction, determination of upper bounds and baselines, and a sensitivity analysis of important hyperparameters .
Experiments	SGD receives two hyperparameters as input: the number of iterations T, and the regularization parameter A.
Experiments	Recall that CL-SCL receives three hyperparameters as input: the number of pivots m, the dimensionality of the cross-lingual representation k,
Introduction	Third, an in-depth analysis with respect to important hyperparameters such as the ratio of labeled and unlabeled documents, the number of pivots, and the optimum dimensionality of the cross-lingual representation.

hyperparameters is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

17. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Since our implementation is based on Unicode and learns all hyperparameters from the data, we also confirmed that NPYLM segments the Arabic Gigawords equally well.
Inference	9 Sample hyperparameters of 9
Inference	ba Na) to estimate A from the data for given language and word type.7 Here, l‘(:c) is a Gamma function and a, b are the hyperparameters chosen to give a nearly uniform prior distribution.8
Pitman-Yor process and n-gram models	6, d are hyperparameters that can be learned as Gamma and Beta posteriors, respectively, given the data.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hierarchical Phrase Table Combination	All the parameters 6j and hyperparameters dj and sj , are obtained by learning on the jth domain.
Hierarchical Phrase Table Combination	Returning the hyperparameters again when cascading another domain may improve the performance of the combination weight, but we will leave it for future work.
Phrase Pair Extraction with Unsupervised Phrasal ITGs	3. d and s are the discount and strengthen hyperparameters .
Related Work	However, their methods usually require numbers of hyperparameters , such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

19. Toward Better Chinese Word Segmentation for SMT via Bilingual Constraints

Zeng, Xiaodong and Chao, Lidia S. and Wong, Derek F. and Trancoso, Isabel and Tian, Liang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	There are four hyperparameters in our model to be tuned by using the development data (devMT) among the following settings: for the graph propagation, ,0 E {0205,08} and p E {0.1,0.3,0.5,0.8}; for the PR learning, A E {0 g A,- g 1} and 0 E {0 S a, g 1} where the step is 0.1.
Experiments	The optimal hyperparameter values were found to be: STS-NO-GP (04 = 0.8) and 77 = 0.6) and STS-GP-PL (,u = 0.5,,0 = 03,04 2 0.8 and 77 = 0.6).
Experiments	The optimal hyperparameter values were found to be: VES-NO-GP (04 = 0.7) and VES-GP-PL (,u = 0.5, ,0 = 0.3 and 04 = 0.7).
Methodology	The hyperparameter A is used to control the impacts of the penalty term.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

CRFs (30)
segmentations (18)
treebank (16)

20. Coreference Semantics from Web Features

Bansal, Mohit and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We develop our features and tune their hyperparameter values on the ACE04 development set and then use these on the ACE04 test set.12 On the ACE05 and ACE05-ALL datasets, we directly transfer our Web features and their hyperparameter values from the ACE04 dev-set, without any retuning.
Semantics via Web Features	To capture this effect, we create a feature that indicates whether there is a match in the top 1:: seeds of the two headwords (where k: is a hyperparameter to tune).
Semantics via Web Features	We first collect the POS tags (using length 2 character prefixes to indicate coarse parts of speech) of the seeds matched in the top h’ seed lists of the two headwords, where h’ is another hyperparameter to tune.
Semantics via Web Features	We tune a separate bin-size hyperparameter for each of these three features.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

21. Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

Yamangil, Elif and Shieber, Stuart M.

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	All hyperparameters ac, BC were held constant at 04, 6 for simplicity and were fit using grid-search over 04 E [10—6,106],6 E [10—3,0.5].
Evaluation	Hyperparameters were handled the same way as for GS.
The STSG Model	The hyperparameters do can be incorporated into the generative model as random variables; however, we opt to fix these at various constants to investigate different levels of sparsity.
The STSG Model	Assuming fixed hyperparameters a 2 {ac} and ,6 2 {BC}, our inference problem is to find the posterior distribution of the derivation sequences

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

22. Latent Variable Models of Selectional Preference

Ó Séaghdha, Diarmuid

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental setup	Unless stated otherwise, all results are based on runs of 1,000 iterations with 100 classes, with a 200-iteration bumin period after which hyperparameters were reesti-mated every 50 iterations.3 The probabilities estimated by the models (P(n\|v, 7“) for LDA and P(n,v\|7“) for ROOTH- and DUAL-LDA) were sampled every 50 iterations post-burnin and averaged over three runs to smooth out variance.
Results	(2009) demonstrate that LDA is relatively insensitive to the choice of topic vocabulary size Z when the 04 and 6 hyperparameters are optimised appropriately during estimation.
Results	In fact, we do not find that performance becomes significantly less robust when hyperparameter reestimation is deactiviated; correlation scores simply drop by a small amount (1—2 points), irrespective of the Z chosen.
Three selectional preference models	Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

23. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure

Johnson, Mark

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and future work	In this paper all of the hyperparameters 04A were tied and varied simultaneously, but it is desirable to learn these from data as well.
Conclusion and future work	Just before the camera-ready version of this paper was due we developed a method for estimating the hyperparameters by putting a vague Gamma hyper-prior on each 04A and sampled using Metropolis-Hastings with a sequence of increasingly narrow Gamma proposal distributions, producing results for each model that are as good or better than the best ones reported in Table l.
Word segmentation with adaptor grammars	We tied the Dirichlet Process concentration parameters a, and performed runs with 04 = 1, 10, 100 and 1000; apart from this, no attempt was made to optimize the hyperparameters .
Word segmentation with adaptor grammars	It may be possible to correct this by “tuning” the grammar’s hyperparameters , but we did not attempt this here.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

24. Learning Latent Personas of Film Characters

Bamman, David and O'Connor, Brendan and Smith, Noah A.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models	P Number of personas (hyperparameter) K Number of word topics ( hyperparameter ) D Number of movie plot summaries E Number of characters in movie d W Number of (role, word) tuples used by character 6 (bk Topic kr’s distribution over V words.
Models	Next, let a persona p be defined as a set of three multinomials $10 over these K topics, one for each typed role 7“, each drawn from a Dirichlet with a role-specific hyperparameter (VT).
Models	In other words, the probability that character 6 embodies persona k is proportional to the number of other characters in the plot summary who also embody that persona (plus the Dirichlet hyperparameter 0%) times the contribution of each observed word wj for that character, given its current topic assignment zj.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

25. Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	4.3 Hyperparameter Estimation
Inference	We treat hyperparameters {d, 0} as random variables and update their values for every MCMC iteration.
Inference	We place a prior on the hyperparameters as follows: d N Beta(1,1), 6 N Gamma(1,1).

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

26. SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations

Nguyen, Viet-An and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluating Topic Shift Tendency	2008 Elections To obtain a posterior estimate of 7r (Figure 3) we create 10 chains with hyperparameters sampled from the uniform distribution U (0, l) and averaged 7r over 10 chains (as described in Section 5).
Inference	Marginal counts are represented with - and >x< represents all hyperparameters .
Topic Segmentation Experiments	Initial hyperparameter values are sampled from U (0, 1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal, 2003) optimizes hyperparameters .

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

27. In-domain Relation Discovery with Meta-constraints via Posterior Regularization

Chen, Harr and Benson, Edward and Naseem, Tahira and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Training Regimes and Hyperparameters For each run of our model we perform three random restarts to convergence and select the posterior with lowest final free energy.
Experimental Setup	Dirichlet hyperparameters are set to 0.1.
Model	Fixed hyperparameters are subscripted with zero.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

28. A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

Blunsom, Phil and Cohn, Trevor

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The PYP-HMM	The arrangement of customers at tables defines a clustering which exhibits a power-law behavior controlled by the hyperparameters a and b.
The PYP-HMM	Sampling hyperparameters We treat the hyper-parameters {(cfl, If”) ,x E (U, B,T, E, 0)} as random variables in our model and infer their values.
The PYP-HMM	The result of this hyperparameter inference is that there are no user tunable parameters in the model, an important feature that we believe helps explain its consistently high performance across test settings.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

29. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Method	,u and A are two hyperparameters whose values are discussed in Section 5.
Method	Based on the development data, the hyperparameters of our model were tuned among the following settings: for the graph propagation, ,u E {0.2,0.5,0.8} and A E {0.1,0.3,0.5,0.8}; for the CRFs training, 04 E {0.1,0.3,0.5,0.7,0.9}.
Method	With the chosen set of hyperparameters , the test data was used to measure the final performance.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

30. A Bayesian Mixed Effects Model of Literary Character

Bamman, David and Underwood, Ted and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	The generative story runs as follows (Figure 2 depicts the full graphical model): Let there be M unique authors in the data, P latent personas (a hyperparameter to be set), and V words in the vocabulary (in the general setting these may be word types; in our data the vocabulary is the set of 1,000 unique cluster IDs).
Model	This is proportional to the number of other characters in document d who also (currently) have that persona (plus the Dirichlet hyperparameter which acts as a smoother) times the probability (under pdfi = z) of all of the words
Model	Number of personas ( hyperparameter ) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. A Statistical Model for Lost Language Decipherment

Snyder, Benjamin and Barzilay, Regina and Knight, Kevin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	Recall that ve is a hyperparameter for the Dirichlet prior on G0 and depends on the value of the corresponding indicator variable A6.
Inference	Recall that each sparsity indicator A6 determines the value of the corresponding hyperparameter 216 of the Dirichlet prior for the character-edit base distribution Go.
Model	The prior on the base distribution G0 is a Dirichlet distribution with hyperparameters 27, i.e., g; N Dirichlet(27).

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

32. Learning Polylingual Topic Models from Code-Switched Social Media Documents

Peng, Nanyun and Wang, Yiming and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Code-Switching	We use asymmetric Dirichlet priors (Wallach et al., 2009), and let the optimization process learn the hyperparameters .
Code-Switching	We optimize the hyperparameters 04, 6, 7 and 6 by interleaving sampling iterations with a Newton-Raphson update to obtain the MLE estimate for the hyperparameters .
Code-Switching	Where H is the Hessian matrix and 3—2 is the gradient of the likelihood function with respect to the optimizing hyperparameter .

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. Latent Variable Models of Concept-Attribute Attachment

Reisinger, Joseph and Pasca, Marius

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hierarchical Topic Models 3.1 Latent Dirichlet Allocation	(1) where 04 and 77 are hyperparameters smoothing the per-attribute set distribution over concepts and per-concept attribute distribution respectively (see Figure 2 for the graphical model).
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation	The hyperparameter *y controls the probability of branching via the per-node Dirichlet Process, and L is the fixed tree depth.
Hierarchical Topic Models 3.1 Latent Dirichlet Allocation	Hyperparameters were a=0.1, 7720.1, 721.0.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: