SciSurf: Index of "hyperparameters" in Proc. ACL 2013

Index of papers in Proc. ACL 2013 that mention

hyperparameters

Seen in text as:

hyperparameters (41)
hyperparameter (12)

Seen in 52 sentences in 8 papers.

1. Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

Cohn, Trevor and Specia, Lucia

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Gaussian Process Regression	Specifically, we can derive the gradient of the (log) marginal likelihood with respect to the model hyperparameters (i.e., a, an, 08 etc.)
Gaussian Process Regression	Note that in general the marginal likelihood is non-convex in the hyperparameter values, and consequently the solutions may only be locally optimal.
Gaussian Process Regression	Here we bootstrap the learning of complex models with many hyperparameters by initialising
Multitask Quality Estimation 4.1 Experimental Setup	GP: All GP models were implemented using the GPML Matlab toolbox.7 Hyperparameter optimi-sation was performed using conjugate gradient ascent of the log marginal likelihood function, with up to 100 iterations.
Multitask Quality Estimation 4.1 Experimental Setup	The simpler models were initialised with all hyperparameters set to one, while more complex models were initialised using the

hyperparameters is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

2. The effect of non-tightness on Bayesian estimation of PCFGs

Cohen, Shay B. and Johnson, Mark

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian inference for PCFGs	Input: Grammar G, vector of trees t, vector of hyperparameters a, previous parameters 80.
Bayesian inference for PCFGs	Result: A vector of parameters 8 repeat draw 0 from products of Dirichlet with i hyperparameters oz + f (t)
Bayesian inference for PCFGs	Input: Grammar G, vector of trees t, vector of hyperparameters a, previous rule parameters 0 .

hyperparameters is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

3. Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Kim, Young-Bum and Snyder, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	Next we examine the transition Dirichlet hyperparameters learned by our model.
Analysis	As we can see, the learned hyperparameters yield highly asymmetric priors over transition distributions.
Analysis	Figure 4 shows MAP transition Dirichlet hyperparameters of the CLUST model, when trained
Experiments	The simplest version, SYMM, disregards all information from other languages, using simple symmetric hyperparameters on the transition and emission Dirichlet priors (all hyperparameters set to 1).
Inference	The second term is the tag transition predictive distribution given Dirichlet hyperparameters , yielding a familiar Polya urn scheme form.
Inference	Finally, we tackle the third term, Equation 7, corresponding to the predictive distribution of emission observations given Dirichlet hyperparameters .
Inference	To sample the Dirichlet hyperparameter for cluster k and transition t —> t’, we need to compute:

hyperparameters is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

4. Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

Cheung, Jackie Chi Kit and Penn, Gerald

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributional Semantic Hidden Markov Models	We follow the “neutral” setting of hyperparameters given by Ormoneit and Tresp (1995), so that the MAP estimate for the covariance matrix for (event or slot) state 2' becomes:
Distributional Semantic Hidden Markov Models	where j indexes all the relevant semantic vectors 953- in the training set, rij is the posterior responsibility of state i for vector :53, and [3 is the remaining hyperparameter that we tune to adjust the amount of regularization.
Distributional Semantic Hidden Markov Models	We tune the hyperparameters (N E, N3, 6, [3, k) and the number of EM iterations by twofold cross-validationl.
Guided Summarization Slot Induction	We trained a DSHMM separately for each of the five domains with different semantic models, tuning hyperparameters by twofold cross-validation.
Related Work	Distributions that generate the latent variables and hyperparameters are omitted for clarity.

hyperparameters is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

Tamura, Akihiro and Watanabe, Taro and Sumita, Eiichiro and Takamura, Hiroya and Okumura, Manabu

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bilingual Infinite Tree Model	Our procedure alternates between sampling each of the following variables: the auxiliary variables u, the state assignments z, the transition probabilities 71', the shared DP parameters ,6, and the hyperparameters 040 and y.
Bilingual Infinite Tree Model	040 is parameterized by a gamma hyperprior with hyperparameters aa and 045.
Bilingual Infinite Tree Model	7 is parameterized by a gamma hyperprior with hyperparameters ya and 7b.
Experiment	In sampling a0 and 7, hyperparameters aa, ab, ya, and 7;, are set to 2, 1, 1, and 1, respectively, which is the same setting in Gael et al.
Experiment	The development test data is used to set up hyperparameters , i.e., to terminate tuning iterations.

hyperparameters is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

6. Hierarchical Phrase Table Combination for Machine Translation

Zhu, Conghui and Watanabe, Taro and Sumita, Eiichiro and Zhao, Tiejun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hierarchical Phrase Table Combination	All the parameters 6j and hyperparameters dj and sj , are obtained by learning on the jth domain.
Hierarchical Phrase Table Combination	Returning the hyperparameters again when cascading another domain may improve the performance of the combination weight, but we will leave it for future work.
Phrase Pair Extraction with Unsupervised Phrasal ITGs	3. d and s are the discount and strengthen hyperparameters .
Related Work	However, their methods usually require numbers of hyperparameters , such as mini-batch size, step size, or human judgment to determine the quality of phrases, and still rely on a heuristic phrase extraction method in each phrase table update.

hyperparameters is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

7. Learning Latent Personas of Film Characters

Bamman, David and O'Connor, Brendan and Smith, Noah A.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models	P Number of personas (hyperparameter) K Number of word topics ( hyperparameter ) D Number of movie plot summaries E Number of characters in movie d W Number of (role, word) tuples used by character 6 (bk Topic kr’s distribution over V words.
Models	Next, let a persona p be defined as a set of three multinomials $10 over these K topics, one for each typed role 7“, each drawn from a Dirichlet with a role-specific hyperparameter (VT).
Models	In other words, the probability that character 6 embodies persona k is proportional to the number of other characters in the plot summary who also embody that persona (plus the Dirichlet hyperparameter 0%) times the contribution of each observed word wj for that character, given its current topic assignment zj.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

8. Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Zeng, Xiaodong and Wong, Derek F. and Chao, Lidia S. and Trancoso, Isabel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Method	,u and A are two hyperparameters whose values are discussed in Section 5.
Method	Based on the development data, the hyperparameters of our model were tuned among the following settings: for the graph propagation, ,u E {0.2,0.5,0.8} and A E {0.1,0.3,0.5,0.8}; for the CRFs training, 04 E {0.1,0.3,0.5,0.7,0.9}.
Method	With the chosen set of hyperparameters , the test data was used to measure the final performance.

hyperparameters is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: