Conclusion | Even though we have used a small set of gold-standard alignments to tune our hyperparameters, we found that performance was fairly robust to variation in the hyperparameters , and translation performance was good even when gold-standard alignments were unavailable. |
Experiments | We have implemented our algorithm as an open-source extension to GIZA++.1 Usage of the extension is identical to standard GIZA++, except that the user can switch the (0 prior on or off, and adjust the hyperparameters a and ,6. |
Experiments | We set the hyperparameters a and ,6 by tuning on gold-standard word alignments (to maximize F1) when possible. |
Experiments | The fact that we had to use hand-aligned data to tune the hyperparameters a and ,6 means that our method is no longer completely unsupervised. |
Method | The hyperparameter ,6 controls the tightness of the approximation, as illustrated in Figure 1. |
Experimental Setup | Table 1: The values of the hyperparameters of our model, where [id and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix of training data. |
Experimental Setup | Hyperparameters and Training Iterations The values of the hyperparameters of our model are shown in Table l, where Md and Ad are the dth entry of the mean and the diagonal of the inverse covariance matrix computed from training data. |
Inference | We use P - - - ) to denote a conditional posterior probability given observed data, all the other variables, and hyperparameters for the model. |
Inference | The conjugate prior we use for the two variables is a normal-Gamma distribution with hyperparameters ,uo, 14:0, a0 and fig (Murphy, 2007). |
Inference | Assume we use a symmetric Dirichlet distribution with a positive hyperparameter |
Model | 2, where the shaded circle denotes the observed feature vectors, and the squares denote the hyperparameters of the priors used in our model. |
Results | 6In the future, we plan to extend the model and infer the values of these hyperparameters from data directly. |
Experiments | The averages of hyperparameters of PROP were 0.84 d: 0.05 for A and 0.85 d: 0.10 for the threshold. |
Experiments | Proposed Model (PROP): Using the training data, we determined the two hyperparameters , A and the threshold to round gbrs to 1 or 0, so that they maximized the F value. |
Experiments | hand, our model learns parameters such as or for each relation and thus the hyperparameter of our model does not directly affect its performance. |
Generative Model | In this section, we consider relation 7“ since parameters are conditionally independent if relation 7“ and the hyperparameter are given. |
Generative Model | A is the hyperparameter and mst is constant. |
Generative Model | where 0 S A S 1 is the hyperparameter that controls how strongly brs is affected by the main labeling process explained in the previous subsection. |
Experiments | We develop our features and tune their hyperparameter values on the ACE04 development set and then use these on the ACE04 test set.12 On the ACE05 and ACE05-ALL datasets, we directly transfer our Web features and their hyperparameter values from the ACE04 dev-set, without any retuning. |
Semantics via Web Features | To capture this effect, we create a feature that indicates whether there is a match in the top 1:: seeds of the two headwords (where k: is a hyperparameter to tune). |
Semantics via Web Features | We first collect the POS tags (using length 2 character prefixes to indicate coarse parts of speech) of the seeds matched in the top h’ seed lists of the two headwords, where h’ is another hyperparameter to tune. |
Semantics via Web Features | We tune a separate bin-size hyperparameter for each of these three features. |
Evaluating Topic Shift Tendency | 2008 Elections To obtain a posterior estimate of 7r (Figure 3) we create 10 chains with hyperparameters sampled from the uniform distribution U (0, l) and averaged 7r over 10 chains (as described in Section 5). |
Inference | Marginal counts are represented with - and >x< represents all hyperparameters . |
Topic Segmentation Experiments | Initial hyperparameter values are sampled from U (0, 1) to favor sparsity; statistics are collected after 500 burn-in iterations with a lag of 25 iterations over a total of 5000 iterations; and slice sampling (Neal, 2003) optimizes hyperparameters . |
Inference | 4.3 Hyperparameter Estimation |
Inference | We treat hyperparameters {d, 0} as random variables and update their values for every MCMC iteration. |
Inference | We place a prior on the hyperparameters as follows: d N Beta(1,1), 6 N Gamma(1,1). |