Index of papers in Proc. ACL that mention

**Gibbs sampling**

Abstract | In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. |

Experiments | Since our algorithm converges rather fast, we ran the Gibbs sampler of trigram NPYLM for 200 iterations to obtain the results in Table 1. |

Experiments | In all cases we removed all whitespaces to yield raw character strings for inference, and set L = 4 for Chinese and L = 8 for Japanese to run the Gibbs sampler for 400 iterations. |

Experiments | 9Notice that analyzing a test data is not easy for character-wise Gibbs sampler of previous work. |

Inference | To find the hidden word segmentation w of a string 3 = 01 - - - c N, which is equivalent to the vector of binary hidden variables 2 = 21 - - - ZN, the simplest approach is to build a Gibbs sampler that randomly selects a character c,- and draw a binary decision 2,- as to whether there is a word boundary, and then update the language model according to the new segmentation (Goldwater et al., 2006; Xu et al., 2008). |

Inference | 4.1 Blocked Gibbs sampler |

Inference | Instead, we propose a sentence-wise Gibbs sampler of word segmentation using efficient dynamic programming, as shown in Figure 3. |

Introduction | However, they are still na‘1've with respect to word spellings, and the inference is very slow owing to inefficient Gibbs sampling . |

Introduction | Section 4 describes an efficient blocked Gibbs sampler that leverages dynamic programming for inference. |

Gibbs sampling is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

- word segmentation (25)
- language model (18)
- n-gram (13)

A Gibbs Sampling Algorithm | Now, we present a simple and efficient Gibbs sampling algorithm for the generalized Bayesian logistic supervised topic models. |

A Gibbs Sampling Algorithm | 3.2 Inference with Collapsed Gibbs Sampling |

A Gibbs Sampling Algorithm | Although we can do Gibbs sampling to infer the complete posterior distribution q(n,A, 8, Z, Q) and thus q(n, ('9, Z, Q) by ignoring A, the mixing rate would be slow due to the large sample space. |

Abstract | We address these issues by: l) introducing a regularization constant to better balance the two parts based on an optimization formulation of Bayesian inference; and 2) developing a simple Gibbs sampling algorithm by introducing auxiliary Polya-Gamma variables and collapsing out Dirichlet variables. |

Introduction | Second, to solve the intractable posterior inference problem of the generalized Bayesian logistic supervised topic models, we present a simple Gibbs sampling algorithm by exploring the ideas of data augmentation (Tanner and Wong, 1987; van Dyk and Meng, 2001; Holmes and Held, 2006). |

Introduction | Then, we develop a simple and efficient Gibbs sampling algorithms with analytic conditional distributions without Metropolis-Hastings accept/reject steps. |

Introduction | For Bayesian LDA models, we can also explore the conjugacy of the Dirichlet-Multinomial prior-likelihood pairs to collapse out the Dirichlet variables (i.e., topics and mixing proportions) to do collapsed Gibbs sampling , which can have better mixing rates (Griffiths and Steyvers, 2004). |

Gibbs sampling is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

- Gibbs sampling (21)
- topic models (21)
- LDA (14)

Background 3.1 LDA | In this paper we estimate approximate posterior inference using collapsed Gibbs sampling (Griffiths and Steyvers, 2004). |

Background 3.1 LDA | The Gibbs sampling equation used to update the assignment of a topic I to the word 21) E W at the position n in document d, conditioned on at, flu, is: |

Background 3.1 LDA | We use a subscript d, fin to denote the current token, zdm is ignored in the Gibbs sampling update. |

Experimental Evaluation | l. Infer T number of topics on D for LDA using collapsed Gibbs sampling . |

Experimental Evaluation | Update M D using collapsed Gibbs sampling update in Equation 1. |

Experimental Evaluation | Infer ‘0‘ number of topics on the sprinkled document corpus D using collapsed Gibbs sampling update. |

Topic Sprinkling in LDA | We then update the new LDA model using collapsed Gibbs sampling . |

Topic Sprinkling in LDA | We then infer a set of |C | number of topics on the sprinkled dataset using collapsed Gibbs sampling , where C is the set of class labels of the training documents. |

Topic Sprinkling in LDA | We modify collapsed Gibbs sampling update in Equation 1 to carry class label information while inferring topics. |

Gibbs sampling is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

- text classification (19)
- LDA (17)
- Gibbs sampling (10)

Experiments | For each data set, Gibbs sampling was performed on the training set in each direction (source-to-target and target-to-source), initialized using GIZA++.4 We used the grow heuristic to combine the GIZA++ alignments in both directions (Koehn et al., 2003), which we then intersect with the predictions of GIZA++ in the relevant translation direction. |

Experiments | The two Gibbs samplers were “burned in” for the first 1000 iterations, after which we ran a further 500 iterations selecting every 50th sample. |

Experiments | Because the data set is small, we performed Gibbs sampling on a single processor. |

Gibbs Sampling | To train the model, we use Gibbs sampling , a Markov Chain Monte Carlo (MCMC) technique for posterior inference. |

Gibbs Sampling | Our Gibbs sampler operates by sampling an update to the alignment of each target word in the corpus. |

Gibbs Sampling | (2009a) of using multiple processors to perform approximate Gibbs sampling which they showed achieved equivalent performance to the exact Gibbs sampler . |

Model | This makes the approach more suitable for learning alignments, e. g., to account for word fertilities (see §3.3), while also permitting inference using Gibbs sampling (§4). |

Gibbs sampling is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- phrase-based (16)
- BLEU (13)
- Gibbs sampling (9)

Constraints Shape Topics | 3.1 Gibbs Sampling for Topic Models |

Constraints Shape Topics | In topic modeling, collapsed Gibbs sampling (Griffiths and Steyvers, 2004) is a standard procedure for obtaining a Markov chain over the latent variables in the model. |

Constraints Shape Topics | Given M documents the state of a Gibbs sampler for LDA consists of topic assignments for each token in the corpus and is represented as Z : {21,1...21,N1,22,1,...2M,NM}. |

Discussion | As presented here, the technique for incorporating constraints is closely tied to inference with Gibbs sampling . |

Interactively adding constraints | In the implementation of a Gibbs sampler , unassignment is done by setting a token’s topic assignment to an invalid topic (e. g. -l, as we use here) and decrementing any counts associated with that word. |

Simulation Experiment | Next, we perform one of the strategies for state ablation, add additional iterations of Gibbs sampling , use the newly obtained topic distribution of each document as the feature vector, and perform classification on the test / train split. |

Simulation Experiment | Each is averaged over five different chains using 10 additional iterations of Gibbs sampling per round (other numbers of iterations are discussed in Section 6.4). |

Simulation Experiment | Figure 4 shows the effect of using different numbers of Gibbs sampling iterations after changing a constraint. |

Gibbs sampling is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

- topic models (24)
- LDA (21)
- Gibbs sampling (9)

AKL: Using the Learned Knowledge | Most importantly, due to the use of the new form of knowledge, AKL’s inference mechanism ( Gibbs sampler ) is entirely different from that of MC-LDA (Section 5.2), which results in superior performances (Section 6). |

AKL: Using the Learned Knowledge | In short, our modeling contributions are (1) the capability of handling more expressive knowledge in the form of clusters, (2) a novel Gibbs sampler to deal with inappropriate knowledge. |

AKL: Using the Learned Knowledge | 5.2 The Gibbs Sampler |

Gibbs sampling is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- LDA (26)
- topic models (26)
- Gibbs sampler (8)

Bayesian MT Decipherment via Hash Sampling | Doing standard collapsed Gibbs sampling in this scenario would be very slow and intractable. |

Bayesian MT Decipherment via Hash Sampling | To do collapsed Gibbs sampling under this model, we would perform the following steps during sampling: |

Bayesian MT Decipherment via Hash Sampling | So, during decipherment training a standard collapsed Gibbs sampler will waste most of its time on expensive computations that will be discarded in the end anyways. |

Conclusion | To summarize, our method is significantly faster than previous methods based on EM or Bayesian with standard Gibbs sampling and obtains better results than any previously published methods for the same task. |

Decipherment Model for Machine Translation | In spite of using Bayesian inference which is typically slow in practice (with standard Gibbs sampling ), we show later that our method is scalable and permits decipherment training using more complex translation models (with several additional parameters). |

Decipherment Model for Machine Translation | with this problem by using a fast, efficient sampler based on hashing that allows us to speed up the Bayesian inference significantly whereas standard Gibbs sampling would be extremely slow. |

Experiments and Results | The table also demonstrates the siginificant speedup achieved by the hash sampler over a standard Gibbs sampler for the same model (~85 times faster when using a 2-gram LM). |

Feature-based representation for Source and Target | Additionally, performing Bayesian inference with such a complex model using standard Gibbs sampling can be very slow in practice. |

Gibbs sampling is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

- translation model (30)
- BLEU (14)
- BLEU score (10)

Experimental Setup | Therefore, the first-order distribution is not well-defined and we only employ Gibbs sampling for simplicity. |

Introduction | Our first strategy is akin to Gibbs sampling and samples a new head for each word in the sentence, modifying one arc at a time. |

Results | iteration of this sampler makes multiple changes to the tree, in contrast to a single-edge change of Gibbs sampler . |

Sampling-Based Dependency Parsing with Global Features | 3.2.1 Gibbs Sampling |

Sampling-Based Dependency Parsing with Global Features | One shortcoming of the Gibbs sampler is that it only changes one variable (arc) at a time. |

Sampling-Based Dependency Parsing with Global Features | Note that blocked Gibbs sampling would be exponential in K, and is thus very slow already at K = 4. |

Gibbs sampling is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- scoring function (29)
- POS tags (19)
- reranker (16)

Background | However this work approximated the derivation of the Gibbs sampler (omitting the interdependence between events when sampling from a collapsed model), resulting in a model which underperformed Brown et al. |

Experiments | We have omitted the results for the HMM-LM as experimentation showed that the local Gibbs sampler became hopelessly stuck, failing to |

The PYP-HMM | In order to induce a tagging under this model we use Gibbs sampling , a Markov chain Monte Carlo (MCMC) technique for drawing samples from the posterior distribution over the tag sequences given observed word sequences. |

The PYP-HMM | We present two different sampling strategies: First, a simple Gibbs sampler which randomly samples an update to a single tag given all other tags; and second, a type-level sampler which updates all tags for a given word under a |

The PYP-HMM | Gibbs samplers Both our Gibbs samplers perform the same calculation of conditional tag distributions, and involve first decrementing all trigrams and emissions affected by a sampling action, and then reintroducing the trigrams one at a time, conditioning their probabilities on the updated counts and table configurations as we progress. |

Gibbs sampling is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- language model (16)
- bigram (13)
- unigram (9)

Experiments | Each topic model uses Gibbs sampling for inference and parameter learning. |

Experiments | For testing we iterated the Gibbs sampler using the trained model for 10 iterations on the testing data. |

Experiments | For fair comparison, each benchmark topic model is provided with prior information on word-semantic tag distributions based on the labeled training data, hence, each K latent topic is assigned to one of K semantic tags at the beginning of Gibbs sampling . |

Markov Topic Regression - MTR | We use blocked Gibbs sampling, in which the topic assignments 3k, and hyper-parameters {6,3521 are alternately sampled at each Gibbs sampling lag period 9 given all other variables. |

Markov Topic Regression - MTR | At each 9 lag period of the Gibbs sampling , K |

Markov Topic Regression - MTR | At the start of the Gibbs sampling , we designate the |

Gibbs sampling is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

- CRF (32)
- unlabeled data (10)
- topic model (10)

Model | In practice, we never deal with such distributions directly, but rather integrate over them during Gibbs sampling . |

Model | We achieve these aims by performing Gibbs sampling . |

Model | Sampling We follow (Neal, 1998) in the derivation of our blocked and collapsed Gibbs sampler . |

Gibbs sampling is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- cross-lingual (7)
- segmentations (7)
- Gibbs sampling (6)

Abstract | We present a block Gibbs sampler for posterior inference and an empirical evaluation on several datasets. |

Inference by Block Gibbs Sampling | We use a block Gibbs sampler , which from an initial state (190,21), zo) repeats these steps: 1. |

Inference by Block Gibbs Sampling | The topics of context words are assumed exchangeable, and so we re-sample them using Gibbs sampling (Griffiths and Steyvers, 2004). |

Inference by Block Gibbs Sampling | Unfortunately, this is prohibitively expensive for the (nonexchangeable) topics of the named mentions c. A Gibbs sampler would have to choose a new value for cc.z with probability proportional to the resulting joint probability of the full sample. |

Gibbs sampling is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- coreference (21)
- Gibbs sampler (6)
- coreference resolution (5)

Learning and Inference | In the E-step, we perform collapsed Gibbs sampling to obtain distributions over row and column indices for every mention, given the current value of the hyperparamaters. |

Learning and Inference | Also, our model has interdependencies among column indices of a mention.2 Standard Gibbs sampling procedure breaks down these dependencies. |

Learning and Inference | This kind of blocked Gibbs sampling was proposed by Jensen et al. |

Gibbs sampling is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- named entity (14)
- entity mentions (8)
- Gibbs sampling (6)

The Model | Following Titov and McDonald (2008) we use a collapsed Gibbs sampling algorithm that was derived for the MG-LDA model based on the Gibbs sampling method proposed for LDA in (Griffiths and Steyvers, 2004). |

The Model | Gibbs sampling is an example of a Markov Chain Monte Carlo algorithm (Geman and Geman, 1984). |

The Model | In Gibbs sampling , variables are sequentially sampled from their distributions conditioned on all other variables in the model. |

Gibbs sampling is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

- topic model (11)
- LDA (8)
- Gibbs sampling (6)

Experiments | Next we used collapsed Gibbs sampling to infer a distribution over topics, 6?, for each of the relations in the primary corpus (based solely on tuples in the training set) using the topics from the generalization corpus. |

Experiments | To evaluate how well our topic-class associations carry over to unseen relations we used the same random sample of 100 relations from the pseudo-disambiguation experiment.8 For each argument of each relation we picked the top two topics according to frequency in the 5 Gibbs samples . |

Previous Work | Additionally we perform full Bayesian inference using collapsed Gibbs sampling , in which parameters are integrated out (Griffiths and Steyvers, 2004). |

Topic Models for Selectional Prefs. | For all the models we use collapsed Gibbs sampling for inference in which each of the hidden variables (e. g., 27.,“ and 273,32 in LinkLDA) are sampled sequentially conditioned on a full-assignment to all others, integrating out the parameters (Griffiths and Steyvers, 2004). |

Topic Models for Selectional Prefs. | In addition, there are several scalability enhancements such as SparseLDA (Yao et al., 2009), and an approximation of the Gibbs Sampling procedure can be efficiently parallelized (Newman et al., 2009). |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- topic models (12)
- WordNet (10)
- LDA (8)

Inference | We employ Gibbs sampling (Gelman et al., 2004) to approximate the posterior distribution of the hidden variables in our model. |

Inference | To apply Gibbs sampling to our problem, we need to derive the conditional posterior distributions of each hidden variable of the model. |

Inference | 2, the Gibbs sampler can draw a new value for CM by sampling from the normalized distribution. |

Introduction | We implement the inference process using Gibbs sampling . |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- feature vectors (12)
- hyperparameters (7)
- generative process (5)

Experimental Setup 4.1 Data Analysis | Per-Node Distribution: In stDA and ssLDA, attribute rankings can be constructed directly for each WN concept 0, by computing the likelihood of attribute 212 attaching to c, £(c|w) = p(w|c) averaged over all Gibbs samples (discarding a fixed number of samples for burn-in). |

Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | This distribution can be approximated efficiently using Gibbs sampling . |

Hierarchical Topic Models 3.1 Latent Dirichlet Allocation | An efficient Gibbs sampling procedure is given in (Blei et al., 2003a). |

Results | Precision was manually evaluated relative to 23 concepts chosen for broad coverage.7 Table 1 shows precision at n and the Mean Average Precision (MAP); In all LDA-based models, the Bayes average posterior is taken over all Gibbs samples |

Results | Inset plots show log-likelihood of each Gibbs sample , indicating convergence except in the case of nCRP. |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- LDA (31)
- WORDNET (6)
- Gibbs samples (5)

Experiment | After that, to infer the substitution sites, we initialized the model with the final sample from a run on the small training set, and used the Gibbs sampler for 2000 iterations. |

Inference | In each splitting step, we use two types of blocked MCMC algorithm: the sentence-level blocked Metroporil-Hastings (MH) sampler and the tree-level blocked Gibbs sampler , while (Petrov et al., 2006) use a different MLE-based model and the EM algorithm. |

Inference | The tree-level blocked Gibbs sampler focuses on the type of SR-TSG rules and simultaneously up- |

Inference | After the inference of symbol subcategories, we use Gibbs sampling to infer the substitution sites of parse trees as described in (Cohn and Lapata, 2009; Post and Gildea, 2009). |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- parse trees (16)
- latent variables (7)
- treebank (7)

Experiments | Posteriors are saved and averaged from 11 Gibbs samples (every 100 iterations from 9,000 to 10,000) for analysis. |

Experiments | where n refers to the averaged Gibbs samples’ counts of event tuples having frame k and a particular verb path,8 and N is the number of token comparisons (i.e. |

Inference | After randomly initializing all 77k,8,7.,t, inference is performed by a blocked Gibbs sampler , alternating resamplings for three major groups of variables: the language model (z,gb), context model (07,7, [3, p), and the 77, 6 variables, which bottleneck between the submodels. |

Inference | find that experimenting with different models is easier in the Gibbs sampling framework. |

Inference | While Gibbs sampling for logistic normal priors is possible using auxiliary variable methods (Mimno et al., 2008; Holmes and Held, 2006; Polson et al., 2012), it can be slow to converge. |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- topic model (7)
- dependency path (6)
- Gibbs sampling (5)

Bayesian inference for PCFGs | The algorithms we give here are based on their Gibbs sampler , which in each iteration first samples parse trees |

Bayesian inference for PCFGs | 1—3) for P(@ | 13,04) into the generic Gibbs sampler framework of Johnson et al. |

Bayesian inference for PCFGs | Figure 1 plots the density of F1 scores (compared to the gold standard) resulting from the Gibbs sampler , using all three approaches. |

Introduction | We show how to modify the Gibbs sampler described by Johnson et al. |

Introduction | Perhaps surprisingly, we show that Gibbs sampler as defined by Johnson et al. |

Gibbs sampling is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

- hyperparameters (9)
- parse trees (8)
- Gibbs sampler (5)

Experimental setup | This model uses the same inference procedure as our bilingual model ( Gibbs sampling ). |

Experimental setup | We also reimplemented the original EM version of CCM and found virtually no difference in performance when using EM or Gibbs sampling . |

Model | We use Gibbs sampling (Hastings, 1970) to draw trees for each sentence conditioned on those drawn for |

Model | This use of a tractable proposal distribution and acceptance ratio is known as the Metropolis-Hastings algorithm and it preserves the convergence guarantee of the Gibbs sampler (Hastings, 1970). |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- word-level (14)
- parallel sentences (12)
- part-of-speech (10)

Bilingual Infinite Tree Model | (2007) presented a sampling algorithm for the infinite tree model, which is based on the Gibbs sampling in the direct assignment representation for iHMM (Teh et al., 2006). |

Bilingual Infinite Tree Model | Gibbs sampling , individual hidden state variables are resampled conditioned on all other variables. |

Bilingual Infinite Tree Model | Beam sampling does not suffer from slow convergence as in Gibbs sampling by sampling the whole state variables at once. |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- POS tags (51)
- joint model (17)
- dependency parser (11)

Experiments | our Gibbs sampling inference method for the type-based HMM, even in the absence of multilingual priors. |

Inference | 4.2 Gibbs Sampling |

Inference | To sample values (tg,z, a, [3) from their posterior (the integrand of Equation 1), we use Gibbs sampling , a Monte Carlo technique that constructs a Markov chain over a high-dimensional sample space by iteratively sampling each variable conditioned on the currently drawn sample values for the others, starting from a random initialization. |

Results | Simply using our Gibbs sampler with symmetric priors boosts the performance up to 96%. |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- hyperparameters (8)
- CLUST (6)
- bigram (4)

24. Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

Abstract | We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. |

Evaluation | We compared the Gibbs sampling compressor (GS) against a version of maximum a posteriori EM (with Dirichlet parameter greater than 1) and a discriminative STSG based on SVM training (Cohn and Lapata, 2008) (SVM). |

The STSG Model | 3.2 Posterior inference via Gibbs sampling |

The STSG Model | We use Gibbs sampling (Geman and Geman, 1984), a Markov chain Monte Carlo (MCMC) method, to sample from the posterior (3). |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- sentence compression (15)
- SVM (10)
- overfitting (7)

Experiments | 3For Gibbs sampling , we use implementations available in Hu and Boyd-Graber (2012) for tLDA; and Mallet (McCallum, 2002) for LDA and pLDA. |

Inference | We use a collapsed Gibbs sampler for tree-based topic models to sample the path ydn and topic assignment zdn for word wdn, |

Inference | For topic .2 and path y, instead of variational updates, we use a Gibbs sampler within a document. |

Inference | This equation embodies how this is a hybrid algorithm: the first term resembles the Gibbs sampling term encoding how much a document prefers a topic, while the second term encodes the expectation under the variational distribution of how much a path is preferred by this topic, |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- topic models (47)
- machine translation (20)
- LDA (18)

Baselines | Here, the topics are extracted from all the documents in the *SEM 2012 shared task using the LDA Gibbs Sampling algorithm (Griffiths, 2002). |

Baselines | where Rel(w,, rm) is the weight of word w in topic rm calculated by the LDA Gibbs Sampling algorithm. |

Baselines | > Topic Modeler: For estimating transition probability Pt(i,m), we employ GibbsLDA++6, an LDA model using Gibbs Sampling technique for parameter estimation and inference. |

Gibbs sampling is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

- graph model (35)
- shared task (15)
- PageRank (7)

27. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Introduction | (2010) used the local best alignment to increase the speed of the Gibbs sampling in training but the impact on accuracy was not explored. |

Introduction | To this end, we model bilingual UWS under a similar framework with monolingual UWS in order to improve efficiency, and replace Gibbs sampling with expectation maximization (EM) in training. |

Methods | EF/{;}(P(.7-"k/|.7-")) = P(J:k'|f, M) in a similar manner to the marginalization in the Gibbs sampling process which we are replacing; |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- segmenters (11)
- BLEU (9)
- bigram (7)

Approach | For constraints with higher-order structures, we use Gibbs Sampling (Geman and Geman, 1984) to approximate the expectations. |

Approach | For documents where the higher-order constraints apply, we use the same Gibbs sampler as described above to infer the most likely label assignment, otherwise, we use the Viterbi algorithm. |

Experiments | For approximation inference with higher-order constraints, we perform 2000 Gibbs sampling iterations where the first 1000 iterations are bum-in iterations. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- sentence-level (27)
- CRF (22)
- sentiment classification (20)

Experiments | We run the Gibbs samplers for 1000 iterations and update all hyper-parameters using slice sampling (Neal, 2003; Wallach, 2008) every 10 iterations. |

Latent Structure in Dialogues | We also assume symmetric Dirichlet priors on all multinomial distributions and apply collapsed Gibbs sampling . |

Latent Structure in Dialogues | All probabilities can be computed using collapsed Gibbs sampler for LDA (Griffiths |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- topic distribution (7)
- language model (6)
- LDA (6)

Experimental Setup | To improve the model’s convergence rate, we perform two initialization steps for the Gibbs sampler . |

Experimental Setup | Inference The final point estimate used for testing is an average (for continuous variables) or a mode (for discrete variables) over the last 1,000 Gibbs sampling iterations. |

Posterior Sampling | We employ Gibbs sampling , previously used in NLP by Finkel et al. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- topic model (14)
- gold standard (6)
- language model (6)

Experiments | All experiments are run with 50 iterations of Gibbs sampling to collect samples for the personas p, alternating with maximization steps for 77. |

Model | Rather than adopting a fully Bayesian approach (e.g., sampling all variables), we infer these values using stochastic EM, alternating between collapsed Gibbs sampling for each p and maximizing with respect to 77. |

Model | 8We assume the reader is familiar with collapsed Gibbs sampling as used in latent-variable NLP models. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- coreference (6)
- log-linear (6)
- coreference resolution (4)

Abstract | Recent approaches instead use more principled approximate inference techniques such as Gibbs sampling for parameter estimation. |

Evaluation | The next line shows the fertility HMM with approximate posterior computation from Gibbs sampling but with final alignment selected by the Viterbi algorithm. |

HMM alignment | estimate the posterior distribution using Markov chain Monte Carlo methods such as Gibbs sampling (Zhao and Gildea, 2010). |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- Viterbi (7)
- word alignment (7)
- dynamic programming (4)

Related work | The combination of a well-defined probabilistic model and Gibbs sampling procedure for estimation guarantee (eventual) convergence and the avoidance of degenerate solutions. |

Three selectional preference models | Following Griffiths and Steyvers (2004), we estimate the model by Gibbs sampling . |

Three selectional preference models | As suggested by the similarity between (4) and (2), the ROOTH-LDA model can be estimated by an LDA-like Gibbs sampling procedure. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- LDA (29)
- topic models (15)
- latent variables (11)

Related Work 2.1 Market Prediction and Social Media | We use collapsed Gibbs sampling (Bishop, 2006) for model inference. |

Related Work 2.1 Market Prediction and Social Media | Only non-opinion words in tweets are used for Gibbs sampling . |

Related Work 2.1 Market Prediction and Social Media | 1 The actual topic priors for topic links are governed by the four cases of the Gibbs Sampler . |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- Gibbs sampling (3)
- LDA (3)
- sentiment analysis (3)

Machine Translation as a Decipherment Task | Sampling IBM Model 3: We use point-wise Gibbs sampling to estimate the IBM Model 3 parameters. |

Word Substitution Decipherment | channel.1 We perform inference using point-wise Gibbs sampling (Geman and Geman, 1984). |

Word Substitution Decipherment | Parallelized Gibbs sampling : Secondly, we parallelize our sampling step using a Map-Reduce framework. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- translation model (15)
- LM (14)
- parallel data (12)

Experiments | We have also found that the Gibbs sampler does not always converge to a similar grammar. |

Inference | We employ the Gibbs sampling algorithm (Gilks et al., 1996). |

Introduction | Section 5 describes the inference algorithm based on Gibbs sampling . |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- treebanks (24)
- POS tags (21)
- dependency tree (10)

Final Experiments | For our models, we ran Gibbs samplers for 2000 iterations for each configuration throwing out first 500 samples as burn-in. |

Two-Tiered Topic Model - TTM | We use Gibbs sampling which allows a combination of estimates from several local maxima of the posterior distribution. |

Two-Tiered Topic Model - TTM | We obtain DS during Gibbs sampling (in §4.l), which indicates a saliency score of each sentence sj E S,j = LSD: |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- topic models (14)
- generative model (3)
- generative process (3)

38. A joint model of word segmentation and phonological variation for English word-final /t/-deletion

Experiments 4.1 The data | To test our Gibbs sampling inference procedure, we ran it on artificial data generated according to the model itself. |

The computational model | ,8“, for unknown n. A major insight in Goldwater’s work is that rather than sampling over the latent variables in the model directly (the number of which we don’t even know), we can instead perform Gibbs sampling over a set of boundary variables (91, . |

The computational model | Figure 5: The relation between the observed sequence of segments (bottom), the boundary variables b1,...,bIW|_1 the Gibbs sampler operates over (in squares), the latent sequence of surface forms and the latent sequence of underlying forms. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- Bigram (31)
- Unigram (22)
- word segmentation (19)

Introduction | The previously proposed J ST model uses the sentiment prior information in the Gibbs sampling inference step that a sentiment label will only be sampled if the current word token has no prior sentiment as defined in a sentiment lexicon. |

Joint Sentiment-Topic (J ST) Model | Gibbs sampling was used to estimate the posterior distribution by sequentially sampling each variable of interest, 2,; and It here, from the distribution over |

Joint Sentiment-Topic (J ST) Model | In our experiment, a was updated every 25 iterations during the Gibbs sampling procedure. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- sentiment classification (20)
- domain adaptation (15)
- LDA (12)

Experiments | The variations in the results are due to the random initialization of the Gibbs sampler . |

Proposed Seeded Models | We employ collapsed Gibbs sampling (Griffiths and Steyvers, 2004) for posterior inference. |

Related Work | (2011) relied on user feedback during Gibbs sampling iterations. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- topic models (8)
- proposed models (7)
- jointly model (4)

Experiments | Each model was run for 500 iterations of Gibbs sampling . |

Method | We use collapsed Gibbs sampling to obtain samples of the hidden variable assignment and to estimate the model parameters from these samples. |

Method | Due to space limit, we only show the derived Gibbs sampling formulas as follows. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- LDA (13)
- topic model (13)
- topic distribution (11)

Experiments | For Base—MGM and WebPrior—MCM, we run Gibbs sampler for 2000 iterations with the first 500 samples as bum-in. |

MultiLayer Context Model - MCM | Thus, we use Markov Chain Monte Carlo (MCMC) method,specifically Gibbs sampling , to model the posterior distribution Pym/(Du, Aud, Sujdla‘fi‘, a‘f, cuff, fl) by obtaining samples (Du, Aud, Sujd) drawn from this distribution. |

MultiLayer Context Model - MCM | During Gibbs sampling , we keep track of the frequency of draws of domain, dialog act and slot indicating n-grams wj, in M D, M A and MS matrices, respectively. |

Gibbs sampling is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

- joint model (8)
- labeled data (8)
- n-grams (6)