Abstract | Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. |
Conclusion | We have shown that contextualized distributional semantic vectors can be successfully integrated into a generative probabilistic model for domain modelling, as demonstrated by improvements in slot induction and multi-document summarization. |
Introduction | Generative probabilistic models have been one popular approach to content modelling. |
Introduction | In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models , in order to combine their complementary strengths for domain modelling. |
Introduction | First, they provide domain-general representations of word meaning that cannot be reliably estimated from the small target-domain corpora on which probabilistic models are trained. |
Related Work | (2013) propose PROFINDER, a probabilistic model for frame induction inspired by content models. |
Related Work | Our work is similar in that we assume much of the same structure within a domain and consequently in the model as well (Section 3), but whereas PROFINDER focuses on finding the “correct” number of frames, events, and slots with a nonparametric method, this work focuses on integrating global knowledge in the form of distributional semantics into a probabilistic model . |
Related Work | Combining distributional information and probabilistic models has actually been explored in previous work. |
Abstract | We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. |
Abstract | We perform parsing experiments the Penn Treebank and draw comparisons to Tree-Substitution Grammars and between different variations in probabilistic model design. |
Applications | In this section we present a probabilistic model for an OSTAG grammar in PCFG form that can be used in such algorithms, and show that many parameters of this PCFG can be pooled or set equal to one and ignored. |
Introduction | Using a context-free language model with proper phrase bracketing, the connection between the words pretzels and thirsty must be recorded with three separate patterns, which can lead to poor generalizability and unreliable sparse frequency estimates in probabilistic models . |
Introduction | Using an automatically induced Tree-Substitution Grammar (TSG), we heuristically extract an OSTAG and estimate its parameters from data using models with various reduced probabilistic models of adjunction. |
TAG and Variants | A simple probabilistic model for a TSG is a set of multinomials, one for each nonterminal in N corresponding to its possible substitutions in R. A more flexible model allows a potentially infinite number of substitution rules using a Dirichlet Process (Cohn et al., 2009; Cohn and Blunsom, 2010). |
TAG and Variants | As such, probabilistic modeling for TAG in its original form is uncommon. |
TAG and Variants | Several probabilistic models have been proposed for TIG. |
Transformation to CFG | To avoid double-counting derivations, which can adversely effect probabilistic modeling , type (3) and type (4) rules in which the side with the unapplied symbol is a nonterminal leaf can be omitted. |
Abstract | We propose two probabilistic models , based on conditional and joint probability formulations, that are novel solutions to the problem. |
Abstract | We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model ) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. |
Conclusion | We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well. |
Final Results | We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”. |
Introduction | Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions. |
Our Approach | 3.1 Model-1 : Conditional Probability Model |
Our Approach | Because our overall model is a conditional probability model , joint-probabilities are marginalized using character-based prior probabilities: |
Our Approach | 3.2 Model-2 : Joint Probability Model |
Abstract | Since most current user simulations deploy probability models to mimic human user behaviors, how to set up user action probabilities in these models is a key problem to solve. |
Evaluation Measures | However, since our simulation model is a probabilistic model , the model will take an action stochastically after the same tutor turn. |
Introduction | most of these current user simulation techniques use probabilistic models to generate user actions, how to set up the probabilities in the simulations is another important problem to solve. |
Introduction | For the trained user simulations, we examine two sets of probabilities trained from user corpora of different sizes, since the amount of training data will impact the quality of the trained probability models . |
Related Work | Most current simulation models are probabilistic models in which the models simulate user actions based on dialog context features (Schatzmann et al., 2006). |
Related Work | They first cluster dialog contexts based on selected features and then build conditional probability models for each cluster. |
Related Work | In our study, we build a conditional probability model which will be described in detail in Section 3.2.1. |
Conclusion | In this paper, we have presented a probabilistic model of paraphrase incorporating syntax, lexical semantics, and hidden loose alignments between two sentences’ trees. |
Experimental Evaluation | It is quite promising that a linguistically-motivated probabilistic model comes so close to a string-similarity baseline, without incorporating string-local phrases. |
Introduction | This syntactic framework represents a major departure from useful and popular surface similarity features, and the latter are difficult to incorporate into our probabilistic model . |
Introduction | We introduce our probabilistic model in §2. |
Probabilistic Model | For the present, consider it a specially-defined probabilistic model that generates sentences with a specific property, like “paraphrases s,” when 0 = p.) Given 5, Ge generates the other sentence in the pair, 5’ . |
QG for Paraphrase Modeling | It is never used for parsing or for generation; it is only used as a component in the generative probability model presented in §2 (Eq. |
Abstract | We employ two algorithms which come from a family of generative probabilistic models . |
Conclusions | We introduced two algorithms for word decomposition which are based on generative probabilistic models . |
Introduction | The purpose of this paper is an analysis of the underlying probabilistic models and the types of errors committed by each one. |
Probabilistic generative model | 1PROMODES stands for PRObabilistic Model for different DEgrees of Supervision. |
Probabilistic generative model | For this reason, we have extended the model which led to PROMODES-H, a higher-order probabilistic model . |
Related work | Moreover, our probabilistic models seem to resemble Hidden Markov Models (HMMs) by having certain states and transitions. |
Abstract | Probabilistic models of sentence comprehension are increasingly relevant to questions concerning human language processing. |
Conclusions | Although our main results above underscore the usefulness of probabilistic modeling , this observation emphasizes the importance of finding a tenable link between probabilities and behaviours. |
Introduction | For example, probabilistic models shed light on so-called locality effects: contrast the non-probabilistic hypothesis that dependants which are far away from their head always cause processing difficulty for readers due to the cost of storing the intervening material in memory (Gibson, 1998), compared to the probabilistic prediction that there are cases when faraway dependants facilitate processing, because readers have more time to predict the head (Levy, 2008). |
Introduction | So far, probabilistic models of sentence processing have been largely limited to syntactic factors. |
Introduction | In the literature on probabilistic modeling , though, the bulk of this work is focused on lexical semantics (e.g. |
Abstract | This paper establishes a connection between two apparently very different kinds of probabilistic models . |
Abstract | Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models. |
Conclusion | This paper establishes a connection between two very different kinds of probabilistic models ; LDA models of the kind used for topic modelling, and PCFGs, which are a standard model of hierarchical structure in language. |
Introduction | This paper establishes a theoretical connection between two very different kinds of probabilistic models : Probabilistic Context-Free Grammars (PCFGs) and a class of models known as Latent Dirichlet Allocation (Blei et al., 2003; Griffiths and Steyvers, 2004) models that have been used for a variety of tasks in machine learning. |
Latent Dirichlet Allocation Models | An LDA model is an explicit generative probabilistic model of a collection of documents. |
Composite language model | A PLSA model (Hofmann, 2001) is a generative probabilistic model of word-document co-occurrences using the bag-of-words assumption described as follows: (i) choose a document d with probability p(d); (ii) SEMANTIZER: select a semantic class 9 with probability p(g|d); and (iii) WORD-PREDICTOR: pick a word 21) with probability p(w|g). |
Composite language model | Since only one pair of (d, w) is being observed, as a result, the joint probability model is a mixture of log-linear model with the expression p(d, w) = p(d) Zg p(wlg)p(9|d)- Typically, the number of documents and vocabulary size are much larger than the size of latent semantic class variables. |
Training algorithm | The TAGGER and CONSTRUCTOR are conditional probabilistic models of the type p(u|zl, - - - ,2“) where u, 21, - - - ,zn belong to a mixed set of words, POS tags, NTtags, CONSTRUCTOR actions (u only), and 21, - - - ,2“, form a linear Markov chain. |
Training algorithm | The WORD-PREDICTOR is, however, a conditional probabilistic model p(w|w:,11+1h:,1ng) where there are three kinds of context 21):; +1, bin and g, each forms a linear Markov chain. |
Abstract | In this work, deterministic constraints are decoded before the application of probabilistic models , therefore lookahead features are made available during Viterbi decoding. |
Abstract | Since these deterministic constraints are applied before the decoding of probabilistic models , reliably high precision of their predictions is crucial. |
Abstract | However, when tagset BMES is used, the learned constraints don’t always make reliable predictions, and the overall precision is not high enough to constrain a probabilistic model . |
Conclusions | We presented an improved probabilistic model for canonicalizing named entities into a table. |
Introduction | Here, we use a probabilistic model to infer a struc- |
Model | These choices highlight that the design of a probabilistic model can draw from both Bayesian and discriminative tools. |
Related Work | Our model is focused on the problem of canonicalizing mention strings into their parts, though its 7“ variables (which map mentions to rows) could be interpreted as (within-document and cross-document) coreference resolution, which has been tackled using a range of probabilistic models (Li et al., 2004; Haghighi and Klein, 2007; Poon and Domingos, 2008; Singh et al., 2011). |
Introduction | The semantic component of our model learns word vectors via an unsupervised probabilistic model of documents. |
Our Model | To capture semantic similarities among words, we derive a probabilistic model of documents which learns word representations. |
Our Model | We build a probabilistic model of a document using a continuous mixture distribution over words indexed by a multidimensional random variable 6. |
Our Model | Equation 1 resembles the probabilistic model of LDA (Blei et al., 2003), which models documents as mixtures of latent topics. |
Parsing Model | This go is factored and taken to be a deterministic constant, and is therefore unimportant as a probability model . |
Parsing Model | This leads us to a specification of the reduce and shift probability models . |
Parsing Model | These models can be thought of as picking out a ftd first, finding the matching case, then applying the probability models that matches. |
Conclusions and Future Work | Because LDA-SP generates a complete probabilistic model for our relation data, its results are easily applicable to many other tasks such as identifying similar relations, ranking inference rules, etc. |
Introduction | Additionally, because LDA-SP is based on a formal probabilistic model , it has the advantage that it can naturally be applied in many scenarios. |
Previous Work | Previous work on selectional preferences can be broken into four categories: class-based approaches (Resnik, 1996; Li and Abe, 1998; Clark and Weir, 2002; Pantel et al., 2007), similarity based approaches (Dagan et al., 1999; Erk, 2007), discriminative (Bergsma et al., 2008), and generative probabilistic models (Rooth et al., 1999). |
Previous Work | Our solution fits into the general category of generative probabilistic models , which model each relatiorflargument combination as being generated by a latent class variable. |
Abstractive Caption Generation | Word-based Model Our first abstractive model builds on and extends a well-known probabilistic model of headline generation (Banko et al., 2000). |
Image Annotation | Our experiments made use of the probabilistic model presented in Feng and Lapata (2010). |
Image Annotation | Any probabilistic model with broadly similar properties could serve our purpose. |
Results | As can be seen the probabilistic models (KL and J S divergence) outperform word overlap and cosine similarity (all differences are statistically significant, p < 0.01).6 They make use of the same topic model as the image annotation model, and are thus able to select sentences that cover common content. |
Abstract | This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. |
Hierarchical ITG Model | By doing so, we are able to do away with heuristic phrase extraction, creating a fully probabilistic model for phrase probabilities that still yields competitive results. |
Related Work | A generative probabilistic model where longer units are built through the binary combination of shorter units was proposed by de Marcken (1996) for monolingual word segmentation using the minimum description length (MDL) framework. |
Experimental evaluation | In the case of the probabilistic model , all models were 3-gram models. |
Experimental evaluation | With 100 training documents per user the mean token error rate is reduced by up to 40% relative by the probabilistic model . |
Experimental evaluation | On the other hand the probabilistic model suffers from a slightly higher deletion rate due to being overzealous in this regard. |
Background and Related Work | Since different derivations may produce the same parse tree, recent work on TSG induction (Post and Gildea, 2009; Cohn et al., 2010) employs a probabilistic model of a TSG and predicts derivations from observed parse trees in an unsupervised way. |
Symbol-Refined Tree Substitution Grammars | 3.1 Probabilistic Model |
Symbol-Refined Tree Substitution Grammars | We define a probabilistic model of an SR-TSG based on the Pitman-Yor Process (PYP) (Pitman and Yor, 1997), namely a sort of nonparametric Bayesian model. |
Conclusions | Therefore, apart from the acoustic and language models used in conventional ASR, HVR also combines the haptic model as well as the PLI model to yield an integrated probabilistic model . |
Integration of Knowledge Sources | Therefore, fl, 5, and 7:1 can be obtained from the respective probabilistic models . |
Introduction | This framework allows coherent probabilistic models of different knowledge sources to be tightly integrated. |
Conclusions and future work | This paper has demonstrated how Bayesian techniques originally developed for modelling the topical structure of documents can be adapted to learn probabilistic models of selectional preference. |
Related work | The combination of a well-defined probabilistic model and Gibbs sampling procedure for estimation guarantee (eventual) convergence and the avoidance of degenerate solutions. |
Three selectional preference models | Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv. |
Experimental Settings | DOM-INIT W1: Noisy probabilistic model , described below. |
Experimental Settings | We used the noisy Robocup dataset to initialize DOM-INIT, a noisy probabilistic model , constructed by taking statistics over the noisy robocup data and computing p(y|X). |
Knowledge Transfer Experiments | Domain-independent information is learned from the situated domain and domain-specific information (Robocup) available is the simple probabilistic model (DOM-INIT). |
Abstract | Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disj ointness constraints. |
Evaluation | The second method is PEARL with no ILP (denoted No ILP), only using the probabilistic model . |
Introduction | For cleaning out false hypotheses among the type candidates for a new entity, we devised probabilistic models and an integer linear program that considers incompatibilities and correlations among entity types. |
Abstract | We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. |
Introduction | Our framework is based on Determinantal Point Processes (DPPs, (Kulesza, 2012; Kulesza and Taskar, 2012c)), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets. |
The Unified Framework | Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks. |
Conclusion | variants, an important aim of probabilistic modeling for word alignment. |
The models IBM-3, IBM-4 and IBM-5 | H221 k denotes the factorial of n. The main difference between IBM-3, IBM-4 and IBM-5 is the choice of probability model in step 3 b), called a distortion model. |
The models IBM-3, IBM-4 and IBM-5 | In total, the IBM-3 has the following probability model: |
Image Clustering with Annotated Auxiliary Data | So we design a new PLSA model by joining the probabilistic model in Equation (1) and the probabilistic model in Equation (4) into a unified model, as shown in Figure 3. |
Related Works | Probabilistic latent semantic analysis (PLSA) is a widely used probabilistic model (Hofmann, 1999), and could be considered as a probabilistic implementation of latent semantic analysis (L SA) (Deerwester et al., 1990). |
Related Works | An extension to PLSA was proposed in (Cohn and Hofmann, 2000), which incorporated the hyperlink connectivity in the PLSA model by using a joint probabilistic model for connectivity and content. |
Boosting-style algorithm | This can be used for example in the case where the experts are derived from probabilistic models . |
Introduction | Furthermore, these methods typically assume the use of probabilistic models , which is not a requirement in our learning scenario. |
Introduction | Other ensembles of probabilistic models have also been considered in text and speech processing by forming a product of probabilistic models via the intersection of lattices (Mohri et al., 2008), or a straightforward combination of the posteriors from probabilistic grammars trained using EM with different starting points (Petrov, 2010), or some other rather intricate techniques in speech recognition (Fiscus, 1997). |
Introduction | In this paper we use a basic neural network architecture and a lexicalized probability model to create a powerful MT decoding feature. |
Model Variations | Formally, the probability model is: |
Neural Network Joint Model (NNJ M) | far too sparse for standard probability models such as Kneser-Ney back-off (Kneser and Ney, 1995) or Maximum Entropy (Rosenfeld, 1996). |
Abstract | However, these new methods lack the rich priors associated with probabilistic models . |
Adding Regularization | For example, if we are seeking the MLE of a probabilistic model parameterized by 6, p(:c|6), adding a regularization term 7(6) = 2le 6? |
Regularization Improves Topic Models | This is the typical evaluation for probabilistic models . |