Index of papers in Proc. ACL that mention
  • probabilistic model
Cheung, Jackie Chi Kit and Penn, Gerald
Abstract
Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain.
Conclusion
We have shown that contextualized distributional semantic vectors can be successfully integrated into a generative probabilistic model for domain modelling, as demonstrated by improvements in slot induction and multi-document summarization.
Introduction
Generative probabilistic models have been one popular approach to content modelling.
Introduction
In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models , in order to combine their complementary strengths for domain modelling.
Introduction
First, they provide domain-general representations of word meaning that cannot be reliably estimated from the small target-domain corpora on which probabilistic models are trained.
Related Work
(2013) propose PROFINDER, a probabilistic model for frame induction inspired by content models.
Related Work
Our work is similar in that we assume much of the same structure within a domain and consequently in the model as well (Section 3), but whereas PROFINDER focuses on finding the “correct” number of frames, events, and slots with a nonparametric method, this work focuses on integrating global knowledge in the form of distributional semantics into a probabilistic model .
Related Work
Combining distributional information and probabilistic models has actually been explored in previous work.
probabilistic model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Swanson, Ben and Yamangil, Elif and Charniak, Eugene and Shieber, Stuart
Abstract
We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters.
Abstract
We perform parsing experiments the Penn Treebank and draw comparisons to Tree-Substitution Grammars and between different variations in probabilistic model design.
Applications
In this section we present a probabilistic model for an OSTAG grammar in PCFG form that can be used in such algorithms, and show that many parameters of this PCFG can be pooled or set equal to one and ignored.
Introduction
Using a context-free language model with proper phrase bracketing, the connection between the words pretzels and thirsty must be recorded with three separate patterns, which can lead to poor generalizability and unreliable sparse frequency estimates in probabilistic models .
Introduction
Using an automatically induced Tree-Substitution Grammar (TSG), we heuristically extract an OSTAG and estimate its parameters from data using models with various reduced probabilistic models of adjunction.
TAG and Variants
A simple probabilistic model for a TSG is a set of multinomials, one for each nonterminal in N corresponding to its possible substitutions in R. A more flexible model allows a potentially infinite number of substitution rules using a Dirichlet Process (Cohn et al., 2009; Cohn and Blunsom, 2010).
TAG and Variants
As such, probabilistic modeling for TAG in its original form is uncommon.
TAG and Variants
Several probabilistic models have been proposed for TIG.
Transformation to CFG
To avoid double-counting derivations, which can adversely effect probabilistic modeling , type (3) and type (4) rules in which the side with the unapplied symbol is a nonterminal leaf can be omitted.
probabilistic model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Abstract
We propose two probabilistic models , based on conditional and joint probability formulations, that are novel solutions to the problem.
Abstract
We obtain final BLEU scores of 19.35 (conditional probability model) and 19.00 (joint probability model ) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Conclusion
We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well.
Final Results
We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”.
Introduction
Section 3 introduces two probabilistic models for integrating translations and transliterations into a translation model which are based on conditional and joint probability distributions.
Our Approach
3.1 Model-1 : Conditional Probability Model
Our Approach
Because our overall model is a conditional probability model , joint-probabilities are marginalized using character-based prior probabilities:
Our Approach
3.2 Model-2 : Joint Probability Model
probabilistic model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Ai, Hua and Litman, Diane
Abstract
Since most current user simulations deploy probability models to mimic human user behaviors, how to set up user action probabilities in these models is a key problem to solve.
Evaluation Measures
However, since our simulation model is a probabilistic model , the model will take an action stochastically after the same tutor turn.
Introduction
most of these current user simulation techniques use probabilistic models to generate user actions, how to set up the probabilities in the simulations is another important problem to solve.
Introduction
For the trained user simulations, we examine two sets of probabilities trained from user corpora of different sizes, since the amount of training data will impact the quality of the trained probability models .
Related Work
Most current simulation models are probabilistic models in which the models simulate user actions based on dialog context features (Schatzmann et al., 2006).
Related Work
They first cluster dialog contexts based on selected features and then build conditional probability models for each cluster.
Related Work
In our study, we build a conditional probability model which will be described in detail in Section 3.2.1.
probabilistic model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Smith, Noah A.
Conclusion
In this paper, we have presented a probabilistic model of paraphrase incorporating syntax, lexical semantics, and hidden loose alignments between two sentences’ trees.
Experimental Evaluation
It is quite promising that a linguistically-motivated probabilistic model comes so close to a string-similarity baseline, without incorporating string-local phrases.
Introduction
This syntactic framework represents a major departure from useful and popular surface similarity features, and the latter are difficult to incorporate into our probabilistic model .
Introduction
We introduce our probabilistic model in §2.
Probabilistic Model
For the present, consider it a specially-defined probabilistic model that generates sentences with a specific property, like “paraphrases s,” when 0 = p.) Given 5, Ge generates the other sentence in the pair, 5’ .
QG for Paraphrase Modeling
It is never used for parsing or for generation; it is only used as a component in the generative probability model presented in §2 (Eq.
probabilistic model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Spiegler, Sebastian and Flach, Peter A.
Abstract
We employ two algorithms which come from a family of generative probabilistic models .
Conclusions
We introduced two algorithms for word decomposition which are based on generative probabilistic models .
Introduction
The purpose of this paper is an analysis of the underlying probabilistic models and the types of errors committed by each one.
Probabilistic generative model
1PROMODES stands for PRObabilistic Model for different DEgrees of Supervision.
Probabilistic generative model
For this reason, we have extended the model which led to PROMODES-H, a higher-order probabilistic model .
Related work
Moreover, our probabilistic models seem to resemble Hidden Markov Models (HMMs) by having certain states and transitions.
probabilistic model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Dubey, Amit
Abstract
Probabilistic models of sentence comprehension are increasingly relevant to questions concerning human language processing.
Conclusions
Although our main results above underscore the usefulness of probabilistic modeling , this observation emphasizes the importance of finding a tenable link between probabilities and behaviours.
Introduction
For example, probabilistic models shed light on so-called locality effects: contrast the non-probabilistic hypothesis that dependants which are far away from their head always cause processing difficulty for readers due to the cost of storing the intervening material in memory (Gibson, 1998), compared to the probabilistic prediction that there are cases when faraway dependants facilitate processing, because readers have more time to predict the head (Levy, 2008).
Introduction
So far, probabilistic models of sentence processing have been largely limited to syntactic factors.
Introduction
In the literature on probabilistic modeling , though, the bulk of this work is focused on lexical semantics (e.g.
probabilistic model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark
Abstract
This paper establishes a connection between two apparently very different kinds of probabilistic models .
Abstract
Exploiting the close relationship between LDA and PCFGs just described, we propose two novel probabilistic models that combine insights from LDA and AG models.
Conclusion
This paper establishes a connection between two very different kinds of probabilistic models ; LDA models of the kind used for topic modelling, and PCFGs, which are a standard model of hierarchical structure in language.
Introduction
This paper establishes a theoretical connection between two very different kinds of probabilistic models : Probabilistic Context-Free Grammars (PCFGs) and a class of models known as Latent Dirichlet Allocation (Blei et al., 2003; Griffiths and Steyvers, 2004) models that have been used for a variety of tasks in machine learning.
Latent Dirichlet Allocation Models
An LDA model is an explicit generative probabilistic model of a collection of documents.
probabilistic model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Composite language model
A PLSA model (Hofmann, 2001) is a generative probabilistic model of word-document co-occurrences using the bag-of-words assumption described as follows: (i) choose a document d with probability p(d); (ii) SEMANTIZER: select a semantic class 9 with probability p(g|d); and (iii) WORD-PREDICTOR: pick a word 21) with probability p(w|g).
Composite language model
Since only one pair of (d, w) is being observed, as a result, the joint probability model is a mixture of log-linear model with the expression p(d, w) = p(d) Zg p(wlg)p(9|d)- Typically, the number of documents and vocabulary size are much larger than the size of latent semantic class variables.
Training algorithm
The TAGGER and CONSTRUCTOR are conditional probabilistic models of the type p(u|zl, - - - ,2“) where u, 21, - - - ,zn belong to a mixed set of words, POS tags, NTtags, CONSTRUCTOR actions (u only), and 21, - - - ,2“, form a linear Markov chain.
Training algorithm
The WORD-PREDICTOR is, however, a conditional probabilistic model p(w|w:,11+1h:,1ng) where there are three kinds of context 21):; +1, bin and g, each forms a linear Markov chain.
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Zhao, Qiuye and Marcus, Mitch
Abstract
In this work, deterministic constraints are decoded before the application of probabilistic models , therefore lookahead features are made available during Viterbi decoding.
Abstract
Since these deterministic constraints are applied before the decoding of probabilistic models , reliably high precision of their predictions is crucial.
Abstract
However, when tagset BMES is used, the learned constraints don’t always make reliable predictions, and the overall precision is not high enough to constrain a probabilistic model .
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yogatama, Dani and Sim, Yanchuan and Smith, Noah A.
Conclusions
We presented an improved probabilistic model for canonicalizing named entities into a table.
Introduction
Here, we use a probabilistic model to infer a struc-
Model
These choices highlight that the design of a probabilistic model can draw from both Bayesian and discriminative tools.
Related Work
Our model is focused on the problem of canonicalizing mention strings into their parts, though its 7“ variables (which map mentions to rows) could be interpreted as (within-document and cross-document) coreference resolution, which has been tackled using a range of probabilistic models (Li et al., 2004; Haghighi and Klein, 2007; Poon and Domingos, 2008; Singh et al., 2011).
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher
Introduction
The semantic component of our model learns word vectors via an unsupervised probabilistic model of documents.
Our Model
To capture semantic similarities among words, we derive a probabilistic model of documents which learns word representations.
Our Model
We build a probabilistic model of a document using a continuous mixture distribution over words indexed by a multidimensional random variable 6.
Our Model
Equation 1 resembles the probabilistic model of LDA (Blei et al., 2003), which models documents as mixtures of latent topics.
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wu, Stephen and Bachrach, Asaf and Cardenas, Carlos and Schuler, William
Parsing Model
This go is factored and taken to be a deterministic constant, and is therefore unimportant as a probability model .
Parsing Model
This leads us to a specification of the reduce and shift probability models .
Parsing Model
These models can be thought of as picking out a ftd first, finding the matching case, then applying the probability models that matches.
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ritter, Alan and Mausam and Etzioni, Oren
Conclusions and Future Work
Because LDA-SP generates a complete probabilistic model for our relation data, its results are easily applicable to many other tasks such as identifying similar relations, ranking inference rules, etc.
Introduction
Additionally, because LDA-SP is based on a formal probabilistic model , it has the advantage that it can naturally be applied in many scenarios.
Previous Work
Previous work on selectional preferences can be broken into four categories: class-based approaches (Resnik, 1996; Li and Abe, 1998; Clark and Weir, 2002; Pantel et al., 2007), similarity based approaches (Dagan et al., 1999; Erk, 2007), discriminative (Bergsma et al., 2008), and generative probabilistic models (Rooth et al., 1999).
Previous Work
Our solution fits into the general category of generative probabilistic models , which model each relatiorflargument combination as being generated by a latent class variable.
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Feng, Yansong and Lapata, Mirella
Abstractive Caption Generation
Word-based Model Our first abstractive model builds on and extends a well-known probabilistic model of headline generation (Banko et al., 2000).
Image Annotation
Our experiments made use of the probabilistic model presented in Feng and Lapata (2010).
Image Annotation
Any probabilistic model with broadly similar properties could serve our purpose.
Results
As can be seen the probabilistic models (KL and J S divergence) outperform word overlap and cosine similarity (all differences are statistically significant, p < 0.01).6 They make use of the same topic model as the image annotation model, and are thus able to select sentences that cover common content.
probabilistic model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Abstract
This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs.
Hierarchical ITG Model
By doing so, we are able to do away with heuristic phrase extraction, creating a fully probabilistic model for phrase probabilities that still yields competitive results.
Related Work
A generative probabilistic model where longer units are built through the binary combination of shorter units was proposed by de Marcken (1996) for monolingual word segmentation using the minimum description length (MDL) framework.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bisani, Maximilian and Vozila, Paul and Divay, Olivier and Adams, Jeff
Experimental evaluation
In the case of the probabilistic model , all models were 3-gram models.
Experimental evaluation
With 100 training documents per user the mean token error rate is reduced by up to 40% relative by the probabilistic model .
Experimental evaluation
On the other hand the probabilistic model suffers from a slightly higher deletion rate due to being overzealous in this regard.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shindo, Hiroyuki and Miyao, Yusuke and Fujino, Akinori and Nagata, Masaaki
Background and Related Work
Since different derivations may produce the same parse tree, recent work on TSG induction (Post and Gildea, 2009; Cohn et al., 2010) employs a probabilistic model of a TSG and predicts derivations from observed parse trees in an unsupervised way.
Symbol-Refined Tree Substitution Grammars
3.1 Probabilistic Model
Symbol-Refined Tree Substitution Grammars
We define a probabilistic model of an SR-TSG based on the Pitman-Yor Process (PYP) (Pitman and Yor, 1997), namely a sort of nonparametric Bayesian model.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sim, Khe Chai
Conclusions
Therefore, apart from the acoustic and language models used in conventional ASR, HVR also combines the haptic model as well as the PLI model to yield an integrated probabilistic model .
Integration of Knowledge Sources
Therefore, fl, 5, and 7:1 can be obtained from the respective probabilistic models .
Introduction
This framework allows coherent probabilistic models of different knowledge sources to be tightly integrated.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ó Séaghdha, Diarmuid
Conclusions and future work
This paper has demonstrated how Bayesian techniques originally developed for modelling the topical structure of documents can be adapted to learn probabilistic models of selectional preference.
Related work
The combination of a well-defined probabilistic model and Gibbs sampling procedure for estimation guarantee (eventual) convergence and the avoidance of degenerate solutions.
Three selectional preference models
Given a dataset of predicate-argument combinations and values for the hyperparameters 04 and 6, the probability model is determined by the class assignment counts fzn and fzv.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Goldwasser, Dan and Roth, Dan
Experimental Settings
DOM-INIT W1: Noisy probabilistic model , described below.
Experimental Settings
We used the noisy Robocup dataset to initialize DOM-INIT, a noisy probabilistic model , constructed by taking statistics over the noisy robocup data and computing p(y|X).
Knowledge Transfer Experiments
Domain-independent information is learned from the situated domain and domain-specific information (Robocup) available is the simple probabilistic model (DOM-INIT).
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nakashole, Ndapandula and Tylenda, Tomasz and Weikum, Gerhard
Abstract
Our method is based on a probabilistic model that feeds weights into integer linear programs that leverage type signatures of relational phrases and type correlation or disj ointness constraints.
Evaluation
The second method is PEARL with no ILP (denoted No ILP), only using the probabilistic model .
Introduction
For cleaning out false hypotheses among the type candidates for a new entity, we devised probabilistic models and an integer linear program that considers incompatibilities and correlations among entity types.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Reichart, Roi and Korhonen, Anna
Abstract
We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering.
Introduction
Our framework is based on Determinantal Point Processes (DPPs, (Kulesza, 2012; Kulesza and Taskar, 2012c)), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets.
The Unified Framework
Determinantal point processes (DPPs) are elegant probabilistic models of repulsion that offer efficient and exact algorithms for sampling, marginalization, conditioning, and other inference tasks.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Schoenemann, Thomas
Conclusion
variants, an important aim of probabilistic modeling for word alignment.
The models IBM-3, IBM-4 and IBM-5
H221 k denotes the factorial of n. The main difference between IBM-3, IBM-4 and IBM-5 is the choice of probability model in step 3 b), called a distortion model.
The models IBM-3, IBM-4 and IBM-5
In total, the IBM-3 has the following probability model:
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Image Clustering with Annotated Auxiliary Data
So we design a new PLSA model by joining the probabilistic model in Equation (1) and the probabilistic model in Equation (4) into a unified model, as shown in Figure 3.
Related Works
Probabilistic latent semantic analysis (PLSA) is a widely used probabilistic model (Hofmann, 1999), and could be considered as a probabilistic implementation of latent semantic analysis (L SA) (Deerwester et al., 1990).
Related Works
An extension to PLSA was proposed in (Cohn and Hofmann, 2000), which incorporated the hyperlink connectivity in the PLSA model by using a joint probabilistic model for connectivity and content.
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Cortes, Corinna and Kuznetsov, Vitaly and Mohri, Mehryar
Boosting-style algorithm
This can be used for example in the case where the experts are derived from probabilistic models .
Introduction
Furthermore, these methods typically assume the use of probabilistic models , which is not a requirement in our learning scenario.
Introduction
Other ensembles of probabilistic models have also been considered in text and speech processing by forming a product of probabilistic models via the intersection of lattices (Mohri et al., 2008), or a straightforward combination of the posteriors from probabilistic grammars trained using EM with different starting points (Petrov, 2010), or some other rather intricate techniques in speech recognition (Fiscus, 1997).
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Introduction
In this paper we use a basic neural network architecture and a lexicalized probability model to create a powerful MT decoding feature.
Model Variations
Formally, the probability model is:
Neural Network Joint Model (NNJ M)
far too sparse for standard probability models such as Kneser-Ney back-off (Kneser and Ney, 1995) or Maximum Entropy (Rosenfeld, 1996).
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Nguyen, Thang and Hu, Yuening and Boyd-Graber, Jordan
Abstract
However, these new methods lack the rich priors associated with probabilistic models .
Adding Regularization
For example, if we are seeking the MLE of a probabilistic model parameterized by 6, p(:c|6), adding a regularization term 7(6) = 2le 6?
Regularization Improves Topic Models
This is the typical evaluation for probabilistic models .
probabilistic model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: