Index of papers in Proc. ACL that mention
  • probability distribution
Nederhof, Mark-Jan and Satta, Giorgio
Definitions
Intuitively, propemess ensures that where a pair of nonterminals in two synchronous strings can be rewritten, there is a probability distribution over the applicable rules.
Definitions
We say a PSCFG is consistent if pg defines a probability distribution over the translation, or formally:
Discussion
Prefix probabilities and right prefix probabilities for PSCFGs can be exploited to compute probability distributions for the next word or part-of-speech in left-to-right incremental translation of speech, or alternatively as a predictive tool in applications of interactive machine translation, of the kind described by Foster et al.
Effective PSCFG parsing
The translation and the associated probability distribution in the resulting grammar will be the same as those in the source grammar.
Effective PSCFG parsing
Again, in the resulting grammar the translation and the associated probability distribution will be the same as those in the source grammar.
Introduction
Prefix probabilities can be used to compute probability distributions for the next word or part-of-speech.
Introduction
Prefix probabilities and right prefix probabilities for PSCFGs can be exploited to compute probability distributions for the next word or part-of-speech in left-to-right incremental translation, essentially in the same way as described by Jelinek and Lafferty (1991) for probabilistic context-free grammars, as discussed later in this paper.
Prefix probabilities
The next step will be to transform Qprefix into a third grammar gl’mfix by eliminating epsilon rules and unit rules from the underlying SCFG, and preserving the probability distribution over pairs
probability distribution is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
Both parallel and monolingual corpora are used to obtain these probability distributions over target phrases.
Generation & Propagation
If a source phrase is found in the baseline phrase table it is called a labeled phrase: its conditional empirical probability distribution over target phrases (estimated from the parallel data) is used as the label, and is sub-
Generation & Propagation
We then propagate by deriving a probability distribution over these target phrases using graph propagation techniques.
Introduction
We then limit the set of translation options for each unlabeled source phrase (§2.3), and using a structured graph propagation algorithm, where translation information is propagated from labeled to unlabeled phrases proportional to both source and target phrase similarities, we estimate probability distributions over translations for
probability distribution is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ogura, Yukari and Kobayashi, Ichiro
Experiment
Furthermore, the hyper-parameters for topic probability distribution and word probability distribution in LDA are a=0.5 and [3:05, respectively.
Experiment
Here, in the case of clustering the documents based on the topic probabilistic distribution by LDA, the topic distribution over documents 6 is changed in every estimation.
Experiment
To measure the latent similarity among documents, we construct topic vectors with the topic probabilistic distribution , and then adopt the Jensen-Shannon divergence to measures it, on the other hand, in the case of using document vectors we adopt cosine similarity.
Techniques for text classification
After obtaining a collection of refined documents for classification, we adopt LDA to estimate the latent topic probabilistic distributions over the target documents and use them for clustering.
Techniques for text classification
In this study, we use the topic probability distribution over documents to make a topic vector for each document, and then calculate the similarity among documents.
Techniques for text classification
Here, N is the number of all words in the target documents, wmn is the nth word in the m-th document; 6 is the topic probabilistic distribution for the documents, and gb is the word probabilistic distribution for every topic.
probability distribution is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ceylan, Hakan and Kim, Yookyung
Language Identification
For each language, we collect the n-gram counts (for n = l to n = 7 also using the word beginning and ending spaces) from the vocabulary of the training corpus, and then generate a probability distribution from these counts.
Language Identification
From these counts, we obtained a probability distribution for all the words in our vocabulary.
Language Identification
In Table 3, we present the top 10 results of the probability distributions obtained from the vocabulary of English, Finnish, and German corpora.
Related Work
(Sibun and Reynar, 1996) used Relative Entropy by first generating n-gram probability distributions for both training and test data, and then measuring the distance between the two probability distributions by using the Kullback-Liebler Distance.
probability distribution is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
Experiments
While the SENSESPOTTING task has MT utility in suggesting which new domain words demand a new translation, the MOSTFRE-QSENSECHANGE task has utility in suggesting which words demand a new translation probability distribution when shifting to a new domain.
New Sense Indicators
Second, given a source word 3, we use this classifier to compute the probability distribution of target translations (p(t|s)).
New Sense Indicators
Subsequently, we use this probability distribution to define new features for the SENSESPOTTING task.
New Sense Indicators
Entropy is the entropy of the probability distribution : — 2t p(t|s) log p(t|s).
probability distribution is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Spiegler, Sebastian and Flach, Peter A.
Probabilistic generative model
If a generative model is fully parameterised it can be reversed to find the underlying word decomposition by forming the conditional probability distribution Pr(Y |X
Probabilistic generative model
The first component of the equation above is the probability distribution over non-/boundaries Pr(bji).
Probabilistic generative model
We assume that a boundary in i is inserted independently from other boundaries (zero-order) and the graphemic representation of the word, however, is conditioned on the length of the word m j which means that the probability distribution is in fact Pr(bji|mj).
probability distribution is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Hingmire, Swapnil and Chakraborti, Sutanu
Background 3.1 LDA
Draw a word: wdm N Multinomial(zd,n) Where, T is the number of topics, 9b,; is the word probabilities for topic 75, 6d is the topic probability distribution , 2d,“, is topic assignment and wdm is word assignment for nth word position in document d respectively.
Experimental Evaluation
(a) Infer a probability distribution 0d over class labels using M D using Equation 3.
Introduction
We use the labeled topics to find probability distribution of each training document over the class labels.
Topic Sprinkling in LDA
We use this new model to infer the probability distribution of each unlabeled training document over the class labels.
Topic Sprinkling in LDA
While classifying a test document, its probability distribution over class labels is inferred using TS-LDA model and it is classified to its most probable class label.
probability distribution is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Metallinou, Angeliki and Bohus, Dan and Williams, Jason
Generative state tracking
(1996)) models the conditional probability distribution of the label 3/ given features X, p(y|x) via an exponential model of the form:
Introduction
The task is to assign a probability distribution over the G dialog state hypotheses, plus a meta-hypothesis which indicates that none of the G hypotheses is correct.
Introduction
Also note that the dialog state tracker is not predicting the contents of the dialog state hypotheses; the dialog state hypotheses contents are given by some external process, and the task is to predict a probability distribution over them, where the probability assigned to a hypothesis indicates the probability that it is correct.
Introduction
Dialog state tracking can be seen an analogous to assigning a probability distribution over items on an ASR N-best list given speech input and the recognition output, including the contents of the N-best list.
probability distribution is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Goldberg, Yoav and Tsarfaty, Reut
A Generative PCFG Model
(1996) who consider the kind of probabilities a generative parser should get from a PoS tagger, and concludes that these should be P(w|t) “and nothing fancier”.3 In our setting, therefore, the Lattice is not used to induce a probability distribution on a linear context, but rather, it is used as a common-denominator of state-indexation of all segmentations possibilities of a surface form.
A Generative PCFG Model
We smooth Prf (p —> (s, 19)) for rare and 00V segments (3 E [,1 E L, s unseen) using a “per-tag” probability distribution over rare segments which we estimate using relative frequency estimates for once-occurring segments.
Discussion and Conclusion
The overall performance of our joint framework demonstrates that a probability distribution obtained over mere syntactic contexts using a Treebank grammar and a data-driven lexicon outperforms upper bounds proposed by previous joint disambiguation systems and achieves segmentation and parsing results on a par with state-of-the-art standalone applications results.
Model Preliminaries
Given that weights on all outgoing arcs sum up to one, weights induce a probability distribution on the lattice paths.
probability distribution is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Klein, Dan
Learning
Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006).
Message Approximation
An alternative approach might be to simply treat messages as unnormalized probability distributions , and to minimize the KL divergence be-
Message Approximation
tween some approximating message mm) and the true message However, messages are not always probability distributions and — because the number of possible strings is in principle infinite —they need not sum to a finite number.5 Instead, we propose to minimize the KL divergence between the “expected” marginal distribution and the approximated “expected” marginal distribution:
Message Approximation
The procedure for calculating these statistics is described in Li and Eisner (2009), which amounts to using an expectation semiring (Eisner, 2001) to compute expected transitions in 7' o 71* under the probability distribution 7' o ,u.
probability distribution is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Silberer, Carina and Ferrari, Vittorio and Lapata, Mirella
Attribute-based Classification
For each image iw E Iw of concept w, we output an F -dimensional vector containing prediction scores scorea(iw) for attributes a = 1, ...,F. We transform these attribute vectors into a single vector pw 6 [0,1]1XF, by computing the centroid of all vectors for concept w. The vector is normalized to obtain a probability distribution over attributes given w:
Attribute-based Semantic Models
Let P E [0, 1]N XF denote a visual matrix, representing a probability distribution over visual attributes for each word.
Experimental Setup
We can thus compute the probability distribution over associates for each cue.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Count distributions
In the E step of EM, we compute a probability distribution (according to the current model) over all possible completions of the observed data, and the expected counts of all types, which may be fractional.
Word Alignment
The IBM models and related models define probability distributions p(a, f | e, 6), which model how likely a French sentence f is to be generated from an English sentence e with word alignment a.
Word Alignment
Different models parameterize this probability distribution in different ways.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Fleischman, Michael and Roy, Deb
Introduction
In the second phase, a conditional probability distribution is estimated that describes the probability that a word was uttered given such event representations.
Linguistic Mapping
We model this relationship, much like traditional language models, using conditional probability distributions .
Linguistic Mapping
The model assumes that every document is made up of a mixture of topics, and that each word in a document is generated from a probability distribution associated with one of those topics.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kang, Jun Seok and Feng, Song and Akoglu, Leman and Choi, Yejin
Pairwise Markov Random Fields and Loopy Belief Propagation
and x to observed ones X (variables with known labels, if any), our objective function is associated with the following joint probability distribution
Pairwise Markov Random Fields and Loopy Belief Propagation
A message mizj is sent from node i to node j and captures the belief of 2' about j, which is the probability distribution over the labels of j; i.e.
Pairwise Markov Random Fields and Loopy Belief Propagation
what i “thinks” j’s label is, given the current label of i and the type of the edge that connects i and j. Beliefs refer to marginal probability distributions of nodes over labels; for example denotes the belief of node 2' having label 3),.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kaufmann, Tobias and Pfister, Beat
Introduction
important reason for the success of these models is the fact that they are lexicalized: the probability distributions are also conditioned on the actual words occuring in the utterance, and not only on their parts of speech.
Language Model 2.1 The General Approach
P was modeled by means of a dedicated probability distribution for each conditioning tag.
Language Model 2.1 The General Approach
The resulting probability distributions were trained on the German TIGER treebank which consists of about 50000 sentences of newspaper text.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Beltagy, Islam and Erk, Katrin and Mooney, Raymond
Background
MLNs define a probability distribution over possible worlds, where a world’s probability increases exponentially with the total weight of the logical clauses that it satisfies.
Background
Given a set of weighted logical formulas, PSL builds a graphical model defining a probability distribution over the continuous space of values of the random variables in the model.
Background
Using distance to satisfaction, PSL defines a probability distribution over all possible interpretations I of all ground atoms.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Baroni, Marco and Dinu, Georgiana and Kruszewski, Germán
Distributional semantic models
The word2vec toolkit implements two efficient alternatives to the standard computation of the output word probability distributions by a softmax classifier.
Distributional semantic models
Hierarchical softmax is a computationally efficient way to estimate the overall probability distribution using an output layer that is proportional to log(unigram.perplexity(W)) instead of W (for W the vocabulary size).
Introduction
Allocation (LDA) models (Blei et al., 2003; Griffiths et al., 2007), where parameters are set to optimize the joint probability distribution of words and documents.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhu, Zede and Li, Miao and Chen, Lei and Yang, Zhenxin
Bilingual LDA Model
denotes the vocabulary probability distribution in the topic k; M denotes the document number; 6m
Bilingual LDA Model
denotes the topic probability distribution in the document m; Nm denotes the length of m; me
Introduction
Preiss (2012) transformed the source language topical model to the target language and classified probability distribution of topics in the same language, whose shortcoming is that the effect of model translation seriously hampers the comparable corpora quality.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tanigaki, Koichi and Shiba, Mitsuteru and Munaka, Tatsuji and Sagisaka, Yoshinori
Simultaneous Optimization of All-words WSD
Shadow thickness and surface height represents the composite probability distribution of all the twelve kernels.
Simultaneous Optimization of All-words WSD
The cluster centers are located at the means of hypotheses including miscellaneous alternatives not intended, thus the estimated probability distribution is, roughly speaking, offset toward the center of WordNet, which is not what we want.
Smoothing Model
Figure 1: Proposed probability distribution model for context-to-sense mapping space.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kummerfeld, Jonathan K. and Roesner, Jessika and Dawborn, Tim and Haggerty, James and Curran, James R. and Clark, Stephen
Background
The C&C supertagger is similar to the Ratnaparkhi (1996) tagger, using features based on words and POS tags in a five-word window surrounding the target word, and defining a local probability distribution over supertags for each word in the sentence, given the previous two supertags.
Background
Alternatively the Forward-Backward algorithm can be used to efficiently sum over all sequences, giving a probability distribution over supertags for each word which is conditional only on the input sentence.
Results
Note that these are all alternative methods for estimating the local log-linear probability distributions used by the Ratnaparkhi-style tagger.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bhat, Suma and Sproat, Richard
Novel Estimator of Vocabulary size
sequence drawn according to a probability distribution P from a large, but finite, vocabulary 9.
Novel Estimator of Vocabulary size
Our main interest is in probability distributions 1?
Novel Estimator of Vocabulary size
In particular, the authors consider a sequence of vocabulary sets and probability distributions , indexed by the observation size n. Specifically, the observation (X 1, .
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Introduction
We investigate the use of distributional representations, which model the probability distribution of a word’s context, as techniques for finding smoothed representations of word sequences.
Smoothing Natural Language Sequences
If V is the vocabulary, or the set of word types, and X is a sequence of random variables over V, the left and right context of Xi = 2) may each be represented as a probability distribution over V: P(XZ-_1|XZ- = v) and P(Xi+1|X = 2)) respectively.
Smoothing Natural Language Sequences
We then normalize each vector to form a probability distribution .
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Foster, George
Vector space model adaptation
Thus, we get the probability distribution of a phrase pair or the phrase pairs in the dev data across all subcorpora:
Vector space model adaptation
To further improve the similarity score, we apply absolute discounting smoothing when calculating the probability distributions p,( f, e).
Vector space model adaptation
We carry out the same smoothing for the probability distributions pi(dev).
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Ziqi and Xu, Gu and Li, Hang and Zhang, Ming
Abstract
The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word.
Introduction
The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction given the misspelled word.
Model for Candidate Generation
We define the conditional probability distribution of we and R(wm, we) given mm as the following log linear model:
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Schütze, Hinrich
Experimental Setup
Table 2: Key to probability distributions
Experimental Setup
Table 2 is a key to the probability distributions we use.
Introduction
Language models, probability distributions over strings of words, are fundamental to many applications in natural language processing.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kazama, Jun'ichi and De Saeger, Stijn and Kuroda, Kow and Murata, Masaki and Torisawa, Kentaro
Background
Estimating a conditional probability distribution gbk; = p( as a context profile for each 212,- falls into this case.
Background
When the context profiles are probability distributions, we usually utilize the measures on probability distributions such as the Jensen-Shannon (J S) divergence to calculate similarities (Dagan et al., 1994; Dagan et al., 1997).
Background
The BC is also a similarity measure on probability distributions and is suitable for our purposes as we describe in the next section.
probability distribution is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: