Index of papers in Proc. ACL that mention
  • conditional probability
Chambers, Nathanael and Jurafsky, Daniel
Discussion
Training Size: Training data improves the conditional probability baseline, but does not help the smoothing model.
Discussion
We optimized argument cutoffs for each training size, but the model still appears to suffer from additional noise that the conditional probability baseline does not.
Discussion
High Precision Baseline: Our conditional probability baseline is very precise.
Experiments
The conditional probability is reported as Baseline.
Models
We propose a conditional probability baseline:
Models
(1999) showed that corpus frequency and conditional probability correlate with human decisions of adjective-noun plausibility, and Dagan et al.
Models
(1999) appear to propose a very similar baseline for verb-noun selectional preferences, but the paper evaluates unseen data, and so the conditional probability model is not studied.
Results
The conditional probability Baseline falls from 91.5 to 79.5, a 12% absolute drop from completely random to neighboring frequency.
Results
Accuracy is the same as recall when the model does not guess between pseudo words that have the same conditional probabilities .
conditional probability is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Li, Linlin and Roth, Benjamin and Sporleder, Caroline
Abstract
This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context.
Abstract
We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables.
Introduction
We approach the sense disambiguation task by choosing the best sense based on the conditional probability of sense paraphrases given a context.
Introduction
We propose three models which are suitable for different situations: Model I requires knowledge of the prior distribution over senses and directly maximizes the conditional probability of a sense given the context; Model 11 maximizes this conditional probability by maximizing the cosine value of two topic-document vectors (one for the sense and one for the context).
The Sense Disambiguation Model
Assigning the correct sense 3 to a target word 21) occurring in a context 0 involves finding the sense which maximizes the conditional probability of senses given a context:
The Sense Disambiguation Model
This conditional probability is decomposed by incorporating a hidden variable, topic 2, introduced by the topic model.
The Sense Disambiguation Model
Model I directly maximizes the conditional probability of the sense given the context, where the sense is modeled as a ‘paraphrase document’ ds and the context as a ‘context document’ dc.
conditional probability is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Pinyin Input Method Model
The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Pinyin Input Method Model
Similar to G8 , the edges are from one syllable to all syllables next to it and edge weights are the conditional probabilities between them.
Pinyin Input Method Model
Thus the conditional probability between characters does not make much sense.
Related Works
They solved the typo correction problem by decomposing the conditional probability P(H |P) of Chinese character sequence H given pinyin sequence P into a language model P(wi|wi_1) and a typing model The typing model that was estimated on real user input data was for typo correction.
conditional probability is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Abstract
We obtain final BLEU scores of 19.35 ( conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system.
Conclusion
We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well.
Final Results
We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”.
Our Approach
3.1 Model-1 : Conditional Probability Model
Our Approach
Our model estimates the conditional probability by interpolating a word-based model and a character-based (transliteration) model.
Our Approach
Because our overall model is a conditional probability model, joint-probabilities are marginalized using character-based prior probabilities:
conditional probability is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Ziqi and Xu, Gu and Li, Hang and Zhang, Ming
Abstract
The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word.
Introduction
The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction given the misspelled word.
Model for Candidate Generation
We define the conditional probability distribution of we and R(wm, we) given mm as the following log linear model:
Model for Candidate Generation
E V is a correction of The objective of training would be to maximize the conditional probability P0122, R(wfn, over the training data.
Model for Candidate Generation
In this paper, we assume that the transformation that actually generates the correction among all the possible transformations is the one that can give the maximum conditional probability ; the exactly same criterion is also used for fast prediction.
conditional probability is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola
Introduction
They use aligned sequences of words, called biphrases, as building blocks for translations, and score alternative candidate translations for the same source sentence based on a log-linear model of the conditional probability of target sentences given the source sentence:
Introduction
alignment a, where the alignment is a representation of the sequence of biphrases that where used in order to build T from S; The Ak’s are weights and ZS is a normalization factor that guarantees that p is a proper conditional probability distribution over the pairs (T, A).
Introduction
These typically include forward and reverse phrase conditional probability features log p(f|§) as well as logp(§ where 59' is the source side of the biphrase and f the target side, and the so-called “phrase penalty” and “word penalty” features, which count the number of phrases and words in the alignment.
Phrase-based Decoding as TSP
For example, in the example of Figure 3, the cost of this - machine translation - is - strange, can only take into account the conditional probability of the word strange relative to the word is, but not relative to the words translation and is.
Related work
its translation has larger conditional probability ) or because the associated heuristics is less tight, hence more optimistic.
conditional probability is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Discriminative Synchronous Transduction
Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence.
Discriminative Synchronous Transduction
The conditional probability of a derivation, d, for a target translation, e, conditioned on the source, f, is given by:
Discriminative Synchronous Transduction
Given (1), the conditional probability of a target translation given the source is the sum over all of its derivations:
Evaluation
This is illustrated in Table 2, which shows the conditional probabilities for rules, obtained by 10-cally normalising the rule feature weights for a simple grammar extracted from the ambiguous pair of sentences presented in DeNero et al.
Evaluation
The first column of conditional probabilities corresponds to a maximum likelihood estimate, i.e., without regularisation.
conditional probability is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
Therefore, theoretically, the conditional probability of a target translation es conditioned on the source CSS-based tree ft is given by P(es | fl) ,
A semantic span can include one or more eus.
The conditional probabilities of the new translation rules are calculated following (Chiang, 2005).
A semantic span can include one or more eus.
transfer model can be represented as a conditional probability : P(w | CSS) (4) By deriving each node of the CSS, we can obtain a factored formula: P(w I CSS) = Hm.
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Mori, Shinsuke and Kawahara, Tatsuya
Alignment Methods
These models are by nature directional, attempting to find the alignments that maximize the conditional probability of the target sentence P(e{| f1], aK For computational reasons, the IBM models are restricted to aligning each word on the target side to a single word on the source side.
Substring Prior Probabilities
In addition, we heuristically prune values for which the conditional probabilities P(e| f) or P(f|e) are less than some fixed value, which we set to 0.1 for the reported experiments.
Substring Prior Probabilities
To determine how to combine 0(6), 0( f ), and C(6, f) into prior probabilities, we performed preliminary experiments testing methods proposed by previous research including plain co-occurrence counts, the Dice coefficient, and X—squared statistics (Cromieres, 2006), as well as a new method of defining substring pair probabilities to be proportional to bidirectional conditional probabilities
Substring Prior Probabilities
Z = Z Pcooc(e|f)Pcooc(f|e)' {6,f;c(e,f)>d} The experiments showed that the bidirectional conditional probability method gave significantly better results than all other methods, so we adopt this for the remainder of our experiments.
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Meng, Xinfan and Wei, Furu and Liu, Xiaohua and Zhou, Ming and Xu, Ge and Wang, Houfeng
Cross-Lingual Mixture Model for Sentiment Classification
The parameters to be estimated include conditional probabilities of word to class, P(w8|c) and P(wt|c), and word projection
Cross-Lingual Mixture Model for Sentiment Classification
The obtained word-class conditional probability P (wt |c) can then be used to classify text in the target languages using Bayes Theorem and the Naive Bayes
Cross-Lingual Mixture Model for Sentiment Classification
Instead of estimating word projection probability (P(w3|wt) and P(wt|w3)) and conditional probability of word to class (P(wt|c) and P(w3|c)) simultaneously in the training procedure, we estimate them separately since the word projection probability stays invariant when estimating other parameters.
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Levy, Roger
Garden-path disambiguation under surprisal
The SURPRISAL THEORY of incremental sentence-processing difficulty (Hale, 2001; Levy, 2008a) posits that the cognitive effort required to process a given word w;- of a sentence in its context is given by the simple information-theoretic measure of the log of the inverse of the word’s conditional probability (also called its “surprisal” or “Shannon information content”) in its intra-sentential context w1,,,,,i_1 and extra-sentential context Ctxt:
Garden-path disambiguation under surprisal
Letting Tj range over the possible incremental syntactic analyses of words wlma preceding fell, under surprisal the conditional probability of the disambiguating continuation fell can be approximated as
Garden-pathing and input uncertainty
In a surprisal model with clean, veridical input, Fodor’s conclusion is exactly what is predicted: separating a verb from its direct object with a comma effectively never happens in edited, published written English, so the conditional probability of the NP analysis should be close to zero.2 When uncertainty about surface input is introduced, however—due to visual noise, imperfect memory representations, and/or beliefs about possible speaker error—analyses come into play in which some parts of the true string are treated as if they were absent.
Model instantiation and predictions
Computing the surprisal incurred by the disambiguating element given an uncertain-input representation of the sentence involves a standard application of the definition of conditional probability (Hale, 2001):
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Dong and Dixon, Paul and Furui, Sadaoki
Conclusions and future work
The CRF segmentation provides a list of segmentations: A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S).
Conclusions and future work
The CRF conversion, given a segmentation Ai, provides a list of transliteration output T1,T2, ...,TM, with conditional probabilities P(T1|S,Ai),P(T2|S,Az-), ...,P(TM|S,Az-).
Experiments
From the two-step CRF model we get the conditional probability PC RF (T |S ) and from the JSCM we get the joint probability P(S, T).
Experiments
The conditional probability of PJSC M (T |S) can be calculuated as follows:
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank
Method
The parser produces prefix probabilities for each word of a sentence which we converted to conditional probabilities by dividing each current probability by the previous one.
Models of Processing Difficulty
Backward transitional probability is essentially the conditional probability of a word given its immediately preceding word, P(wk|wk_1).
Models of Processing Difficulty
Analogously, forward probability is the conditional probability of the current word given the next word, P(wk|wk+1).
Models of Processing Difficulty
This measure of processing cost for an input word, wk+1, given the previous context, W1...Wk, can be expressed straightforwardly in terms of its conditional probability as:
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pervouchine, Vladimir and Li, Haizhou and Lin, Bo
Experiments
The joint probability is expressed as a chain product of a series of conditional probabilities of token pairs P({ei}, {cj}) = P((ék,ck)|(ék_1,ck_1)), k = 1 .
Experiments
The conditional probabilities for token pairs are estimated from the aligned training corpus.
Transliteration alignment techniques
From the affinity matrix conditional probabilities P(ei|cj) can be estimated as
Transliteration alignment techniques
We estimate conditional probability of Chinese phoneme cpk, after observing English character 6, as
conditional probability is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lucas, Michael and Downey, Doug
Problem Definition
The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates.
Problem Definition
MNB-FM improves the conditional probability estimates in MNB and, surprisingly, we found that it can often improve these estimates for words that do not even occur in the training set.
Problem Definition
For each word in the vocabulary, we compared each classifier’s conditional probability ratios, i.e.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Kang and Xu, Liheng and Zhao, Jun
Opinion Target Extraction Methodology
Then the conditional probabilities between potential opinion target wt and potential opinion word wo can be es-
Opinion Target Extraction Methodology
P (wt|w0) means the conditional probabilities between two words.
Opinion Target Extraction Methodology
At the same time, we can obtain conditional probability P (w0|wt).
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tian, Zhenhua and Xiang, Hengheng and Liu, Ziqi and Zheng, Qinghua
RSP: A Random Walk Model for SP
We suppose this could bring at least two benefits: 1) a proper measure on the preferences can make the discovering of nearby predicates with similar preferences to be more accurate; 2) while propagation, we propagate the scored preferences, rather than the raw counts or conditional probabilities , which could be more proper and agree with the nature of SP smooth.
RSP: A Random Walk Model for SP
Some introduce conditional probability (CP) p(a|q) for the decision of preference judgements (Chambers and Jurafsky, 2010; Erk et al., 2010; Seaghdha, 2010).
RSP: A Random Walk Model for SP
In this mode, we always set Pr(q, a) as the conditional probability p(a|q) for the propagation function, despite what \11 is used for the distance function.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Flati, Tiziano and Navigli, Roberto
Experiment 1: Oxford Lexical Predicates
For each lexical predicate we calculated the conditional probability of each semantic class using Formula 1, resulting in a ranking of semantic classes.
Large-Scale Harvesting of Semantic Predicates
Since not all classes are equally relevant to the lexical predicate 7r, we estimate the conditional probability of each class c E 0 given 7r on the basis of the number of sentences which contain an argument in that class.
Large-Scale Harvesting of Semantic Predicates
To enable wide coverage we estimate a second conditional probability based on the distributional semantic profile of each class.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhu, Zede and Li, Miao and Chen, Lei and Yang, Zhenxin
Building comparable corpora
3.2 Conditional Probability
Building comparable corpora
The similarity between 7715 and 771T is defined as the Conditional Probability (CP) of documents
Introduction
method of conditional probability to calculate document similarity; 4) Address a language-independent study which isn’t limited to a particular data source in any language.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bendersky, Michael and Croft, W. Bruce and Smith, David A.
Independent Query Annotations
The most straightforward way to estimate the conditional probabilities in Eq.
Independent Query Annotations
iven a short, often ungrammatical query, it is hard 1 accurately estimate the conditional probability in q.
Independent Query Annotations
Furthermore, to make the estimation of the conditional probability p(zQ |r) feasible, it is assumed that the symbols C;- in the annotation sequence are independent, given a sentence 7“.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Neubig, Graham and Watanabe, Taro and Sumita, Eiichiro and Mori, Shinsuke and Kawahara, Tatsuya
Phrase Extraction
ities Pt((e, f Similarly to the heuristic phrase tables, we use conditional probabilities Pt(f|e) and Pt(e| f), lexical weighting probabilities, and a phrase penalty.
Phrase Extraction
Here, instead of using maximum likelihood, we calculate conditional probabilities directly from Pt probabilities:
Phrase Extraction
In MOD, we do this by taking the average of the joint probability and span probability features, and recalculating the conditional probabilities from the averaged joint probabilities.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael and Jurafsky, Dan
Information Extraction: Slot Filling
A document is labeled for a template if two different conditions are met: (1) it contains at least one trigger phrase, and (2) its average per-token conditional probability meets a strict threshold.
Information Extraction: Slot Filling
Both conditions require a definition of the conditional probability of a template given a token.
Information Extraction: Slot Filling
This is not the usual conditional probability definition as IR-corpora are different sizes.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Decoding
The conditional probability P(o | TC) is decomposes into the product of rule probabilities:
Decoding
where the first three are conditional probabilities based on fractional counts of rules defined in Section 3.4, and the last two are lexical probabilities.
Rule Extraction
We use fractional counts to compute three conditional probabilities for each rule, which will be used in the next section:
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Klein, Dan
Inference of Cognate Assignments
The edge affinities afl' between these nodes are the conditional probabilities of each word wg belonging to each cognate group 9:
Learning
Because we enforce that a word in language d must be dead if its parent word in language a is dead, we just need to learn the conditional probabilities p(Sd = dead|Sa = alive).
Learning
Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006).
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Yang, Qiang and Chen, Yuqiang and Xue, Gui-Rong and Dai, Wenyuan and Yu, Yong
Image Clustering with Annotated Auxiliary Data
Then we apply the EM algorithm (Dempster et al., 1977) to estimate the conditional probabilities P(f|z), and P(z|v) with respect to each dependence in Figure 3 as follows.
Image Clustering with Annotated Auxiliary Data
o M-Step: re-estimates conditional probabilities P(zk|vi) and P(zk|wl):
Image Clustering with Annotated Auxiliary Data
and conditional probability P(fj |zk), which is a mixture portion of posterior probability of latent variables
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tomanek, Katrin and Hahn, Udo
Active Learning for Sequence Labeling
We apply CRFs as our base learner throughout this paper and employ a utility function which is based on the conditional probability of the most likely label sequence 37* for an observation sequence :8 (cf.
Conditional Random Fields for Sequence Labeling
See Equation (1) for the conditional probability PX(37 with ZX calculated as in Equation (6).
Conditional Random Fields for Sequence Labeling
The marginal and conditional probabilities are used by our AL approaches as confidence estimators.
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Szarvas, Gy"orgy
Methods
Being a stopword in our case, and having no relevance at all to speculative assertions, it has a class conditional probability of P(spec|it) = 74.67% on the seed sets.
Methods
We ranked the features c by frequency and their class conditional probability P(spec|cc).
Methods
Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features).
conditional probability is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: