Discussion | Training Size: Training data improves the conditional probability baseline, but does not help the smoothing model. |
Discussion | We optimized argument cutoffs for each training size, but the model still appears to suffer from additional noise that the conditional probability baseline does not. |
Discussion | High Precision Baseline: Our conditional probability baseline is very precise. |
Experiments | The conditional probability is reported as Baseline. |
Models | We propose a conditional probability baseline: |
Models | (1999) showed that corpus frequency and conditional probability correlate with human decisions of adjective-noun plausibility, and Dagan et al. |
Models | (1999) appear to propose a very similar baseline for verb-noun selectional preferences, but the paper evaluates unseen data, and so the conditional probability model is not studied. |
Results | The conditional probability Baseline falls from 91.5 to 79.5, a 12% absolute drop from completely random to neighboring frequency. |
Results | Accuracy is the same as recall when the model does not guess between pseudo words that have the same conditional probabilities . |
Abstract | This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. |
Abstract | We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. |
Introduction | We approach the sense disambiguation task by choosing the best sense based on the conditional probability of sense paraphrases given a context. |
Introduction | We propose three models which are suitable for different situations: Model I requires knowledge of the prior distribution over senses and directly maximizes the conditional probability of a sense given the context; Model 11 maximizes this conditional probability by maximizing the cosine value of two topic-document vectors (one for the sense and one for the context). |
The Sense Disambiguation Model | Assigning the correct sense 3 to a target word 21) occurring in a context 0 involves finding the sense which maximizes the conditional probability of senses given a context: |
The Sense Disambiguation Model | This conditional probability is decomposed by incorporating a hidden variable, topic 2, introduced by the topic model. |
The Sense Disambiguation Model | Model I directly maximizes the conditional probability of the sense given the context, where the sense is modeled as a ‘paraphrase document’ ds and the context as a ‘context document’ dc. |
Pinyin Input Method Model | The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables: |
Pinyin Input Method Model | Similar to G8 , the edges are from one syllable to all syllables next to it and edge weights are the conditional probabilities between them. |
Pinyin Input Method Model | Thus the conditional probability between characters does not make much sense. |
Related Works | They solved the typo correction problem by decomposing the conditional probability P(H |P) of Chinese character sequence H given pinyin sequence P into a language model P(wi|wi_1) and a typing model The typing model that was estimated on real user input data was for typo correction. |
Abstract | We obtain final BLEU scores of 19.35 ( conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. |
Conclusion | We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well. |
Final Results | We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”. |
Our Approach | 3.1 Model-1 : Conditional Probability Model |
Our Approach | Our model estimates the conditional probability by interpolating a word-based model and a character-based (transliteration) model. |
Our Approach | Because our overall model is a conditional probability model, joint-probabilities are marginalized using character-based prior probabilities: |
Abstract | The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word. |
Introduction | The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction given the misspelled word. |
Model for Candidate Generation | We define the conditional probability distribution of we and R(wm, we) given mm as the following log linear model: |
Model for Candidate Generation | E V is a correction of The objective of training would be to maximize the conditional probability P0122, R(wfn, over the training data. |
Model for Candidate Generation | In this paper, we assume that the transformation that actually generates the correction among all the possible transformations is the one that can give the maximum conditional probability ; the exactly same criterion is also used for fast prediction. |
Introduction | They use aligned sequences of words, called biphrases, as building blocks for translations, and score alternative candidate translations for the same source sentence based on a log-linear model of the conditional probability of target sentences given the source sentence: |
Introduction | alignment a, where the alignment is a representation of the sequence of biphrases that where used in order to build T from S; The Ak’s are weights and ZS is a normalization factor that guarantees that p is a proper conditional probability distribution over the pairs (T, A). |
Introduction | These typically include forward and reverse phrase conditional probability features log p(f|§) as well as logp(§ where 59' is the source side of the biphrase and f the target side, and the so-called “phrase penalty” and “word penalty” features, which count the number of phrases and words in the alignment. |
Phrase-based Decoding as TSP | For example, in the example of Figure 3, the cost of this - machine translation - is - strange, can only take into account the conditional probability of the word strange relative to the word is, but not relative to the words translation and is. |
Related work | its translation has larger conditional probability ) or because the associated heuristics is less tight, hence more optimistic. |
Discriminative Synchronous Transduction | Our log-linear translation model defines a conditional probability distribution over the target translations of a given source sentence. |
Discriminative Synchronous Transduction | The conditional probability of a derivation, d, for a target translation, e, conditioned on the source, f, is given by: |
Discriminative Synchronous Transduction | Given (1), the conditional probability of a target translation given the source is the sum over all of its derivations: |
Evaluation | This is illustrated in Table 2, which shows the conditional probabilities for rules, obtained by 10-cally normalising the rule feature weights for a simple grammar extracted from the ambiguous pair of sentences presented in DeNero et al. |
Evaluation | The first column of conditional probabilities corresponds to a maximum likelihood estimate, i.e., without regularisation. |
A semantic span can include one or more eus. | Therefore, theoretically, the conditional probability of a target translation es conditioned on the source CSS-based tree ft is given by P(es | fl) , |
A semantic span can include one or more eus. | The conditional probabilities of the new translation rules are calculated following (Chiang, 2005). |
A semantic span can include one or more eus. | transfer model can be represented as a conditional probability : P(w | CSS) (4) By deriving each node of the CSS, we can obtain a factored formula: P(w I CSS) = Hm. |
Alignment Methods | These models are by nature directional, attempting to find the alignments that maximize the conditional probability of the target sentence P(e{| f1], aK For computational reasons, the IBM models are restricted to aligning each word on the target side to a single word on the source side. |
Substring Prior Probabilities | In addition, we heuristically prune values for which the conditional probabilities P(e| f) or P(f|e) are less than some fixed value, which we set to 0.1 for the reported experiments. |
Substring Prior Probabilities | To determine how to combine 0(6), 0( f ), and C(6, f) into prior probabilities, we performed preliminary experiments testing methods proposed by previous research including plain co-occurrence counts, the Dice coefficient, and X—squared statistics (Cromieres, 2006), as well as a new method of defining substring pair probabilities to be proportional to bidirectional conditional probabilities |
Substring Prior Probabilities | Z = Z Pcooc(e|f)Pcooc(f|e)' {6,f;c(e,f)>d} The experiments showed that the bidirectional conditional probability method gave significantly better results than all other methods, so we adopt this for the remainder of our experiments. |
Cross-Lingual Mixture Model for Sentiment Classification | The parameters to be estimated include conditional probabilities of word to class, P(w8|c) and P(wt|c), and word projection |
Cross-Lingual Mixture Model for Sentiment Classification | The obtained word-class conditional probability P (wt |c) can then be used to classify text in the target languages using Bayes Theorem and the Naive Bayes |
Cross-Lingual Mixture Model for Sentiment Classification | Instead of estimating word projection probability (P(w3|wt) and P(wt|w3)) and conditional probability of word to class (P(wt|c) and P(w3|c)) simultaneously in the training procedure, we estimate them separately since the word projection probability stays invariant when estimating other parameters. |
Garden-path disambiguation under surprisal | The SURPRISAL THEORY of incremental sentence-processing difficulty (Hale, 2001; Levy, 2008a) posits that the cognitive effort required to process a given word w;- of a sentence in its context is given by the simple information-theoretic measure of the log of the inverse of the word’s conditional probability (also called its “surprisal” or “Shannon information content”) in its intra-sentential context w1,,,,,i_1 and extra-sentential context Ctxt: |
Garden-path disambiguation under surprisal | Letting Tj range over the possible incremental syntactic analyses of words wlma preceding fell, under surprisal the conditional probability of the disambiguating continuation fell can be approximated as |
Garden-pathing and input uncertainty | In a surprisal model with clean, veridical input, Fodor’s conclusion is exactly what is predicted: separating a verb from its direct object with a comma effectively never happens in edited, published written English, so the conditional probability of the NP analysis should be close to zero.2 When uncertainty about surface input is introduced, however—due to visual noise, imperfect memory representations, and/or beliefs about possible speaker error—analyses come into play in which some parts of the true string are treated as if they were absent. |
Model instantiation and predictions | Computing the surprisal incurred by the disambiguating element given an uncertain-input representation of the sentence involves a standard application of the definition of conditional probability (Hale, 2001): |
Conclusions and future work | The CRF segmentation provides a list of segmentations: A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S). |
Conclusions and future work | The CRF conversion, given a segmentation Ai, provides a list of transliteration output T1,T2, ...,TM, with conditional probabilities P(T1|S,Ai),P(T2|S,Az-), ...,P(TM|S,Az-). |
Experiments | From the two-step CRF model we get the conditional probability PC RF (T |S ) and from the JSCM we get the joint probability P(S, T). |
Experiments | The conditional probability of PJSC M (T |S) can be calculuated as follows: |
Method | The parser produces prefix probabilities for each word of a sentence which we converted to conditional probabilities by dividing each current probability by the previous one. |
Models of Processing Difficulty | Backward transitional probability is essentially the conditional probability of a word given its immediately preceding word, P(wk|wk_1). |
Models of Processing Difficulty | Analogously, forward probability is the conditional probability of the current word given the next word, P(wk|wk+1). |
Models of Processing Difficulty | This measure of processing cost for an input word, wk+1, given the previous context, W1...Wk, can be expressed straightforwardly in terms of its conditional probability as: |
Experiments | The joint probability is expressed as a chain product of a series of conditional probabilities of token pairs P({ei}, {cj}) = P((ék,ck)|(ék_1,ck_1)), k = 1 . |
Experiments | The conditional probabilities for token pairs are estimated from the aligned training corpus. |
Transliteration alignment techniques | From the affinity matrix conditional probabilities P(ei|cj) can be estimated as |
Transliteration alignment techniques | We estimate conditional probability of Chinese phoneme cpk, after observing English character 6, as |
Problem Definition | The above example illustrates how MNB-FM can leverage frequency marginal statistics computed over unlabeled data to improve MNB’s conditional probability estimates. |
Problem Definition | MNB-FM improves the conditional probability estimates in MNB and, surprisingly, we found that it can often improve these estimates for words that do not even occur in the training set. |
Problem Definition | For each word in the vocabulary, we compared each classifier’s conditional probability ratios, i.e. |
Opinion Target Extraction Methodology | Then the conditional probabilities between potential opinion target wt and potential opinion word wo can be es- |
Opinion Target Extraction Methodology | P (wt|w0) means the conditional probabilities between two words. |
Opinion Target Extraction Methodology | At the same time, we can obtain conditional probability P (w0|wt). |
RSP: A Random Walk Model for SP | We suppose this could bring at least two benefits: 1) a proper measure on the preferences can make the discovering of nearby predicates with similar preferences to be more accurate; 2) while propagation, we propagate the scored preferences, rather than the raw counts or conditional probabilities , which could be more proper and agree with the nature of SP smooth. |
RSP: A Random Walk Model for SP | Some introduce conditional probability (CP) p(a|q) for the decision of preference judgements (Chambers and Jurafsky, 2010; Erk et al., 2010; Seaghdha, 2010). |
RSP: A Random Walk Model for SP | In this mode, we always set Pr(q, a) as the conditional probability p(a|q) for the propagation function, despite what \11 is used for the distance function. |
Experiment 1: Oxford Lexical Predicates | For each lexical predicate we calculated the conditional probability of each semantic class using Formula 1, resulting in a ranking of semantic classes. |
Large-Scale Harvesting of Semantic Predicates | Since not all classes are equally relevant to the lexical predicate 7r, we estimate the conditional probability of each class c E 0 given 7r on the basis of the number of sentences which contain an argument in that class. |
Large-Scale Harvesting of Semantic Predicates | To enable wide coverage we estimate a second conditional probability based on the distributional semantic profile of each class. |
Building comparable corpora | 3.2 Conditional Probability |
Building comparable corpora | The similarity between 7715 and 771T is defined as the Conditional Probability (CP) of documents |
Introduction | method of conditional probability to calculate document similarity; 4) Address a language-independent study which isn’t limited to a particular data source in any language. |
Independent Query Annotations | The most straightforward way to estimate the conditional probabilities in Eq. |
Independent Query Annotations | iven a short, often ungrammatical query, it is hard 1 accurately estimate the conditional probability in q. |
Independent Query Annotations | Furthermore, to make the estimation of the conditional probability p(zQ |r) feasible, it is assumed that the symbols C;- in the annotation sequence are independent, given a sentence 7“. |
Phrase Extraction | ities Pt((e, f Similarly to the heuristic phrase tables, we use conditional probabilities Pt(f|e) and Pt(e| f), lexical weighting probabilities, and a phrase penalty. |
Phrase Extraction | Here, instead of using maximum likelihood, we calculate conditional probabilities directly from Pt probabilities: |
Phrase Extraction | In MOD, we do this by taking the average of the joint probability and span probability features, and recalculating the conditional probabilities from the averaged joint probabilities. |
Information Extraction: Slot Filling | A document is labeled for a template if two different conditions are met: (1) it contains at least one trigger phrase, and (2) its average per-token conditional probability meets a strict threshold. |
Information Extraction: Slot Filling | Both conditions require a definition of the conditional probability of a template given a token. |
Information Extraction: Slot Filling | This is not the usual conditional probability definition as IR-corpora are different sizes. |
Decoding | The conditional probability P(o | TC) is decomposes into the product of rule probabilities: |
Decoding | where the first three are conditional probabilities based on fractional counts of rules defined in Section 3.4, and the last two are lexical probabilities. |
Rule Extraction | We use fractional counts to compute three conditional probabilities for each rule, which will be used in the next section: |
Inference of Cognate Assignments | The edge affinities afl' between these nodes are the conditional probabilities of each word wg belonging to each cognate group 9: |
Learning | Because we enforce that a word in language d must be dead if its parent word in language a is dead, we just need to learn the conditional probabilities p(Sd = dead|Sa = alive). |
Learning | Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006). |
Image Clustering with Annotated Auxiliary Data | Then we apply the EM algorithm (Dempster et al., 1977) to estimate the conditional probabilities P(f|z), and P(z|v) with respect to each dependence in Figure 3 as follows. |
Image Clustering with Annotated Auxiliary Data | o M-Step: re-estimates conditional probabilities P(zk|vi) and P(zk|wl): |
Image Clustering with Annotated Auxiliary Data | and conditional probability P(fj |zk), which is a mixture portion of posterior probability of latent variables |
Active Learning for Sequence Labeling | We apply CRFs as our base learner throughout this paper and employ a utility function which is based on the conditional probability of the most likely label sequence 37* for an observation sequence :8 (cf. |
Conditional Random Fields for Sequence Labeling | See Equation (1) for the conditional probability PX(37 with ZX calculated as in Equation (6). |
Conditional Random Fields for Sequence Labeling | The marginal and conditional probabilities are used by our AL approaches as confidence estimators. |
Methods | Being a stopword in our case, and having no relevance at all to speculative assertions, it has a class conditional probability of P(spec|it) = 74.67% on the seed sets. |
Methods | We ranked the features c by frequency and their class conditional probability P(spec|cc). |
Methods | Maximum Entropy Models (Berger et al., 1996) seek to maximise the conditional probability of classes, given certain observations (features). |