Discussion | Training Size: Training data improves the conditional probability baseline, but does not help the smoothing model. |
Discussion | We optimized argument cutoffs for each training size, but the model still appears to suffer from additional noise that the conditional probability baseline does not. |
Discussion | High Precision Baseline: Our conditional probability baseline is very precise. |
Experiments | The conditional probability is reported as Baseline. |
Models | We propose a conditional probability baseline: |
Models | (1999) showed that corpus frequency and conditional probability correlate with human decisions of adjective-noun plausibility, and Dagan et al. |
Models | (1999) appear to propose a very similar baseline for verb-noun selectional preferences, but the paper evaluates unseen data, and so the conditional probability model is not studied. |
Results | The conditional probability Baseline falls from 91.5 to 79.5, a 12% absolute drop from completely random to neighboring frequency. |
Results | Accuracy is the same as recall when the model does not guess between pseudo words that have the same conditional probabilities . |
Abstract | This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. |
Abstract | We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. |
Introduction | We approach the sense disambiguation task by choosing the best sense based on the conditional probability of sense paraphrases given a context. |
Introduction | We propose three models which are suitable for different situations: Model I requires knowledge of the prior distribution over senses and directly maximizes the conditional probability of a sense given the context; Model 11 maximizes this conditional probability by maximizing the cosine value of two topic-document vectors (one for the sense and one for the context). |
The Sense Disambiguation Model | Assigning the correct sense 3 to a target word 21) occurring in a context 0 involves finding the sense which maximizes the conditional probability of senses given a context: |
The Sense Disambiguation Model | This conditional probability is decomposed by incorporating a hidden variable, topic 2, introduced by the topic model. |
The Sense Disambiguation Model | Model I directly maximizes the conditional probability of the sense given the context, where the sense is modeled as a ‘paraphrase document’ ds and the context as a ‘context document’ dc. |
Abstract | We obtain final BLEU scores of 19.35 ( conditional probability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. |
Conclusion | We found that the joint probability model performs almost as well as the conditional probability model but that it was more complex to make it work well. |
Final Results | We refer to the results as Mchy where cc denotes the model number, 1 for the conditional probability model and 2 for the joint probability model and 3/ denotes a heuristic or a combination of heuristics applied to that model”. |
Our Approach | 3.1 Model-1 : Conditional Probability Model |
Our Approach | Our model estimates the conditional probability by interpolating a word-based model and a character-based (transliteration) model. |
Our Approach | Because our overall model is a conditional probability model, joint-probabilities are marginalized using character-based prior probabilities: |
Method | The parser produces prefix probabilities for each word of a sentence which we converted to conditional probabilities by dividing each current probability by the previous one. |
Models of Processing Difficulty | Backward transitional probability is essentially the conditional probability of a word given its immediately preceding word, P(wk|wk_1). |
Models of Processing Difficulty | Analogously, forward probability is the conditional probability of the current word given the next word, P(wk|wk+1). |
Models of Processing Difficulty | This measure of processing cost for an input word, wk+1, given the previous context, W1...Wk, can be expressed straightforwardly in terms of its conditional probability as: |
Conclusions and future work | The CRF segmentation provides a list of segmentations: A : A1, A2, ..., AN, with conditional probabilities P(A1|S), P(A2|S), ..., P(AN|S). |
Conclusions and future work | The CRF conversion, given a segmentation Ai, provides a list of transliteration output T1,T2, ...,TM, with conditional probabilities P(T1|S,Ai),P(T2|S,Az-), ...,P(TM|S,Az-). |
Experiments | From the two-step CRF model we get the conditional probability PC RF (T |S ) and from the JSCM we get the joint probability P(S, T). |
Experiments | The conditional probability of PJSC M (T |S) can be calculuated as follows: |
Inference of Cognate Assignments | The edge affinities afl' between these nodes are the conditional probabilities of each word wg belonging to each cognate group 9: |
Learning | Because we enforce that a word in language d must be dead if its parent word in language a is dead, we just need to learn the conditional probabilities p(Sd = dead|Sa = alive). |
Learning | Given the expected counts, we now need to normalize them to ensure that the transducer represents a conditional probability distribution (Eisner, 2002; Oncina and Sebban, 2006). |
Decoding | The conditional probability P(o | TC) is decomposes into the product of rule probabilities: |
Decoding | where the first three are conditional probabilities based on fractional counts of rules defined in Section 3.4, and the last two are lexical probabilities. |
Rule Extraction | We use fractional counts to compute three conditional probabilities for each rule, which will be used in the next section: |