Abstract | The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction conditioned on the misspelled word. |
Introduction | The log linear model is defined as a conditional probability distribution of a corrected word and a rule set for the correction given the misspelled word. |
Model for Candidate Generation | We define the conditional probability distribution of we and R(wm, we) given mm as the following log linear model: |
Model for Candidate Generation | E V is a correction of The objective of training would be to maximize the conditional probability P0122, R(wfn, over the training data. |
Model for Candidate Generation | In this paper, we assume that the transformation that actually generates the correction among all the possible transformations is the one that can give the maximum conditional probability ; the exactly same criterion is also used for fast prediction. |
Garden-path disambiguation under surprisal | The SURPRISAL THEORY of incremental sentence-processing difficulty (Hale, 2001; Levy, 2008a) posits that the cognitive effort required to process a given word w;- of a sentence in its context is given by the simple information-theoretic measure of the log of the inverse of the word’s conditional probability (also called its “surprisal” or “Shannon information content”) in its intra-sentential context w1,,,,,i_1 and extra-sentential context Ctxt: |
Garden-path disambiguation under surprisal | Letting Tj range over the possible incremental syntactic analyses of words wlma preceding fell, under surprisal the conditional probability of the disambiguating continuation fell can be approximated as |
Garden-pathing and input uncertainty | In a surprisal model with clean, veridical input, Fodor’s conclusion is exactly what is predicted: separating a verb from its direct object with a comma effectively never happens in edited, published written English, so the conditional probability of the NP analysis should be close to zero.2 When uncertainty about surface input is introduced, however—due to visual noise, imperfect memory representations, and/or beliefs about possible speaker error—analyses come into play in which some parts of the true string are treated as if they were absent. |
Model instantiation and predictions | Computing the surprisal incurred by the disambiguating element given an uncertain-input representation of the sentence involves a standard application of the definition of conditional probability (Hale, 2001): |
Independent Query Annotations | The most straightforward way to estimate the conditional probabilities in Eq. |
Independent Query Annotations | iven a short, often ungrammatical query, it is hard 1 accurately estimate the conditional probability in q. |
Independent Query Annotations | Furthermore, to make the estimation of the conditional probability p(zQ |r) feasible, it is assumed that the symbols C;- in the annotation sequence are independent, given a sentence 7“. |
Information Extraction: Slot Filling | A document is labeled for a template if two different conditions are met: (1) it contains at least one trigger phrase, and (2) its average per-token conditional probability meets a strict threshold. |
Information Extraction: Slot Filling | Both conditions require a definition of the conditional probability of a template given a token. |
Information Extraction: Slot Filling | This is not the usual conditional probability definition as IR-corpora are different sizes. |
Phrase Extraction | ities Pt((e, f Similarly to the heuristic phrase tables, we use conditional probabilities Pt(f|e) and Pt(e| f), lexical weighting probabilities, and a phrase penalty. |
Phrase Extraction | Here, instead of using maximum likelihood, we calculate conditional probabilities directly from Pt probabilities: |
Phrase Extraction | In MOD, we do this by taking the average of the joint probability and span probability features, and recalculating the conditional probabilities from the averaged joint probabilities. |