Inference | While previous work used p(l<:|@) = (l —p($))k_1p($), this is only true for unigrams . |
Inference | the number of tables tew for w in word unigrams . |
Nested Pitman-Yor Language Model | Thus far we have assumed that the unigram G1 is already given, but of course it should also be generated as G1 ~ PY(G0, d, 6). |
Nested Pitman-Yor Language Model | 2Note that this is different from unigrams , which are posterior distribution given data. |
Nested Pitman-Yor Language Model | When a word 21) is generated from its parent at the unigram node, it means that w |
Pitman-Yor process and n-gram models | Suppose we have a unigram word distribution G1 = } where - ranges over each word in the lexicon. |
Pitman-Yor process and n-gram models | In this representation, each n-gram context h (including the null context 6 for unigrams ) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw. |
Data and Task | Notice the high lexical overlap between the two sentences ( unigram overlap of 100% in one direction and 72% in the other). |
Data and Task | 19 is another true paraphrase pair with much lower lexical overlap ( unigram overlap of 50% in one direction and 30% in the other). |
Experimental Evaluation | (2006), using features calculated directly from 51 and 52 without recourse to any hidden structure: proportion of word unigram matches, proportion of lemma-tized unigram matches, BLEU score (Papineni et al., 2001), BLEU score on lemmatized tokens, F measure (Turian et al., 2003), difference of sentence length, and proportion of dependency relation overlap. |
Experimental Evaluation | 10This is accomplished by eliminating lines 12 and 13 from the definition of pm and redefining pword to be the unigram word distribution estimated from the Gigaword corpus, as in G0, without the help of WordNet. |
QG for Paraphrase Modeling | (15) Here aw is the Good-Turing unigram probability estimate of a word 21) from the Gigaword corpus (Graff, 2003). |
QG for Paraphrase Modeling | As noted, the distributions pm, the word unigram weights in Eq. |
Corpus Details | As our reference algorithm, we used the current state-of-the-art system developed by Boulis and Ostendorf (2005) using unigram and bigram features in a SVM framework. |
Corpus Details | For each conversation side, a training example was created using unigram and bigram features with tf-idf weighting, as done in standard text classification approaches. |
Corpus Details | Also, named entity “Mike” shows up as a discriminative unigram , this maybe due to the self-introduction at the beginning of the conversations and “Mike” being a common male name. |
Training method | Table 2: Unigram features. |
Training method | We broadly classify features into two categories: unigram and bigram features. |
Training method | Unigram features: Table 2 shows our unigram features. |
Computing Feature Expectations | where h(t) is the unigram prefix of bigram t. |
Consensus Decoding Algorithms | where T1 is the set of unigrams in the language, and 6(6, 25) is an indicator function that equals 1 if 75 appears in e and 0 otherwise. |
Consensus Decoding Algorithms | Figure 1: For the linear similarity measure U (c; e’ ), which computes unigram precision, the MBR translation can be found by iterating either over sentence pairs (Algorithm 1) or over features (Algorithm 2). |
Experimental Results | As shown in Table 2a, decoding with a single variational n-gram model (VM) as per (14) improves the Viterbi baseline (except the case with a unigram VM), though often not statistically significant. |
Experimental Results | The interpolation between a VM and a word penalty feature (“wp”) improves over the unigram |
Experimental Results | This is necessarily true, but it is interesting to see that most of the improvement is obtained just by moving from a unigram to a bigram model. |
Log-Linear Models | The features used in this experiment were unigrams and bigrams of neighboring words, and unigrams , bigrams and trigrams of neighboring POS tags. |
Log-Linear Models | For the features, we used unigrams of neighboring chunk tags, substrings (shorter than 10 characters) of the current word, and the shape of the word (e. g. “IL-2” is converted into “AA-#”), on top of the features used in the text chunking experiments. |
Log-Linear Models | For the features, we used unigrams and bigrams of neighboring words, prefixes and suffixes of the current word, and some characteristics of the word. |
Empirical Evaluation 4.1 Evaluation Setup | In the above experiments, all features ( unigram + bigram) are used. |
The Co-Training Approach | The English or Chinese features used in this study include both unigrams and bigrams5 and the feature weight is simply set to term frequency6. |
The Co-Training Approach | 5 For Chinese text, a unigram refers to a Chinese word and a bigram refers to two adjacent Chinese words. |