Distribution Prediction | For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows. |
Distribution Prediction | Using a standard stop word list, we filter out frequent non-content unigrams and select the remainder as unigram features to represent a sentence. |
Distribution Prediction | Bigram features capture negations more accurately than unigrams , and have been found to be useful for sentiment classification tasks. |
Abstract | The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements. |
Abstract | We estimate the correlation of unigram and Smoothed BLEU, TER, ROUGE-SU4, and Meteor against human judgements on two data sets. |
Abstract | The main finding is that unigram BLEU has a weak correlation, and Meteor has the strongest correlation with human judgements. |
Introduction | The main finding of our analysis is that TER and unigram BLEU are weakly corre- |
Methodology | Unigram BLEU without a brevity penalty has been reported by Kulkarni et a1. |
Methodology | (2011) to perform a sentence-level analysis, setting n = 1 and no brevity penalty to get the unigram BLEU measure, or n = 4 with the brevity penalty to get the Smoothed BLEU measure. |
Methodology | We set dskip = 4 and award partial credit for unigram only matches, otherwise known as ROUGE-SU4. |
Introduction | In Section 3, we theoretically derive the tradeoff formulae of the cutoff for unigram models, k-gram models, and topic models, each of which represents its perplexity with respect to a reduced vocabulary, under the assumption that the corpus follows Zipf’s law. |
Perplexity on Reduced Corpora | 3.1 Perplexity of Unigram Models |
Perplexity on Reduced Corpora | Let us consider the perplexity of a unigram model learned from a reduced corpus. |
Perplexity on Reduced Corpora | In unigram models, a predictive distribution 19’ on a reduced corpus w’ can be simply calculated as p’ (w’) = f (w’ ) / N’ . |
Related Work | We learn embedding for unigrams , bigrams and trigrams separately with same neural network and same parameter setting. |
Related Work | The contexts of unigram (bigram/trigram) are the surrounding unigrams (bigrams/trigrams), respectively. |
Related Work | 25$ employs the embedding of unigrams , bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table. |
Introduction | twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%) |
Introduction | We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams ( unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history. |
Introduction | headline unigram TT YES (53%) headline bigram TTTT * YES (52%) |
Experiments | The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features. |
Experiments | SVM is a support vector machine with unigram and bigram features. |
Experiments | unigram , POS, head chunks 91.0 |
Introduction | On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al. |
Experiments | We train the OVR classifier on three sets of features, LI WC, Unigram , and POS.9 |
Experiments | In particular, the three-class classifier is around 65% accurate at distinguishing between Employee, Customer, and Tarker for each of the domains using Unigram , significantly higher than random guess. |
Experiments | Best performance is achieved on Unigram features, constantly outperforming LIWC and POS features in both three-class and two-class settings in the hotel domain. |
Introduction | In the examples in Table l, we trained a linear SVM classifier on Ott’s Chicago-hotel dataset on unigram features and tested it on a couple of different domains (the details of data acquisition are illustrated in Section 3). |
Introduction | Table 1: SVM performance on datasets for a classifier trained on Chicago hotel review based on Unigram feature. |
Smoothing on count distributions | p’(w | u’) For the example above, the estimates for the unigram model p’(w) are p'(cat) = x 0.489 p'(dog) = x 0.511. |
Smoothing on count distributions | For the example above, the count distributions used for the unigram distribution would be: ‘ r = 0 r = l p(c(cat) = r) 0.14 0.86 p(c(dog) = r) 0.1 0.9 |
Smoothing on integral counts | Absolute discounting chooses p’(w | u’) to be the maximum-likelihood unigram distribution; under KN smoothing (Kneser and Ney, 1995), it is chosen to make [9 in (2) satisfy the following constraint for all (n — l)—grams u’w: |
Word Alignment | Following common practice in language modeling, we use the unigram distribution p( f ) as the lower-order distribution. |
Word Alignment | As shown in Table l, for KN smoothing, interpolation with the unigram distribution performs the best, while for WB smoothing, interestingly, interpolation with the uniform distribution performs the best. |
Word Alignment | In WB smoothing, p’( f) is the empirical unigram distribution. |
Term and Document Frequency Statistics | Figure 4: Difference between observed and predicted IDFw for Tagalog unigrams . |
Term and Document Frequency Statistics | 3.1 Unigram Probabilities |
Term and Document Frequency Statistics | We encounter the burstiness property of words again by looking at unigram occurrence probabilities. |
Complexity Analysis | For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it. |
Complexity Analysis | Thus its complexity is U 2 times the unigram model’s complexity: |
Complexity Analysis | ( unigram ) 0.729 0.804 3 s 50 s Prop. |
Methods | This section uses a unigram model for description convenience, but the method can be extended to n-gram models. |
Evaluation | In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq. |
Evaluation | Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams. |
Evaluation | 6It is relatively straightforward to combine both unigrams and bi grams in one source graph, but for experimental clarity we did not mix these phrase lengths. |
Generation & Propagation | Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings. |
Introduction | Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams , since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text. |
Related Work | Recent improvements to BLI (Tamura et al., 2012; Irvine and Callison-Burch, 2013b) have contained a graph-based flavor by presenting label propagation-based approaches using a seed lexicon, but evaluation is once again done on top-1 or top-3 accuracy, and the focus is on unigrams . |
Related Work | (2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams , and not on enriching the entire translation model. |
Algorithms | When constructing a summary, we update the unigram distribution of the constructed summary so that it includes a smoothed distribution of the previous summaries in order to eliminate redundancy between the successive steps in the chain. |
Algorithms | For example, when we summarize the documents that were retrieved as a result to the first query, we calculate the unigram distribution in the same manner as we did in Focused KLSum; but for the second query, we calculate the unigram distribution as if all the sentences we selected for the previous summary were selected for the current query too, with a damping factor. |
Algorithms | In this variant, the Unigram Distribution estimate of word X is computed as: |
Previous Work | KLSum adopts a language model approach to compute relevance: the documents in the input set are modeled as a distribution over words (the original algorithm uses a unigram distribution over the bag of words in documents D). |
Previous Work | KLSum is a sentence extraction algorithm: it searches for a subset of the sentences in D with a unigram distribution as similar as possible to that of the overall collection D, but with a limited length. |
Previous Work | After the words are classified, the algorithm uses a KLSum variant to find the summary that best matches the unigram distribution of topic specific words. |
Approach | 0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents. |
Approach | models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram). |
Evaluation | We derived unigram tfidf vectors for each section in each of 50 randomly sampled policies per category. |
Experiment | The implementation uses unigram features and cosine similarity. |
Experiment | Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens. |
Predicting Direction of Power | Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44 |
Predicting Direction of Power | We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set. |
Predicting Direction of Power | We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%. |
Conclusions and Future Work | We model and harness lexical correlations using translation models, in the company of unigram language models that are used to characterize reply posts, and formulate a clustering-based EM approach for solution identification. |
Introduction | We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively. |
Our Approach | Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions. |
Our Approach | Consider the post and reply vocabularies to be of sizes A and B respectively; then, the translation model would have A x B variables, whereas the unigram language model has only B variables. |
Approaches | We consider both template unigrams and bigrams, combining two templates in sequence. |
Approaches | Constructing all feature template unigrams and bigrams would yield an unwieldy number of features. |
Experiments | Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1). |
Experiments | However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB). |
Features | We also count the number of unigrams the two transcripts have in common and the length, absolute and relative, of the longest unigram overlap. |
Features | In addition, we look at the number of characters and unigrams and the audio duration of each query, with the intuition that the length of a query may be correlated with its likelihood of being retried (or a retry). |
Prediction task | T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries. |
Experiments | The vocabulary V includes all unigrams after down-casing. |
Experiments | In total, there are 16250 unique unigrams in V. |
Experiments | fication for visualization is we consider only the top 1000 frequent unigrams in the RST—DT training set. |