Index of papers in Proc. ACL 2014 that mention
  • unigram
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
Distribution Prediction
Using a standard stop word list, we filter out frequent non-content unigrams and select the remainder as unigram features to represent a sentence.
Distribution Prediction
Bigram features capture negations more accurately than unigrams , and have been found to be useful for sentiment classification tasks.
unigram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Elliott, Desmond and Keller, Frank
Abstract
The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements.
Abstract
We estimate the correlation of unigram and Smoothed BLEU, TER, ROUGE-SU4, and Meteor against human judgements on two data sets.
Abstract
The main finding is that unigram BLEU has a weak correlation, and Meteor has the strongest correlation with human judgements.
Introduction
The main finding of our analysis is that TER and unigram BLEU are weakly corre-
Methodology
Unigram BLEU without a brevity penalty has been reported by Kulkarni et a1.
Methodology
(2011) to perform a sentence-level analysis, setting n = 1 and no brevity penalty to get the unigram BLEU measure, or n = 4 with the brevity penalty to get the Smoothed BLEU measure.
Methodology
We set dskip = 4 and award partial credit for unigram only matches, otherwise known as ROUGE-SU4.
unigram is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Kobayashi, Hayato
Introduction
In Section 3, we theoretically derive the tradeoff formulae of the cutoff for unigram models, k-gram models, and topic models, each of which represents its perplexity with respect to a reduced vocabulary, under the assumption that the corpus follows Zipf’s law.
Perplexity on Reduced Corpora
3.1 Perplexity of Unigram Models
Perplexity on Reduced Corpora
Let us consider the perplexity of a unigram model learned from a reduced corpus.
Perplexity on Reduced Corpora
In unigram models, a predictive distribution 19’ on a reduced corpus w’ can be simply calculated as p’ (w’) = f (w’ ) / N’ .
unigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Related Work
We learn embedding for unigrams , bigrams and trigrams separately with same neural network and same parameter setting.
Related Work
The contexts of unigram (bigram/trigram) are the surrounding unigrams (bigrams/trigrams), respectively.
Related Work
25$ employs the embedding of unigrams , bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
unigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Tan, Chenhao and Lee, Lillian and Pang, Bo
Introduction
twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
Introduction
We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams ( unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
Introduction
headline unigram TT YES (53%) headline bigram TTTT * YES (52%)
unigram is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Experiments
The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
Experiments
SVM is a support vector machine with unigram and bigram features.
Experiments
unigram , POS, head chunks 91.0
Introduction
On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al.
unigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard
Experiments
We train the OVR classifier on three sets of features, LI WC, Unigram , and POS.9
Experiments
In particular, the three-class classifier is around 65% accurate at distinguishing between Employee, Customer, and Tarker for each of the domains using Unigram , significantly higher than random guess.
Experiments
Best performance is achieved on Unigram features, constantly outperforming LIWC and POS features in both three-class and two-class settings in the hotel domain.
Introduction
In the examples in Table l, we trained a linear SVM classifier on Ott’s Chicago-hotel dataset on unigram features and tested it on a couple of different domains (the details of data acquisition are illustrated in Section 3).
Introduction
Table 1: SVM performance on datasets for a classifier trained on Chicago hotel review based on Unigram feature.
unigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Smoothing on count distributions
p’(w | u’) For the example above, the estimates for the unigram model p’(w) are p'(cat) = x 0.489 p'(dog) = x 0.511.
Smoothing on count distributions
For the example above, the count distributions used for the unigram distribution would be: ‘ r = 0 r = l p(c(cat) = r) 0.14 0.86 p(c(dog) = r) 0.1 0.9
Smoothing on integral counts
Absolute discounting chooses p’(w | u’) to be the maximum-likelihood unigram distribution; under KN smoothing (Kneser and Ney, 1995), it is chosen to make [9 in (2) satisfy the following constraint for all (n — l)—grams u’w:
Word Alignment
Following common practice in language modeling, we use the unigram distribution p( f ) as the lower-order distribution.
Word Alignment
As shown in Table l, for KN smoothing, interpolation with the unigram distribution performs the best, while for WB smoothing, interestingly, interpolation with the uniform distribution performs the best.
Word Alignment
In WB smoothing, p’( f) is the empirical unigram distribution.
unigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Term and Document Frequency Statistics
Figure 4: Difference between observed and predicted IDFw for Tagalog unigrams .
Term and Document Frequency Statistics
3.1 Unigram Probabilities
Term and Document Frequency Statistics
We encounter the burstiness property of words again by looking at unigram occurrence probabilities.
unigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Complexity Analysis
For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it.
Complexity Analysis
Thus its complexity is U 2 times the unigram model’s complexity:
Complexity Analysis
( unigram ) 0.729 0.804 3 s 50 s Prop.
Methods
This section uses a unigram model for description convenience, but the method can be extended to n-gram models.
unigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq.
Evaluation
Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams.
Evaluation
6It is relatively straightforward to combine both unigrams and bi grams in one source graph, but for experimental clarity we did not mix these phrase lengths.
Generation & Propagation
Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings.
Introduction
Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams , since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.
Related Work
Recent improvements to BLI (Tamura et al., 2012; Irvine and Callison-Burch, 2013b) have contained a graph-based flavor by presenting label propagation-based approaches using a seed lexicon, but evaluation is once again done on top-1 or top-3 accuracy, and the focus is on unigrams .
Related Work
(2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams , and not on enriching the entire translation model.
unigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Baumel, Tal and Cohen, Raphael and Elhadad, Michael
Algorithms
When constructing a summary, we update the unigram distribution of the constructed summary so that it includes a smoothed distribution of the previous summaries in order to eliminate redundancy between the successive steps in the chain.
Algorithms
For example, when we summarize the documents that were retrieved as a result to the first query, we calculate the unigram distribution in the same manner as we did in Focused KLSum; but for the second query, we calculate the unigram distribution as if all the sentences we selected for the previous summary were selected for the current query too, with a damping factor.
Algorithms
In this variant, the Unigram Distribution estimate of word X is computed as:
Previous Work
KLSum adopts a language model approach to compute relevance: the documents in the input set are modeled as a distribution over words (the original algorithm uses a unigram distribution over the bag of words in documents D).
Previous Work
KLSum is a sentence extraction algorithm: it searches for a subset of the sentences in D with a unigram distribution as similar as possible to that of the overall collection D, but with a limited length.
Previous Work
After the words are classified, the algorithm uses a KLSum variant to find the summary that best matches the unigram distribution of topic specific words.
unigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Ramanath, Rohan and Liu, Fei and Sadeh, Norman and Smith, Noah A.
Approach
0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents.
Approach
models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram).
Evaluation
We derived unigram tfidf vectors for each section in each of 50 randomly sampled policies per category.
Experiment
The implementation uses unigram features and cosine similarity.
Experiment
Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens.
unigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Prabhakaran, Vinodkumar and Rambow, Owen
Predicting Direction of Power
Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44
Predicting Direction of Power
We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set.
Predicting Direction of Power
We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%.
unigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
P, Deepak and Visweswariah, Karthik
Conclusions and Future Work
We model and harness lexical correlations using translation models, in the company of unigram language models that are used to characterize reply posts, and formulate a clustering-based EM approach for solution identification.
Introduction
We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively.
Our Approach
Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions.
Our Approach
Consider the post and reply vocabularies to be of sizes A and B respectively; then, the translation model would have A x B variables, whereas the unigram language model has only B variables.
unigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
We consider both template unigrams and bigrams, combining two templates in sequence.
Approaches
Constructing all feature template unigrams and bigrams would yield an unwieldy number of features.
Experiments
Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
Experiments
However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
unigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Levitan, Rivka and Elson, David
Features
We also count the number of unigrams the two transcripts have in common and the length, absolute and relative, of the longest unigram overlap.
Features
In addition, we look at the number of characters and unigrams and the audio duration of each query, with the intuition that the length of a query may be correlated with its likelihood of being retried (or a retry).
Prediction task
T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries.
unigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ji, Yangfeng and Eisenstein, Jacob
Experiments
The vocabulary V includes all unigrams after down-casing.
Experiments
In total, there are 16250 unique unigrams in V.
Experiments
fication for visualization is we consider only the top 1000 frequent unigrams in the RST—DT training set.
unigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: