Index of papers in Proc. ACL that mention

unigram

Seen in text as:

unigram (336)
unigrams (183)
Unigram (61)
Unigrams (15)
UNIGRAMS (3)

Seen in 520 sentences in 67 papers.

1. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distribution Prediction	For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
Distribution Prediction	Using a standard stop word list, we filter out frequent non-content unigrams and select the remainder as unigram features to represent a sentence.
Distribution Prediction	Bigram features capture negations more accurately than unigrams , and have been found to be useful for sentiment classification tasks.

unigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

2. Labeling Documents with Timestamps: Learning from their Time Expressions

Chambers, Nathanael

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Previous Work	They learned unigram language models (LMs) for specific time periods and scored articles with log-likelihood ratio scores.
Previous Work	Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags, collocations, and tf-idf scores.
Previous Work	As above, they learned unigram LMs, but instead measured the KL-divergence between a document and a time period’s LM.
Timestamp Classifiers	The unigrams w are lowercased tokens.
Timestamp Classifiers	model as the Unigram NLLR.
Timestamp Classifiers	Followup work by Kanhabua and Norvag (2008) applied two filtering techniques to the unigrams in the model:

unigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

MaxEnt (15)
unigrams (15)
NER (7)

3. Integrating history-length interpolation and classes in language modeling

Schütze, Hinrich

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	256,873 unique unigrams and 4,494,222 unique bigrams.
Experimental Setup	We cluster unigrams (i = l) and bigrams (i = 2).
Experimental Setup	For all experiments, \|l31\| = \|l32\| (except in cases where \|l32\| exceeds the number of unigrams , see below).
Models	The parameters d’, d”, and d’” are the discounts for unigrams , bigrams and trigrams, respectively, as defined by Chen and Goodman (1996, p. 20, (26)).
Models	232) is the set of unigram (resp.
Models	We cluster bigram histories and unigram histories separately and write 193 (7.03 \|w1w2) for the bigram cluster model and pB(w3\|w2) for the unigram cluster model.
Related work	symbol \| denotation 2w (sum over all unigrams w)

unigram is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

4. Exploiting Social Information in Grounded Language Learning via Grammatical Reduction

Johnson, Mark and Demuth, Katherine and Frank, Michael

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We show how to model the task of inferring which objects are being talked about (and which words refer to which objects) as standard grammatical inference, and describe PCFG-based unigram models and adaptor grammar-based collocation models for the task.
Introduction	The unigram model we describe below corresponds most closely to the Frank
Introduction	2.1 Topic models and the unigram PCFG
Introduction	This leads to our first model, the unigram grammar, which is a PCFG.1

unigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

5. A joint model of word segmentation and phonological variation for English word-final /t/-deletion

Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments 4.1 The data	Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context.
Experiments 4.1 The data	For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops
Experiments 4.1 The data	Unigram
Introduction	We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms.
The computational model	Our models build on the Unigram and the Bigram model introduced in Goldwater et al.
The computational model	Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).

unigram is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

6. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Motivating Example	( unigrams ) opment, civilization
Feature Expansion	,w N}, where the elements 212,- are either unigrams or bigrams that appear in the review d. We then represent a review d by a real-valued term-frequency vector d 6 RN , where the value of the j-th element dj is set to the total number of occurrences of the unigram or bigram wj in the review d. To find the suitable candidates to expand a vector d for the review d, we define a ranking score score(ui, d) for each base entry in the thesaurus as follows:
Feature Expansion	Moreover, we weight the relatedness scores for each word wj by its normalized term-frequency to emphasize the salient unigrams and bigrams in a review.
Feature Expansion	This is particularly important because we would like to score base entries ui considering all the unigrams and bigrams that appear in a review d, instead of considering each unigram or bigram individually.
Introduction	a unigram or a bigram of word lemma) in a review using a feature vector.
Sentiment Sensitive Thesaurus	We select unigrams and bigrams from each sentence.
Sentiment Sensitive Thesaurus	For the remainder of this paper, we will refer to unigrams and bigrams collectively as lexical elements.
Sentiment Sensitive Thesaurus	Previous work on sentiment classification has shown that both unigrams and bigrams are useful for training a sentiment classifier (Blitzer et al., 2007).

unigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

7. Finding Cognate Groups Using Phylogenies

Hall, David and Klein, Dan

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams , with and without learning the parametric edit distances.
Learning	To find this maximizer for any given 7m, we need to find a marginal distribution over the edges connecting any two languages a and d. With this distribution, we calculate the expected “alignment unigrams.” That is, for each pair of phonemes cc and y (or empty phoneme 5), we need to find the quantity:
Message Approximation	In the context of transducers, previous authors have focused on a combination of n-best lists and unigram back-off models (Dreyer and Eisner, 2009), a schematic diagram of which is in Figure 2(d).
Message Approximation	Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009).
Message Approximation	Another is to choose 7'(w) to be a unigram language model over the language in question with a geometric probability over lengths.

unigram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

8. A Hybrid Hierarchical Model for Multi-Document Summarization

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Discussions	We use R-l (recall against unigrams ), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams).
Experiments and Discussions	Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams.
Regression Model	(I) nGram Meta-Features (NMF): For each document cluster D, we identify most frequent (nonstop word) unigrams, i.e., vfreq {wiflzl C V, where 7“ is a model parameter of number of most frequent unigram features.
Regression Model	We measure observed unigram probabilities for each 212,- E vfreq with pD(wi) = nD(w,-)/ 2'32, 7mm), where nD(w,-) is the number of times 212,- appears in D and \|V\| is the total number of unigrams .
Regression Model	To characterize this feature, we reuse the 7“ most frequent unigrams , i.e., w,- E vfreq.
Tree-Based Sentence Scoring	* sparse unigram distributions (siml) at each topic I on com: similarity between p(w0m,l 17 Com: vl) and p(wsn,l zsn : 17 Com: vl)
Tree-Based Sentence Scoring	— siml: We define two sparse (discrete) unigram distributions for candidate 0m and summary 3,, at each node Z on a vocabulary identified with words generated by the topic at that node, v; C V. Given wom = {2111, ...,wl0m\|}, let WOW; C wom be the set of words in am that are generated from topic zom at level I on path com.
Tree-Based Sentence Scoring	The discrete unigram distribution pom; = p(w0m,l zom = l, cowvl) represents the probability over all words 2); assigned to topic zom at level 1, by sampling only for words in woml.

unigram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

9. Comparing Automatic Evaluation Measures for Image Description

Elliott, Desmond and Keller, Frank

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The evaluation of computer-generated text is a notoriously difficult problem, however, the quality of image descriptions has typically been measured using unigram BLEU and human judgements.
Abstract	We estimate the correlation of unigram and Smoothed BLEU, TER, ROUGE-SU4, and Meteor against human judgements on two data sets.
Abstract	The main finding is that unigram BLEU has a weak correlation, and Meteor has the strongest correlation with human judgements.
Introduction	The main finding of our analysis is that TER and unigram BLEU are weakly corre-
Methodology	Unigram BLEU without a brevity penalty has been reported by Kulkarni et a1.
Methodology	(2011) to perform a sentence-level analysis, setting n = 1 and no brevity penalty to get the unigram BLEU measure, or n = 4 with the brevity penalty to get the Smoothed BLEU measure.
Methodology	We set dskip = 4 and award partial credit for unigram only matches, otherwise known as ROUGE-SU4.

unigram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

10. Perplexity on Reduced Corpora

Kobayashi, Hayato

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In Section 3, we theoretically derive the tradeoff formulae of the cutoff for unigram models, k-gram models, and topic models, each of which represents its perplexity with respect to a reduced vocabulary, under the assumption that the corpus follows Zipf’s law.
Perplexity on Reduced Corpora	3.1 Perplexity of Unigram Models
Perplexity on Reduced Corpora	Let us consider the perplexity of a unigram model learned from a reduced corpus.
Perplexity on Reduced Corpora	In unigram models, a predictive distribution 19’ on a reduced corpus w’ can be simply calculated as p’ (w’) = f (w’ ) / N’ .

unigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

unigram (13)
topic models (12)
LDA (10)

11. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automatic Evaluation Metrics	Then, unigram matching is performed on the remaining words that are not matched using paraphrases.
Automatic Evaluation Metrics	Based on the matches, ParaEval will then elect to use either unigram precision or unigram recall as its score for the sentence pair.
Automatic Evaluation Metrics	Based on the number of word or unigram matches and the amount of string fragmentation represented by the alignment, METEOR calculates a score for the pair of strings.

unigram is mentioned in 24 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

12. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Consistent with findings in the literature (Cui et al., 2006; Dave et al., 2003; Gamon and Aue, 2005), on the large corpus of movie review texts, the in-domain-trained system based solely on unigrams had lower accuracy than the similar system trained on bigrams.
Experiments	On sentences, however, we have observed an inverse pattern: unigrams performed better than bigrams and trigrams.
Experiments	Due to lower frequency of higher-order n-grams (as opposed to unigrams ), higher-order n-gram language models are more sparse, which increases the probability of missing a particular sentiment marker in a sentence (Table 33).
Factors Affecting System Performance	System runs with unigrams , bigrams, and trigrams as features and with different training set sizes are presented.
Lexicon-Based Approach	One of the limitations of general lexicons and dictionaries, such as WordNet (Fellbaum, 1998), as training sets for sentiment tagging systems is that they contain only definitions of individual words and, hence, only unigrams could be effectively learned from dictionary entries.
Lexicon-Based Approach	Since the structure of WordNet glosses is fairly different from that of other types of corpora, we developed a system that used the list of human-annotated adjectives from (Hatzivassiloglou and McKeown, 1997) as a seed list and then learned additional unigrams

unigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

13. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure

Johnson, Mark

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Word segmentation with adaptor grammars	Figure l: The unigram word adaptor grammar, which uses a unigram model to generate a sequence of words, where each word is a sequence of phonemes.
Word segmentation with adaptor grammars	3.1 Unigram word adaptor grammar
Word segmentation with adaptor grammars	(2007a) presented an adaptor grammar that defines a unigram model of word segmentation and showed that it performs as well as the unigram DP word segmentation model presented by (Goldwater et al., 2006a).

unigram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

14. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter

Tan, Chenhao and Lee, Lillian and Pang, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
Introduction	We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams ( unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
Introduction	headline unigram TT YES (53%) headline bigram TTTT * YES (52%)

unigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

15. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	We learn embedding for unigrams , bigrams and trigrams separately with same neural network and same parameter setting.
Related Work	The contexts of unigram (bigram/trigram) are the surrounding unigrams (bigrams/trigrams), respectively.
Related Work	25$ employs the embedding of unigrams , bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.

unigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

16. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	For notational convenience, we use terms to denote both words ( unigrams ) and phrases (n-grams).
Phrase Ranking based on Relevance	Topics in most topic models like LDA are usually unigram distributions.
Phrase Ranking based on Relevance	For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution.
Phrase Ranking based on Relevance	Yet another thread of research post-processes the discovered topical unigrams to form multi-word phrases using likelihood scores (Blei and Lafferty, 2009).

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

17. A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

Blunsom, Phil and Cohn, Trevor

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	This work differs from previous Bayesian models in that we explicitly model a complex backoff path using a hierachical prior, such that our model jointly infers distributions over tag trigrams, bigrams and unigrams and whole words and their character level representation.
Experiments	Note that the bigram PYP-HMM outperforms the closely related BHMM (the main difference being that we smooth tag bigrams with unigrams ).
The PYP-HMM	The trigram transition distribution, Tij, is drawn from a hierarchical PYP prior which backs off to a bigram Bj and then a unigram U distribution,
The PYP-HMM	This allows the modelling of trigram tag sequences, while smoothing these estimates with their corresponding bigram and unigram distributions.
The PYP-HMM	That is, each table at one level is equivalent to a customer at the next deeper level, creating the invari-ants: Kh} = n;- andKu—i 2 715, where u = tl_1 indicates the unigram backoff context of h. The recursion terminates at the lowest level where the base distribution is static.

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

18. Towards a General Rule for Identifying Deceptive Opinion Spam

Li, Jiwei and Ott, Myle and Cardie, Claire and Hovy, Eduard

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We train the OVR classifier on three sets of features, LI WC, Unigram , and POS.9
Experiments	In particular, the three-class classifier is around 65% accurate at distinguishing between Employee, Customer, and Tarker for each of the domains using Unigram , significantly higher than random guess.
Experiments	Best performance is achieved on Unigram features, constantly outperforming LIWC and POS features in both three-class and two-class settings in the hotel domain.
Introduction	In the examples in Table l, we trained a linear SVM classifier on Ott’s Chicago-hotel dataset on unigram features and tested it on a couple of different domains (the details of data acquisition are illustrated in Section 3).
Introduction	Table 1: SVM performance on datasets for a classifier trained on Chicago hotel review based on Unigram feature.

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

19. Decipherment Complexity in 1:1 Substitution Ciphers

Nuhn, Malte and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Definitions	Given a ciphertext ffv , we define the unigram count Nf off 6 Vf as1
Definitions	Similarly, we define language model matrices S for the unigram and the bigram case.
Definitions	The unigram language model Sf is defined as
Introduction	In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP).

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

20. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	However, oovs can be considered as n-grams (phrases) instead of unigrams .
Conclusion	In this scenario, we also can look for paraphrases and translations for phrases containing oovs and add them to the phrase-table as new translations along with the translations for unigram oovs.
Experiments & Results 4.1 Experimental Setup	Table 4: Intrinsic results of different types of graphs when using unigram nodes on Europarl.
Experiments & Results 4.1 Experimental Setup	Type Node \ MRR % \ RCL % \ Bipartite unigram 5.2 12.5 bigram 6.8 15.7 Tripartite unigram 5.9 12.6 bigram 6.9 15.9 Baseline bigram 3.9 7.7
Experiments & Results 4.1 Experimental Setup	Table 5: Results on using unigram or bigram nodes.

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

21. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
Experiments	SVM is a support vector machine with unigram and bigram features.
Experiments	unigram , POS, head chunks 91.0
Introduction	On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al.

unigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

22. Adapting Discriminative Reranking to Grounded Language Learning

Kim, Joohyun and Mooney, Raymond

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Reranking Features	Long-range Unigram .
Reranking Features	in the parse tree: f(L2 «a left) = l and f(L4 «A turn) 2 l. Two-level Long-range Unigram .
Reranking Features	Unigram .

unigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

23. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Previous work in traditional text classification and its variants — such as sentiment analysis — has achieved successful results by using the bag-of-words representation; that is, by treating text as a collection of words with no interdependencies, training a classifier on a large feature set of word unigrams which appear in the corpus.
Abstract	Few of these tactics would be effectively encapsulated by word unigrams .
Abstract	Many would be better modeled by POS tag unigrams (with no word information) or by longer n-grams consisting of either words, POS tags, or a combination of the two.

unigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

24. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Complexity Analysis	For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it.
Complexity Analysis	Thus its complexity is U 2 times the unigram model’s complexity:
Complexity Analysis	( unigram ) 0.729 0.804 3 s 50 s Prop.
Methods	This section uses a unigram model for description convenience, but the method can be extended to n-gram models.

unigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)

25. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Term and Document Frequency Statistics	Figure 4: Difference between observed and predicted IDFw for Tagalog unigrams .
Term and Document Frequency Statistics	3.1 Unigram Probabilities
Term and Document Frequency Statistics	We encounter the burstiness property of words again by looking at unigram occurrence probabilities.

unigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

26. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Smoothing on count distributions	p’(w \| u’) For the example above, the estimates for the unigram model p’(w) are p'(cat) = x 0.489 p'(dog) = x 0.511.
Smoothing on count distributions	For the example above, the count distributions used for the unigram distribution would be: ‘ r = 0 r = l p(c(cat) = r) 0.14 0.86 p(c(dog) = r) 0.1 0.9
Smoothing on integral counts	Absolute discounting chooses p’(w \| u’) to be the maximum-likelihood unigram distribution; under KN smoothing (Kneser and Ney, 1995), it is chosen to make [9 in (2) satisfy the following constraint for all (n — l)—grams u’w:
Word Alignment	Following common practice in language modeling, we use the unigram distribution p( f ) as the lower-order distribution.
Word Alignment	As shown in Table l, for KN smoothing, interpolation with the unigram distribution performs the best, while for WB smoothing, interestingly, interpolation with the uniform distribution performs the best.
Word Alignment	In WB smoothing, p’( f) is the empirical unigram distribution.

unigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

27. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq.
Evaluation	Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams.
Evaluation	6It is relatively straightforward to combine both unigrams and bi grams in one source graph, but for experimental clarity we did not mix these phrase lengths.
Generation & Propagation	Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings.
Introduction	Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams , since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.
Related Work	Recent improvements to BLI (Tamura et al., 2012; Irvine and Callison-Burch, 2013b) have contained a graph-based flavor by presenting label propagation-based approaches using a seed lexicon, but evaluation is once again done on top-1 or top-3 accuracy, and the focus is on unigrams .
Related Work	(2013) and Irvine and Callison-Burch (2013a) conduct a more extensive evaluation of their graph-based BLI techniques, where the emphasis and end-to-end BLEU evaluations concentrated on OOVs, i.e., unigrams , and not on enriching the entire translation model.

unigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

28. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	While previous work used p(l<:\|@) = (l —p($))k_1p($), this is only true for unigrams .
Inference	the number of tables tew for w in word unigrams .
Nested Pitman-Yor Language Model	Thus far we have assumed that the unigram G1 is already given, but of course it should also be generated as G1 ~ PY(G0, d, 6).
Nested Pitman-Yor Language Model	2Note that this is different from unigrams , which are posterior distribution given data.
Nested Pitman-Yor Language Model	When a word 21) is generated from its parent at the unigram node, it means that w
Pitman-Yor process and n-gram models	Suppose we have a unigram word distribution G1 = } where - ranges over each word in the lexicon.
Pitman-Yor process and n-gram models	In this representation, each n-gram context h (including the null context 6 for unigrams ) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw.

unigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

29. A New Dataset and Method for Automatically Grading ESOL Texts

Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	(a) Word unigrams (b) Word bigrams
Approach	(a) PoS unigrams (b) PoS bigrams (c) PoS trigrams
Approach	Word unigrams and bigrams are lower-cased and used in their inflected forms.
Previous work	The Bayesian Essay Test Scoring sYstem (BETSY) (Rudner and Liang, 2002) uses multinomial or Bernoulli Naive Bayes models to classify texts into different classes (e. g. pass/fail, grades AF) based on content and style features such as word unigrams and bigrams, sentence length, number of verbs, noun—verb pairs etc.
Validity tests	(a) word unigrams within a sentence (b) word bigrams within a sentence (c) word trigrams within a sentence

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

30. Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords

Szarvas, Gy"orgy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	Our finding that token unigram features are capable of solving the task accurately agrees with the the results of previous works on hedge classification ((Light et al., 2004), (Med-
Methods	For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
Results	About half of these were the kind of phrases that had no unigram components of themselves in the feature set, so these could be regarded as meaningful standalone features.
Results	Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).
Results	Our experiments revealed that in radiology reports, which mainly concentrate on listing the identified diseases and symptoms (facts) and the physician’s impressions (speculative parts), detecting hedge instances can be performed accurately using unigram features.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

31. Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition

Das, Dipanjan and Smith, Noah A.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Task	Notice the high lexical overlap between the two sentences ( unigram overlap of 100% in one direction and 72% in the other).
Data and Task	19 is another true paraphrase pair with much lower lexical overlap ( unigram overlap of 50% in one direction and 30% in the other).
Experimental Evaluation	(2006), using features calculated directly from 51 and 52 without recourse to any hidden structure: proportion of word unigram matches, proportion of lemma-tized unigram matches, BLEU score (Papineni et al., 2001), BLEU score on lemmatized tokens, F measure (Turian et al., 2003), difference of sentence length, and proportion of dependency relation overlap.
Experimental Evaluation	10This is accomplished by eliminating lines 12 and 13 from the definition of pm and redefining pword to be the unigram word distribution estimated from the Gigaword corpus, as in G0, without the help of WordNet.
QG for Paraphrase Modeling	(15) Here aw is the Good-Turing unigram probability estimate of a word 21) from the Gigaword corpus (Graff, 2003).
QG for Paraphrase Modeling	As noted, the distributions pm, the word unigram weights in Eq.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

32. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Architecture	To derive word features, first of all, our system automatically collect a list of word unigrams and bigrams from the training data.
System Architecture	To avoid overfitting, we only collect the word unigrams and bigrams whose frequency is larger than 2 in the training set.
System Architecture	This list of word unigrams and bigrams are then used as a unigram-dictionary and a bigram-dictionary to generate word-based unigram and bigram features.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

33. Automatic Coupling of Answer Extraction and Information Retrieval

Yao, Xuchen and Van Durme, Benjamin and Clark, Peter

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	6This is because the weights of unigram to trigram features in a loglinear CRF model is a balanced consequence for maximization.
Experiments	A unigram feature might end up with lower weight because another trigram containing this unigram gets a higher weight.
Experiments	Then we would have missed this feature if we only used top unigram features.
Method	Unigram QA Model The QA system uses up to trigram features (Table 1 shows examples of unigram and bigram features).
Method	We drop this strict constraint (which may need further smoothing) and only use unigram features, not by simply extracting “good” unigram features from the trained model, but by retraining the model with only unigram features.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

34. Query-Chain Focused Summarization

Baumel, Tal and Cohen, Raphael and Elhadad, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Algorithms	When constructing a summary, we update the unigram distribution of the constructed summary so that it includes a smoothed distribution of the previous summaries in order to eliminate redundancy between the successive steps in the chain.
Algorithms	For example, when we summarize the documents that were retrieved as a result to the first query, we calculate the unigram distribution in the same manner as we did in Focused KLSum; but for the second query, we calculate the unigram distribution as if all the sentences we selected for the previous summary were selected for the current query too, with a damping factor.
Algorithms	In this variant, the Unigram Distribution estimate of word X is computed as:
Previous Work	KLSum adopts a language model approach to compute relevance: the documents in the input set are modeled as a distribution over words (the original algorithm uses a unigram distribution over the bag of words in documents D).
Previous Work	KLSum is a sentence extraction algorithm: it searches for a subset of the sentences in D with a unigram distribution as similar as possible to that of the overall collection D, but with a limited length.
Previous Work	After the words are classified, the algorithm uses a KLSum variant to find the summary that best matches the unigram distribution of topic specific words.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

35. Grounded Language Modeling for Automatic Speech Recognition of Sports Video

Fleischman, Michael and Roy, Deb

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The remaining 93 unlabeled games are used to train unigram , bigram, and trigram grounded language models.
Evaluation	Only unigrams , bigrams, and tri—grams that are not proper names, appear greater than three times, and are not composed only of stop words were used.
Evaluation	with traditional unigram , bigram, and trigram language models generated from a combination of the closed captioning transcripts of all training games and data from the switchboard corpus (see below).
Linguistic Mapping	3 In the discussion that follows, we describe a method for estimating unigram grounded language models.

unigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

36. Unsupervised Alignment of Privacy Policies using Hidden Markov Models

Ramanath, Rohan and Liu, Fei and Sadeh, Norman and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents.
Approach	models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram).
Evaluation	We derived unigram tfidf vectors for each section in each of 50 randomly sampled policies per category.
Experiment	The implementation uses unigram features and cosine similarity.
Experiment	Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens.

unigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

37. Deceptive Answer Prediction with User Preference Graph

Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Besides unigram and bigram, the most effective textual feature is URL.
Proposed Features	3.1.1 Unigrams and Bigrams The most common type of feature for text classi-
Proposed Features	feature selection method X2 (Yang and Pedersen, 1997) to select the top 200 unigrams and bigrams as features.
Proposed Features	The top ten unigrams related to deceptive answers are shown on Table 1.

unigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

38. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental results	Note that unigrams in the models are never pruned, hence all models assign probabilities over an identical vocabulary and perplexity is comparable across models.
Marginal distribution constraints	Thus the unigram distribution is with respect to the bigram model, the bigram model is with respect to the trigram model, and so forth.
Model constraint algorithm	Thus we process each history length in descending order, finishing with the unigram state.
Model constraint algorithm	This can be particularly clearly seen at the unigram state, which has an arc for every unigram (the size of the vocabulary): for every bigram state (also order of the vocabulary), in the naive algorithm we must look for every possible arc.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In particular, we use the unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, all-number, punctuation, and tag bigrams for POS, CoNLL2000 and CoNLL 2003 datasets.
Experiments	For supertag dataset, we use the same features for the word inputs, and the unigrams and bigrams for gold POS inputs.
Problem formulation	To simplify the discussion, we divide the features into two groups: unigram label features and bi-gram label features.
Problem formulation	Unigram features are of form fk(yt, xt) which are concerned with the current label and arbitrary feature patterns from input sequence.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Viterbi (72)
CoNLL (15)
beam search (9)

40. Tackling Sparse Data Issue in Machine Translation Evaluation

Bojar, Ondřej and Kos, Kamil and Mareċek, David

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Extensions of SemPOS	For the purposes of the combination, we compute BLEU only on unigrams up to fourgrams (denoted BLEUl, ..., BLEU4) but including the brevity penalty as usual.
Extensions of SemPOS	This is also confirmed by the observation that using BLEU alone is rather unreliable for Czech and BLEU-l (which judges unigrams only) is even worse.
Problems of BLEU	Fortunately, there are relatively few false positives in n-gram based metrics: 6.3% of unigrams and far fewer higher n-grams.
Problems of BLEU	This amounts to 34% of running unigrams , giving enough space to differ in human judgments and still remain unscored.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approaches	We consider both template unigrams and bigrams, combining two templates in sequence.
Approaches	Constructing all feature template unigrams and bigrams would yield an unwieldy number of features.
Experiments	Our primary feature set IGC consists of 127 template unigrams that emphasize coarse properties (i.e., properties 7, 9, and 11 in Table 1).
Experiments	However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Unsupervised Solution Post Identification from Discussion Forums

P, Deepak and Visweswariah, Karthik

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions and Future Work	We model and harness lexical correlations using translation models, in the company of unigram language models that are used to characterize reply posts, and formulate a clustering-based EM approach for solution identification.
Introduction	We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively.
Our Approach	Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions.
Our Approach	Consider the post and reply vocabularies to be of sizes A and B respectively; then, the translation model would have A x B variables, whereas the unigram language model has only B variables.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

43. Predicting Power Relations between Participants in Written Dialog from a Single Thread

Prabhakaran, Vinodkumar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Predicting Direction of Power	Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44
Predicting Direction of Power	We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set.
Predicting Direction of Power	We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

44. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Training method	Table 2: Unigram features.
Training method	We broadly classify features into two categories: unigram and bigram features.
Training method	Unigram features: Table 2 shows our unigram features.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

45. Modeling Latent Biographic Attributes in Conversational Genres

Garera, Nikesh and Yarowsky, David

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Corpus Details	As our reference algorithm, we used the current state-of-the-art system developed by Boulis and Ostendorf (2005) using unigram and bigram features in a SVM framework.
Corpus Details	For each conversation side, a training example was created using unigram and bigram features with tf-idf weighting, as done in standard text classification approaches.
Corpus Details	Also, named entity “Mike” shows up as a discriminative unigram , this maybe due to the self-introduction at the beginning of the conversations and “Mike” being a common male name.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

46. Practical Very Large Scale CRFs

Lavergne, Thomas and Cappé, Olivier and Yvon, François

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional Random Fields	Using only unigram features {fy,$}(y,$)€y>< X results in a model equivalent to a simple bag-of-tokens position-by-position logistic regression model.
Conditional Random Fields	The same idea can be used when the set {My,$t+1}yey of unigram features is sparse.
Conditional Random Fields	The features used in Nettalk experiments take the form fyflu ( unigram ) and fy/ww (bigram), where w is a n-gram of letters.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

47. A Risk Minimization Framework for Extractive Speech Summarization

Lin, Shih-Hsiang and Chen, Berlin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental results and discussions 6.1 Baseline experiments	Another is that BC utilizes a rich set of features to characterize a given spoken sentence while LM is constructed solely on the basis of the lexical ( unigram ) information.
Experimental setup 5.1 Data	They are, respectively, the ROUGE-l ( unigram ) measure, the ROUGE-2 (bigram) measure and the ROUGE-L (longest common subsequence) measure (Lin, 2004).
Proposed Methods	In the LM approach, each sentence in a document can be simply regarded as a probabilistic generative model consisting of a unigram distribution (the so-called “bag-0f-words” assumption) for generating the document (Chen et al., 2009): (w)
Proposed Methods	To mitigate this potential defect, a unigram probability estimated from a general collection, which models the general distribution of words in the target language, is often used to smooth the sentence model.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

loss function (14)
LM (8)
CRF (6)

48. Recognizing Rare Social Phenomena in Conversation: Empowerment Detection in Support Group Chatrooms

Mayfield, Elijah and Adamson, David and Penstein Rosé, Carolyn

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Cue Discovery for Content Selection	.xm} consists of m unigram features representing the observed vocabulary used in our corpus.
Experimental Results	We use a binary unigram feature space, and we perform 7-fold cross-va1idation.
Prediction	One challenge of this approach is our underlying unigram feature space - tree-based algorithms are generally poor classifiers for the high-dimensionality, low-information features in a lexical feature space (Han et al., 2001).
Prediction	splits than would unigrams alone.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

49. Word Sense Disambiguation Improves Information Retrieval

Zhong, Zhi and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We use the Lemur toolkit (Ogilvie and Callan, 2001) version 4.11 as the basic retrieval tool, and select the default unigram LM approach based on KL-divergence and Dirichlet-prior smoothing method in Lemur as our basic retrieval approach.
The Language Modeling Approach to IR	The most commonly used language model in IR is the unigram model, in which terms are assumed to be independent of each other.
The Language Modeling Approach to IR	In the rest of this paper, language model will refer to the unigram language model.
The Language Modeling Approach to IR	With unigram model, the negative KL-divergence between model 6.1 of query (1 and model 6d of document d is calculated as follows:

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

50. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

New Sense Indicators	As such, we compute unigram log probabilities (via smoothed relative frequencies) of each word under consideration in the old domain and the new domain.
New Sense Indicators	However, we do not simply want to capture unusual words, but words that are unlikely in context, so we also need to look at the respective unigram log probabilities: 635' and Eflgw.
New Sense Indicators	From these four values, we compute corpus-level (and therefore type-based) statistics of the new domain n-gram log probability (Eflgw, the difference between the n-gram probabilities in each domain (623” — 6:51), the difference between the n-gram and unigram probabilities in the new domain (EQSW — 633‘”), and finally the combined difference: 623"” — [SSW + 63:: — 635’).

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

51. Efficient Staggered Decoding for Sequence Labeling

Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	where we explicitly distinguish the unigram feature function o; and bigram feature function Comparing the form of the two functions, we can see that our discussion on HMMs can be extended to perceptrons by substituting 2k wigbflwwn) and 2k wg¢%(w,yn_1,yn) for logp(:cn\|yn) and 10gp(yn\|yn—1)-
Introduction	For unigram features, we compute the maximum, maxy 2k wligbflmw), as a preprocess in
Introduction	In POS tagging, we used unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, and tag bigrams.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

52. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Why Does Unsimplified Data Help?	This is particularly important for unigrams (i.e.
Why Does Unsimplified Data Help?	Table 3 shows the percentage of unigrams , bigrams and trigrams from the two test sets that are found in the simple and normal training data.
Why Does Unsimplified Data Help?	Even at the unigram level, the normal data contained significantly more of the test set unigrams than the simple data.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

53. Detecting Retries of Voice Search Queries

Levitan, Rivka and Elson, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	We also count the number of unigrams the two transcripts have in common and the length, absolute and relative, of the longest unigram overlap.
Features	In addition, we look at the number of characters and unigrams and the audio duration of each query, with the intuition that the length of a query may be correlated with its likelihood of being retried (or a retry).
Prediction task	T-tests between the two categories showed that all edit distance features—character, word, reduced, and phonetic; raw and normalized—are significantly more similar between retry query pairs.1 Similarly, the number of unigrams the two queries have in common is significantly higher for retries.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Content Models with Attitude

Sauper, Christina and Haghighi, Aria and Barzilay, Regina

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The DISCRIMINATIVE baseline for this task is a standard maximum entropy discriminative binary classifier over unigrams .
Model	Global Distributions: At the global level, we draw several unigram distributions: a global background distribution 63 and attribute distributions 6% for each attribute.
Model	Product Level: For the ith product, we draw property unigram distributions 6351, .

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used a simple measure for isolating the syntactic likelihood of a sentence: we take the log-probability under our model and subtract the log-probability under a unigram model, then normalize by the length of the sentence.8 This measure, which we call the syntactic log-odds ratio (SLR), is a crude way of “subtracting out” the semantic component of the generative probability, so that sentences that use rare words are not penalized for doing so.
Experiments	(2004) also report using a parser probability normalized by the unigram probability (but not length), and did not find it effective.
Treelet Language Modeling	p(w\|P, R, r’, w_1, w_2) to p(w\|P, R, r’, w_1) and then p(w\|P, R, r’ From there, we back off to p(w\|P, R) where R is the sibling immediately to the right of P, then to a raw PCFG p(w\|P), and finally to a unigram distribution.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Structure-based Stacking	0 Character unigrams : ck (i — l S k: S i + l) 0 Character bigrams: ckck+1 (i — l S k: < i + l)
Structure-based Stacking	0 Character label unigrams : cgpd (i—lppd S k: 3 73+ zppd)
Structure-based Stacking	0 Unigram features: C(sk) (i — l0 3 k: S +l0), Tctb(3k) (i — 1351) S k?

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	where h(t) is the unigram prefix of bigram t.
Consensus Decoding Algorithms	where T1 is the set of unigrams in the language, and 6(6, 25) is an indicator function that equals 1 if 75 appears in e and 0 otherwise.
Consensus Decoding Algorithms	Figure 1: For the linear similarity measure U (c; e’ ), which computes unigram precision, the MBR translation can be found by iterating either over sentence pairs (Algorithm 1) or over features (Algorithm 2).

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Approaches to Deceptive Opinion Spam Detection	Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS , BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection	We consider all three n-gram feature sets, namely UNIGRAMS , BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Automated Approaches to Deceptive Opinion Spam Detection	We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS , BIGRAMS+, and TRIGRAMS+.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Automatic Generation of Story Highlights

Woodsend, Kristian and Lapata, Mirella

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	The mapping of sentence labels to phrase labels was unsupervised: if the phrase came from a sentence labeled (1), and there was a unigram overlap (excluding stop words) between the phrase and any of the original highlights, we marked this phrase with a positive label.
Experimental Setup	Our feature set comprised surface features such as sentence and paragraph position information, POS tags, unigram and bigram overlap with the title, and whether high-scoring tf.idf words were present in the phrase (66 features in total).
Experimental Setup	We report unigram overlap (ROUGE-l) as a means of assessing informativeness and the longest common subsequence (ROUGE-L) as a means of assessing fluency.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	As shown in Table 2a, decoding with a single variational n-gram model (VM) as per (14) improves the Viterbi baseline (except the case with a unigram VM), though often not statistically significant.
Experimental Results	The interpolation between a VM and a word penalty feature (“wp”) improves over the unigram
Experimental Results	This is necessarily true, but it is interesting to see that most of the improvement is obtained just by moving from a unigram to a bigram model.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

61. Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Kim, Young-Bum and Snyder, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference	where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
Inference	where is the unigram count of character w, and n(t’) is the unigram count of tag 75, over all characters tokens (including 7.0).
Inference	where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

62. Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

Tsuruoka, Yoshimasa and Tsujii, Jun'ichi and Ananiadou, Sophia

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Log-Linear Models	The features used in this experiment were unigrams and bigrams of neighboring words, and unigrams , bigrams and trigrams of neighboring POS tags.
Log-Linear Models	For the features, we used unigrams of neighboring chunk tags, substrings (shorter than 10 characters) of the current word, and the shape of the word (e. g. “IL-2” is converted into “AA-#”), on top of the features used in the text chunking experiments.
Log-Linear Models	For the features, we used unigrams and bigrams of neighboring words, prefixes and suffixes of the current word, and some characteristics of the word.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

63. Representation Learning for Text-level Discourse Parsing

Ji, Yangfeng and Eisenstein, Jacob

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The vocabulary V includes all unigrams after down-casing.
Experiments	In total, there are 16250 unique unigrams in V.
Experiments	fication for visualization is we consider only the top 1000 frequent unigrams in the RST—DT training set.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

64. Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Kozareva, Zornitsa

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Task A: Polarity Classification	We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams.
Task A: Polarity Classification	In this paper, we will refer from now on to n-grams as the combination of unigrams and bigrams.
Task B: Valence Prediction	Those include n-grams ( unigrams , bigrams and combination of the two), LIWC scores.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

65. Co-Training for Cross-Lingual Sentiment Classification

Wan, Xiaojun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Empirical Evaluation 4.1 Evaluation Setup	In the above experiments, all features ( unigram + bigram) are used.
The Co-Training Approach	The English or Chinese features used in this study include both unigrams and bigrams5 and the feature weight is simply set to term frequency6.
The Co-Training Approach	5 For Chinese text, a unigram refers to a Chinese word and a bigram refers to two adjacent Chinese words.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

66. Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

Setiawan, Hendra and Zhou, Bowen and Xiang, Bing and Shen, Libin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	As the backbone of our string-to-dependency system, we train 3-gram models for left and right dependencies and unigram for head using the target side of the bilingual training data.
Introduction	In this way, we hope to upgrade the unigram formulation of existing reordering models to a higher order formulation.
Related Work	Our TNO model is closely related to the Unigram Orientation Model (UOM) (Tillman, 2004), which is the de facto reordering model of phrase-based SMT (Koehn et al., 2007).

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

67. Utterance-Level Multimodal Sentiment Analysis

Perez-Rosas, Veronica and Mihalcea, Rada and Morency, Louis-Philippe

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Multimodal Sentiment Analysis	We use a bag-of-words representation of the video transcriptions of each utterance to derive unigram counts, which are then used as linguistic features.
Multimodal Sentiment Analysis	The remaining words represent the unigram features, which are then associated with a value corresponding to the frequency of the unigram inside each utterance transcription.
Multimodal Sentiment Analysis	These simple weighted unigram features have been successfully used in the past to build sentiment classifiers on text, and in conjunction with Support Vector Machines (SVM) have been shown to lead to state-of-the-art performance (Maas et al., 2011).

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: