Index of papers in Proc. ACL 2012 that mention

Seen in text as:

Seen in 49 sentences in 7 papers.

Chambers, Nathanael

Previous Work	They learned unigram language models (LMs) for specific time periods and scored articles with log-likelihood ratio scores.
Previous Work	Kanhabua and Norvag (2008; 2009) extended this approach with the same model, but expanded its unigrams with POS tags, collocations, and tf-idf scores.
Previous Work	As above, they learned unigram LMs, but instead measured the KL-divergence between a document and a time period’s LM.
Timestamp Classifiers	The unigrams w are lowercased tokens.
Timestamp Classifiers	model as the Unigram NLLR.
Timestamp Classifiers	Followup work by Kanhabua and Norvag (2008) applied two filtering techniques to the unigrams in the model:

unigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

Johnson, Mark and Demuth, Katherine and Frank, Michael

Abstract	We show how to model the task of inferring which objects are being talked about (and which words refer to which objects) as standard grammatical inference, and describe PCFG-based unigram models and adaptor grammar-based collocation models for the task.
Introduction	The unigram model we describe below corresponds most closely to the Frank
Introduction	2.1 Topic models and the unigram PCFG
Introduction	This leads to our first model, the unigram grammar, which is a PCFG.1

unigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

Sun, Xu and Wang, Houfeng and Li, Wenjie

System Architecture	To derive word features, first of all, our system automatically collect a list of word unigrams and bigrams from the training data.
System Architecture	To avoid overfitting, we only collect the word unigrams and bigrams whose frequency is larger than 2 in the training set.
System Architecture	This list of word unigrams and bigrams are then used as a unigram-dictionary and a bigram-dictionary to generate word-based unigram and bigram features.

unigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin

Experiments	In particular, we use the unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, all-number, punctuation, and tag bigrams for POS, CoNLL2000 and CoNLL 2003 datasets.
Experiments	For supertag dataset, we use the same features for the word inputs, and the unigrams and bigrams for gold POS inputs.
Problem formulation	To simplify the discussion, we divide the features into two groups: unigram label features and bi-gram label features.
Problem formulation	Unigram features are of form fk(yt, xt) which are concerned with the current label and arbitrary feature patterns from input sequence.

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Zhong, Zhi and Ng, Hwee Tou

Experiments	We use the Lemur toolkit (Ogilvie and Callan, 2001) version 4.11 as the basic retrieval tool, and select the default unigram LM approach based on KL-divergence and Dirichlet-prior smoothing method in Lemur as our basic retrieval approach.
The Language Modeling Approach to IR	The most commonly used language model in IR is the unigram model, in which terms are assumed to be independent of each other.
The Language Modeling Approach to IR	In the rest of this paper, language model will refer to the unigram language model.
The Language Modeling Approach to IR	With unigram model, the negative KL-divergence between model 6.1 of query (1 and model 6d of document d is calculated as follows:

unigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Pauls, Adam and Klein, Dan

Experiments	We used a simple measure for isolating the syntactic likelihood of a sentence: we take the log-probability under our model and subtract the log-probability under a unigram model, then normalize by the length of the sentence.8 This measure, which we call the syntactic log-odds ratio (SLR), is a crude way of “subtracting out” the semantic component of the generative probability, so that sentences that use rare words are not penalized for doing so.
Experiments	(2004) also report using a parser probability normalized by the unigram probability (but not length), and did not find it effective.
Treelet Language Modeling	p(w\|P, R, r’, w_1, w_2) to p(w\|P, R, r’, w_1) and then p(w\|P, R, r’ From there, we back off to p(w\|P, R) where R is the sibling immediately to the right of P, then to a raw PCFG p(w\|P), and finally to a unigram distribution.

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Sun, Weiwei and Wan, Xiaojun

Structure-based Stacking	0 Character unigrams : ck (i — l S k: S i + l) 0 Character bigrams: ckck+1 (i — l S k: < i + l)
Structure-based Stacking	0 Character label unigrams : cgpd (i—lppd S k: 3 73+ zppd)
Structure-based Stacking	0 Unigram features: C(sk) (i — l0 3 k: S +l0), Tctb(3k) (i — 1351) S k?

unigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: