Index of papers in Proc. ACL 2011 that mention
  • bigram
Berg-Kirkpatrick, Taylor and Gillick, Dan and Klein, Dan
Joint Model
While past extractive methods have assigned value to individual sentences and then explicitly represented the notion of redundancy (Carbonell and Goldstein, 1998), recent methods show greater success by using a simpler notion of coverage: bigrams
Joint Model
Note that there is intentionally a bigram missing from (a).
Joint Model
contribute content, and redundancy is implicitly encoded in the fact that redundant sentences cover fewer bigrams (Nenkova and Vanderwende, 2005; Gillick and Favre, 2009).
Structured Learning
We use bigram recall as our loss function (see Section 3.3).
Structured Learning
Luckily, our choice of loss function, bigram recall, factors over bigrams .
Structured Learning
We simply modify each bigram value 2);, to include bigram b’s contribution to the total loss.
bigram is mentioned in 36 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor
Background
This work differs from previous Bayesian models in that we explicitly model a complex backoff path using a hierachical prior, such that our model jointly infers distributions over tag trigrams, bigrams and unigrams and whole words and their character level representation.
Experiments
mkcls (Och, 1999) 73.7 65.6 MLE 1HMM—LM (Clark, 2003)* 71.2 65.5 BHMM (GG07) 63.2 56.2 PR (Ganchev et al., 2010)* 62.5 54.8 Trigram PYP—HMM 69.8 62.6 Trigram PYP-lHMM 76.0 68.0 Trigram PYP—lHMM-LM 77.5 69.7 Bigram PYP-HMM 66.9 59.2 Bigram PYP— 1HMM 72.9 65.9 Trigram DP—HMM 68.1 60.0 Trigram DP— 1HMM 76.0 68.0 Trigram DP— 1HMM—LM 76.8 69.8
Experiments
If we restrict the model to bigrams we see a considerable drop in performance.
The PYP-HMM
The trigram transition distribution, Tij, is drawn from a hierarchical PYP prior which backs off to a bigram Bj and then a unigram U distribution,
The PYP-HMM
This allows the modelling of trigram tag sequences, while smoothing these estimates with their corresponding bigram and unigram distributions.
The PYP-HMM
We formulate the character—level language model as a bigram model over the character sequence comprising word 7111,
bigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Bollegala, Danushka and Weir, David and Carroll, John
A Motivating Example
( bigrams ) survey+development, develop-ment+civilization
Feature Expansion
,w N}, where the elements 212,- are either unigrams or bigrams that appear in the review d. We then represent a review d by a real-valued term-frequency vector d 6 RN , where the value of the j-th element dj is set to the total number of occurrences of the unigram or bigram wj in the review d. To find the suitable candidates to expand a vector d for the review d, we define a ranking score score(ui, d) for each base entry in the thesaurus as follows:
Feature Expansion
Moreover, we weight the relatedness scores for each word wj by its normalized term-frequency to emphasize the salient unigrams and bigrams in a review.
Feature Expansion
This is particularly important because we would like to score base entries ui considering all the unigrams and bigrams that appear in a review d, instead of considering each unigram or bigram individually.
Introduction
a unigram or a bigram of word lemma) in a review using a feature vector.
Sentiment Sensitive Thesaurus
We select unigrams and bigrams from each sentence.
Sentiment Sensitive Thesaurus
For the remainder of this paper, we will refer to unigrams and bigrams collectively as lexical elements.
Sentiment Sensitive Thesaurus
Previous work on sentiment classification has shown that both unigrams and bigrams are useful for training a sentiment classifier (Blitzer et al., 2007).
bigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Schütze, Hinrich
Experimental Setup
256,873 unique unigrams and 4,494,222 unique bigrams .
Experimental Setup
We cluster unigrams (i = l) and bigrams (i = 2).
Experimental Setup
SRILM does not directly support bigram clustering.
Models
The parameters d’, d”, and d’” are the discounts for unigrams, bigrams and trigrams, respectively, as defined by Chen and Goodman (1996, p. 20, (26)).
Models
bigram ) histories that is covered by the clusters.
Models
We cluster bigram histories and unigram histories separately and write 193 (7.03 |w1w2) for the bigram cluster model and pB(w3|w2) for the unigram cluster model.
bigram is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Rush, Alexander M. and Collins, Michael
A Simple Lagrangian Relaxation Algorithm
We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption:
A Simple Lagrangian Relaxation Algorithm
over the original (non-intersected) hypergraph, with leaf nodes having weights 6v + 0% + (3) If the output derivation from step 2 has the same set of bigrams as those from step 1, then we have an exact solution to the problem.
A Simple Lagrangian Relaxation Algorithm
C1 states that each leaf in a derivation has exactly one incoming bigram, and that each leaf not in the derivation has 0 incoming bigrams; C2 states that each leaf in a derivation has exactly one outgoing bigram , and that each leaf not in the derivation has 0 outgoing bigrams.6
Background: Hypergraphs
Throughout this paper we make the following assumption when using a bigram language model:
Background: Hypergraphs
Assumption 3.1 ( Bigram start/end assumption.)
The Full Algorithm
The set ’P of trigram paths plays an analogous role to the set 23 of bigrams in our previous algorithm.
bigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract
please AVB and 9 is the number of bigrams in T, excluding sentence initial and final markers.
Abstract
Unigrams and Bigrams : As a different sort of baseline, we considered the results of a bag-of-words based classifier.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.
Automated Approaches to Deceptive Opinion Spam Detection
Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+ , TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection
We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+ , and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Automated Approaches to Deceptive Opinion Spam Detection
We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+ , and TRIGRAMS+.
Conclusion and Future Work
Specifically, our findings suggest the importance of considering both the context (e.g., BIGRAMS+ ) and motivations underlying a deception, rather than strictly adhering to a universal set of deception cues (e.g., LIWC).
Results and Discussion
This suggests that a universal set of keyword-based deception cues (e.g., LIWC) is not the best approach to detecting deception, and a context-sensitive approach (e.g., BIGRAMS+ ) might be necessary to achieve state-of-the-art deception detection performance.
Results and Discussion
Additional work is required, but these findings further suggest the importance of moving beyond a universal set of deceptive language features (e. g., LIWC) by considering both the contextual (e. g., BIGRAMS+ ) and motivational parameters underlying a deception as well.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Liu, Jenny and Haghighi, Aria
Experiments
We also kept NPs with only 1 modifier to be used for generating <m0difie1; head n0un> bigram counts at training time.
Experiments
For example, the NP “the beautiful blue Macedonian vase” generates the following bigrams : <beautzful blue>, <blue Macedonian>, and <beautlful Macedonian>, along with the 3-gram <beautlful blue Macedonian>.
Experiments
In addition, we also store a table that keeps track of bigram counts for < M, H >, where H is the head noun of an NP and M is the modifier closest to it.
Related Work
Shaw and Hatzivassiloglou also use a transitivity method to fill out parts of the Count table where bigrams are not actually seen in the training data but their counts can be inferred from other entries in the table, and they use a clustering method to group together modifiers with similar positional preferences.
Related Work
Shaw and Hatzivassiloglou report a highest accuracy of 94.93% and a lowest accuracy of 65.93%, but since their methods depend heavily on bigram counts in the training corpus, they are also limited in how informed their decisions can be if modifiers in the test data are not present at training time.
bigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben
Approach
(a) Word unigrams (b) Word bigrams
Approach
(a) PoS unigrams (b) PoS bigrams (c) PoS trigrams
Approach
Word unigrams and bigrams are lower-cased and used in their inflected forms.
Previous work
The Bayesian Essay Test Scoring sYstem (BETSY) (Rudner and Liang, 2002) uses multinomial or Bernoulli Naive Bayes models to classify texts into different classes (e. g. pass/fail, grades AF) based on content and style features such as word unigrams and bigrams , sentence length, number of verbs, noun—verb pairs etc.
Validity tests
(a) word unigrams within a sentence (b) word bigrams within a sentence (c) word trigrams within a sentence
bigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Experiments
Web page hits for word pairs and trigrams are obtained using a simple heuristic query to the search engine Google.11 Inflected queries are performed by expanding a bigram or trigram into all its morphological forms.
Experiments
Although Google hits is noisier, it has very much larger coverage of bigrams or trigrams.
Experiments
This means that if pages indexed by Google doubles, then so do the bigrams or trigrams frequencies.
Related Work
Keller and Lapata (2003) evaluated the utility of using web search engine statistics for unseen bigram .
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Clifton, Ann and Sarkar, Anoop
Models 2.1 Baseline Models
After CRF based recovery of the suffix tag sequence, we use a bigram language model trained on a full segmented version on the training data to recover the original vowels.
Models 2.1 Baseline Models
We used bigrams only, because the suffix vowel harmony alternation depends only upon the preceding phonemes in the word from which it was segmented.
Models 2.1 Baseline Models
original training data: koskevaa mietintoa kasitellaan segmentation: koske+ +va+ +a mietinto+ +5 kasi+ +te+ +115+ +a+ +n (train bigram language model with mapping A = { a, 'a map final suflia‘ to abstract tag-set: koske+ +va+ +A mietinto+ +A kasi+ +te+ +11a+ +a+ +n (train CRF model to predict the final suffix) peeling of final suflia‘: koske+ +va+ mietinto+ kasi+ +te+ +11a+ +a+ (train SMT model on this transformation of training data) (a) Training
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: