Joint Model | While past extractive methods have assigned value to individual sentences and then explicitly represented the notion of redundancy (Carbonell and Goldstein, 1998), recent methods show greater success by using a simpler notion of coverage: bigrams |
Joint Model | Note that there is intentionally a bigram missing from (a). |
Joint Model | contribute content, and redundancy is implicitly encoded in the fact that redundant sentences cover fewer bigrams (Nenkova and Vanderwende, 2005; Gillick and Favre, 2009). |
Structured Learning | We use bigram recall as our loss function (see Section 3.3). |
Structured Learning | Luckily, our choice of loss function, bigram recall, factors over bigrams . |
Structured Learning | We simply modify each bigram value 2);, to include bigram b’s contribution to the total loss. |
Background | This work differs from previous Bayesian models in that we explicitly model a complex backoff path using a hierachical prior, such that our model jointly infers distributions over tag trigrams, bigrams and unigrams and whole words and their character level representation. |
Experiments | mkcls (Och, 1999) 73.7 65.6 MLE 1HMM—LM (Clark, 2003)* 71.2 65.5 BHMM (GG07) 63.2 56.2 PR (Ganchev et al., 2010)* 62.5 54.8 Trigram PYP—HMM 69.8 62.6 Trigram PYP-lHMM 76.0 68.0 Trigram PYP—lHMM-LM 77.5 69.7 Bigram PYP-HMM 66.9 59.2 Bigram PYP— 1HMM 72.9 65.9 Trigram DP—HMM 68.1 60.0 Trigram DP— 1HMM 76.0 68.0 Trigram DP— 1HMM—LM 76.8 69.8 |
Experiments | If we restrict the model to bigrams we see a considerable drop in performance. |
The PYP-HMM | The trigram transition distribution, Tij, is drawn from a hierarchical PYP prior which backs off to a bigram Bj and then a unigram U distribution, |
The PYP-HMM | This allows the modelling of trigram tag sequences, while smoothing these estimates with their corresponding bigram and unigram distributions. |
The PYP-HMM | We formulate the character—level language model as a bigram model over the character sequence comprising word 7111, |
A Motivating Example | ( bigrams ) survey+development, develop-ment+civilization |
Feature Expansion | ,w N}, where the elements 212,- are either unigrams or bigrams that appear in the review d. We then represent a review d by a real-valued term-frequency vector d 6 RN , where the value of the j-th element dj is set to the total number of occurrences of the unigram or bigram wj in the review d. To find the suitable candidates to expand a vector d for the review d, we define a ranking score score(ui, d) for each base entry in the thesaurus as follows: |
Feature Expansion | Moreover, we weight the relatedness scores for each word wj by its normalized term-frequency to emphasize the salient unigrams and bigrams in a review. |
Feature Expansion | This is particularly important because we would like to score base entries ui considering all the unigrams and bigrams that appear in a review d, instead of considering each unigram or bigram individually. |
Introduction | a unigram or a bigram of word lemma) in a review using a feature vector. |
Sentiment Sensitive Thesaurus | We select unigrams and bigrams from each sentence. |
Sentiment Sensitive Thesaurus | For the remainder of this paper, we will refer to unigrams and bigrams collectively as lexical elements. |
Sentiment Sensitive Thesaurus | Previous work on sentiment classification has shown that both unigrams and bigrams are useful for training a sentiment classifier (Blitzer et al., 2007). |
Experimental Setup | 256,873 unique unigrams and 4,494,222 unique bigrams . |
Experimental Setup | We cluster unigrams (i = l) and bigrams (i = 2). |
Experimental Setup | SRILM does not directly support bigram clustering. |
Models | The parameters d’, d”, and d’” are the discounts for unigrams, bigrams and trigrams, respectively, as defined by Chen and Goodman (1996, p. 20, (26)). |
Models | bigram ) histories that is covered by the clusters. |
Models | We cluster bigram histories and unigram histories separately and write 193 (7.03 |w1w2) for the bigram cluster model and pB(w3|w2) for the unigram cluster model. |
A Simple Lagrangian Relaxation Algorithm | We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption: |
A Simple Lagrangian Relaxation Algorithm | over the original (non-intersected) hypergraph, with leaf nodes having weights 6v + 0% + (3) If the output derivation from step 2 has the same set of bigrams as those from step 1, then we have an exact solution to the problem. |
A Simple Lagrangian Relaxation Algorithm | C1 states that each leaf in a derivation has exactly one incoming bigram, and that each leaf not in the derivation has 0 incoming bigrams; C2 states that each leaf in a derivation has exactly one outgoing bigram , and that each leaf not in the derivation has 0 outgoing bigrams.6 |
Background: Hypergraphs | Throughout this paper we make the following assumption when using a bigram language model: |
Background: Hypergraphs | Assumption 3.1 ( Bigram start/end assumption.) |
The Full Algorithm | The set ’P of trigram paths plays an analogous role to the set 23 of bigrams in our previous algorithm. |
Abstract | To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag): |
Abstract | please AVB and 9 is the number of bigrams in T, excluding sentence initial and final markers. |
Abstract | Unigrams and Bigrams : As a different sort of baseline, we considered the results of a bag-of-words based classifier. |
Automated Approaches to Deceptive Opinion Spam Detection | Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+ , TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set. |
Automated Approaches to Deceptive Opinion Spam Detection | We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+ , and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996). |
Automated Approaches to Deceptive Opinion Spam Detection | We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+ , and TRIGRAMS+. |
Conclusion and Future Work | Specifically, our findings suggest the importance of considering both the context (e.g., BIGRAMS+ ) and motivations underlying a deception, rather than strictly adhering to a universal set of deception cues (e.g., LIWC). |
Results and Discussion | This suggests that a universal set of keyword-based deception cues (e.g., LIWC) is not the best approach to detecting deception, and a context-sensitive approach (e.g., BIGRAMS+ ) might be necessary to achieve state-of-the-art deception detection performance. |
Results and Discussion | Additional work is required, but these findings further suggest the importance of moving beyond a universal set of deceptive language features (e. g., LIWC) by considering both the contextual (e. g., BIGRAMS+ ) and motivational parameters underlying a deception as well. |
Experiments | We also kept NPs with only 1 modifier to be used for generating <m0difie1; head n0un> bigram counts at training time. |
Experiments | For example, the NP “the beautiful blue Macedonian vase” generates the following bigrams : <beautzful blue>, <blue Macedonian>, and <beautlful Macedonian>, along with the 3-gram <beautlful blue Macedonian>. |
Experiments | In addition, we also store a table that keeps track of bigram counts for < M, H >, where H is the head noun of an NP and M is the modifier closest to it. |
Related Work | Shaw and Hatzivassiloglou also use a transitivity method to fill out parts of the Count table where bigrams are not actually seen in the training data but their counts can be inferred from other entries in the table, and they use a clustering method to group together modifiers with similar positional preferences. |
Related Work | Shaw and Hatzivassiloglou report a highest accuracy of 94.93% and a lowest accuracy of 65.93%, but since their methods depend heavily on bigram counts in the training corpus, they are also limited in how informed their decisions can be if modifiers in the test data are not present at training time. |
Approach | (a) Word unigrams (b) Word bigrams |
Approach | (a) PoS unigrams (b) PoS bigrams (c) PoS trigrams |
Approach | Word unigrams and bigrams are lower-cased and used in their inflected forms. |
Previous work | The Bayesian Essay Test Scoring sYstem (BETSY) (Rudner and Liang, 2002) uses multinomial or Bernoulli Naive Bayes models to classify texts into different classes (e. g. pass/fail, grades AF) based on content and style features such as word unigrams and bigrams , sentence length, number of verbs, noun—verb pairs etc. |
Validity tests | (a) word unigrams within a sentence (b) word bigrams within a sentence (c) word trigrams within a sentence |
Experiments | Web page hits for word pairs and trigrams are obtained using a simple heuristic query to the search engine Google.11 Inflected queries are performed by expanding a bigram or trigram into all its morphological forms. |
Experiments | Although Google hits is noisier, it has very much larger coverage of bigrams or trigrams. |
Experiments | This means that if pages indexed by Google doubles, then so do the bigrams or trigrams frequencies. |
Related Work | Keller and Lapata (2003) evaluated the utility of using web search engine statistics for unseen bigram . |
Models 2.1 Baseline Models | After CRF based recovery of the suffix tag sequence, we use a bigram language model trained on a full segmented version on the training data to recover the original vowels. |
Models 2.1 Baseline Models | We used bigrams only, because the suffix vowel harmony alternation depends only upon the preceding phonemes in the word from which it was segmented. |
Models 2.1 Baseline Models | original training data: koskevaa mietintoa kasitellaan segmentation: koske+ +va+ +a mietinto+ +5 kasi+ +te+ +115+ +a+ +n (train bigram language model with mapping A = { a, 'a map final suflia‘ to abstract tag-set: koske+ +va+ +A mietinto+ +A kasi+ +te+ +11a+ +a+ +n (train CRF model to predict the final suffix) peeling of final suflia‘: koske+ +va+ mietinto+ kasi+ +te+ +11a+ +a+ (train SMT model on this transformation of training data) (a) Training |