Abstract | We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.1 |
Experiments 4.1 The data | Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context. |
Experiments 4.1 The data | For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops |
Experiments 4.1 The data | Note how the Unigram model always suffers in the LEARN- p condition whereas the Bigram model’s performance is actually best for LEARN- p in the left-right setting. |
Introduction | We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms. |
The computational model | Our models build on the Unigram and the Bigram model introduced in Goldwater et al. |
The computational model | Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1). |
The computational model | Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation. |
Abstract | In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard. |
Definitions | similarly define the bigram count N f f/ of f, f ’ E Vf as |
Definitions | (a) N f fl are integer counts > 0 of bigrams found in the ciphertext ffv . |
Definitions | (b) Given the first and last token of the cipher f1 and fN, the bigram counts involving the sentence boundary token $ need to fulfill |
Introduction | Section 5 shows the connection between the quadratic assignment problem and decipherment using a bigram language model. |
Abstract | In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework. |
Abstract | For each bigram , a regression model is used to estimate its frequency in the reference summary. |
Abstract | The regression model uses a variety of indicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary. |
Introduction | They used bigrams as such language concepts. |
Introduction | Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function. |
Introduction | In this paper, we propose to find a candidate summary such that the language concepts (e. g., bigrams ) in this candidate summary and the reference summary can have the same frequency. |
Proposed Method 2.1 Bigram Gain Maximization by ILP | We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work. |
Proposed Method 2.1 Bigram Gain Maximization by ILP | In addition, we expect that the bigram oriented ILP is consistent with the ROUGE-2 measure widely used for summarization evaluation. |
Abstract | Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively. |
Abstract | Compared to the distant bigram , we show that word-pairs can be more effectively modeled in terms of both distance and occurrence. |
Language Modeling with TD and TO | The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model: |
Perplexity Evaluation | As seen from the table, for lower order n- gram models, the complementary information captured by the TD and TO components reduced the perplexity up to 23.5% and 14.0%, for bigram and trigram models, respectively. |
Perplexity Evaluation | ter modeling of word-pairs compared to the distant bigram model. |
Perplexity Evaluation | Here we compare the perplexity of both, the distance-k bigram model and distance-k TD model (for values of k ranging from two to ten), when combined with a standard bi gram model. |
Related Work | The distant bigram model (Huang et.al 1993, Simon et al. |
Related Work | 2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 . |
Related Work | Each distance-k bigram model predicts the target-word based on the occurrence of a history-word located k positions behind. |
Experiments & Results 4.1 Experimental Setup | The measures are evaluated by fixing the window size to 4 and maximum candidate paraphrase length to 2 (e. g. bigram ). |
Experiments & Results 4.1 Experimental Setup | - - -unigram --- bigram —trigram "" " quadgram |
Experiments & Results 4.1 Experimental Setup | Type Node \ MRR % \ RCL % \ Bipartite unigram 5.2 12.5 bigram 6.8 15.7 Tripartite unigram 5.9 12.6 bigram 6.9 15.9 Baseline bigram 3.9 7.7 |
Methodology | In response to these difficulties in differentiating linguistic registers, we compute two different PMI scores for character-based bigrams from two large corpora representing news and microblogs as features. |
Methodology | In addition, we also convert all the character-based bigrams into Pinyin-based bigrams (ignoring tones5) and compute the Pinyin-level PMI in the same way. |
Methodology | These features capture inconsistent use of the bigram across the two domains, which assists to distinguish informal words. |
Change in Description Length | Suppose that the original sequence VVz-_1 is N -word long, the selected word type pair cc and 3/ each occurs k and 1 times, respectively, and altogether x-y bigram occurs m times in VVz-_1. |
Change in Description Length | In the new sequence Wi, each of the m bigrams is replaced with an unseen word 2 = my. |
Regularized Compression | Hence, a new sequence IV,-is created in the i-th iteration by merging all the occurrences of some selected bigram (cc, 3/) in the original sequence Wi_1. |
Regularized Compression | Note that f(:c, y) is the bigram frequency, |Wi_1| the sequence length of VVi_1, and AH(W,-_1, m) = FI(W,-) — H(W,-_1) is the difference between the empirical Shannon entropy measured on 1% and VVi_1, using maximum likelihood estimates. |
Regularized Compression | In the new sequence 1%, each occurrence of the 53-3; bigram is replaced with a new (conceptually unseen) word 2. |
Evaluation framework | The second clique contains query bigrams that match |
Evaluation framework | document bigrams in 2-word ordered windows (‘#I’), A2 = 0.1. |
Evaluation framework | The third clique uses the same bigrams as clique 2 with an 8-word unordered window (‘#uw8’), A3 = 0.05. |
Introduction | Integration of the identified catenae in queries also improves IR effectiveness compared to a highly effective baseline that uses sequential bigrams with no linguistic knowledge. |
Baseline parser | bigrams sowslw, sowslc, socslw, 300310, sowqow, sowqot, socqow, socqot, qo’wa’w, (1010(1175, (10731110, (Jotmt, slwqow, slwqot, slcqow, slcqot |
Semi-supervised Parsing with Large Data | 2 From the dependency trees, we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl). |
Semi-supervised Parsing with Large Data | (2009), we assign categories to bigram and trigram items separately according to their frequency counts. |
Semi-supervised Parsing with Large Data | Hereafter, we refer to the bigram and trigram lexical dependency lists as BLD and TLD, respectively. |
Reranking Features | Bigram . |
Reranking Features | Indicates whether a given bigram of nonterminal/terminals occurs for given a parent nonterminal: f (L1 —> L2 : L3) = l. |
Reranking Features | Grandparent Bigram . |
Analysis | with a bigram HMM with four language clusters. |
Inference | where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1). |
Inference | where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively. |
Model | We note that in practice, we implemented a trigram version of the model,2 but we present the bigram version here for notational clarity. |
Introduction | Knowing that the back-transliterated un-igram “blacki” and bigram “blacki shred” are unlikely in English can promote the correct WS, jfiafi‘yi/z/I/yh“ “blackish red”. |
Use of Language Model | As the English LM, we used Google Web 1T 5-gram Version 1 (Brants and Franz, 2006), limiting it to unigrams occurring more than 2000 times and bigrams occurring more than 500 times. |
Word Segmentation Model | We limit the features to word unigram and bigram features, i'e'7 My) 2 Zil¢1<wi) + ¢2(7~Ui—1awi)l for y = wl...wn. |
Task A: Polarity Classification | We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams . |
Task A: Polarity Classification | In this paper, we will refer from now on to n-grams as the combination of unigrams and bigrams . |
Task B: Valence Prediction | Those include n-grams (unigrams, bigrams and combination of the two), LIWC scores. |
Experiments | Besides unigram and bigram , the most effective textual feature is URL. |
Proposed Features | 3.1.1 Unigrams and Bigrams The most common type of feature for text classi- |
Proposed Features | feature selection method X2 (Yang and Pedersen, 1997) to select the top 200 unigrams and bigrams as features. |
Phrase Ranking based on Relevance | This thread of research models bigrams by encoding them into the generative process. |
Phrase Ranking based on Relevance | For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution. |
Phrase Ranking based on Relevance | In (Tomokiyo and Hurst, 2003), a language model approach is used for bigram phrase extraction. |
Compressive Summarization | (2011), we used stemmed word bigrams as concepts, to which we associate the following concept features ((DCOV): indicators for document counts, features indicating if each of the words in the bigram is a stop-word, the earliest position in a document each concept occurs, as well as two and three-way conjunctions of these features. |
Experiments | We generated oracle extracts by maximizing bigram recall with respect to the manual abstracts, as described in Berg-Kirkpatrick et al. |
Extractive Summarization | 1Previous work has modeled concepts as events (Filatova and Hatzivassiloglou, 2004), salient words (Lin and Bilmes, 2010), and word bigrams (Gillick etal., 2008). |
Introduction | For example, if a bigram parameter is modified due to the presence of some set of trigrams, and then some or all of those trigrams are pruned from the model, the bigram associated with the modified parameter will be unlikely to have an overall expected frequency equal to its observed frequency anymore. |
Marginal distribution constraints | Thus the unigram distribution is with respect to the bigram model, the bigram model is with respect to the trigram model, and so forth. |
Model constraint algorithm | This can be particularly clearly seen at the unigram state, which has an arc for every unigram (the size of the vocabulary): for every bigram state (also order of the vocabulary), in the naive algorithm we must look for every possible arc. |
Results | The results in Table 5 use the official ROUGE software with standard options5 and report ROUGE-2 (R-2) (measures bigram overlap) and ROUGE-SU4 (R-SU4) (measures unigram and skip-bigram separated by up to four words). |
The Framework | unigram/bigram/skip bigram (at most four words apart) overlap unigram/bigram TF/TF—IDF similarity |
The Framework | 2011; Ouyang et al., 2011), we use the ROUGE-2 score, which measures bigram overlap between a sentence and the abstracts, as the objective for regression. |
Using subcategorization information | The model has access to the basic features stem and tag, as well as the new features based on subcat-egorizaion information (explained below), using unigrams within a Window of up to four positions to the right and the left of the current position, as well as bigrams and trigrams for stems and tags (current item + left and/or right item). |
Using subcategorization information | In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple, |
Using subcategorization information | By providing the parts of the tuple as unigrams, bigrams or trigrams to the CRF, all relevant information is available: verb, noun and the probabilities for the potential functions of the noun in the sentence. |