Index of papers in Proc. ACL 2013 that mention
  • bigram
Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine
Abstract
We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.1
Experiments 4.1 The data
Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context.
Experiments 4.1 The data
For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops
Experiments 4.1 The data
Note how the Unigram model always suffers in the LEARN- p condition whereas the Bigram model’s performance is actually best for LEARN- p in the left-right setting.
Introduction
We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms.
The computational model
Our models build on the Unigram and the Bigram model introduced in Goldwater et al.
The computational model
Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).
The computational model
Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.
bigram is mentioned in 31 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Ney, Hermann
Abstract
In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
Definitions
similarly define the bigram count N f f/ of f, f ’ E Vf as
Definitions
(a) N f fl are integer counts > 0 of bigrams found in the ciphertext ffv .
Definitions
(b) Given the first and last token of the cipher f1 and fN, the bigram counts involving the sentence boundary token $ need to fulfill
Introduction
Section 5 shows the connection between the quadratic assignment problem and decipherment using a bigram language model.
bigram is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Li, Chen and Qian, Xian and Liu, Yang
Abstract
In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework.
Abstract
For each bigram , a regression model is used to estimate its frequency in the reference summary.
Abstract
The regression model uses a variety of indicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary.
Introduction
They used bigrams as such language concepts.
Introduction
Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function.
Introduction
In this paper, we propose to find a candidate summary such that the language concepts (e. g., bigrams ) in this candidate summary and the reference summary can have the same frequency.
Proposed Method 2.1 Bigram Gain Maximization by ILP
We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work.
Proposed Method 2.1 Bigram Gain Maximization by ILP
In addition, we expect that the bigram oriented ILP is consistent with the ROUGE-2 measure widely used for summarization evaluation.
bigram is mentioned in 114 sentences in this paper.
Topics mentioned in this paper:
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Abstract
Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively.
Abstract
Compared to the distant bigram , we show that word-pairs can be more effectively modeled in terms of both distance and occurrence.
Language Modeling with TD and TO
The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model:
Perplexity Evaluation
As seen from the table, for lower order n- gram models, the complementary information captured by the TD and TO components reduced the perplexity up to 23.5% and 14.0%, for bigram and trigram models, respectively.
Perplexity Evaluation
ter modeling of word-pairs compared to the distant bigram model.
Perplexity Evaluation
Here we compare the perplexity of both, the distance-k bigram model and distance-k TD model (for values of k ranging from two to ten), when combined with a standard bi gram model.
Related Work
The distant bigram model (Huang et.al 1993, Simon et al.
Related Work
2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 .
Related Work
Each distance-k bigram model predicts the target-word based on the occurrence of a history-word located k positions behind.
bigram is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Experiments & Results 4.1 Experimental Setup
The measures are evaluated by fixing the window size to 4 and maximum candidate paraphrase length to 2 (e. g. bigram ).
Experiments & Results 4.1 Experimental Setup
- - -unigram --- bigram —trigram "" " quadgram
Experiments & Results 4.1 Experimental Setup
Type Node \ MRR % \ RCL % \ Bipartite unigram 5.2 12.5 bigram 6.8 15.7 Tripartite unigram 5.9 12.6 bigram 6.9 15.9 Baseline bigram 3.9 7.7
bigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wang, Aobo and Kan, Min-Yen
Methodology
In response to these difficulties in differentiating linguistic registers, we compute two different PMI scores for character-based bigrams from two large corpora representing news and microblogs as features.
Methodology
In addition, we also convert all the character-based bigrams into Pinyin-based bigrams (ignoring tones5) and compute the Pinyin-level PMI in the same way.
Methodology
These features capture inconsistent use of the bigram across the two domains, which assists to distinguish informal words.
bigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Chen, Ruey-Cheng
Change in Description Length
Suppose that the original sequence VVz-_1 is N -word long, the selected word type pair cc and 3/ each occurs k and 1 times, respectively, and altogether x-y bigram occurs m times in VVz-_1.
Change in Description Length
In the new sequence Wi, each of the m bigrams is replaced with an unseen word 2 = my.
Regularized Compression
Hence, a new sequence IV,-is created in the i-th iteration by merging all the occurrences of some selected bigram (cc, 3/) in the original sequence Wi_1.
Regularized Compression
Note that f(:c, y) is the bigram frequency, |Wi_1| the sequence length of VVi_1, and AH(W,-_1, m) = FI(W,-) — H(W,-_1) is the difference between the empirical Shannon entropy measured on 1% and VVi_1, using maximum likelihood estimates.
Regularized Compression
In the new sequence 1%, each occurrence of the 53-3; bigram is replaced with a new (conceptually unseen) word 2.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Maxwell, K. Tamsin and Oberlander, Jon and Croft, W. Bruce
Evaluation framework
The second clique contains query bigrams that match
Evaluation framework
document bigrams in 2-word ordered windows (‘#I’), A2 = 0.1.
Evaluation framework
The third clique uses the same bigrams as clique 2 with an 8-word unordered window (‘#uw8’), A3 = 0.05.
Introduction
Integration of the identified catenae in queries also improves IR effectiveness compared to a highly effective baseline that uses sequential bigrams with no linguistic knowledge.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo
Baseline parser
bigrams sowslw, sowslc, socslw, 300310, sowqow, sowqot, socqow, socqot, qo’wa’w, (1010(1175, (10731110, (Jotmt, slwqow, slwqot, slcqow, slcqot
Semi-supervised Parsing with Large Data
2 From the dependency trees, we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl).
Semi-supervised Parsing with Large Data
(2009), we assign categories to bigram and trigram items separately according to their frequency counts.
Semi-supervised Parsing with Large Data
Hereafter, we refer to the bigram and trigram lexical dependency lists as BLD and TLD, respectively.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Kim, Joohyun and Mooney, Raymond
Reranking Features
Bigram .
Reranking Features
Indicates whether a given bigram of nonterminal/terminals occurs for given a parent nonterminal: f (L1 —> L2 : L3) = l.
Reranking Features
Grandparent Bigram .
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kim, Young-Bum and Snyder, Benjamin
Analysis
with a bigram HMM with four language clusters.
Inference
where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
Inference
where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
Model
We note that in practice, we implemented a trigram version of the model,2 but we present the bigram version here for notational clarity.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hagiwara, Masato and Sekine, Satoshi
Introduction
Knowing that the back-transliterated un-igram “blacki” and bigram “blacki shred” are unlikely in English can promote the correct WS, jfiafi‘yi/z/I/yh“ “blackish red”.
Use of Language Model
As the English LM, we used Google Web 1T 5-gram Version 1 (Brants and Franz, 2006), limiting it to unigrams occurring more than 2000 times and bigrams occurring more than 500 times.
Word Segmentation Model
We limit the features to word unigram and bigram features, i'e'7 My) 2 Zil¢1<wi) + ¢2(7~Ui—1awi)l for y = wl...wn.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kozareva, Zornitsa
Task A: Polarity Classification
We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams .
Task A: Polarity Classification
In this paper, we will refer from now on to n-grams as the combination of unigrams and bigrams .
Task B: Valence Prediction
Those include n-grams (unigrams, bigrams and combination of the two), LIWC scores.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng
Experiments
Besides unigram and bigram , the most effective textual feature is URL.
Proposed Features
3.1.1 Unigrams and Bigrams The most common type of feature for text classi-
Proposed Features
feature selection method X2 (Yang and Pedersen, 1997) to select the top 200 unigrams and bigrams as features.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Phrase Ranking based on Relevance
This thread of research models bigrams by encoding them into the generative process.
Phrase Ranking based on Relevance
For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution.
Phrase Ranking based on Relevance
In (Tomokiyo and Hurst, 2003), a language model approach is used for bigram phrase extraction.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Almeida, Miguel and Martins, Andre
Compressive Summarization
(2011), we used stemmed word bigrams as concepts, to which we associate the following concept features ((DCOV): indicators for document counts, features indicating if each of the words in the bigram is a stop-word, the earliest position in a document each concept occurs, as well as two and three-way conjunctions of these features.
Experiments
We generated oracle extracts by maximizing bigram recall with respect to the manual abstracts, as described in Berg-Kirkpatrick et al.
Extractive Summarization
1Previous work has modeled concepts as events (Filatova and Hatzivassiloglou, 2004), salient words (Lin and Bilmes, 2010), and word bigrams (Gillick etal., 2008).
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Roark, Brian and Allauzen, Cyril and Riley, Michael
Introduction
For example, if a bigram parameter is modified due to the presence of some set of trigrams, and then some or all of those trigrams are pruned from the model, the bigram associated with the modified parameter will be unlikely to have an overall expected frequency equal to its observed frequency anymore.
Marginal distribution constraints
Thus the unigram distribution is with respect to the bigram model, the bigram model is with respect to the trigram model, and so forth.
Model constraint algorithm
This can be particularly clearly seen at the unigram state, which has an arc for every unigram (the size of the vocabulary): for every bigram state (also order of the vocabulary), in the naive algorithm we must look for every possible arc.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire
Results
The results in Table 5 use the official ROUGE software with standard options5 and report ROUGE-2 (R-2) (measures bigram overlap) and ROUGE-SU4 (R-SU4) (measures unigram and skip-bigram separated by up to four words).
The Framework
unigram/bigram/skip bigram (at most four words apart) overlap unigram/bigram TF/TF—IDF similarity
The Framework
2011; Ouyang et al., 2011), we use the ROUGE-2 score, which measures bigram overlap between a sentence and the abstracts, as the objective for regression.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine
Using subcategorization information
The model has access to the basic features stem and tag, as well as the new features based on subcat-egorizaion information (explained below), using unigrams within a Window of up to four positions to the right and the left of the current position, as well as bigrams and trigrams for stems and tags (current item + left and/or right item).
Using subcategorization information
In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,
Using subcategorization information
By providing the parts of the tuple as unigrams, bigrams or trigrams to the CRF, all relevant information is available: verb, noun and the probabilities for the potential functions of the noun in the sentence.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: