Index of papers in Proc. ACL 2014 that mention
  • bigram
Bollegala, Danushka and Weir, David and Carroll, John
Distribution Prediction
For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
Distribution Prediction
Next, we generate bigrams of word lemmas and remove any bigrams that consists only of stop words.
Distribution Prediction
Bigram features capture negations more accurately than unigrams, and have been found to be useful for sentiment classification tasks.
bigram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Bhat, Suma and Xue, Huichao and Yoon, Su-Youn
Models for Measuring Grammatical Competence
Then, regarding POS bigrams as terms, they construct POS-based vector space models for each score-class (there are four score classes denoting levels of proficiency as will be explained in Section 5.2), thus yielding four score-specific vector-space models (VSMs).
Models for Measuring Grammatical Competence
0 0034: the cosine similarity score between the test response and the vector of POS bigrams for the highest score class (level 4); and,
Models for Measuring Grammatical Competence
First, the VSM-based method is likely to overestimate the contribution of the POS bigrams when highly correlated bigrams occur as terms in the VSM.
Related Work
In order to avoid the problems encountered with deep analysis-based measures, Yoon and Bhat (2012) explored a shallow analysis-based approach, based on the assumption that the level of grammar sophistication at each proficiency level is reflected in the distribution of part-of-speech (POS) tag bigrams .
Shallow-analysis approach to measuring syntactic complexity
The measures of syntactic complexity in this approach are POS bigrams and are not obtained by a deep analysis (syntactic parsing) of the structure of the sentence.
Shallow-analysis approach to measuring syntactic complexity
In a shallow-analysis approach to measuring syntactic complexity, we rely on the distribution of POS bigrams at every profi-
Shallow-analysis approach to measuring syntactic complexity
Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced).
bigram is mentioned in 22 sentences in this paper.
Topics mentioned in this paper:
Pei, Wenzhe and Ge, Tao and Chang, Baobao
Experiment
Model PKU MSRA Best05(Chen et al., 2005) 95.0 96.0 Best05(Tseng et al., 2005) 95.0 96.4 (Zhang et al., 2006) 95.1 97.1 (Zhang and Clark, 2007) 94.5 97.2 (Sun et al., 2009) 95.2 97.3 (Sun et al., 2012) 95.4 97.4 (Zhang et al., 2013) 96.1 97.4 MMTNN 94.0 94.9 MMTNN + bigram 95.2 97.2
Experiment
A very common feature in Chinese word segmentation is the character bigram feature.
Experiment
Formally, at the i-th character of a sentence cum] , the bigram features are ckck+1(i — 3 < k < z' + 2).
Introduction
Therefore, we integrate additional simple character bigram features into our model and the result shows that our model can achieve a competitive performance that other systems hardly achieve unless they use more complex task-specific features.
Related Work
Most previous systems address this task by using linear statistical models with carefully designed features such as bigram features, punctuation information (Li and Sun, 2009) and statistical information (Sun and Xu, 2011).
bigram is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Thadani, Kapil
Multi-Structure Sentence Compression
Following this, §2.3 discusses a dynamic program to find maximum weight bigram subsequences from the input sentence, while §2.4 covers LP relaxation-based approaches for approximating solutions to the problem of finding a maximum-weight subtree in a graph of potential output dependencies.
Multi-Structure Sentence Compression
C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.
Multi-Structure Sentence Compression
where Qtok, ngr and 6dep are feature-based scoring functions for tokens, bigrams and dependencies respectively.
bigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq.
Evaluation
Table 4 presents the results of these variations; overall, by taking into account generated candidates appropriately and using bigrams (“SLP 2-gram”), we obtained a 1.13 BLEU gain on the test set.
Evaluation
Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams .
Generation & Propagation
Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings.
Generation & Propagation
We only consider target phrases whose source phrase is a bigram , but it is worth noting that the target phrases are of variable length.
Generation & Propagation
To generate new translation candidates using the baseline system, we decode each unlabeled source bigram to generate its m-best translations.
bigram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark
Approaches
The clusters are formed by a greedy hierachi-cal clustering algorithm that finds an assignment of words to classes by maximizing the likelihood of the training data under a latent-class bigram model.
Approaches
First, for SRL, it has been observed that feature bigrams (the concatenation of simple features such as a predicate’s POS tag and an argument’s word) are important for state-of-the-art (Zhao et al., 2009; Bjorkelund et al., 2009).
Approaches
We consider both template unigrams and bigrams , combining two templates in sequence.
Experiments
Each of 1G0 and 1GB also include 32 template bigrams selected by information gain on 1000 sentences—we select a different set of template bigrams for each dataset.
Experiments
However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
Experiments
In Czech, we disallowed template bigrams involving path-grams.
bigram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen
Evaluation
To our surprise, the Fixed Affix model does a slightly better job in reducing out of vocabulary than the Bigram Affix model.
Evaluation
WoTr 24.21 Bigram Affix Model TRR 25.
Morphology-based Vocabulary Expansion
We use two different models of morphology expansion in this paper: Fixed Affix model and Bigram Affix model.
Morphology-based Vocabulary Expansion
3.2.2 Bigram Affix Expansion Model
Morphology-based Vocabulary Expansion
In the Bigram Affix model, we do the same for the stem as in the Fixed Affix model, but for prefixes and suffixes, we create a bigram language model in the finite state machine.
bigram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Experiments
The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
Experiments
SVM is a support vector machine with unigram and bigram features.
Experiments
unigram, bigram , trigram 92.6 MAXENT POS, chunks, NE, supertags
Introduction
On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro
Complexity Analysis
For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it.
Complexity Analysis
NPY( bigram )a 0.750 0.802 17 m —NPY(trigram)a 0.757 0.807
Complexity Analysis
HDP( bigram )b 0.723 — 10 h —FitnessC — 0.667 — —Prop.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Related Work
We learn embedding for unigrams, bigrams and trigrams separately with same neural network and same parameter setting.
Related Work
25$ employs the embedding of unigrams, bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
Related Work
Lum, Lbi and Lm are the lookup tables of the unigram, bigram and trigram embedding, respectively.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tan, Chenhao and Lee, Lillian and Pang, Bo
Introduction
twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
Introduction
We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams ( bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
Introduction
16The tokens [at], [hashtag], [url] were ignored in the unigram-model case to prevent their undue influence, but retained in the bigram model to capture longer-range usage (“combination”) patterns.
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hui and Chiang, David
Count distributions
For example, suppose that our data consists of the following bigrams , with their weights:
Word Alignment
That is, during the E step, we calculate the distribution of C(e, f) for each e and f, and during the M step, we train a language model on bigrams e f using expected KN smoothing (that is, with u = e and w = f).
Word Alignment
(The latter case is equivalent to a backoff language model, where, since all bigrams are known, the lower-order model is never used.)
Word Alignment
This is much less of a problem in KN smoothing, where p’ is estimated from bigram types rather than bigram tokens.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Wang, Zhiguo and Xue, Nianwen
Joint POS Tagging and Parsing with Nonlocal Features
(Collins and Koo, 2005) (Charniak and Johnson, 2005) Rules CoPar HeadTree Bigrams CoLenPar
Joint POS Tagging and Parsing with Nonlocal Features
Grandparent Bigrams Heavy
Joint POS Tagging and Parsing with Nonlocal Features
Lexical Bigrams Neighbours
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ramanath, Rohan and Liu, Fei and Sadeh, Norman and Smith, Noah A.
Approach
In our formulation, each hidden state corresponds to an issue or topic, characterized by a distribution over words and bigrams appearing in privacy policy sections addressing that issue.
Approach
0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents.
Approach
models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram ).
Experiment
Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Prabhakaran, Vinodkumar and Rambow, Owen
Predicting Direction of Power
Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams ) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44
Predicting Direction of Power
We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set.
Predicting Direction of Power
We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Kobayashi, Hayato
Experiments
In the case of bigrams , the perpleXities of TheoryZ are almost the same as that of Zipf2 when the size of reduced vocabulary is large.
Perplexity on Reduced Corpora
This model seems to be stupid, since we can easily notice that the bigram “is is” is quite frequent, and the two bigrams “is a” and “a is” have the same frequency.
Perplexity on Reduced Corpora
For example, the decay function 92 of bigrams is as follows:
Perplexity on Reduced Corpora
They pointed out that the exponent of bigrams is about 0.66, and that of 5-grams is about 0.59 in the Wall Street Journal corpus (WSJ 87).
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Experiments
All of the three smoothing methods for bigram and trigram LMs are examined both using back-off mod-
Pinyin Input Method Model
The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Pinyin Input Method Model
WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM, by tracking longer history while traversing the graph.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip
Datasets
Their features come from the Linguistic Inquiry and Word Count lexicon (LIWC) (Pennebaker et al., 2001), as well as from lists of “sticky bigrams” (Brown et al., 1992) strongly associated with one party or another (e. g., “illegal aliens” implies conservative, “universal healthcare” implies liberal).
Datasets
We first extract the subset of sentences that contains any words in the LIWC categories of Negative Emotion, Positive Emotion, Causation, Anger, and Kill verbs.3 After computing a list of the top 100 sticky bigrams for each category, ranked by log-likelihood ratio, and selecting another subset from the original data that included only sentences containing at least one sticky bigram , we take the union of the two subsets.
Related Work
They use an HMM-based model, defining the states as a set of fine-grained political ideologies, and rely on a closed set of lexical bigram features associated with each ideology, inferred from a manually labeled ideological books corpus.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: