Index of papers in Proc. ACL 2014 that mention

bigram

Seen in text as:

bigram (79)
bigrams (66)
Bigram (10)
Bigrams (6)

Seen in 144 sentences in 18 papers.

1. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distribution Prediction	For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
Distribution Prediction	Next, we generate bigrams of word lemmas and remove any bigrams that consists only of stop words.
Distribution Prediction	Bigram features capture negations more accurately than unigrams, and have been found to be useful for sentiment classification tasks.

bigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

2. Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring

Bhat, Suma and Xue, Huichao and Yoon, Su-Youn

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models for Measuring Grammatical Competence	Then, regarding POS bigrams as terms, they construct POS-based vector space models for each score-class (there are four score classes denoting levels of proficiency as will be explained in Section 5.2), thus yielding four score-specific vector-space models (VSMs).
Models for Measuring Grammatical Competence	0 0034: the cosine similarity score between the test response and the vector of POS bigrams for the highest score class (level 4); and,
Models for Measuring Grammatical Competence	First, the VSM-based method is likely to overestimate the contribution of the POS bigrams when highly correlated bigrams occur as terms in the VSM.
Related Work	In order to avoid the problems encountered with deep analysis-based measures, Yoon and Bhat (2012) explored a shallow analysis-based approach, based on the assumption that the level of grammar sophistication at each proficiency level is reflected in the distribution of part-of-speech (POS) tag bigrams .
Shallow-analysis approach to measuring syntactic complexity	The measures of syntactic complexity in this approach are POS bigrams and are not obtained by a deep analysis (syntactic parsing) of the structure of the sentence.
Shallow-analysis approach to measuring syntactic complexity	In a shallow-analysis approach to measuring syntactic complexity, we rely on the distribution of POS bigrams at every profi-
Shallow-analysis approach to measuring syntactic complexity	Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced).

bigram is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

3. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	Model PKU MSRA Best05(Chen et al., 2005) 95.0 96.0 Best05(Tseng et al., 2005) 95.0 96.4 (Zhang et al., 2006) 95.1 97.1 (Zhang and Clark, 2007) 94.5 97.2 (Sun et al., 2009) 95.2 97.3 (Sun et al., 2012) 95.4 97.4 (Zhang et al., 2013) 96.1 97.4 MMTNN 94.0 94.9 MMTNN + bigram 95.2 97.2
Experiment	A very common feature in Chinese word segmentation is the character bigram feature.
Experiment	Formally, at the i-th character of a sentence cum] , the bigram features are ckck+1(i — 3 < k < z' + 2).
Introduction	Therefore, we integrate additional simple character bigram features into our model and the result shows that our model can achieve a competitive performance that other systems hardly achieve unless they use more complex task-specific features.
Related Work	Most previous systems address this task by using linear statistical models with carefully designed features such as bigram features, punctuation information (Li and Sun, 2009) and statistical information (Sun and Xu, 2011).

bigram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Approximation Strategies for Multi-Structure Sentence Compression

Thadani, Kapil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Multi-Structure Sentence Compression	Following this, §2.3 discusses a dynamic program to find maximum weight bigram subsequences from the input sentence, while §2.4 covers LP relaxation-based approaches for approximating solutions to the problem of finding a maximum-weight subtree in a graph of potential output dependencies.
Multi-Structure Sentence Compression	C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.
Multi-Structure Sentence Compression	where Qtok, ngr and 6dep are feature-based scoring functions for tokens, bigrams and dependencies respectively.

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

ILP (19)
bigram (13)
spanning tree (9)

5. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq.
Evaluation	Table 4 presents the results of these variations; overall, by taking into account generated candidates appropriately and using bigrams (“SLP 2-gram”), we obtained a 1.13 BLEU gain on the test set.
Evaluation	Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams .
Generation & Propagation	Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings.
Generation & Propagation	We only consider target phrases whose source phrase is a bigram , but it is worth noting that the target phrases are of variable length.
Generation & Propagation	To generate new translation candidates using the baseline system, we decode each unlabeled source bigram to generate its m-best translations.

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

6. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approaches	The clusters are formed by a greedy hierachi-cal clustering algorithm that finds an assignment of words to classes by maximizing the likelihood of the training data under a latent-class bigram model.
Approaches	First, for SRL, it has been observed that feature bigrams (the concatenation of simple features such as a predicate’s POS tag and an argument’s word) are important for state-of-the-art (Zhao et al., 2009; Bjorkelund et al., 2009).
Approaches	We consider both template unigrams and bigrams , combining two templates in sequence.
Experiments	Each of 1G0 and 1GB also include 32 template bigrams selected by information gain on 1000 sentences—we select a different set of template bigrams for each dataset.
Experiments	However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
Experiments	In Czech, we disallowed template bigrams involving path-grams.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. Unsupervised Morphology-Based Vocabulary Expansion

Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	To our surprise, the Fixed Affix model does a slightly better job in reducing out of vocabulary than the Bigram Affix model.
Evaluation	WoTr 24.21 Bigram Affix Model TRR 25.
Morphology-based Vocabulary Expansion	We use two different models of morphology expansion in this paper: Fixed Affix model and Bigram Affix model.
Morphology-based Vocabulary Expansion	3.2.2 Bigram Affix Expansion Model
Morphology-based Vocabulary Expansion	In the Bigram Affix model, we do the same for the stem as in the Fixed Affix model, but for prefixes and suffixes, we create a bigram language model in the finite state machine.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

8. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
Experiments	SVM is a support vector machine with unigram and bigram features.
Experiments	unigram, bigram , trigram 92.6 MAXENT POS, chunks, NE, supertags
Introduction	On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Complexity Analysis	For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it.
Complexity Analysis	NPY( bigram )a 0.750 0.802 17 m —NPY(trigram)a 0.757 0.807
Complexity Analysis	HDP( bigram )b 0.723 — 10 h —FitnessC — 0.667 — —Prop.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)

10. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	We learn embedding for unigrams, bigrams and trigrams separately with same neural network and same parameter setting.
Related Work	25$ employs the embedding of unigrams, bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
Related Work	Lum, Lbi and Lm are the lookup tables of the unigram, bigram and trigram embedding, respectively.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

11. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter

Tan, Chenhao and Lee, Lillian and Pang, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
Introduction	We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams ( bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
Introduction	16The tokens [at], [hashtag], [url] were ignored in the unigram-model case to prevent their undue influence, but retained in the bigram model to capture longer-range usage (“combination”) patterns.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

12. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Count distributions	For example, suppose that our data consists of the following bigrams , with their weights:
Word Alignment	That is, during the E step, we calculate the distribution of C(e, f) for each e and f, and during the M step, we train a language model on bigrams e f using expected KN smoothing (that is, with u = e and w = f).
Word Alignment	(The latter case is equivalent to a backoff language model, where, since all bigrams are known, the lower-order model is never used.)
Word Alignment	This is much less of a problem in KN smoothing, where p’ is estimated from bigram types rather than bigram tokens.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Joint POS Tagging and Parsing with Nonlocal Features	(Collins and Koo, 2005) (Charniak and Johnson, 2005) Rules CoPar HeadTree Bigrams CoLenPar
Joint POS Tagging and Parsing with Nonlocal Features	Grandparent Bigrams Heavy
Joint POS Tagging and Parsing with Nonlocal Features	Lexical Bigrams Neighbours

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Unsupervised Alignment of Privacy Policies using Hidden Markov Models

Ramanath, Rohan and Liu, Fei and Sadeh, Norman and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	In our formulation, each hidden state corresponds to an issue or topic, characterized by a distribution over words and bigrams appearing in privacy policy sections addressing that issue.
Approach	0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents.
Approach	models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram ).
Experiment	Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Predicting Power Relations between Participants in Written Dialog from a Single Thread

Prabhakaran, Vinodkumar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Predicting Direction of Power	Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams ) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44
Predicting Direction of Power	We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set.
Predicting Direction of Power	We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Perplexity on Reduced Corpora

Kobayashi, Hayato

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the case of bigrams , the perpleXities of TheoryZ are almost the same as that of Zipf2 when the size of reduced vocabulary is large.
Perplexity on Reduced Corpora	This model seems to be stupid, since we can easily notice that the bigram “is is” is quite frequent, and the two bigrams “is a” and “a is” have the same frequency.
Perplexity on Reduced Corpora	For example, the decay function 92 of bigrams is as follows:
Perplexity on Reduced Corpora	They pointed out that the exponent of bigrams is about 0.66, and that of 5-grams is about 0.59 in the Wall Street Journal corpus (WSJ 87).

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

unigram (13)
topic models (12)
LDA (10)

17. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	All of the three smoothing methods for bigram and trigram LMs are examined both using back-off mod-
Pinyin Input Method Model	The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Pinyin Input Method Model	WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM, by tracking longer history while traversing the graph.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Datasets	Their features come from the Linguistic Inquiry and Word Count lexicon (LIWC) (Pennebaker et al., 2001), as well as from lists of “sticky bigrams” (Brown et al., 1992) strongly associated with one party or another (e. g., “illegal aliens” implies conservative, “universal healthcare” implies liberal).
Datasets	We first extract the subset of sentences that contains any words in the LIWC categories of Negative Emotion, Positive Emotion, Causation, Anger, and Kill verbs.3 After computing a list of the top 100 sticky bigrams for each category, ranked by log-likelihood ratio, and selecting another subset from the original data that included only sentences containing at least one sticky bigram , we take the union of the two subsets.
Related Work	They use an HMM-based model, defining the states as a set of fine-grained political ideologies, and rely on a closed set of lexical bigram features associated with each ideology, inferred from a manually labeled ideological books corpus.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: