Index of papers in Proc. ACL that mention

bigram

Seen in text as:

bigram (486)
bigrams (364)
Bigram (90)
Bigrams (16)
BIGRAMS+ (7)
bigram’s (6)

Seen in 840 sentences in 91 papers.

1. Using Supervised Bigram-based ILP for Extractive Summarization

Li, Chen and Qian, Xian and Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we propose a bigram based supervised method for extractive document summarization in the integer linear programming (ILP) framework.
Abstract	For each bigram , a regression model is used to estimate its frequency in the reference summary.
Abstract	The regression model uses a variety of indicative features and is trained discriminatively to minimize the distance between the estimated and the ground truth bigram frequency in the reference summary.
Introduction	They used bigrams as such language concepts.
Introduction	Gillick and Favre (Gillick and Favre, 2009) used bigrams as concepts, which are selected from a subset of the sentences, and their document frequency as the weight in the objective function.
Introduction	In this paper, we propose to find a candidate summary such that the language concepts (e. g., bigrams ) in this candidate summary and the reference summary can have the same frequency.
Proposed Method 2.1 Bigram Gain Maximization by ILP	We choose bigrams as the language concepts in our proposed method since they have been successfully used in previous work.
Proposed Method 2.1 Bigram Gain Maximization by ILP	In addition, we expect that the bigram oriented ILP is consistent with the ROUGE-2 measure widely used for summarization evaluation.

bigram is mentioned in 114 sentences in this paper.

Topics mentioned in this paper:

bigrams (114)
ILP (67)
regression model (16)

2. Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Evaluated on the WSJ corpus, bigram and trigram model perplexity were reduced up to 23.5% and 14.0%, respectively.
Abstract	Compared to the distant bigram , we show that word-pairs can be more effectively modeled in terms of both distance and occurrence.
Language Modeling with TD and TO	The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model:
Perplexity Evaluation	As seen from the table, for lower order n- gram models, the complementary information captured by the TD and TO components reduced the perplexity up to 23.5% and 14.0%, for bigram and trigram models, respectively.
Perplexity Evaluation	ter modeling of word-pairs compared to the distant bigram model.
Perplexity Evaluation	Here we compare the perplexity of both, the distance-k bigram model and distance-k TD model (for values of k ranging from two to ten), when combined with a standard bi gram model.
Related Work	The distant bigram model (Huang et.al 1993, Simon et al.
Related Work	2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 .
Related Work	Each distance-k bigram model predicts the target-word based on the occurrence of a history-word located k positions behind.

bigram is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

3. A joint model of word segmentation and phonological variation for English word-final /t/-deletion

Börschinger, Benjamin and Johnson, Mark and Demuth, Katherine

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We find that Bigram dependencies are important for performing well on real data and for learning appropriate deletion probabilities for different contexts.1
Experiments 4.1 The data	Best performance for both the Unigram and the Bigram model in the GOLD-p condition is achieved under the left-right setting, in line with the standard analyses of /t/-deleti0n as primarily being determined by the preceding and the following context.
Experiments 4.1 The data	For the LEARN-p condition, the Bigram model still performs best in the left-right setting but the Unigram model’s performance drops
Experiments 4.1 The data	Note how the Unigram model always suffers in the LEARN- p condition whereas the Bigram model’s performance is actually best for LEARN- p in the left-right setting.
Introduction	We find that models that capture bigram dependencies between underlying forms provide considerably more accurate estimates of those probabilities than corresponding unigram or “bag of words” models of underlying forms.
The computational model	Our models build on the Unigram and the Bigram model introduced in Goldwater et al.
The computational model	Figure 1 shows the graphical model for our joint Bigram model (the Unigram case is trivially recovered by generating the Ums directly from L rather than from LUi,j_1).
The computational model	Figure l: The graphical model for our joint model of word-final /t/-deletion and Bigram word segmentation.

bigram is mentioned in 31 sentences in this paper.

Topics mentioned in this paper:

4. Shallow Analysis Based Assessment of Syntactic Complexity for Automated Speech Scoring

Bhat, Suma and Xue, Huichao and Yoon, Su-Youn

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models for Measuring Grammatical Competence	Then, regarding POS bigrams as terms, they construct POS-based vector space models for each score-class (there are four score classes denoting levels of proficiency as will be explained in Section 5.2), thus yielding four score-specific vector-space models (VSMs).
Models for Measuring Grammatical Competence	0 0034: the cosine similarity score between the test response and the vector of POS bigrams for the highest score class (level 4); and,
Models for Measuring Grammatical Competence	First, the VSM-based method is likely to overestimate the contribution of the POS bigrams when highly correlated bigrams occur as terms in the VSM.
Related Work	In order to avoid the problems encountered with deep analysis-based measures, Yoon and Bhat (2012) explored a shallow analysis-based approach, based on the assumption that the level of grammar sophistication at each proficiency level is reflected in the distribution of part-of-speech (POS) tag bigrams .
Shallow-analysis approach to measuring syntactic complexity	The measures of syntactic complexity in this approach are POS bigrams and are not obtained by a deep analysis (syntactic parsing) of the structure of the sentence.
Shallow-analysis approach to measuring syntactic complexity	In a shallow-analysis approach to measuring syntactic complexity, we rely on the distribution of POS bigrams at every profi-
Shallow-analysis approach to measuring syntactic complexity	Consider the two sentence fragments below taken from actual responses (the bigrams of interest and their associated POS tags are boldfaced).

bigram is mentioned in 22 sentences in this paper.

Topics mentioned in this paper:

5. Learning to Predict Distributions of Words Across Domains

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distribution Prediction	For this purpose, we represent a word 21) using unigrams and bigrams that co-occur with w in a sentence as follows.
Distribution Prediction	Next, we generate bigrams of word lemmas and remove any bigrams that consists only of stop words.
Distribution Prediction	Bigram features capture negations more accurately than unigrams, and have been found to be useful for sentiment classification tasks.

bigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

6. Integrating history-length interpolation and classes in language modeling

Schütze, Hinrich

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	256,873 unique unigrams and 4,494,222 unique bigrams .
Experimental Setup	We cluster unigrams (i = l) and bigrams (i = 2).
Experimental Setup	SRILM does not directly support bigram clustering.
Models	The parameters d’, d”, and d’” are the discounts for unigrams, bigrams and trigrams, respectively, as defined by Chen and Goodman (1996, p. 20, (26)).
Models	bigram ) histories that is covered by the clusters.
Models	We cluster bigram histories and unigram histories separately and write 193 (7.03 \|w1w2) for the bigram cluster model and pB(w3\|w2) for the unigram cluster model.

bigram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

7. Max-Margin Tensor Neural Network for Chinese Word Segmentation

Pei, Wenzhe and Ge, Tao and Chang, Baobao

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	Model PKU MSRA Best05(Chen et al., 2005) 95.0 96.0 Best05(Tseng et al., 2005) 95.0 96.4 (Zhang et al., 2006) 95.1 97.1 (Zhang and Clark, 2007) 94.5 97.2 (Sun et al., 2009) 95.2 97.3 (Sun et al., 2012) 95.4 97.4 (Zhang et al., 2013) 96.1 97.4 MMTNN 94.0 94.9 MMTNN + bigram 95.2 97.2
Experiment	A very common feature in Chinese word segmentation is the character bigram feature.
Experiment	Formally, at the i-th character of a sentence cum] , the bigram features are ckck+1(i — 3 < k < z' + 2).
Introduction	Therefore, we integrate additional simple character bigram features into our model and the result shows that our model can achieve a competitive performance that other systems hardly achieve unless they use more complex task-specific features.
Related Work	Most previous systems address this task by using linear statistical models with carefully designed features such as bigram features, punctuation information (Li and Sun, 2009) and statistical information (Sun and Xu, 2011).

bigram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

8. Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Bollegala, Danushka and Weir, David and Carroll, John

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Motivating Example	( bigrams ) survey+development, develop-ment+civilization
Feature Expansion	,w N}, where the elements 212,- are either unigrams or bigrams that appear in the review d. We then represent a review d by a real-valued term-frequency vector d 6 RN , where the value of the j-th element dj is set to the total number of occurrences of the unigram or bigram wj in the review d. To find the suitable candidates to expand a vector d for the review d, we define a ranking score score(ui, d) for each base entry in the thesaurus as follows:
Feature Expansion	Moreover, we weight the relatedness scores for each word wj by its normalized term-frequency to emphasize the salient unigrams and bigrams in a review.
Feature Expansion	This is particularly important because we would like to score base entries ui considering all the unigrams and bigrams that appear in a review d, instead of considering each unigram or bigram individually.
Introduction	a unigram or a bigram of word lemma) in a review using a feature vector.
Sentiment Sensitive Thesaurus	We select unigrams and bigrams from each sentence.
Sentiment Sensitive Thesaurus	For the remainder of this paper, we will refer to unigrams and bigrams collectively as lexical elements.
Sentiment Sensitive Thesaurus	Previous work on sentiment classification has shown that both unigrams and bigrams are useful for training a sentiment classifier (Blitzer et al., 2007).

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

9. A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

Blunsom, Phil and Cohn, Trevor

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	This work differs from previous Bayesian models in that we explicitly model a complex backoff path using a hierachical prior, such that our model jointly infers distributions over tag trigrams, bigrams and unigrams and whole words and their character level representation.
Experiments	mkcls (Och, 1999) 73.7 65.6 MLE 1HMM—LM (Clark, 2003)* 71.2 65.5 BHMM (GG07) 63.2 56.2 PR (Ganchev et al., 2010)* 62.5 54.8 Trigram PYP—HMM 69.8 62.6 Trigram PYP-lHMM 76.0 68.0 Trigram PYP—lHMM-LM 77.5 69.7 Bigram PYP-HMM 66.9 59.2 Bigram PYP— 1HMM 72.9 65.9 Trigram DP—HMM 68.1 60.0 Trigram DP— 1HMM 76.0 68.0 Trigram DP— 1HMM—LM 76.8 69.8
Experiments	If we restrict the model to bigrams we see a considerable drop in performance.
The PYP-HMM	The trigram transition distribution, Tij, is drawn from a hierarchical PYP prior which backs off to a bigram Bj and then a unigram U distribution,
The PYP-HMM	This allows the modelling of trigram tag sequences, while smoothing these estimates with their corresponding bigram and unigram distributions.
The PYP-HMM	We formulate the character—level language model as a bigram model over the character sequence comprising word 7111,

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

10. Jointly Learning to Extract and Compress

Berg-Kirkpatrick, Taylor and Gillick, Dan and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Joint Model	While past extractive methods have assigned value to individual sentences and then explicitly represented the notion of redundancy (Carbonell and Goldstein, 1998), recent methods show greater success by using a simpler notion of coverage: bigrams
Joint Model	Note that there is intentionally a bigram missing from (a).
Joint Model	contribute content, and redundancy is implicitly encoded in the fact that redundant sentences cover fewer bigrams (Nenkova and Vanderwende, 2005; Gillick and Favre, 2009).
Structured Learning	We use bigram recall as our loss function (see Section 3.3).
Structured Learning	Luckily, our choice of loss function, bigram recall, factors over bigrams .
Structured Learning	We simply modify each bigram value 2);, to include bigram b’s contribution to the total loss.

bigram is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

11. Decipherment Complexity in 1:1 Substitution Ciphers

Nuhn, Malte and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
Definitions	similarly define the bigram count N f f/ of f, f ’ E Vf as
Definitions	(a) N f fl are integer counts > 0 of bigrams found in the ciphertext ffv .
Definitions	(b) Given the first and last token of the cipher f1 and fN, the bigram counts involving the sentence boundary token $ need to fulfill
Introduction	Section 5 shows the connection between the quadratic assignment problem and decipherment using a bigram language model.

bigram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

12. Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Ravi, Sujith and Baldridge, Jason and Knight, Kevin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Most methods have employed some variant of Expectation Maximization (EM) to learn parameters for a bigram
Introduction	Ravi and Knight (2009) achieved the best results thus far (92.3% word token accuracy) via a Minimum Description Length approach using an integer program (IP) that finds a minimal bigram grammar that obeys the tag dictionary constraints and covers the observed data.
Minimized models for supertagging	The 1241 distinct supertags in the tagset result in 1.5 million tag bigram entries in the model and the dictionary contains almost 3.5 million word/tag pairs that are relevant to the test data.
Minimized models for supertagging	The set of 45 P08 tags for the same data yields 2025 tag bigrams and 8910 dictionary entries.
Minimized models for supertagging	Our objective is to find the smallest supertag grammar (of tag bigram types) that explains the entire text while obeying the lexicon’s constraints.

bigram is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

13. Detecting Errors in Automatically-Parsed Dependency Relations

Dickinson, Markus

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Ad hoc rule detection	3.4 Bigram anomalies 3.4.1 Motivation
Ad hoc rule detection	The bigram method examines relationships between adjacent sisters, complementing the whole rule method by focusing on local properties.
Ad hoc rule detection	But only the final elements have anomalous bigrams : HD:ID IR:IR, IR:IR ANzRO, and ANzRO J RzIR all never occur.
Additional information	This rule is entirely correct, yet the XXzXX position has low whole rule and bigram scores.
Approach	First, the bigram method abstracts a rule to its bigrams .
Evaluation	For example, the bigram method with a threshold of 39 leads to finding 283 errors (455 x .622).
Evaluation	The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre-
Introduction and Motivation	We propose to flag erroneous parse rules, using information which reflects different grammatical properties: POS lookup, bigram information, and full rule comparisons.

bigram is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

14. Minimized Models for Unsupervised Part-of-Speech Tagging

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Fitting the Model	We still give EM the full word/tag dictionary, but now we constrain its initial grammar model to the 459 tag bigrams identified by IP.
Fitting the Model	In addition to removing many bad tag bigrams from the grammar, IP minimization also removes some of the good ones, leading to lower recall (EM = 0.87, IP+EM = 0.57).
Fitting the Model	During EM training, the smaller grammar with fewer bad tag bigrams helps to restrict the dictionary model from making too many bad choices that EM made earlier.
Small Models	That is, there exists a tag sequence that contains 459 distinct tag bigrams , and no other tag sequence contains fewer.
Small Models	We also create variables for every possible tag bigram and word/tag dictionary entry.
What goes wrong with EM?	We investigate the Viterbi tag sequence generated by EM training and count how many distinct tag bigrams there are in that sequence.

bigram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

15. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	In our first set of experiments, we looked at the impact of choosing bigrams over unigrams as our basic unit of representation, along with performance of LP (Eq.
Evaluation	Table 4 presents the results of these variations; overall, by taking into account generated candidates appropriately and using bigrams (“SLP 2-gram”), we obtained a 1.13 BLEU gain on the test set.
Evaluation	Using unigrams (“SLP l-gram”) actually does worse than the baseline, indicating the importance of focusing on translations for sparser bigrams .
Generation & Propagation	Although our technique applies to phrases of any length, in this work we concentrate on unigram and bigram phrases, which provides substantial computational cost savings.
Generation & Propagation	We only consider target phrases whose source phrase is a bigram , but it is worth noting that the target phrases are of variable length.
Generation & Propagation	To generate new translation candidates using the baseline system, we decode each unlabeled source bigram to generate its m-best translations.

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

16. Approximation Strategies for Multi-Structure Sentence Compression

Thadani, Kapil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Multi-Structure Sentence Compression	Following this, §2.3 discusses a dynamic program to find maximum weight bigram subsequences from the input sentence, while §2.4 covers LP relaxation-based approaches for approximating solutions to the problem of finding a maximum-weight subtree in a graph of potential output dependencies.
Multi-Structure Sentence Compression	C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.
Multi-Structure Sentence Compression	where Qtok, ngr and 6dep are feature-based scoring functions for tokens, bigrams and dependencies respectively.

bigram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

ILP (19)
bigram (13)
spanning tree (9)

17. Selecting Query Term Alternations for Web Search by Exploiting Query Contexts

Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model), or according to its expected impact on the retrieval effectiveness (using a regression model).
Bigram Expansion Model for Alteration Selection	The query context is modeled by a bigram language model as in (Peng et al.
Bigram Expansion Model for Alteration Selection	In this work, we used bigram language model to calculate the probability of each path.
Introduction	The query context is modeled by a bigram language model.
Introduction	which is the most coherent with the bigram model.
Introduction	We call this model Bigram Expansion.
Related Work	2007), a bigram language model is used to determine the alteration of the head word that best fits the query.
Related Work	In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates.

bigram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

18. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Abstract	The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass.
Decoding to Maximize BLEU	The outside-pass Algorithm 1 for bigram decoding can be generalized to the trigram case.
Experiments	Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169.
Introduction	First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model, and the second pass expands these states into trigram-based
Introduction	With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
Multi-pass LM-Integrated Decoding	More specifically, a bigram decoding pass is executed forward and backward to figure out the probability of each state.
Multi-pass LM-Integrated Decoding	We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding	If we can afford a bigram decoding pass, the outside cost from a bigram model is conceivably a

bigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)

19. Ad Hoc Treebank Structures

Dickinson, Markus

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The results are shown in figure 1 for the whole daughters scoring method and in figure 2 for the bigram method.
Evaluation	Figure 2: Bigram ungeneralizability (devo.)
Rule dissimilarity and generalizability	To do this, we can examine the weakest parts of each rule and compare those across the corpus, to see which anomalous patterns emerge; we do this in the Bigram scoring section below.
Rule dissimilarity and generalizability	Bigram scoring The other method of detecting ad hoc rules calculates reliability scores by focusing specifically on what the classes do not have in common.
Rule dissimilarity and generalizability	We abstract to bigrams , including added START and END tags, as longer sequences risk missing generalizations; e. g., unary rules would have no comparable rules.

bigram is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

bigram (20)
treebank (15)

20. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributed Clustering	With each word w in one of these sets, all words 1) preceding w in the corpus are stored with the respective bigram count N (v, w).
Distributed Clustering	While the greedy non-distributed exchange algorithm is guaranteed to converge as each exchange increases the log likelihood of the assumed bigram model, this is not necessarily true for the distributed exchange algorithm.
Exchange Clustering	Beginning with an initial clustering, the algorithm greedily maximizes the log likelihood of a two-sided class bigram or trigram model as described in Eq.
Exchange Clustering	With N g” and N fuc denoting the average number of clusters preceding and succeeding another cluster, B denoting the number of distinct bigrams in the training corpus, and I denoting the number of iterations, the worst case complexity of the algorithm is in:
Exchange Clustering	When using large corpora With large numbers of bigrams the number of required updates can increase towards the quadratic upper bound as N?” and N 5% approach NC.
Predictive Exchange Clustering	Modifying the exchange algorithm in order to optimize the log likelihood of a predictive class bigram model, leads to substantial performance improvements, similar to those previously reported for another type of one-sided class model in (Whittaker and Woodland, 2001).
Predictive Exchange Clustering	We use a predictive class bigram model as given in Eq.
Predictive Exchange Clustering	Then the following optimization criterion can be derived, with F(C) being the log likelihood function of the predictive class bigram model given a clustering C:

bigram is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

21. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Metric Design Considerations	Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
Metric Design Considerations	where in our experiments, we set N =3, representing calculation of unigram, bigram , and trigram scores.
Metric Design Considerations	To determine the number matchbi of bigram matches, a system bigram (lsipsi, l8i+1psi+1) matches a reference bigram (lripme +1 pm.

bigram is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

22. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Bigram and trigram performances are similar for Chinese, but trigram performs better for Japanese.
Experiments	In fact, although the difference in perplexity per character is not so large, the perplexity per word is radically reduced: 439.8 ( bigram ) to 190.1 (trigram).
Inference	Furthermore, it has an inherent limitation that it cannot deal with larger than bigrams , because it uses only local statistics between directly contiguous words for word segmentation.
Inference	It has an additional advantage in that we can accommodate higher-order relationships than bigrams , particularly trigrams, for word segmentation.
Inference	For this purpose, we maintain a forward variable a[t] in the bigram case.
Introduction	Crucially, since they rely on sampling a word boundary between two neighboring words, they can leverage only up to bigram word dependencies.
Pitman-Yor process and n-gram models	The bigram distribution G2 = {

bigram is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

23. Finding Cognate Groups Using Phylogenies

Hall, David and Klein, Dan

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We follow prior work and use sets of bigrams within words.
Experiments	In our case, during bipartite matching the set X is the set of bigrams in the language being re-permuted, and Y is the union of bigrams in the other languages.
Experiments	Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances.
Message Approximation	Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009).
Message Approximation	The first is a plain unigram model, the second is a bigram model, and the third is an anchored unigram topology: a position-specific unigram model for each position up to some maximum length.
Message Approximation	The second topology we consider is the bigram topology, illustrated in Figure 2(b).

bigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

24. Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

Rush, Alexander M. and Collins, Michael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Simple Lagrangian Relaxation Algorithm	We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption:
A Simple Lagrangian Relaxation Algorithm	over the original (non-intersected) hypergraph, with leaf nodes having weights 6v + 0% + (3) If the output derivation from step 2 has the same set of bigrams as those from step 1, then we have an exact solution to the problem.
A Simple Lagrangian Relaxation Algorithm	C1 states that each leaf in a derivation has exactly one incoming bigram, and that each leaf not in the derivation has 0 incoming bigrams; C2 states that each leaf in a derivation has exactly one outgoing bigram , and that each leaf not in the derivation has 0 outgoing bigrams.6
Background: Hypergraphs	Throughout this paper we make the following assumption when using a bigram language model:
Background: Hypergraphs	Assumption 3.1 ( Bigram start/end assumption.)
The Full Algorithm	The set ’P of trigram paths plays an analogous role to the set 23 of bigrams in our previous algorithm.

bigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

25. An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Training method	We broadly classify features into two categories: unigram and bigram features.
Training method	TBO (TE (w_1)) T31 <TE(w—1)7P0> T32 <TE(w—1),P—1,P0> TB3 (TE(w_1),TB(w0)) <TE(w—1)7T3(w0)7p0> <TE(w—1)7p—17T3(w0)> T36 <TE(w—1)7p—17TB(w0)7p0> CBO <p_1, p0) otherwise Table 3: Bigram features.
Training method	Bigram features: Table 3 shows our bigram features.

bigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

26. A Syntax-Free Approach to Japanese Sentence Compression

Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Syntax Free Sequence-oriented Sentence Compression Method	1 if 1(yj) = (yj—1)+ 1 APLM Bigram (w71(yj)71(yj—1)) (5) otherwise
A Syntax Free Sequence-oriented Sentence Compression Method	Here, 0 g APLM g l, Bigram(-) indicates word bigram probability.
A Syntax Free Sequence-oriented Sentence Compression Method	The first line of equation (5) agrees with Jing’s observation on sentence alignment tasks (Jing and McKeown, 1999); that is, most (or almost all) bigrams in a compressed sentence appear in the original sentence as they are.
Experimental Evaluation	For example, label ‘w/o IPTW + Dep’ employs IDF term weighting as function and word bigram, part-of-speech bigram and dependency probability between words as function in equation (1).
Results and Discussion	Replacing PLM with the bigram language model (w/o PLM) degrades the performance significantly.
Results and Discussion	Most bigrams in a compressed sentence followed those in the source sentence.
Results and Discussion	PLM is similar to dependency probability in that both features emphasize word pairs that occurred as bigrams in the source sentence.

bigram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

27. A Rational Model of Eye Movement Control in Reading

Bicknell, Klinton and Levy, Roger

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Simulation 1	Our reader’s language model was an unsmoothed bigram model created using a vocabulary set con-
Simulation 1	From this vocabulary, we constructed a bigram model using the counts from every bigram in the BNC for which both words were in vocabulary (about 222,000 bigrams ).
Simulation 1	Specifically, we constructed the model’s initial belief state (i.e., the distribution over sentences given by its language model) by directly translating the bigram model into a wFSA in the log semiring.
Simulation 2	Instead, we begin with the same set of bigrams used in Sim.
Simulation 2	1 — i.e., those that contain two in-vocabulary words — and trim this set by removing rare bigrams that occur less than 200 times in the BNC (except that we do not trim any bigrams that occur in our test corpus).
Simulation 2	This reduces our set of bigrams to about 19,000.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

language model (15)
bigram (8)

28. Fast Online Training with Frequency-Adaptive Learning Rates for Chinese Word Segmentation and New Word Detection

Sun, Xu and Wang, Houfeng and Li, Wenjie

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

System Architecture	To derive word features, first of all, our system automatically collect a list of word unigrams and bigrams from the training data.
System Architecture	To avoid overfitting, we only collect the word unigrams and bigrams whose frequency is larger than 2 in the training set.
System Architecture	This list of word unigrams and bigrams are then used as a unigram-dictionary and a bigram-dictionary to generate word-based unigram and bigram features.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

29. Unsupervised Morphology-Based Vocabulary Expansion

Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	To our surprise, the Fixed Affix model does a slightly better job in reducing out of vocabulary than the Bigram Affix model.
Evaluation	WoTr 24.21 Bigram Affix Model TRR 25.
Morphology-based Vocabulary Expansion	We use two different models of morphology expansion in this paper: Fixed Affix model and Bigram Affix model.
Morphology-based Vocabulary Expansion	3.2.2 Bigram Affix Expansion Model
Morphology-based Vocabulary Expansion	In the Bigram Affix model, we do the same for the stem as in the Fixed Affix model, but for prefixes and suffixes, we create a bigram language model in the finite state machine.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

30. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Bigram based reordering.
Experiments	First we consider a bigram Language Model and the algorithms try to find the reordering that maximizes the LM score.
Experiments	This means that, when using a bigram language model, it is often possible to reorder the words of a randomly permuted reference sentence in such a way that the LM score of the reordered sentence is larger than the LM of the reference.
Phrase-based Decoding as TSP	o The language model cost of producing the target words of 19’ right after the target words of b; with a bigram language model, this cost can be precomputed directly from b and b’.
Phrase-based Decoding as TSP	This restriction to bigram models will be removed in Section 4.1.
Phrase-based Decoding as TSP	4.1 From Bigram to N-gram LM

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

31. Low-Resource Semantic Role Labeling

Gormley, Matthew R. and Mitchell, Margaret and Van Durme, Benjamin and Dredze, Mark

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approaches	The clusters are formed by a greedy hierachi-cal clustering algorithm that finds an assignment of words to classes by maximizing the likelihood of the training data under a latent-class bigram model.
Approaches	First, for SRL, it has been observed that feature bigrams (the concatenation of simple features such as a predicate’s POS tag and an argument’s word) are important for state-of-the-art (Zhao et al., 2009; Bjorkelund et al., 2009).
Approaches	We consider both template unigrams and bigrams , combining two templates in sequence.
Experiments	Each of 1G0 and 1GB also include 32 template bigrams selected by information gain on 1000 sentences—we select a different set of template bigrams for each dataset.
Experiments	However, the original unigram Bjorkelund features (Bdeflmemh), which were tuned for a high-resource model, obtain higher Fl than our information gain set using the same features in unigram and bigram templates (1GB).
Experiments	In Czech, we disallowed template bigrams involving path-grams.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

32. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Consistent with findings in the literature (Cui et al., 2006; Dave et al., 2003; Gamon and Aue, 2005), on the large corpus of movie review texts, the in-domain-trained system based solely on unigrams had lower accuracy than the similar system trained on bigrams .
Experiments	But the trigrams fared slightly worse than bigrams .
Experiments	On sentences, however, we have observed an inverse pattern: unigrams performed better than bigrams and trigrams.
Factors Affecting System Performance	System runs with unigrams, bigrams , and trigrams as features and with different training set sizes are presented.
Integrating the Corpus-based and Dictionary-based Approaches	In the ensemble of classifiers, they used a combination of nine SVM-based classifiers deployed to learn unigrams, bigrams , and trigrams on three different domains, while the fourth domain was used as an evaluation set.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

33. Exploring Deterministic Constraints: from a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation

Zhao, Qiuye and Marcus, Mitch

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We consider bigram and trigram templates for generating potentially deterministic constraints.
Abstract	bigram constraint includes one contextual word (w_1\|w1) or the corresponding morph feature; and a trigram constraint includes both contextual words or their morph features.
Abstract	precision recall F1 bigram 0.993 0.841 0.911 trigram 0.996 0.608 0.755

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

POS tagging (31)
ILP (20)
Viterbi (18)

34. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract	please AVB and 9 is the number of bigrams in T, excluding sentence initial and final markers.
Abstract	Unigrams and Bigrams : As a different sort of baseline, we considered the results of a bag-of-words based classifier.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

35. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features.
Experiments	SVM is a support vector machine with unigram and bigram features.
Experiments	unigram, bigram , trigram 92.6 MAXENT POS, chunks, NE, supertags
Introduction	On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

36. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Approaches to Deceptive Opinion Spam Detection	Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+ , TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection	We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+ , and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Automated Approaches to Deceptive Opinion Spam Detection	We use SVMlight (Joachims, 1999) to train our linear SVM models on all three approaches and feature sets described above, namely POS, LIWC, UNIGRAMS, BIGRAMS+ , and TRIGRAMS+.
Conclusion and Future Work	Specifically, our findings suggest the importance of considering both the context (e.g., BIGRAMS+ ) and motivations underlying a deception, rather than strictly adhering to a universal set of deception cues (e.g., LIWC).
Results and Discussion	This suggests that a universal set of keyword-based deception cues (e.g., LIWC) is not the best approach to detecting deception, and a context-sensitive approach (e.g., BIGRAMS+ ) might be necessary to achieve state-of-the-art deception detection performance.
Results and Discussion	Additional work is required, but these findings further suggest the importance of moving beyond a universal set of deceptive language features (e. g., LIWC) by considering both the contextual (e. g., BIGRAMS+ ) and motivational parameters underlying a deception as well.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

37. Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

Wang, Xiaolin and Utiyama, Masao and Finch, Andrew and Sumita, Eiichiro

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Complexity Analysis	For the monolingual bigram model, the number of states in the HMM is U times more than that of the monolingual unigram model, as the states at specific position of F are not only related to the length of the current word, but also related to the length of the word before it.
Complexity Analysis	NPY( bigram )a 0.750 0.802 17 m —NPY(trigram)a 0.757 0.807
Complexity Analysis	HDP( bigram )b 0.723 — 10 h —FitnessC — 0.667 — —Prop.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

segmenters (11)
BLEU (9)
bigram (7)

38. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	We learn embedding for unigrams, bigrams and trigrams separately with same neural network and same parameter setting.
Related Work	25$ employs the embedding of unigrams, bigrams and trigrams separately and conducts the matrix-vector operation of ac on the sequence represented by columns in each lookup table.
Related Work	Lum, Lbi and Lm are the lookup tables of the unigram, bigram and trigram embedding, respectively.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

39. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter

Tan, Chenhao and Lee, Lillian and Pang, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	twitter unigram TTT * YES (54%) twitter bigram TTT * YES (52%) personal uni gram MT * YES (52%) personal bigram — NO (48%)
Introduction	We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams ( bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
Introduction	16The tokens [at], [hashtag], [url] were ignored in the unigram-model case to prevent their undue influence, but retained in the bigram model to capture longer-range usage (“combination”) patterns.

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

40. Learning Word-Class Lattices for Definition and Hypernym Extraction

Navigli, Roberto and Velardi, Paola

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	0 Bigrams: an implementation of the bigram classifier for soft pattern matching proposed by Cui et al.
Experiments	The probability is calculated as a mixture of bigram and
Experiments	WCL—l 99.88 42.09 59.22 76.06 WCL—3 98.81 60.74 75.23 83.48 Star patterns 86.74 66.14 75.05 81.84 Bigrams 66.70 82.70 73.84 75.80

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

41. Practical Very Large Scale CRFs

Lavergne, Thomas and Cappé, Olivier and Yvon, François

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conditional Random Fields	In the sequel, we distinguish between two types of feature functions: unigramfea-tures fyflc, associated with parameters Hwy, and bigram features fy/Mgc, associated with parameters Aggy)?
Conditional Random Fields	On the other hand, bigram features {fy/,y,$}(y,$)€y2xX are helpful in modelling dependencies between successive labels.
Conditional Random Fields	Assume the set of bigram features {Ag/,y,$t+l}(y/,y)€y2 is sparse with only r(:ct+1) << \|Y 2 non null values and define the \|Y\| >< \|Y\| sparse matrix

bigram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

42. Ordering Prenominal Modifiers with a Reranking Approach

Liu, Jenny and Haghighi, Aria

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We also kept NPs with only 1 modifier to be used for generating <m0difie1; head n0un> bigram counts at training time.
Experiments	For example, the NP “the beautiful blue Macedonian vase” generates the following bigrams : <beautzful blue>, <blue Macedonian>, and <beautlful Macedonian>, along with the 3-gram <beautlful blue Macedonian>.
Experiments	In addition, we also store a table that keeps track of bigram counts for < M, H >, where H is the head noun of an NP and M is the modifier closest to it.
Related Work	Shaw and Hatzivassiloglou also use a transitivity method to fill out parts of the Count table where bigrams are not actually seen in the training data but their counts can be inferred from other entries in the table, and they use a clustering method to group together modifiers with similar positional preferences.
Related Work	Shaw and Hatzivassiloglou report a highest accuracy of 94.93% and a lowest accuracy of 65.93%, but since their methods depend heavily on bigram counts in the training corpus, they are also limited in how informed their decisions can be if modifiers in the test data are not present at training time.

bigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

MAXENT (16)
n-gram (15)
reranking (7)

43. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments & Results 4.1 Experimental Setup	The measures are evaluated by fixing the window size to 4 and maximum candidate paraphrase length to 2 (e. g. bigram ).
Experiments & Results 4.1 Experimental Setup	- - -unigram --- bigram —trigram "" " quadgram
Experiments & Results 4.1 Experimental Setup	Type Node \ MRR % \ RCL % \ Bipartite unigram 5.2 12.5 bigram 6.8 15.7 Tripartite unigram 5.9 12.6 bigram 6.9 15.9 Baseline bigram 3.9 7.7

bigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

44. A New Dataset and Method for Automatically Grading ESOL Texts

Yannakoudakis, Helen and Briscoe, Ted and Medlock, Ben

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	(a) Word unigrams (b) Word bigrams
Approach	(a) PoS unigrams (b) PoS bigrams (c) PoS trigrams
Approach	Word unigrams and bigrams are lower-cased and used in their inflected forms.
Previous work	The Bayesian Essay Test Scoring sYstem (BETSY) (Rudner and Liang, 2002) uses multinomial or Bernoulli Naive Bayes models to classify texts into different classes (e. g. pass/fail, grades AF) based on content and style features such as word unigrams and bigrams , sentence length, number of verbs, noun—verb pairs etc.
Validity tests	(a) word unigrams within a sentence (b) word bigrams within a sentence (c) word trigrams within a sentence

bigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

45. Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation

Wang, Aobo and Kan, Min-Yen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	In response to these difficulties in differentiating linguistic registers, we compute two different PMI scores for character-based bigrams from two large corpora representing news and microblogs as features.
Methodology	In addition, we also convert all the character-based bigrams into Pinyin-based bigrams (ignoring tones5) and compute the Pinyin-level PMI in the same way.
Methodology	These features capture inconsistent use of the bigram across the two domains, which assists to distinguish informal words.

bigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

SVM (13)
baseline systems (11)
CRF (11)

46. Efficient Staggered Decoding for Sequence Labeling

Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	where we explicitly distinguish the unigram feature function o; and bigram feature function Comparing the form of the two functions, we can see that our discussion on HMMs can be extended to perceptrons by substituting 2k wigbflwwn) and 2k wg¢%(w,yn_1,yn) for logp(:cn\|yn) and 10gp(yn\|yn—1)-
Introduction	For bigram features, we compute its upper bound offline.
Introduction	The simplest case is that the bigram features are independent of the token sequence :13.

bigram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

47. Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval

Maxwell, K. Tamsin and Oberlander, Jon and Croft, W. Bruce

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation framework	The second clique contains query bigrams that match
Evaluation framework	document bigrams in 2-word ordered windows (‘#I’), A2 = 0.1.
Evaluation framework	The third clique uses the same bigrams as clique 2 with an 8-word unordered window (‘#uw8’), A3 = 0.05.
Introduction	Integration of the identified catenae in queries also improves IR effectiveness compared to a highly effective baseline that uses sequential bigrams with no linguistic knowledge.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

48. An improved MDL-based compression algorithm for unsupervised word segmentation

Chen, Ruey-Cheng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Change in Description Length	Suppose that the original sequence VVz-_1 is N -word long, the selected word type pair cc and 3/ each occurs k and 1 times, respectively, and altogether x-y bigram occurs m times in VVz-_1.
Change in Description Length	In the new sequence Wi, each of the m bigrams is replaced with an unseen word 2 = my.
Regularized Compression	Hence, a new sequence IV,-is created in the i-th iteration by merging all the occurrences of some selected bigram (cc, 3/) in the original sequence Wi_1.
Regularized Compression	Note that f(:c, y) is the bigram frequency, \|Wi_1\| the sequence length of VVi_1, and AH(W,-_1, m) = FI(W,-) — H(W,-_1) is the difference between the empirical Shannon entropy measured on 1% and VVi_1, using maximum likelihood estimates.
Regularized Compression	In the new sequence 1%, each occurrence of the 53-3; bigram is replaced with a new (conceptually unseen) word 2.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

49. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	Moreover, a bigram (i.e., “2gram”) achieves the best BLEU scores among the four different orders of VMs.
Experimental Results	This is necessarily true, but it is interesting to see that most of the improvement is obtained just by moving from a unigram to a bigram model.
Experimental Results	Indeed, although Table 3 shows that better approximations can be obtained by using higher-order models, the best BLEU score in Tables 2a and 2c was obtained by the bigram model.
Variational Approximate Decoding	However, because q* only approximates p, y* of (13) may be locally appropriate but globally inadequate as a translation of c. Observe, e. g., that an n-gram model q* will tend to favor short strings y, regardless of the length of c. Suppose cc 2 le chat chasse la souris (“the cat chases the mouse”) and q* is a bigram approximation to p(y \| Presumably q* (the \| START), q* (mouse \| the), and q(END \| mouse) are all large in So the most probable string y under q* may be simply “the mouse,” which is short and has a high probability but fails to cover cc.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

50. Fast and Accurate Shift-Reduce Constituent Parsing

Zhu, Muhua and Zhang, Yue and Chen, Wenliang and Zhang, Min and Zhu, Jingbo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline parser	bigrams sowslw, sowslc, socslw, 300310, sowqow, sowqot, socqow, socqot, qo’wa’w, (1010(1175, (10731110, (Jotmt, slwqow, slwqot, slcqow, slcqot
Semi-supervised Parsing with Large Data	2 From the dependency trees, we extract bigram lexical dependencies (2121,2122, L/R) where the symbol L (R) means that w1 (2112) is the head of ’LU2 (wl).
Semi-supervised Parsing with Large Data	(2009), we assign categories to bigram and trigram items separately according to their frequency counts.
Semi-supervised Parsing with Large Data	Hereafter, we refer to the bigram and trigram lexical dependency lists as BLD and TLD, respectively.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

51. Word or Phrase? Learning Which Unit to Stress for Information Retrieval

Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Phrase extraction and indexing: We evaluate our proposed method on two different types of phrases: syntactic head-modifier pairs (syntactic phrases) and simple bigram phrases (statistical phrases).
Experiments	Since our method is not limited to a particular type of phrases, we have also conducted experiments on statistical phrases ( bigrams ) with a reduced set of features directed applicable; RMO, RSO, PD5, DF, and CPP; the features requiring linguistic preprocessing (e. g. PPT) are not used, because it is unrealistic to use them under bigram—based retrieval setting.
Experiments	5In most cases, the distance between words in a bigram is l, but sometimes, it could be more than 1 because of the effect of stopword removal.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

52. A Hybrid Hierarchical Model for Multi-Document Summarization

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Discussions	We use R-l (recall against unigrams), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams ).
Experiments and Discussions	Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams .
Regression Model	We similarly include bigram features in the experiments.
Regression Model	We also include bigram extensions of DMF features.
Regression Model	We use sentence bigram frequency, sentence rank in a document, and sentence size as additional fea-

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

53. Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

Sun, Weiwei and Uszkoreit, Hans

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Capturing Paradigmatic Relations via Word Clustering	The quality is defined based on a class-based bigram language model as follows.
Capturing Paradigmatic Relations via Word Clustering	The objective function is maximizing the likelihood H2121 P(w7;\|w1, ..., wi_1) of the training data given a partially class-based bigram model of the form
State-of-the-Art	Word bigrams : w_2_w_1, w_1_w, w_w+1, w+1_w+2; In order to better handle unknown words, we extract morphological features: character n- gram prefixes and suffixes for n up to 3.
State-of-the-Art	(2009) introduced a bigram HMM model with latent variables (Bi gram HMM-LA in the table) for Chinese tagging.
State-of-the-Art	Trigram HMM (Huang et al., 2009) 93.99% Bigram HMM-LA (Huang et al., 2009) 94.53% Our tagger 94.69%

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

54. Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords

Szarvas, Gy"orgy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	0 The extension of the feature representation used by previous works with bigrams and trigrams and an evaluation of the benefit of using longer keywords in hedge classification.
Methods	For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
Results	These keywords were many times used in a mathematical context (referring to probabilities) and thus expressed no speculative meaning, while such uses were not represented in the FlyBase articles (otherwise bigram or trigram features could have captured these non-speculative uses).
Results	One third of the features used by our advanced model were either bigrams or trigrams.
Results	Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

55. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Class-based Model of Agreement	The features are indicators for (character, position, label) triples for a five Character window and bigram label transition indicators.
A Class-based Model of Agreement	Bigram transition features gbt encode local agreement relations.
A Class-based Model of Agreement	We trained a simple add-1 smoothed bigram language model over gold class sequences in the same treebank training data:

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

56. Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

MWE-dedicated Features	We use word unigrams and bigrams in order to capture multiwords present in the training section and to extract lexical cues to discover new MWEs.
MWE-dedicated Features	For instance, the bigram coup de is often the prefix of compounds such as coup de pied (kick), coup de foudre (love at first sight), coup de main (help).
MWE-dedicated Features	We use part-of-speech unigrams and bigrams in order to capture MWEs with irregular syntactic structures that might indicate the id-iomacity of a word sequence.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

57. Grounded Language Modeling for Automatic Speech Recognition of Sports Video

Fleischman, Michael and Roy, Deb

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The remaining 93 unlabeled games are used to train unigram, bigram , and trigram grounded language models.
Evaluation	Only unigrams, bigrams , and tri—grams that are not proper names, appear greater than three times, and are not composed only of stop words were used.
Evaluation	with traditional unigram, bigram , and trigram language models generated from a combination of the closed captioning transcripts of all training games and data from the switchboard corpus (see below).
Linguistic Mapping	Estimating bigram and trigram models can be done by processing on word pairs or triples, and performing normalization on the resulting conditional distributions.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

58. Unsupervised Alignment of Privacy Policies using Hidden Markov Models

Ramanath, Rohan and Liu, Fei and Sadeh, Norman and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	In our formulation, each hidden state corresponds to an issue or topic, characterized by a distribution over words and bigrams appearing in privacy policy sections addressing that issue.
Approach	0,; is generated by repeatedly sampling from a distribution over terms that includes all unigrams and bigrams except those that occur in fewer than 5% of the documents and in more than 98% of the documents.
Approach	models (e. g., a bigram may be generated by as many as three draws from the emission distribution: once for each unigram it contains and once for the bigram ).
Experiment	Our second baseline is latent Dirichlet allocation (LDA; Blei et al., 2003), with ten topics and online variational Bayes for inference (Hoffman et al., 2010).7 To more closely match our models, LDA is given access to the same unigram and bigram tokens.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

59. Perplexity on Reduced Corpora

Kobayashi, Hayato

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the case of bigrams , the perpleXities of TheoryZ are almost the same as that of Zipf2 when the size of reduced vocabulary is large.
Perplexity on Reduced Corpora	This model seems to be stupid, since we can easily notice that the bigram “is is” is quite frequent, and the two bigrams “is a” and “a is” have the same frequency.
Perplexity on Reduced Corpora	For example, the decay function 92 of bigrams is as follows:
Perplexity on Reduced Corpora	They pointed out that the exponent of bigrams is about 0.66, and that of 5-grams is about 0.59 in the Wall Street Journal corpus (WSJ 87).

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

unigram (13)
topic models (12)
LDA (10)

60. Predicting Power Relations between Participants in Written Dialog from a Single Thread

Prabhakaran, Vinodkumar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Predicting Direction of Power	Baseline (Always Superior) 52.54 Baseline (Word Unigrams + Bigrams ) 68.56 THRNCW 55.90 THRPR 54.30 DIAPR 54.05 THRPR + THRNew 61.49 DIAPR + THRPR + THRNew 62.47 LEX 70.74 LEX + DIAPR + THRPR 67.44 LEX + DIAPR + THRPR + THRNew 68.56 BEST (= LEX + THRNeW) 73.03 BEST (Using p1 features only) 72.08 BEST (Using IMt features only) 72.11 BEST (Using Mt only) 71.27 BEST (No Indicator Variables) 72.44
Predicting Direction of Power	We found the best setting to be using both unigrams and bigrams for all three types of ngrams, by tuning in our dev set.
Predicting Direction of Power	We also use a stronger baseline using word unigrams and bigrams as features, which obtained an accuracy of 68.6%.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

61. Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	We have presented a noisy-channel model that simultaneously learns a lexicon, a bigram language model, and a model of phonetic variation, while using only the noisy surface forms as training data.
Introduction	Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model).
Introduction	Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
Related work	In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

62. Joint POS Tagging and Transition-based Constituent Parsing in Chinese with Non-local Features

Wang, Zhiguo and Xue, Nianwen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Joint POS Tagging and Parsing with Nonlocal Features	(Collins and Koo, 2005) (Charniak and Johnson, 2005) Rules CoPar HeadTree Bigrams CoLenPar
Joint POS Tagging and Parsing with Nonlocal Features	Grandparent Bigrams Heavy
Joint POS Tagging and Parsing with Nonlocal Features	Lexical Bigrams Neighbours

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

63. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Count distributions	For example, suppose that our data consists of the following bigrams , with their weights:
Word Alignment	That is, during the E step, we calculate the distribution of C(e, f) for each e and f, and during the M step, we train a language model on bigrams e f using expected KN smoothing (that is, with u = e and w = f).
Word Alignment	(The latter case is equivalent to a backoff language model, where, since all bigrams are known, the lower-order model is never used.)
Word Alignment	This is much less of a problem in KN smoothing, where p’ is estimated from bigram types rather than bigram tokens.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

64. Name Translation in Statistical Machine Translation - Learning When to Transliterate

Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning what to transliterate	From the stat section we collect statistics as to how often every word, bigram or trigram occurs, and what distribution of name/non-name patterns these ngrams have.
Learning what to transliterate	The name distribution bigram
Learning what to transliterate	stat corpus bitext, the first word is a marked up as a non-name (”0”) and the second as a name (”1”), which strongly suggests that in such a bigram context, aljzyre better be translated as island or peninsula, and not be transliterated as Al-Jazeera.
Transliterator	The same consonant skeleton indexing process is applied to name bigrams (47,700,548 unique with 167,398,054 skeletons) and trigrams (46,543,712 unique with 165,536,451 skeletons).

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

65. Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

Kim, Young-Bum and Snyder, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	with a bigram HMM with four language clusters.
Inference	where n(t) and n(t, t’) are, respectively, unigram and bigram tag counts excluding those containing character w. Conversely, n’(t) and n’(t,t’) are, respectively, unigram and bigram tag counts only including those containing character w. The notation am] denotes the ascending factorial: a(a + l) - - - (a +n — 1).
Inference	where n(j, 19,25) and n(j, 19,75, 25’) are the numbers of languages currently assigned to cluster k which have more than j occurrences of unigram (t) and bigram (t, t’ ), respectively.
Model	We note that in practice, we implemented a trigram version of the model,2 but we present the bigram version here for notational clarity.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

66. Adapting Discriminative Reranking to Grounded Language Learning

Kim, Joohyun and Mooney, Raymond

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Reranking Features	Bigram .
Reranking Features	Indicates whether a given bigram of nonterminal/terminals occurs for given a parent nonterminal: f (L1 —> L2 : L3) = l.
Reranking Features	Grandparent Bigram .

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

67. Learning to Rank Answers on Large Online QA Collections

Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	Bigrams (B) - the text is represented as a bag of bigrams (larger n-grams did not help).
Approach	Generalized bigrams (Bg) - same as above, but the words are generalized to their WNSS.
Experiments	The first chosen feature is the translation probability computed between the B 9 question and answer representations ( bigrams with words generalized to their WNSS tags).
Experiments	This is caused by the fact that the BM25 formula is less forgiving with errors of the NLP processors (due to the high idf scores assigned to bigrams and dependencies), and the WNSS tagger is the least robust component in our pipeline.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

68. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	Hyper-edges (boxes) are annotated with normalized transition probabilities, as well as the bigrams produced by each rule application.
Computing Feature Expectations	The expected count of the bigram “man with” is the sum of posterior probabilities of the two hyper-edges that produce it.
Computing Feature Expectations	5For example, decoding under a variational approximation to the model’s posterior that decomposes over bigram probabilities is equivalent to fast consensus decoding with

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

69. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	In the figure, phone bigram TF—IDF is labeled p2; phonetic alignment with dynamic programming is labeled DP.
Experiments	The TF-IDF features used in the experiments are based on phone bigrams .
Feature functions	In practice, we only consider n-grams of a certain order (e. g., bigrams ).
Feature functions	Then for the bi-gram /1 iy/, we have TF/liy/(fo) = 1/5 (one out of five bigrams in 1—9), and IDF /1 iy / = log(2 / 1) (one word out of two in the dictionary).

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

70. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering-based word representations	The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992).
Clustering-based word representations	So it is a class-based bigram language model.
Clustering-based word representations	One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

71. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Web page hits for word pairs and trigrams are obtained using a simple heuristic query to the search engine Google.11 Inflected queries are performed by expanding a bigram or trigram into all its morphological forms.
Experiments	Although Google hits is noisier, it has very much larger coverage of bigrams or trigrams.
Experiments	This means that if pages indexed by Google doubles, then so do the bigrams or trigrams frequencies.
Related Work	Keller and Lapata (2003) evaluated the utility of using web search engine statistics for unseen bigram .

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

72. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	All of the three smoothing methods for bigram and trigram LMs are examined both using back-off mod-
Pinyin Input Method Model	The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Pinyin Input Method Model	WE(W,j—>Vj+1,k) : _10g P(Vj+1vk Vivi) Although the model is formulated on first order HMM, i.e., the LM used for transition probability is a bigram one, it is easy to extend the model to take advantage of higher order n-gram LM, by tracking longer history while traversing the graph.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

73. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Syllabification with Structured SVMs	For example, the bigram bl frequently occurs within a single English syllable, while the bigram lb generally straddles two syllables.
Syllabification with Structured SVMs	Thus, in addition to the single-letter features outlined above, we also include in our representation any bigrams , trigrams, four-grams, and five-grams that fit inside our context window.
Syllabification with Structured SVMs	As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

74. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Local instances NonLocal instances Rule 10, 85 1 ParentRule 18, 019 Word 20, 328 WProj 27, 417 WordEdges 454, 101 Heads 70, 013 CoLenPar 22 HeadTree 67, 836 Bigram <> 10, 292 Heavy 1, 401 Trigram<> 24, 677 NGramTree 67, 559 HeadMod<> 12, 047 RightBranch 2 DistMod<> 16, 017 Total Feature Instances: 800, 582
Experiments	We also restricted NGramTree to be on bigrams only.
Forest Reranking	2 (d) returns the minimum tree fragement spanning a bigram , in this case “saw” and “the”, and should thus be computed at the smallest common ancestor of the two, which is the VP node in this example.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

75. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure

Johnson, Mark

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and future work	We then investigated adaptor grammars that incorporate one additional kind of information, and found that modeling collocations provides the greatest improvement in word segmentation accuracy, resulting in a model that seems to capture many of the same interword dependencies as the bigram model of Goldwater et al.
Word segmentation with adaptor grammars	It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance.
Word segmentation with adaptor grammars	This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

76. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background 2.1 Dependency parsing	The algorithm then repeatedly merges the pair of clusters which causes the smallest decrease in the likelihood of the text corpus, according to a class-based bigram language model defined on the word clusters.
Conclusions	To begin, recall that the Brown clustering algorithm is based on a bigram language model.
Feature design	(2005a), and consists of indicator functions for combinations of words and parts of speech for the head and modifier of each dependency, as well as certain contextual tokens.1 Our second-order baseline features are the same as those of Carreras (2007) and include indicators for triples of part of speech tags for sibling interactions and grandparent interactions, as well as additional bigram features based on pairs of words involved these higher-order interactions.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

77. Modeling Norms of Turn-Taking in Multi-Party Conversation

Laskowski, Kornel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Excluding qt_1 2 qt bigrams (leading to 0.32M frames from 2.39M frames in “all”) offers a glimpse of expected performance differences were duration modeling to be included in the models.
Limitations and Desiderata	To produce Figures 1 and 2, a small fraction of probability mass was reserved for unseen bigram transitions (as opposed to backing off to unigram probabilities).
The Extended-Degree-of-Overlap Model	The EDO model mitigates R-specificity because it models each bigram (qt_1, qt) 2 (8,, S j) as the modified bigram (m, [0ij,nj]), involving three scalars each of which is a sum — a commutative (and therefore rotation-invariant) operation.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

language modeling (4)
bigram (3)

78. Models of Metaphor in NLP

Shutova, Ekaterina

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automatic Metaphor Recognition	They use hyponymy relation in WordNet and word bigram counts to predict metaphors at a sentence level.
Automatic Metaphor Recognition	Hereby they calculate bigram probabilities of verb-noun and adjective-noun pairs (including the hyponyms/hypernyms of the noun in question).
Automatic Metaphor Recognition	However, by using bigram counts over verb-noun pairs Krishnakumaran and Zhu (2007) loose a great deal of information compared to a system extracting verb-object relations from parsed text.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

79. Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

Clifton, Ann and Sarkar, Anoop

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models 2.1 Baseline Models	After CRF based recovery of the suffix tag sequence, we use a bigram language model trained on a full segmented version on the training data to recover the original vowels.
Models 2.1 Baseline Models	We used bigrams only, because the suffix vowel harmony alternation depends only upon the preceding phonemes in the word from which it was segmented.
Models 2.1 Baseline Models	original training data: koskevaa mietintoa kasitellaan segmentation: koske+ +va+ +a mietinto+ +5 kasi+ +te+ +115+ +a+ +n (train bigram language model with mapping A = { a, 'a map final suflia‘ to abstract tag-set: koske+ +va+ +A mietinto+ +A kasi+ +te+ +11a+ +a+ +n (train CRF model to predict the final suffix) peeling of final suflia‘: koske+ +va+ mietinto+ kasi+ +te+ +11a+ +a+ (train SMT model on this transformation of training data) (a) Training

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

translation model (21)
BLEU (17)
CRF (12)

80. Iterative Viterbi A* Algorithm for K-Best Sequential Decoding

Huang, Zhiheng and Chang, Yi and Long, Bo and Crespo, Jean-Francois and Dong, Anlei and Keerthi, Sathiya and Wu, Su-Lin

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In particular, we use the unigrams of the current and its neighboring words, word bigrams, prefixes and suffixes of the current word, capitalization, all-number, punctuation, and tag bigrams for POS, CoNLL2000 and CoNLL 2003 datasets.
Experiments	For supertag dataset, we use the same features for the word inputs, and the unigrams and bigrams for gold POS inputs.
Problem formulation	Bigram features are of form fk (yt, yt_1, xt) which are concerned with both the previous and the current labels.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

Viterbi (72)
CoNLL (15)
beam search (9)

81. Deceptive Answer Prediction with User Preference Graph

Li, Fangtao and Gao, Yang and Zhou, Shuchang and Si, Xiance and Dai, Decheng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Besides unigram and bigram , the most effective textual feature is URL.
Proposed Features	3.1.1 Unigrams and Bigrams The most common type of feature for text classi-
Proposed Features	feature selection method X2 (Yang and Pedersen, 1997) to select the top 200 unigrams and bigrams as features.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

82. Political Ideology Detection Using Recursive Neural Networks

Iyyer, Mohit and Enns, Peter and Boyd-Graber, Jordan and Resnik, Philip

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Datasets	Their features come from the Linguistic Inquiry and Word Count lexicon (LIWC) (Pennebaker et al., 2001), as well as from lists of “sticky bigrams” (Brown et al., 1992) strongly associated with one party or another (e. g., “illegal aliens” implies conservative, “universal healthcare” implies liberal).
Datasets	We first extract the subset of sentences that contains any words in the LIWC categories of Negative Emotion, Positive Emotion, Causation, Anger, and Kill verbs.3 After computing a list of the top 100 sticky bigrams for each category, ranked by log-likelihood ratio, and selecting another subset from the original data that included only sentences containing at least one sticky bigram , we take the union of the two subsets.
Related Work	They use an HMM-based model, defining the states as a set of fine-grained political ideologies, and rely on a closed set of lexical bigram features associated with each ideology, inferred from a manually labeled ideological books corpus.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

83. Concept-to-text Generation via Discriminative Reranking

Konstas, Ioannis and Lapata, Mirella

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Design	Consecutive Word/Bigram/Trigram This feature family targets adjacent repetitions of the same word, bigram or trigram, e.g., ‘show me the show me the
Problem Formulation	The weight of this rule is the bigram probability of two records conditioned on their type, multiplied with a normalization factor 7».
Problem Formulation	Rule (6) defines the expansion of field F to a sequence of (binarized) words W, with a weight equal to the bigram probability of the current word given the previous word, the current record, and field.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

84. Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations

Sun, Weiwei and Wan, Xiaojun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Structure-based Stacking	0 Character unigrams: ck (i — l S k: S i + l) 0 Character bigrams : ckck+1 (i — l S k: < i + l)
Structure-based Stacking	0 Character label bigrams : cgpdcgffi (i — lppd S
Structure-based Stacking	0 Bigram features: C(sk)C(sk+1) (i — [C S k; < 73 + lo), Tctb(5k)Tctb(3k+1) (i — 1ng g k; < i +1390), Tppd(5k)Tppd(3k+1) (73 — lgpd S k: < 73+ zgpd)

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

85. Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning

Almeida, Miguel and Martins, Andre

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Compressive Summarization	(2011), we used stemmed word bigrams as concepts, to which we associate the following concept features ((DCOV): indicators for document counts, features indicating if each of the words in the bigram is a stop-word, the earliest position in a document each concept occurs, as well as two and three-way conjunctions of these features.
Experiments	We generated oracle extracts by maximizing bigram recall with respect to the manual abstracts, as described in Berg-Kirkpatrick et al.
Extractive Summarization	1Previous work has modeled concepts as events (Filatova and Hatzivassiloglou, 2004), salient words (Lin and Bilmes, 2010), and word bigrams (Gillick etal., 2008).

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

86. Using subcategorization knowledge to improve case prediction for translation to German

Weller, Marion and Fraser, Alexander and Schulte im Walde, Sabine

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Using subcategorization information	The model has access to the basic features stem and tag, as well as the new features based on subcat-egorizaion information (explained below), using unigrams within a Window of up to four positions to the right and the left of the current position, as well as bigrams and trigrams for stems and tags (current item + left and/or right item).
Using subcategorization information	In addition to the probability/frequency of the respective functions, we also provide the CRF with bigrams containing the two parts of the tuple,
Using subcategorization information	By providing the parts of the tuple as unigrams, bigrams or trigrams to the CRF, all relevant information is available: verb, noun and the probabilities for the potential functions of the noun in the sentence.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

87. Accurate Word Segmentation using Transliteration and Language Model Projection

Hagiwara, Masato and Sekine, Satoshi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Knowing that the back-transliterated un-igram “blacki” and bigram “blacki shred” are unlikely in English can promote the correct WS, jfiafi‘yi/z/I/yh“ “blackish red”.
Use of Language Model	As the English LM, we used Google Web 1T 5-gram Version 1 (Brants and Franz, 2006), limiting it to unigrams occurring more than 2000 times and bigrams occurring more than 500 times.
Word Segmentation Model	We limit the features to word unigram and bigram features, i'e'7 My) 2 Zil¢1<wi) + ¢2(7~Ui—1awi)l for y = wl...wn.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (19)
PoS tags (4)
bigram (3)

88. A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization

Wang, Lu and Raghavan, Hema and Castelli, Vittorio and Florian, Radu and Cardie, Claire

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Results	The results in Table 5 use the official ROUGE software with standard options5 and report ROUGE-2 (R-2) (measures bigram overlap) and ROUGE-SU4 (R-SU4) (measures unigram and skip-bigram separated by up to four words).
The Framework	unigram/bigram/skip bigram (at most four words apart) overlap unigram/bigram TF/TF—IDF similarity
The Framework	2011; Ouyang et al., 2011), we use the ROUGE-2 score, which measures bigram overlap between a sentence and the abstracts, as the objective for regression.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

89. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	For example, if a bigram parameter is modified due to the presence of some set of trigrams, and then some or all of those trigrams are pruned from the model, the bigram associated with the modified parameter will be unlikely to have an overall expected frequency equal to its observed frequency anymore.
Marginal distribution constraints	Thus the unigram distribution is with respect to the bigram model, the bigram model is with respect to the trigram model, and so forth.
Model constraint algorithm	This can be particularly clearly seen at the unigram state, which has an arc for every unigram (the size of the vocabulary): for every bigram state (also order of the vocabulary), in the naive algorithm we must look for every possible arc.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

90. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Phrase Ranking based on Relevance	This thread of research models bigrams by encoding them into the generative process.
Phrase Ranking based on Relevance	For each word, a topic is sampled first, then its status as a unigram or bigram is sampled, and finally the word is sampled from a topic-specific unigram or bigram distribution.
Phrase Ranking based on Relevance	In (Tomokiyo and Hurst, 2003), a language model approach is used for bigram phrase extraction.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

91. Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Kozareva, Zornitsa

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Task A: Polarity Classification	We studied the influence of unigrams, bigrams and a combination of the two, and saw that the best performing feature set consists of the combination of unigrams and bigrams .
Task A: Polarity Classification	In this paper, we will refer from now on to n-grams as the combination of unigrams and bigrams .
Task B: Valence Prediction	Those include n-grams (unigrams, bigrams and combination of the two), LIWC scores.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: