Index of papers in Proc. ACL 2008 that mention

bigram

Seen in text as:

bigram (77)
bigrams (46)
Bigram (21)

Seen in 126 sentences in 14 papers.

1. Selecting Query Term Alternations for Web Search by Exploiting Query Contexts

Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model), or according to its expected impact on the retrieval effectiveness (using a regression model).
Bigram Expansion Model for Alteration Selection	The query context is modeled by a bigram language model as in (Peng et al.
Bigram Expansion Model for Alteration Selection	In this work, we used bigram language model to calculate the probability of each path.
Introduction	The query context is modeled by a bigram language model.
Introduction	which is the most coherent with the bigram model.
Introduction	We call this model Bigram Expansion.
Related Work	2007), a bigram language model is used to determine the alteration of the head word that best fits the query.
Related Work	In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates.

bigram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

2. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Metric Design Considerations	Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
Metric Design Considerations	where in our experiments, we set N =3, representing calculation of unigram, bigram , and trigram scores.
Metric Design Considerations	To determine the number matchbi of bigram matches, a system bigram (lsipsi, l8i+1psi+1) matches a reference bigram (lripme +1 pm.

bigram is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

3. Ad Hoc Treebank Structures

Dickinson, Markus

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The results are shown in figure 1 for the whole daughters scoring method and in figure 2 for the bigram method.
Evaluation	Figure 2: Bigram ungeneralizability (devo.)
Rule dissimilarity and generalizability	To do this, we can examine the weakest parts of each rule and compare those across the corpus, to see which anomalous patterns emerge; we do this in the Bigram scoring section below.
Rule dissimilarity and generalizability	Bigram scoring The other method of detecting ad hoc rules calculates reliability scores by focusing specifically on what the classes do not have in common.
Rule dissimilarity and generalizability	We abstract to bigrams , including added START and END tags, as longer sequences risk missing generalizations; e. g., unary rules would have no comparable rules.

bigram is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

bigram (20)
treebank (15)

4. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributed Clustering	With each word w in one of these sets, all words 1) preceding w in the corpus are stored with the respective bigram count N (v, w).
Distributed Clustering	While the greedy non-distributed exchange algorithm is guaranteed to converge as each exchange increases the log likelihood of the assumed bigram model, this is not necessarily true for the distributed exchange algorithm.
Exchange Clustering	Beginning with an initial clustering, the algorithm greedily maximizes the log likelihood of a two-sided class bigram or trigram model as described in Eq.
Exchange Clustering	With N g” and N fuc denoting the average number of clusters preceding and succeeding another cluster, B denoting the number of distinct bigrams in the training corpus, and I denoting the number of iterations, the worst case complexity of the algorithm is in:
Exchange Clustering	When using large corpora With large numbers of bigrams the number of required updates can increase towards the quadratic upper bound as N?” and N 5% approach NC.
Predictive Exchange Clustering	Modifying the exchange algorithm in order to optimize the log likelihood of a predictive class bigram model, leads to substantial performance improvements, similar to those previously reported for another type of one-sided class model in (Whittaker and Woodland, 2001).
Predictive Exchange Clustering	We use a predictive class bigram model as given in Eq.
Predictive Exchange Clustering	Then the following optimization criterion can be derived, with F(C) being the log likelihood function of the predictive class bigram model given a clustering C:

bigram is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

5. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Abstract	The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass.
Decoding to Maximize BLEU	The outside-pass Algorithm 1 for bigram decoding can be generalized to the trigram case.
Experiments	Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169.
Introduction	First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model, and the second pass expands these states into trigram-based
Introduction	With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
Multi-pass LM-Integrated Decoding	More specifically, a bigram decoding pass is executed forward and backward to figure out the probability of each state.
Multi-pass LM-Integrated Decoding	We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding	If we can afford a bigram decoding pass, the outside cost from a bigram model is conceivably a

bigram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)

6. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Consistent with findings in the literature (Cui et al., 2006; Dave et al., 2003; Gamon and Aue, 2005), on the large corpus of movie review texts, the in-domain-trained system based solely on unigrams had lower accuracy than the similar system trained on bigrams .
Experiments	But the trigrams fared slightly worse than bigrams .
Experiments	On sentences, however, we have observed an inverse pattern: unigrams performed better than bigrams and trigrams.
Factors Affecting System Performance	System runs with unigrams, bigrams , and trigrams as features and with different training set sizes are presented.
Integrating the Corpus-based and Dictionary-based Approaches	In the ensemble of classifiers, they used a combination of nine SVM-based classifiers deployed to learn unigrams, bigrams , and trigrams on three different domains, while the fourth domain was used as an evaluation set.

bigram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

7. Grounded Language Modeling for Automatic Speech Recognition of Sports Video

Fleischman, Michael and Roy, Deb

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	The remaining 93 unlabeled games are used to train unigram, bigram , and trigram grounded language models.
Evaluation	Only unigrams, bigrams , and tri—grams that are not proper names, appear greater than three times, and are not composed only of stop words were used.
Evaluation	with traditional unigram, bigram , and trigram language models generated from a combination of the closed captioning transcripts of all training games and data from the switchboard corpus (see below).
Linguistic Mapping	Estimating bigram and trigram models can be done by processing on word pairs or triples, and performing normalization on the resulting conditional distributions.

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

8. Hedge Classification in Biomedical Texts with a Weakly Supervised Selection of Keywords

Szarvas, Gy"orgy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	0 The extension of the feature representation used by previous works with bigrams and trigrams and an evaluation of the benefit of using longer keywords in hedge classification.
Methods	For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
Results	These keywords were many times used in a mathematical context (referring to probabilities) and thus expressed no speculative meaning, while such uses were not represented in the FlyBase articles (otherwise bigram or trigram features could have captured these non-speculative uses).
Results	One third of the features used by our advanced model were either bigrams or trigrams.
Results	Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).

bigram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. Name Translation in Statistical Machine Translation - Learning When to Transliterate

Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning what to transliterate	From the stat section we collect statistics as to how often every word, bigram or trigram occurs, and what distribution of name/non-name patterns these ngrams have.
Learning what to transliterate	The name distribution bigram
Learning what to transliterate	stat corpus bitext, the first word is a marked up as a non-name (”0”) and the second as a name (”1”), which strongly suggests that in such a bigram context, aljzyre better be translated as island or peninsula, and not be transliterated as Al-Jazeera.
Transliterator	The same consonant skeleton indexing process is applied to name bigrams (47,700,548 unique with 167,398,054 skeletons) and trigrams (46,543,712 unique with 165,536,451 skeletons).

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

10. Learning to Rank Answers on Large Online QA Collections

Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach	Bigrams (B) - the text is represented as a bag of bigrams (larger n-grams did not help).
Approach	Generalized bigrams (Bg) - same as above, but the words are generalized to their WNSS.
Experiments	The first chosen feature is the translation probability computed between the B 9 question and answer representations ( bigrams with words generalized to their WNSS tags).
Experiments	This is caused by the fact that the BM25 formula is less forgiving with errors of the NLP processors (due to the high idf scores assigned to bigrams and dependencies), and the WNSS tagger is the least robust component in our pipeline.

bigram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Syllabification with Structured SVMs	For example, the bigram bl frequently occurs within a single English syllable, while the bigram lb generally straddles two syllables.
Syllabification with Structured SVMs	Thus, in addition to the single-letter features outlined above, we also include in our representation any bigrams , trigrams, four-grams, and five-grams that fit inside our context window.
Syllabification with Structured SVMs	As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

12. Forest Reranking: Discriminative Parsing with Non-Local Features

Huang, Liang

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Local instances NonLocal instances Rule 10, 85 1 ParentRule 18, 019 Word 20, 328 WProj 27, 417 WordEdges 454, 101 Heads 70, 013 CoLenPar 22 HeadTree 67, 836 Bigram <> 10, 292 Heavy 1, 401 Trigram<> 24, 677 NGramTree 67, 559 HeadMod<> 12, 047 RightBranch 2 DistMod<> 16, 017 Total Feature Instances: 800, 582
Experiments	We also restricted NGramTree to be on bigrams only.
Forest Reranking	2 (d) returns the minimum tree fragement spanning a bigram , in this case “saw” and “the”, and should thus be computed at the smallest common ancestor of the two, which is the VP node in this example.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

13. Using Adaptor Grammars to Identify Synergies in the Unsupervised Acquisition of Linguistic Structure

Johnson, Mark

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and future work	We then investigated adaptor grammars that incorporate one additional kind of information, and found that modeling collocations provides the greatest improvement in word segmentation accuracy, resulting in a model that seems to capture many of the same interword dependencies as the bigram model of Goldwater et al.
Word segmentation with adaptor grammars	It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance.
Word segmentation with adaptor grammars	This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

14. Simple Semi-supervised Dependency Parsing

Koo, Terry and Carreras, Xavier and Collins, Michael

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background 2.1 Dependency parsing	The algorithm then repeatedly merges the pair of clusters which causes the smallest decrease in the likelihood of the text corpus, according to a class-based bigram language model defined on the word clusters.
Conclusions	To begin, recall that the Brown clustering algorithm is based on a bigram language model.
Feature design	(2005a), and consists of indicator functions for combinations of words and parts of speech for the head and modifier of each dependency, as well as certain contextual tokens.1 Our second-order baseline features are the same as those of Carreras (2007) and include indicators for triples of part of speech tags for sibling interactions and grandparent interactions, as well as additional bigram features based on pairs of words involved these higher-order interactions.

bigram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: