Index of papers in Proc. ACL 2008 that mention
  • bigram
Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun
Abstract
The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model), or according to its expected impact on the retrieval effectiveness (using a regression model).
Bigram Expansion Model for Alteration Selection
The query context is modeled by a bigram language model as in (Peng et al.
Bigram Expansion Model for Alteration Selection
In this work, we used bigram language model to calculate the probability of each path.
Introduction
The query context is modeled by a bigram language model.
Introduction
which is the most coherent with the bigram model.
Introduction
We call this model Bigram Expansion.
Related Work
2007), a bigram language model is used to determine the alteration of the head word that best fits the query.
Related Work
In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates.
bigram is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Chan, Yee Seng and Ng, Hwee Tou
Metric Design Considerations
Similarly, we also match the bigrams and trigrams of the sentence pair and calculate their corresponding Fmean scores.
Metric Design Considerations
where in our experiments, we set N =3, representing calculation of unigram, bigram , and trigram scores.
Metric Design Considerations
To determine the number matchbi of bigram matches, a system bigram (lsipsi, l8i+1psi+1) matches a reference bigram (lripme +1 pm.
bigram is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Dickinson, Markus
Evaluation
The results are shown in figure 1 for the whole daughters scoring method and in figure 2 for the bigram method.
Evaluation
Figure 2: Bigram ungeneralizability (devo.)
Rule dissimilarity and generalizability
To do this, we can examine the weakest parts of each rule and compare those across the corpus, to see which anomalous patterns emerge; we do this in the Bigram scoring section below.
Rule dissimilarity and generalizability
Bigram scoring The other method of detecting ad hoc rules calculates reliability scores by focusing specifically on what the classes do not have in common.
Rule dissimilarity and generalizability
We abstract to bigrams , including added START and END tags, as longer sequences risk missing generalizations; e. g., unary rules would have no comparable rules.
bigram is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Distributed Clustering
With each word w in one of these sets, all words 1) preceding w in the corpus are stored with the respective bigram count N (v, w).
Distributed Clustering
While the greedy non-distributed exchange algorithm is guaranteed to converge as each exchange increases the log likelihood of the assumed bigram model, this is not necessarily true for the distributed exchange algorithm.
Exchange Clustering
Beginning with an initial clustering, the algorithm greedily maximizes the log likelihood of a two-sided class bigram or trigram model as described in Eq.
Exchange Clustering
With N g” and N fuc denoting the average number of clusters preceding and succeeding another cluster, B denoting the number of distinct bigrams in the training corpus, and I denoting the number of iterations, the worst case complexity of the algorithm is in:
Exchange Clustering
When using large corpora With large numbers of bigrams the number of required updates can increase towards the quadratic upper bound as N?” and N 5% approach NC.
Predictive Exchange Clustering
Modifying the exchange algorithm in order to optimize the log likelihood of a predictive class bigram model, leads to substantial performance improvements, similar to those previously reported for another type of one-sided class model in (Whittaker and Woodland, 2001).
Predictive Exchange Clustering
We use a predictive class bigram model as given in Eq.
Predictive Exchange Clustering
Then the following optimization criterion can be derived, with F(C) being the log likelihood function of the predictive class bigram model given a clustering C:
bigram is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Abstract
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Abstract
The trigram pass closes most of the performance gap between a bigram decoder and a much slower trigram decoder, but takes time that is insignificant in comparison to the bigram pass.
Decoding to Maximize BLEU
The outside-pass Algorithm 1 for bigram decoding can be generalized to the trigram case.
Experiments
Hyperedges BLEU Bigram Pass 167K 21.77 Trigram Pass UNI — —BO + 629.7K=796.7K 23.56 BO+BB +2.7K =169.
Introduction
First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model, and the second pass expands these states into trigram-based
Introduction
With this heuristic, we achieve the same BLEU scores and model cost as a trigram decoder with essentially the same speed as a bigram decoder.
Multi-pass LM-Integrated Decoding
More specifically, a bigram decoding pass is executed forward and backward to figure out the probability of each state.
Multi-pass LM-Integrated Decoding
We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding
If we can afford a bigram decoding pass, the outside cost from a bigram model is conceivably a
bigram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Andreevskaia, Alina and Bergler, Sabine
Experiments
Consistent with findings in the literature (Cui et al., 2006; Dave et al., 2003; Gamon and Aue, 2005), on the large corpus of movie review texts, the in-domain-trained system based solely on unigrams had lower accuracy than the similar system trained on bigrams .
Experiments
But the trigrams fared slightly worse than bigrams .
Experiments
On sentences, however, we have observed an inverse pattern: unigrams performed better than bigrams and trigrams.
Factors Affecting System Performance
System runs with unigrams, bigrams , and trigrams as features and with different training set sizes are presented.
Integrating the Corpus-based and Dictionary-based Approaches
In the ensemble of classifiers, they used a combination of nine SVM-based classifiers deployed to learn unigrams, bigrams , and trigrams on three different domains, while the fourth domain was used as an evaluation set.
bigram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Fleischman, Michael and Roy, Deb
Evaluation
The remaining 93 unlabeled games are used to train unigram, bigram , and trigram grounded language models.
Evaluation
Only unigrams, bigrams , and tri—grams that are not proper names, appear greater than three times, and are not composed only of stop words were used.
Evaluation
with traditional unigram, bigram , and trigram language models generated from a combination of the closed captioning transcripts of all training games and data from the switchboard corpus (see below).
Linguistic Mapping
Estimating bigram and trigram models can be done by processing on word pairs or triples, and performing normalization on the resulting conditional distributions.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Szarvas, Gy"orgy
Introduction
0 The extension of the feature representation used by previous works with bigrams and trigrams and an evaluation of the benefit of using longer keywords in hedge classification.
Methods
For trigrams, bigrams and unigrams — processed separately — we calculated a new class-conditional probability for each feature cc, discarding those observations of c in speculative instances where c was not among the two highest ranked candidate.
Results
These keywords were many times used in a mathematical context (referring to probabilities) and thus expressed no speculative meaning, while such uses were not represented in the FlyBase articles (otherwise bigram or trigram features could have captured these non-speculative uses).
Results
One third of the features used by our advanced model were either bigrams or trigrams.
Results
Our model using just unigram features achieved a BEP(spec) score of 78.68% and F5=1(spec) score of 80.23%, which means that using bigram and trigram hedge cues here significantly improved the performance (the difference in BEP (spec) and F5=1(spec) scores were 5.23% and 4.97%, respectively).
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hermjakob, Ulf and Knight, Kevin and Daumé III, Hal
Learning what to transliterate
From the stat section we collect statistics as to how often every word, bigram or trigram occurs, and what distribution of name/non-name patterns these ngrams have.
Learning what to transliterate
The name distribution bigram
Learning what to transliterate
stat corpus bitext, the first word is a marked up as a non-name (”0”) and the second as a name (”1”), which strongly suggests that in such a bigram context, aljzyre better be translated as island or peninsula, and not be transliterated as Al-Jazeera.
Transliterator
The same consonant skeleton indexing process is applied to name bigrams (47,700,548 unique with 167,398,054 skeletons) and trigrams (46,543,712 unique with 165,536,451 skeletons).
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Surdeanu, Mihai and Ciaramita, Massimiliano and Zaragoza, Hugo
Approach
Bigrams (B) - the text is represented as a bag of bigrams (larger n-grams did not help).
Approach
Generalized bigrams (Bg) - same as above, but the words are generalized to their WNSS.
Experiments
The first chosen feature is the translation probability computed between the B 9 question and answer representations ( bigrams with words generalized to their WNSS tags).
Experiments
This is caused by the fact that the BM25 formula is less forgiving with errors of the NLP processors (due to the high idf scores assigned to bigrams and dependencies), and the WNSS tagger is the least robust component in our pipeline.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin
Syllabification with Structured SVMs
For example, the bigram bl frequently occurs within a single English syllable, while the bigram lb generally straddles two syllables.
Syllabification with Structured SVMs
Thus, in addition to the single-letter features outlined above, we also include in our representation any bigrams , trigrams, four-grams, and five-grams that fit inside our context window.
Syllabification with Structured SVMs
As is apparent from Figure 2, we see a substantial improvement by adding bigrams to our feature set.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Huang, Liang
Experiments
Local instances NonLocal instances Rule 10, 85 1 ParentRule 18, 019 Word 20, 328 WProj 27, 417 WordEdges 454, 101 Heads 70, 013 CoLenPar 22 HeadTree 67, 836 Bigram <> 10, 292 Heavy 1, 401 Trigram<> 24, 677 NGramTree 67, 559 HeadMod<> 12, 047 RightBranch 2 DistMod<> 16, 017 Total Feature Instances: 800, 582
Experiments
We also restricted NGramTree to be on bigrams only.
Forest Reranking
2 (d) returns the minimum tree fragement spanning a bigram , in this case “saw” and “the”, and should thus be computed at the smallest common ancestor of the two, which is the VP node in this example.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Johnson, Mark
Conclusion and future work
We then investigated adaptor grammars that incorporate one additional kind of information, and found that modeling collocations provides the greatest improvement in word segmentation accuracy, resulting in a model that seems to capture many of the same interword dependencies as the bigram model of Goldwater et al.
Word segmentation with adaptor grammars
It is not possible to write an adaptor grammar that directly implements Goldwater’s bigram word segmentation model because an adaptor grammar has one DP per adapted nonterminal (so the number of DPs is fixed in advance) while Goldwater’s bigram model has one DP per word type, and the number of word types is not known in advance.
Word segmentation with adaptor grammars
This suggests that the collocation word adaptor grammar can capture inter-word dependencies similar to those that improve the performance of Goldwater’s bigram segmentation model.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Koo, Terry and Carreras, Xavier and Collins, Michael
Background 2.1 Dependency parsing
The algorithm then repeatedly merges the pair of clusters which causes the smallest decrease in the likelihood of the text corpus, according to a class-based bigram language model defined on the word clusters.
Conclusions
To begin, recall that the Brown clustering algorithm is based on a bigram language model.
Feature design
(2005a), and consists of indicator functions for combinations of words and parts of speech for the head and modifier of each dependency, as well as certain contextual tokens.1 Our second-order baseline features are the same as those of Carreras (2007) and include indicators for triples of part of speech tags for sibling interactions and grandparent interactions, as well as additional bigram features based on pairs of words involved these higher-order interactions.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: