Index of papers in Proc. ACL 2009 that mention
  • bigram
Ravi, Sujith and Knight, Kevin
Fitting the Model
We still give EM the full word/tag dictionary, but now we constrain its initial grammar model to the 459 tag bigrams identified by IP.
Fitting the Model
In addition to removing many bad tag bigrams from the grammar, IP minimization also removes some of the good ones, leading to lower recall (EM = 0.87, IP+EM = 0.57).
Fitting the Model
During EM training, the smaller grammar with fewer bad tag bigrams helps to restrict the dictionary model from making too many bad choices that EM made earlier.
Small Models
That is, there exists a tag sequence that contains 459 distinct tag bigrams , and no other tag sequence contains fewer.
Small Models
We also create variables for every possible tag bigram and word/tag dictionary entry.
What goes wrong with EM?
We investigate the Viterbi tag sequence generated by EM training and count how many distinct tag bigrams there are in that sequence.
bigram is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Experiments
Bigram and trigram performances are similar for Chinese, but trigram performs better for Japanese.
Experiments
In fact, although the difference in perplexity per character is not so large, the perplexity per word is radically reduced: 439.8 ( bigram ) to 190.1 (trigram).
Inference
Furthermore, it has an inherent limitation that it cannot deal with larger than bigrams , because it uses only local statistics between directly contiguous words for word segmentation.
Inference
It has an additional advantage in that we can accommodate higher-order relationships than bigrams , particularly trigrams, for word segmentation.
Inference
For this purpose, we maintain a forward variable a[t] in the bigram case.
Introduction
Crucially, since they rely on sampling a word boundary between two neighboring words, they can leverage only up to bigram word dependencies.
Pitman-Yor process and n-gram models
The bigram distribution G2 = {
bigram is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki
A Syntax Free Sequence-oriented Sentence Compression Method
1 if 1(yj) = (yj—1)+ 1 APLM Bigram (w71(yj)71(yj—1)) (5) otherwise
A Syntax Free Sequence-oriented Sentence Compression Method
Here, 0 g APLM g l, Bigram(-) indicates word bigram probability.
A Syntax Free Sequence-oriented Sentence Compression Method
The first line of equation (5) agrees with Jing’s observation on sentence alignment tasks (Jing and McKeown, 1999); that is, most (or almost all) bigrams in a compressed sentence appear in the original sentence as they are.
Experimental Evaluation
For example, label ‘w/o IPTW + Dep’ employs IDF term weighting as function and word bigram, part-of-speech bigram and dependency probability between words as function in equation (1).
Results and Discussion
Replacing PLM with the bigram language model (w/o PLM) degrades the performance significantly.
Results and Discussion
Most bigrams in a compressed sentence followed those in the source sentence.
Results and Discussion
PLM is similar to dependency probability in that both features emphasize word pairs that occurred as bigrams in the source sentence.
bigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kruengkrai, Canasai and Uchimoto, Kiyotaka and Kazama, Jun'ichi and Wang, Yiou and Torisawa, Kentaro and Isahara, Hitoshi
Training method
We broadly classify features into two categories: unigram and bigram features.
Training method
TBO (TE (w_1)) T31 <TE(w—1)7P0> T32 <TE(w—1),P—1,P0> TB3 (TE(w_1),TB(w0)) <TE(w—1)7T3(w0)7p0> <TE(w—1)7p—17T3(w0)> T36 <TE(w—1)7p—17TB(w0)7p0> CBO <p_1, p0) otherwise Table 3: Bigram features.
Training method
Bigram features: Table 3 shows our bigram features.
bigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola
Experiments
Bigram based reordering.
Experiments
First we consider a bigram Language Model and the algorithms try to find the reordering that maximizes the LM score.
Experiments
This means that, when using a bigram language model, it is often possible to reorder the words of a randomly permuted reference sentence in such a way that the LM score of the reordered sentence is larger than the LM of the reference.
Phrase-based Decoding as TSP
o The language model cost of producing the target words of 19’ right after the target words of b; with a bigram language model, this cost can be precomputed directly from b and b’.
Phrase-based Decoding as TSP
This restriction to bigram models will be removed in Section 4.1.
Phrase-based Decoding as TSP
4.1 From Bigram to N-gram LM
bigram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Experimental Results
Moreover, a bigram (i.e., “2gram”) achieves the best BLEU scores among the four different orders of VMs.
Experimental Results
This is necessarily true, but it is interesting to see that most of the improvement is obtained just by moving from a unigram to a bigram model.
Experimental Results
Indeed, although Table 3 shows that better approximations can be obtained by using higher-order models, the best BLEU score in Tables 2a and 2c was obtained by the bigram model.
Variational Approximate Decoding
However, because q* only approximates p, y* of (13) may be locally appropriate but globally inadequate as a translation of c. Observe, e. g., that an n-gram model q* will tend to favor short strings y, regardless of the length of c. Suppose cc 2 le chat chasse la souris (“the cat chases the mouse”) and q* is a bigram approximation to p(y | Presumably q* (the | START), q* (mouse | the), and q*(END | mouse) are all large in So the most probable string y* under q* may be simply “the mouse,” which is short and has a high probability but fails to cover cc.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang
Experiments
Phrase extraction and indexing: We evaluate our proposed method on two different types of phrases: syntactic head-modifier pairs (syntactic phrases) and simple bigram phrases (statistical phrases).
Experiments
Since our method is not limited to a particular type of phrases, we have also conducted experiments on statistical phrases ( bigrams ) with a reduced set of features directed applicable; RMO, RSO, PD5, DF, and CPP; the features requiring linguistic preprocessing (e. g. PPT) are not used, because it is unrealistic to use them under bigram—based retrieval setting.
Experiments
5In most cases, the distance between words in a bigram is l, but sometimes, it could be more than 1 because of the effect of stopword removal.
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
Hyper-edges (boxes) are annotated with normalized transition probabilities, as well as the bigrams produced by each rule application.
Computing Feature Expectations
The expected count of the bigram “man with” is the sum of posterior probabilities of the two hyper-edges that produce it.
Computing Feature Expectations
5For example, decoding under a variational approximation to the model’s posterior that decomposes over bigram probabilities is equivalent to fast consensus decoding with
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper: