Index of papers in Proc. ACL 2010 that mention
  • bigram
Dickinson, Markus
Ad hoc rule detection
3.4 Bigram anomalies 3.4.1 Motivation
Ad hoc rule detection
The bigram method examines relationships between adjacent sisters, complementing the whole rule method by focusing on local properties.
Ad hoc rule detection
But only the final elements have anomalous bigrams : HD:ID IR:IR, IR:IR ANzRO, and ANzRO J RzIR all never occur.
Additional information
This rule is entirely correct, yet the XXzXX position has low whole rule and bigram scores.
Approach
First, the bigram method abstracts a rule to its bigrams .
Evaluation
For example, the bigram method with a threshold of 39 leads to finding 283 errors (455 x .622).
Evaluation
The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre-
Introduction and Motivation
We propose to flag erroneous parse rules, using information which reflects different grammatical properties: POS lookup, bigram information, and full rule comparisons.
bigram is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Baldridge, Jason and Knight, Kevin
Introduction
Most methods have employed some variant of Expectation Maximization (EM) to learn parameters for a bigram
Introduction
Ravi and Knight (2009) achieved the best results thus far (92.3% word token accuracy) via a Minimum Description Length approach using an integer program (IP) that finds a minimal bigram grammar that obeys the tag dictionary constraints and covers the observed data.
Minimized models for supertagging
The 1241 distinct supertags in the tagset result in 1.5 million tag bigram entries in the model and the dictionary contains almost 3.5 million word/tag pairs that are relevant to the test data.
Minimized models for supertagging
The set of 45 P08 tags for the same data yields 2025 tag bigrams and 8910 dictionary entries.
Minimized models for supertagging
Our objective is to find the smallest supertag grammar (of tag bigram types) that explains the entire text while obeying the lexicon’s constraints.
bigram is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Hall, David and Klein, Dan
Experiments
We follow prior work and use sets of bigrams within words.
Experiments
In our case, during bipartite matching the set X is the set of bigrams in the language being re-permuted, and Y is the union of bigrams in the other languages.
Experiments
Besides the heuristic baseline, we tried our model-based approach using Unigrams, Bigrams and Anchored Unigrams, with and without learning the parametric edit distances.
Message Approximation
Figure 2: Various topologies for approximating topologies: (a) a unigram model, (b) a bigram model, (c) the anchored uni gram model, and (d) the n-best plus backoff model used in Dreyer and Eisner (2009).
Message Approximation
The first is a plain unigram model, the second is a bigram model, and the third is an anchored unigram topology: a position-specific unigram model for each position up to some maximum length.
Message Approximation
The second topology we consider is the bigram topology, illustrated in Figure 2(b).
bigram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bicknell, Klinton and Levy, Roger
Simulation 1
Our reader’s language model was an unsmoothed bigram model created using a vocabulary set con-
Simulation 1
From this vocabulary, we constructed a bigram model using the counts from every bigram in the BNC for which both words were in vocabulary (about 222,000 bigrams ).
Simulation 1
Specifically, we constructed the model’s initial belief state (i.e., the distribution over sentences given by its language model) by directly translating the bigram model into a wFSA in the log semiring.
Simulation 2
Instead, we begin with the same set of bigrams used in Sim.
Simulation 2
1 — i.e., those that contain two in-vocabulary words — and trim this set by removing rare bigrams that occur less than 200 times in the BNC (except that we do not trim any bigrams that occur in our test corpus).
Simulation 2
This reduces our set of bigrams to about 19,000.
bigram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Lavergne, Thomas and Cappé, Olivier and Yvon, François
Conditional Random Fields
In the sequel, we distinguish between two types of feature functions: unigramfea-tures fyflc, associated with parameters Hwy, and bigram features fy/Mgc, associated with parameters Aggy)?
Conditional Random Fields
On the other hand, bigram features {fy/,y,$}(y,$)€y2xX are helpful in modelling dependencies between successive labels.
Conditional Random Fields
Assume the set of bigram features {Ag/,y,$t+l}(y/,y)€y2 is sparse with only r(:ct+1) << |Y 2 non null values and define the |Y| >< |Y| sparse matrix
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Navigli, Roberto and Velardi, Paola
Experiments
0 Bigrams: an implementation of the bigram classifier for soft pattern matching proposed by Cui et al.
Experiments
The probability is calculated as a mixture of bigram and
Experiments
WCL—l 99.88 42.09 59.22 76.06 WCL—3 98.81 60.74 75.23 83.48 Star patterns 86.74 66.14 75.05 81.84 Bigrams 66.70 82.70 73.84 75.80
bigram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kaji, Nobuhiro and Fujiwara, Yasuhiro and Yoshinaga, Naoki and Kitsuregawa, Masaru
Introduction
where we explicitly distinguish the unigram feature function o; and bigram feature function Comparing the form of the two functions, we can see that our discussion on HMMs can be extended to perceptrons by substituting 2k wigbflwwn) and 2k wg¢%(w,yn_1,yn) for logp(:cn|yn) and 10gp(yn|yn—1)-
Introduction
For bigram features, we compute its upper bound offline.
Introduction
The simplest case is that the bigram features are independent of the token sequence :13.
bigram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Experiments and Discussions
We use R-l (recall against unigrams), R-2 (recall against bigrams), and R-SU4 (recall against skip-4 bigrams ).
Experiments and Discussions
Note that R-2 is a measure of bigram recall and sumHLDA of HybHSumg is built on unigrams rather than bigrams .
Regression Model
We similarly include bigram features in the experiments.
Regression Model
We also include bigram extensions of DMF features.
Regression Model
We use sentence bigram frequency, sentence rank in a document, and sentence size as additional fea-
bigram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Clustering-based word representations
The Brown algorithm is a hierarchical clustering algorithm which clusters words to maximize the mutual information of bigrams (Brown et al., 1992).
Clustering-based word representations
So it is a class-based bigram language model.
Clustering-based word representations
One downside of Brown clustering is that it is based solely on bigram statistics, and does not consider word usage in a wider context.
bigram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Laskowski, Kornel
Experiments
Excluding qt_1 2 qt bigrams (leading to 0.32M frames from 2.39M frames in “all”) offers a glimpse of expected performance differences were duration modeling to be included in the models.
Limitations and Desiderata
To produce Figures 1 and 2, a small fraction of probability mass was reserved for unseen bigram transitions (as opposed to backing off to unigram probabilities).
The Extended-Degree-of-Overlap Model
The EDO model mitigates R-specificity because it models each bigram (qt_1, qt) 2 (8,, S j) as the modified bigram (m, [0ij,nj]), involving three scalars each of which is a sum — a commutative (and therefore rotation-invariant) operation.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Shutova, Ekaterina
Automatic Metaphor Recognition
They use hyponymy relation in WordNet and word bigram counts to predict metaphors at a sentence level.
Automatic Metaphor Recognition
Hereby they calculate bigram probabilities of verb-noun and adjective-noun pairs (including the hyponyms/hypernyms of the noun in question).
Automatic Metaphor Recognition
However, by using bigram counts over verb-noun pairs Krishnakumaran and Zhu (2007) loose a great deal of information compared to a system extracting verb-object relations from parsed text.
bigram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: