Index of papers in Proc. ACL 2009 that mention
  • n-grams
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
Un-igrams are produced by lexical rules, while higher-order n-grams can be produced either directly by lexical rules, or by combining constituents.
Computing Feature Expectations
The n-gram language model score of e similarly decomposes over the h in e that produce n-grams .
Computing Feature Expectations
The linear similarity measure takes the following form, where Tn is the set of n-grams:
Experimental Results
Above, we compare the precision, relative to reference translations, of sets of n-grams chosen in two ways.
Experimental Results
The left bar is the precision of the n-grams in 6*.
Experimental Results
The right bar is the precision of n-grams with E[c(6, 75)] > p. To justify this comparison, we chose p so that both methods of choosing 71- grams gave the same n-gram recall: the fraction of n-grams in reference translations that also appeared in 6* or had E[6(6, 75)] > p.
n-grams is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
MERT for MBR Parameter Optimization
This linear function contains n + 1 parameters 60, 61, ..., 6 N, where N is the maximum order of the n-grams involved.
Minimum Bayes-Risk Decoding
First, the set of n-grams is extracted from the lattice.
Minimum Bayes-Risk Decoding
For a moderately large lattice, there can be several thousands of n-grams and the procedure becomes expensive.
Minimum Bayes-Risk Decoding
For each node 75 in the lattice, we maintain a quantity Score(w, t) for each n-gram 212 that lies on a path from the source node to t. Score(w, t) is the highest posterior probability among all edges on the paths that terminate on t and contain n-gram w. The forward pass requires computing the n-grams introduced by each edge; to do this, we propagate n-grams (up to maximum order — l) terminating on each node.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
Here we do not discriminate among different lexical n-grams and are only concerned with statistics aggregation of all n-grams of the same order.
Collaborative Decoding
Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Collaborative Decoding
So the corresponding feature value will be the expected number of occurrences in 17-[k (f) of all n-grams in e:
Discussion
Our method uses agreement information of n-grams , and consensus features are integrated into decoding models.
Experiments
In Table 5 we show in another dimension the impact of consensus-based features by restricting the maximum order of n-grams used to compute agreement statistics.
Experiments
One reason could be that the data sparsity for high-order n-grams leads to over fitting on development data.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Variational Approximate Decoding
whose edges correspond to n-grams (weighted with negative log-probabilities) and whose vertices correspond to (n — l)-grams.
Variational Approximate Decoding
This may be regarded as favoring n-grams that are likely to appear in the reference translation (because they are likely in the derivation forest).
Variational Approximate Decoding
However, in order to score well on the BLEU metric for MT evaluation (Papineni et al., 2001), which gives partial credit, we would also like to favor lower-order n-grams that are likely to appear in the reference, even if this means picking some less-likely high-order n-grams .
Variational vs. Min-Risk Decoding
Now, let us divide N, which contains n-gram types of different n, into several subsets Wn, each of which contains only the n-grams with a given length n. We can now rewrite (19) as follows,
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ceylan, Hakan and Kim, Yookyung
Introduction
Rank OIflGR using words —-- ——onte‘Carfo usin - rams --» ---d/x 7,‘ Monte Carlo using words w ~ 7 ~ ’ Relative Entropy using N-grams — —~—-’ IRelative EIntropy usin words If r r w
Related Work
The n-gram based approaches are based on the counts of character or byte n-grams , which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work
(Dunning, 1994) proposed a system that uses Markov Chains of byte n-grams with Bayesian Decision Rules to minimize the probability error.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang
Introduction
(Miller et al., 1999; Song and Croft, 1999) explore the use n-grams in retrieval models.
Previous Work
Subsequently, various types of phrases, such as sequential n-grams (Mitra et al., 1997), head-modifier pairs extracted from syntactic structures (Lewis and Croft, 1990; Zhai, 1997; Dillon and Gray, 1983; Strzalkowski et al., 1994), proximity-based phrases (Turpin and Mof—fat, 1999), were examined with conventional retrieval models (e. g. vector space model).
Previous Work
(Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: