Index of papers in Proc. ACL 2008 that mention
  • n-gram
Chan, Yee Seng and Ng, Hwee Tou
Automatic Evaluation Metrics
number of n-gram matches of the system translation against one or more reference translations.
Automatic Evaluation Metrics
Generally, more n-gram matches result in a higher BLEU score.
Automatic Evaluation Metrics
When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision.
Metric Design Considerations
4.1 Using N-gram Information
Metric Design Considerations
Lemma and POS match Representing each n-gram by its sequence of lemma and POS-tag pairs, we first try to perform an exact match in both lemma and POS-tag.
Metric Design Considerations
In all our n-gram matching, each n-gram in the system translation can only match at most one n-gram in the reference translation.
n-gram is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Talbot, David and Brants, Thorsten
Abstract
The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning.
Introduction
LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus.
Introduction
Recent work (Talbot and Osborne, 2007b) has demonstrated that randomized encodings can be used to represent n-gram counts for LMs with signficant space-savings, circumventing information-theoretic constraints on lossless data structures by allowing errors with some small probability.
Introduction
Lookup is very efficient: the values of 3 cells in a large array are combined with the fingerprint of an n-gram .
Perfect Hash-based Language Models
The model can erroneously return a value for an n-gram that was never actually stored, but will always return the correct value for an n-gram that is in the model.
Perfect Hash-based Language Models
Each n-gram sci is drawn from some set of possible n-grams LI and its associated value from a corresponding set of possible values V.
Perfect Hash-based Language Models
We do not store the n-grams and their probabilities directly but rather encode a fingerprint of each n-gram f together with its associated value in such a way that the value can be retrieved when the model is queried with the n-gram sci.
Scaling Language Models
These are typically n-gram models that approximate the probability of a word sequence by assuming each token to be independent of all but n — l preceding tokens.
Scaling Language Models
Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models
While the approach results in significant space savings, working with corpus statistics, rather than n-gram probabilities directly, is computationally less efficient (particularly in a distributed setting) and introduces a dependency on the smoothing scheme used.
n-gram is mentioned in 36 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Abstract
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Decoding to Maximize BLEU
BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU.
Introduction
This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
Language Model Integrated Decoding for SCFG
We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
Language Model Integrated Decoding for SCFG
Without an n-gram language model, decoding using SCFG is not much different from CFG parsing.
Language Model Integrated Decoding for SCFG
However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
Multi-pass LM-Integrated Decoding
We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding
We can make the approximation even closer by incorporating local higher-order outside n-gram information for a state of X[z',j, u1,,,,n_1, v1,,,,n_1] into account.
Multi-pass LM-Integrated Decoding
where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.
n-gram is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Evaluation
We smooth by adding 40 to all counts, equal to the minimum count in the n-gram data.
Introduction
For example, in our n-gram collection (Section 3.4), “make it in advance” and “make them in advance” occur roughly the same number of times (442 vs. 449), indicating a referential pattern.
Methodology
We gather pattern fillers from a large collection of n-gram frequencies.
Methodology
We do the same processing to our n-gram corpus.
Methodology
Also, other pronouns in the pattern are allowed to match a corresponding pronoun in an n-gram , regardless of differences in inflection and class.
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Espinosa, Dominic and White, Michael and Mehay, Dennis
Background
makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class.
Background
In the anytime mode, a best-first search is performed with a con-figurable time limit: the scores assigned by the n-gram model determine the order of the edges on the agenda, and thus have an impact on realization speed.
Background
the one that covers the most elementary predications in the input logical form, with ties broken according to the n-gram score.
The Approach
Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model.
n-gram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Seneff, Stephanie
Abstract
To improve recall, irregularities in parse trees caused by verb form errors are taken into account; to improve precision, n-gram counts are utilized to filter proposed corrections.
Experiments
For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM
Experiments
Some kind of confidence measure on the n-gram counts might be appropriate for reducing such false alarms.
Introduction
To improve recall, irregularities in parse trees caused by verb form errors are considered; to improve precision, n-gram counts are utilized to filter proposed corrections.
Research Issues
We propose using n-gram counts as a filter to counter this kind of overgeneralization.
Research Issues
A second goal is to show that n-gram counts can eflectively serve as a filter; in order to increase precision.
n-gram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Class-Based Language Modeling
Generalizing this leads to arbitrary order class-based n-gram models of the form:
Introduction
In the case of n-gram language models this is done by factoring the probability:
Introduction
do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1).
Introduction
However, in the area of statistical machine translation, especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006).
n-gram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Experiments
In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
Introduction
In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
Our Method
For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
Our Method
Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Andreevskaia, Alina and Bergler, Sabine
Experiments
Due to lower frequency of higher-order n-grams (as opposed to unigrams), higher-order n-gram language models are more sparse, which increases the probability of missing a particular sentiment marker in a sentence (Table 33).
Experiments
training sets are required to overcome this higher n-gram sparseness in sentence-level annotation.
Experiments
results depends on the genre and size of the n-gram : on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025).
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Avramidis, Eleftherios and Koehn, Philipp
Experiments
The NIST metric clearly shows a significant improvement, because it mostly measures difficult n-gram matches (e. g. due to the long-distance rules we have been dealing with).
Experiments
In n-gram based metrics, the scores for all words are equally weighted, so mistakes on crucial sentence constituents may be penalized the same as errors on redundant or meaningless words (Callison-Burch et al., 2006).
Introduction
Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam-
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin
Related Work
Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
Syllabification with Structured SVMs
In addition to these primary n-gram features, we experimented with linguistically-derived features.
Syllabification with Structured SVMs
We believe that this is caused by the ability of the SVM to learn such generalizations from the n-gram features alone.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
Features
Trying to find phrase translations for any possible n-gram is not a good idea for two reasons.
Features
We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence.
Features
Now we turn to monolingual resources to evaluate the quality of an n-gram being a good phrase.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and McDonald, Ryan
The Model
In this model the distribution of the overall sentiment rating you is based on all the n-gram features of a review text.
The Model
Then the distribution of ya, for every rated aspect a, can be computed from the distribution of you and from any n-gram feature where at least one word in the n-gram is assigned to the associated aspect topic (7“ 2 Zoe, 2 = a).
The Model
b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams, J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: