Index of papers in Proc. ACL 2008 that mention
  • n-grams
Talbot, David and Brants, Thorsten
Abstract
We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters.
Introduction
Parameters that are stored in the model are retrieved without error; however, false positives may occur whereby n-grams not in the model are incorrectly ‘found’ when requested.
Introduction
We encode fingerprints (random hashes) of n-grams together with their associated probabilities using a perfect hash function generated at random (Majewski et al., 1996).
Perfect Hash-based Language Models
We assume the n-grams and their associated parameter values have been precomputed and stored on disk.
Perfect Hash-based Language Models
We then encode the model in an array such that each n-gram’s value can be retrieved.
Perfect Hash-based Language Models
The model uses randomization to map n-grams to fingerprints and to generate a perfect hash function that associates n-grams with their values.
Scaling Language Models
In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary.
Scaling Language Models
Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models
However, if we are willing to accept that occasionally our model will be unable to distinguish between distinct n-grams , then it is possible to store
n-grams is mentioned in 46 sentences in this paper.
Topics mentioned in this paper:
Chan, Yee Seng and Ng, Hwee Tou
Abstract
This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc.
Introduction
In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
Introduction
Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other.
Metric Design Considerations
We note, however, that matches between items (such as words, n-grams , etc.)
Metric Design Considerations
In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair.
Metric Design Considerations
Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match.
n-grams is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Methodology
The maximum size of a context pattern depends on the size of n-grams available in the data.
Methodology
In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five.
Methodology
Shorter n-grams were not found to improve performance on development data and hence are not extracted.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Seneff, Stephanie
Data 5.1 Development Data
For disambiguation with n-grams (see §3.3), we made use of the WEB lT 5-GRAM corpus.
Data 5.1 Development Data
Prepared by Google Inc., it contains English n-grams , up to 5-grams, with their observed frequency counts from a large number of web pages.
Experiments
Table 6: The n-grams used for filtering, with examples of sentences which they are intended to differentiate.
Experiments
6.2 Disambiguation with N-grams
Experiments
For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
Usually all n-grams up to a predefined length limit are considered as candidate phrases.
Features
First, due to data sparsity and/or alignment model’s capability, there would exist n-grams that cannot be aligned
Features
well, for instance, n-grams that are part of a paraphrase translation or metaphorical expression.
Features
Extracting candidate translations for such kind of n-grams for the sake of improving coverage (recall) might hurt translation quality (precision).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and McDonald, Ryan
The Model
The first score is computed on the basis of all the n-grams , but using a common set of weights independent of the aspect a.
The Model
Another score is computed only using n-grams associated with the related topic, but an aspect-specific set of weights is used in this computation.
The Model
b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams , J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Introduction
Use of longer n-grams improves translation results, but exacerbates this interaction.
Introduction
We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
Multi-pass LM-Integrated Decoding
where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: