Index of papers in Proc. ACL 2008 that mention

n-grams

Seen in text as:

n-grams (75)
n-gram’s (4)

Seen in 78 sentences in 7 papers.

1. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters.
Introduction	Parameters that are stored in the model are retrieved without error; however, false positives may occur whereby n-grams not in the model are incorrectly ‘found’ when requested.
Introduction	We encode fingerprints (random hashes) of n-grams together with their associated probabilities using a perfect hash function generated at random (Majewski et al., 1996).
Perfect Hash-based Language Models	We assume the n-grams and their associated parameter values have been precomputed and stored on disk.
Perfect Hash-based Language Models	We then encode the model in an array such that each n-gram’s value can be retrieved.
Perfect Hash-based Language Models	The model uses randomization to map n-grams to fingerprints and to generate a perfect hash function that associates n-grams with their values.
Scaling Language Models	In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary.
Scaling Language Models	Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models	However, if we are willing to accept that occasionally our model will be unable to distinguish between distinct n-grams , then it is possible to store

n-grams is mentioned in 46 sentences in this paper.

Topics mentioned in this paper:

2. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc.
Introduction	In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
Introduction	Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other.
Metric Design Considerations	We note, however, that matches between items (such as words, n-grams , etc.)
Metric Design Considerations	In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair.
Metric Design Considerations	Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match.

n-grams is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

3. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	The maximum size of a context pattern depends on the size of n-grams available in the data.
Methodology	In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five.
Methodology	Shorter n-grams were not found to improve performance on development data and hence are not extracted.

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

4. Correcting Misuse of Verb Forms

Lee, John and Seneff, Stephanie

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data 5.1 Development Data	For disambiguation with n-grams (see §3.3), we made use of the WEB lT 5-GRAM corpus.
Data 5.1 Development Data	Prepared by Google Inc., it contains English n-grams , up to 5-grams, with their observed frequency counts from a large number of web pages.
Experiments	Table 6: The n-grams used for filtering, with examples of sentences which they are intended to differentiate.
Experiments	6.2 Disambiguation with N-grams
Experiments	For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

5. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Generic Phrase Training Procedure	Usually all n-grams up to a predefined length limit are considered as candidate phrases.
Features	First, due to data sparsity and/or alignment model’s capability, there would exist n-grams that cannot be aligned
Features	well, for instance, n-grams that are part of a paraphrase translation or metaphorical expression.
Features	Extracting candidate translations for such kind of n-grams for the sake of improving coverage (recall) might hurt translation quality (precision).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

6. A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Titov, Ivan and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Model	The first score is computed on the basis of all the n-grams , but using a common set of weights independent of the aspect a.
The Model	Another score is computed only using n-grams associated with the related topic, but an aspect-specific set of weights is used in this computation.
The Model	b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams , J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

7. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Use of longer n-grams improves translation results, but exacerbates this interaction.
Introduction	We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
Multi-pass LM-Integrated Decoding	where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)