Index of papers in Proc. ACL 2008 that mention

n-gram

Seen in text as:

n-gram (115)

Seen in 108 sentences in 13 papers.

1. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automatic Evaluation Metrics	number of n-gram matches of the system translation against one or more reference translations.
Automatic Evaluation Metrics	Generally, more n-gram matches result in a higher BLEU score.
Automatic Evaluation Metrics	When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision.
Metric Design Considerations	4.1 Using N-gram Information
Metric Design Considerations	Lemma and POS match Representing each n-gram by its sequence of lemma and POS-tag pairs, we first try to perform an exact match in both lemma and POS-tag.
Metric Design Considerations	In all our n-gram matching, each n-gram in the system translation can only match at most one n-gram in the reference translation.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

2. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning.
Introduction	LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus.
Introduction	Recent work (Talbot and Osborne, 2007b) has demonstrated that randomized encodings can be used to represent n-gram counts for LMs with signficant space-savings, circumventing information-theoretic constraints on lossless data structures by allowing errors with some small probability.
Introduction	Lookup is very efficient: the values of 3 cells in a large array are combined with the fingerprint of an n-gram .
Perfect Hash-based Language Models	The model can erroneously return a value for an n-gram that was never actually stored, but will always return the correct value for an n-gram that is in the model.
Perfect Hash-based Language Models	Each n-gram sci is drawn from some set of possible n-grams LI and its associated value from a corresponding set of possible values V.
Perfect Hash-based Language Models	We do not store the n-grams and their probabilities directly but rather encode a fingerprint of each n-gram f together with its associated value in such a way that the value can be retrieved when the model is queried with the n-gram sci.
Scaling Language Models	These are typically n-gram models that approximate the probability of a word sequence by assuming each token to be independent of all but n — l preceding tokens.
Scaling Language Models	Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models	While the approach results in significant space savings, working with corpus statistics, rather than n-gram probabilities directly, is computationally less efficient (particularly in a distributed setting) and introduces a dependency on the smoothing scheme used.

n-gram is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

3. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Decoding to Maximize BLEU	BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU.
Introduction	This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
Language Model Integrated Decoding for SCFG	We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
Language Model Integrated Decoding for SCFG	Without an n-gram language model, decoding using SCFG is not much different from CFG parsing.
Language Model Integrated Decoding for SCFG	However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
Multi-pass LM-Integrated Decoding	We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding	We can make the approximation even closer by incorporating local higher-order outside n-gram information for a state of X[z',j, u1,,,,n_1, v1,,,,n_1] into account.
Multi-pass LM-Integrated Decoding	where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.

n-gram is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)

4. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We smooth by adding 40 to all counts, equal to the minimum count in the n-gram data.
Introduction	For example, in our n-gram collection (Section 3.4), “make it in advance” and “make them in advance” occur roughly the same number of times (442 vs. 449), indicating a referential pattern.
Methodology	We gather pattern fillers from a large collection of n-gram frequencies.
Methodology	We do the same processing to our n-gram corpus.
Methodology	Also, other pronouns in the pattern are allowed to match a corresponding pronoun in an n-gram , regardless of differences in inflection and class.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

5. Hypertagging: Supertagging for Surface Realization with CCG

Espinosa, Dominic and White, Michael and Mehay, Dennis

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class.
Background	In the anytime mode, a best-first search is performed with a con-figurable time limit: the scores assigned by the n-gram model determine the order of the edges on the agenda, and thus have an impact on realization speed.
Background	the one that covers the most elementary predications in the input logical form, with ties broken according to the n-gram score.
The Approach	Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

logical form (13)
POS tags (13)
CCG (11)

6. Correcting Misuse of Verb Forms

Lee, John and Seneff, Stephanie

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To improve recall, irregularities in parse trees caused by verb form errors are taken into account; to improve precision, n-gram counts are utilized to filter proposed corrections.
Experiments	For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM
Experiments	Some kind of confidence measure on the n-gram counts might be appropriate for reducing such false alarms.
Introduction	To improve recall, irregularities in parse trees caused by verb form errors are considered; to improve precision, n-gram counts are utilized to filter proposed corrections.
Research Issues	We propose using n-gram counts as a filter to counter this kind of overgeneralization.
Research Issues	A second goal is to show that n-gram counts can eflectively serve as a filter; in order to increase precision.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

7. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Class-Based Language Modeling	Generalizing this leads to arbitrary order class-based n-gram models of the form:
Introduction	In the case of n-gram language models this is done by factoring the probability:
Introduction	do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1).
Introduction	However, in the area of statistical machine translation, especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006).

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

8. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
Introduction	In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
Our Method	For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
Our Method	Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

9. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Due to lower frequency of higher-order n-grams (as opposed to unigrams), higher-order n-gram language models are more sparse, which increases the probability of missing a particular sentiment marker in a sentence (Table 33).
Experiments	training sets are required to overcome this higher n-gram sparseness in sentence-level annotation.
Experiments	results depends on the genre and size of the n-gram : on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025).

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

10. Enriching Morphologically Poor Languages for Statistical Machine Translation

Avramidis, Eleftherios and Koehn, Philipp

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The NIST metric clearly shows a significant improvement, because it mostly measures difficult n-gram matches (e. g. due to the long-distance rules we have been dealing with).
Experiments	In n-gram based metrics, the scores for all words are equally weighted, so mistakes on crucial sentence constituents may be penalized the same as errors on redundant or meaningless words (Callison-Burch et al., 2006).
Introduction	Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam-

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

11. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
Syllabification with Structured SVMs	In addition to these primary n-gram features, we experimented with linguistically-derived features.
Syllabification with Structured SVMs	We believe that this is caused by the ability of the SVM to learn such generalizations from the n-gram features alone.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

12. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	Trying to find phrase translations for any possible n-gram is not a good idea for two reasons.
Features	We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence.
Features	Now we turn to monolingual resources to evaluate the quality of an n-gram being a good phrase.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

13. A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Titov, Ivan and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Model	In this model the distribution of the overall sentiment rating you is based on all the n-gram features of a review text.
The Model	Then the distribution of ya, for every rated aspect a, can be computed from the distribution of you and from any n-gram feature where at least one word in the n-gram is assigned to the associated aspect topic (7“ 2 Zoe, 2 = a).
The Model	b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams, J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: