Index of papers in Proc. ACL 2009 that mention

n-gram

Seen in text as:

n-gram (183)
N-gram (7)

Seen in 179 sentences in 15 papers.

1. The Contribution of Linguistic Features to Automatic Machine Translation Evaluation

Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	However, n-gram based metrics are still today the dominant approach.
Alternatives to Correlation-based Meta-evaluation	Therefore, we have focused on other aspects of metric reliability that have revealed differences between n-gram and linguistic based metrics:
Alternatives to Correlation-based Meta-evaluation	All n-gram based metrics achieve SIP and SIR values between 0.8 and 0.9.
Alternatives to Correlation-based Meta-evaluation	This result suggests that n-gram based metrics are reasonably reliable for this purpose.
Correlation with Human Judgements	Let us first analyze the correlation with human judgements for linguistic vs. n-gram based metrics.
Correlation with Human Judgements	Linguistic metrics are represented by grey plots, and black plots represent metrics based on n-gram overlap.
Correlation with Human Judgements	Therefore, we need additional meta-evaluation criteria in order to clarify the behavior of linguistic metrics as compared to n-gram based metrics.
Introduction	However, the most commonly used metrics are still based on n-gram matching.
Introduction	For that purpose, we compare — using four different test beds — the performance of 16 n-gram based metrics, 48 linguistic metrics and one combined metric from the state of the art.
Previous Work on Machine Translation Meta-Evaluation	All that metrics were based on n-gram overlap.

n-gram is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

2. Language Identification of Search Engine Queries

Ceylan, Hakan and Kim, Yookyung

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Language Identification	We implement a statistical model using a character based n-gram feature.
Language Identification	For each language, we collect the n-gram counts (for n = l to n = 7 also using the word beginning and ending spaces) from the vocabulary of the training corpus, and then generate a probability distribution from these counts.
Language Identification	In other words, this time we used a word-based n-gram method, only with n = 1.
Related Work	Most of the work carried out to date on the written language identification problem consists of supervised approaches that are trained on a list of words or n-gram models for each reference language.
Related Work	The n-gram based approaches are based on the counts of character or byte n-grams, which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work	classification models that use the n-gram features have been proposed.

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

3. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	In this section, we consider BLEU in particular, for which the relevant features gb(e) are n-gram counts up to length n = 4.
Computing Feature Expectations	The nodes are states in the decoding process that include the span (2', j) of the sentence to be translated, the grammar symbol 3 over that span, and the left and right context words of the translation relevant for computing n-gram language model scores.3 Each hyper-edge h represents the application of a synchronous rule 7" that combines nodes corresponding to non-terminals in
Computing Feature Expectations	Each 71- gram that appears in a translation 6 is associated with some h in its derivation: the h corresponding to the rule that produces the n-gram .
Consensus Decoding Algorithms	In this expression, BLEU(e; 6’) references 6’ only via its n-gram count features C(e’ , t).2 2The length penalty (1 — is also a function of n-
Introduction	The contributions of this paper include a linear-time algorithm for MBR using linear similarities, a linear-time alternative to MBR using nonlinear similarity measures, and a forest-based extension to this procedure for similarities based on n-gram counts.

n-gram is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

4. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	One is n-gram model over different units, such as word-level bigram/trigram models (Bangalore and Rambow, 2000; Langkilde, 2000), or factored language models integrated with syntactic tags (White et al.
Introduction	(2008) develop a general-purpose realizer couched in the framework of Lexical Functional Grammar based on simple n-gram models.
Introduction	(2009) present a dependency-spanning tree algorithm for word ordering, which first builds dependency trees to decide linear precedence between heads and modifiers then uses an n-gram language model to order siblings.
Log-linear Models	We linearize the dependency relations by computing n-gram models, similar to traditional word-based language models, except using the names of dependency relations instead of words.
Log-linear Models	The dependency relation model calculates the probability of dependency relation n-gram P(DR) according to Eq.(3).
Log-linear Models	= HP(DRk I Dle—jn k=1 Word Model: We integrate an n-gram word model into the log-linear model for capturing the relation between adjacent words.

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

5. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Lattice MBR decoding uses a linear approximation to the BLEU score (Pap-ineni et al., 2001); the weights in this linear loss are set heuristically by assuming that n-gram pre-cisions decay exponentially with n. However, this may not be optimal in practice.
Minimum Bayes-Risk Decoding	They approximated log(BLEU) score by a linear function of n-gram matches and candidate length.
Minimum Bayes-Risk Decoding	where w is an n-gram present in either E or E’, and 60, 61, ..., 6 N are weights which are determined empirically, where N is the maximum n-gram order.
Minimum Bayes-Risk Decoding	Next, the posterior probability of each n-gram is computed.

n-gram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

6. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using an iterative decoding approach, n-gram agreement statistics between translations of multiple decoders are employed to re-rank both full and partial hypothesis explored in decoding.
Collaborative Decoding	To compute the consensus measures, we further decompose each 61 (e, 6') into n-gram matching statistics between 6 and 6'.
Collaborative Decoding	For each n-gram of order n, we introduce a pair of complementary consensus measure functions Gn+(e, e') and 671— (e, 6') described as follows:
Collaborative Decoding	Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Experiments	All baseline decoders are extended with n-gram consensus —based co-decoding features to construct member decoders.
Experiments	Table 4 shows the comparison results of a two-system co-decoding using different settings of n-gram agreement and disagreement features.
Experiments	It is clearly shown that both n-gram agreement and disagreement types of features are helpful, and using them together is the best choice.

n-gram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

7. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our particular variational distributions are parameterized as n-gram models.
Abstract	We also analytically show that interpolating these n-gram models for different n is similar to minimum-risk decoding for BLEU (Tromble et al., 2008).
Introduction	In practice, we approximate with several different variational families Q, corresponding to n-gram (Markov) models of different orders.
Variational Approximate Decoding	Since each q(y) is a distribution over output strings, a natural choice for Q is the family of n-gram models.
Variational Approximate Decoding	where W is a set of n-gram types.
Variational Approximate Decoding	Each 212 E W is an n-gram , which occurs cw(y) times in the string 3/, and 212 may be divided into an (n — l)-gram prefix (the history) and a l-gram suffix r(w) (the rightmost or current word).

n-gram is mentioned in 32 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

8. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any “wor ” indications.
Experiments	NPY(n) means n-gram NPYLM.
Inference	character n-gram model explicit as 9 we can set
Inference	where p(cl---ck,k\|@) is an n-gram probability given by (6), and p(l<:\|@) is a probability that a word of length k will be generated from 9.
Introduction	In this paper, we extend this work to propose a more efficient and accurate unsupervised word segmentation that will optimize the performance of the word n-gram Pitman-Yor (i.e.
Introduction	Furthermore, it can be viewed as a method for building a high-performance n-gram language model directly from character strings of arbitrary language.
Introduction	By embedding a character n-gram in word n-gram from a Bayesian perspective, Section 3 introduces a novel language model for word segmentation, which we call the Nested Pitman-Yor language model.
Nested Pitman-Yor Language Model	In contrast, in this paper we use a simple but more elaborate model, that is, a character n-gram language model that also employs HPYLM.
Nested Pitman-Yor Language Model	To avoid dependency on n-gram order n, we actually used the oo-gram language model (Mochihashi and Sumita, 2007), a variable order HPYLM, for characters.
Pitman-Yor process and n-gram models	In this representation, each n-gram context h (including the null context 6 for unigrams) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw.
Pitman-Yor process and n-gram models	As a result, the n-gram probability of this hierarchical Pitman—Yor language model (HPYLM) is recursively computed as

n-gram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

9. A Syntax-Free Approach to Japanese Sentence Compression

Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Syntax Free Sequence-oriented Sentence Compression Method	Many studies on sentence compression employ the n-gram language model to evaluate the linguistic likelihood of a compressed sentence.
A Syntax Free Sequence-oriented Sentence Compression Method	N-gram distribution of short sentences may different from that of long sentences.
A Syntax Free Sequence-oriented Sentence Compression Method	Therefore, the n-gram probability sometimes disagrees with our intuition in terms of sentence compression.
Experimental Evaluation	We developed the n-gram language model from a 9 year set of Mainichi Newspaper articles.
Results and Discussion	This result shows that the n-gram language model is improper for sentence compression because the n-gram probability is computed by using a corpus that includes both short and long sentences.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Future Work	This powerful method, which was proposed in (Kam and Kopec, 1996; Popat et al., 2001) in the context of a finite-state model (but not of TSP), can be easily extended to N-gram situations, and typically converges in a small number of iterations.
Introduction	Typical nonlocal features include one or more n-gram language models as well as a distortion feature, measuring by how much the order of biphrases in the candidate translation deviates from their order in the source sentence.
Phrase-based Decoding as TSP	4.1 From Bigram to N-gram LM
Phrase-based Decoding as TSP	If we want to extend the power of the model to general n-gram language models, and in particular to the 3-gram
Phrase-based Decoding as TSP	The problem becomes even worse if we extend the compiling-out method to n-gram language models with n > 3.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

11. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Smoothing in NLP usually refers to the problem of smoothing n-gram models.
Related Work	Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity.
Related Work	While n-gram models have traditionally dominated in language modeling, two recent efforts de-

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Robust Machine Translation Evaluation with Entailment Features

Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	BLEUR includes the following 18 sentence-level scores: BLEU-n and n-gram precision scores (1 g n g 4); BLEU brevity penalty (BP); BLEU score divided by BP.
Experimental Evaluation	To counteract BLEU’s brittleness at the sentence level, we also smooth BLEU-n and n-gram precision as in Lin and Och (2004).
Experimental Evaluation	NIST-n scores (1 g n g 10) and information-weighted n-gram precision scores (1 g n g 4); NIST brevity penalty (BP); and NIST score divided by BP.
Introduction	BLEU and NIST measure MT quality by using the strong correlation between human judgments and the degree of n-gram overlap between a system hypothesis translation and one or more reference translations.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition

Das, Dipanjan and Smith, Noah A.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Although paraphrase identification is defined in semantic terms, it is usually solved using statistical classifiers based on shallow lexical, n-gram , and syntactic “overlap” features.
Introduction	We use a product of experts (Hinton, 2002) to bring together a logistic regression classifier built from n-gram overlap features and our syntactic model.
Product of Experts	The features are of the form precisionn (number of n-gram matches divided by the number of n-grams in 51), recalln (number of n-gram matches divided by the number of n-grams in 52) and E, (harmonic mean of the previous two features), where l g n g 3.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

14. Hybrid Approach to User Intention Modeling for Dialog Simulation

Jung, Sangkeun and Lee, Cheongjae and Kim, Kyungduk and Lee, Gary Geunbae

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Overall architecture	The unseen rate of n-gram varies according to the simulated user.
Overall architecture	Notice that simulated user C, E and H generates higher unseen n-gram patterns over all word error settings.
Related work	N-gram based approaches (Eckert et al., 1997, Levin et al., 2000) and other approaches (Scheffler and Young, 2001, Pietquin and Dutoit, 2006, Schatzmann et al., 2007) are introduced.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

logistic regression (3)
n-gram (3)

15. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Selection	We use smoothed sentence-level BLEU score to replace the human assessments, where we use additive smoothing to avoid zero BLEU scores when we calculate the n-gram precisions.
Translation Selection	1-4 n-gram precisions against pseudo references (l g n g 4)
Translation Selection	15-19 n-gram precision against a target corpus (l g n g 5)

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: