Index of papers in Proc. ACL 2009 that mention
  • n-gram
Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa
Abstract
However, n-gram based metrics are still today the dominant approach.
Alternatives to Correlation-based Meta-evaluation
Therefore, we have focused on other aspects of metric reliability that have revealed differences between n-gram and linguistic based metrics:
Alternatives to Correlation-based Meta-evaluation
All n-gram based metrics achieve SIP and SIR values between 0.8 and 0.9.
Alternatives to Correlation-based Meta-evaluation
This result suggests that n-gram based metrics are reasonably reliable for this purpose.
Correlation with Human Judgements
Let us first analyze the correlation with human judgements for linguistic vs. n-gram based metrics.
Correlation with Human Judgements
Linguistic metrics are represented by grey plots, and black plots represent metrics based on n-gram overlap.
Correlation with Human Judgements
Therefore, we need additional meta-evaluation criteria in order to clarify the behavior of linguistic metrics as compared to n-gram based metrics.
Introduction
However, the most commonly used metrics are still based on n-gram matching.
Introduction
For that purpose, we compare — using four different test beds — the performance of 16 n-gram based metrics, 48 linguistic metrics and one combined metric from the state of the art.
Previous Work on Machine Translation Meta-Evaluation
All that metrics were based on n-gram overlap.
n-gram is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Ceylan, Hakan and Kim, Yookyung
Language Identification
We implement a statistical model using a character based n-gram feature.
Language Identification
For each language, we collect the n-gram counts (for n = l to n = 7 also using the word beginning and ending spaces) from the vocabulary of the training corpus, and then generate a probability distribution from these counts.
Language Identification
In other words, this time we used a word-based n-gram method, only with n = 1.
Related Work
Most of the work carried out to date on the written language identification problem consists of supervised approaches that are trained on a list of words or n-gram models for each reference language.
Related Work
The n-gram based approaches are based on the counts of character or byte n-grams, which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work
classification models that use the n-gram features have been proposed.
n-gram is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
In this section, we consider BLEU in particular, for which the relevant features gb(e) are n-gram counts up to length n = 4.
Computing Feature Expectations
The nodes are states in the decoding process that include the span (2', j) of the sentence to be translated, the grammar symbol 3 over that span, and the left and right context words of the translation relevant for computing n-gram language model scores.3 Each hyper-edge h represents the application of a synchronous rule 7" that combines nodes corresponding to non-terminals in
Computing Feature Expectations
Each 71- gram that appears in a translation 6 is associated with some h in its derivation: the h corresponding to the rule that produces the n-gram .
Consensus Decoding Algorithms
In this expression, BLEU(e; 6’) references 6’ only via its n-gram count features C(e’ , t).2 2The length penalty (1 — is also a function of n-
Introduction
The contributions of this paper include a linear-time algorithm for MBR using linear similarities, a linear-time alternative to MBR using nonlinear similarity measures, and a forest-based extension to this procedure for similarities based on n-gram counts.
n-gram is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting
Introduction
One is n-gram model over different units, such as word-level bigram/trigram models (Bangalore and Rambow, 2000; Langkilde, 2000), or factored language models integrated with syntactic tags (White et al.
Introduction
(2008) develop a general-purpose realizer couched in the framework of Lexical Functional Grammar based on simple n-gram models.
Introduction
(2009) present a dependency-spanning tree algorithm for word ordering, which first builds dependency trees to decide linear precedence between heads and modifiers then uses an n-gram language model to order siblings.
Log-linear Models
We linearize the dependency relations by computing n-gram models, similar to traditional word-based language models, except using the names of dependency relations instead of words.
Log-linear Models
The dependency relation model calculates the probability of dependency relation n-gram P(DR) according to Eq.(3).
Log-linear Models
= HP(DRk I Dle—jn k=1 Word Model: We integrate an n-gram word model into the log-linear model for capturing the relation between adjacent words.
n-gram is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
Introduction
Lattice MBR decoding uses a linear approximation to the BLEU score (Pap-ineni et al., 2001); the weights in this linear loss are set heuristically by assuming that n-gram pre-cisions decay exponentially with n. However, this may not be optimal in practice.
Minimum Bayes-Risk Decoding
They approximated log(BLEU) score by a linear function of n-gram matches and candidate length.
Minimum Bayes-Risk Decoding
where w is an n-gram present in either E or E’, and 60, 61, ..., 6 N are weights which are determined empirically, where N is the maximum n-gram order.
Minimum Bayes-Risk Decoding
Next, the posterior probability of each n-gram is computed.
n-gram is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Abstract
Using an iterative decoding approach, n-gram agreement statistics between translations of multiple decoders are employed to re-rank both full and partial hypothesis explored in decoding.
Collaborative Decoding
To compute the consensus measures, we further decompose each 61 (e, 6') into n-gram matching statistics between 6 and 6'.
Collaborative Decoding
For each n-gram of order n, we introduce a pair of complementary consensus measure functions Gn+(e, e') and 671— (e, 6') described as follows:
Collaborative Decoding
Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Experiments
All baseline decoders are extended with n-gram consensus —based co-decoding features to construct member decoders.
Experiments
Table 4 shows the comparison results of a two-system co-decoding using different settings of n-gram agreement and disagreement features.
Experiments
It is clearly shown that both n-gram agreement and disagreement types of features are helpful, and using them together is the best choice.
n-gram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Abstract
Our particular variational distributions are parameterized as n-gram models.
Abstract
We also analytically show that interpolating these n-gram models for different n is similar to minimum-risk decoding for BLEU (Tromble et al., 2008).
Introduction
In practice, we approximate with several different variational families Q, corresponding to n-gram (Markov) models of different orders.
Variational Approximate Decoding
Since each q(y) is a distribution over output strings, a natural choice for Q is the family of n-gram models.
Variational Approximate Decoding
where W is a set of n-gram types.
Variational Approximate Decoding
Each 212 E W is an n-gram , which occurs cw(y) times in the string 3/, and 212 may be divided into an (n — l)-gram prefix (the history) and a l-gram suffix r(w) (the rightmost or current word).
n-gram is mentioned in 32 sentences in this paper.
Topics mentioned in this paper:
Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori
Abstract
Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any “wor ” indications.
Experiments
NPY(n) means n-gram NPYLM.
Inference
character n-gram model explicit as 9 we can set
Inference
where p(cl---ck,k|@) is an n-gram probability given by (6), and p(l<:|@) is a probability that a word of length k will be generated from 9.
Introduction
In this paper, we extend this work to propose a more efficient and accurate unsupervised word segmentation that will optimize the performance of the word n-gram Pitman-Yor (i.e.
Introduction
Furthermore, it can be viewed as a method for building a high-performance n-gram language model directly from character strings of arbitrary language.
Introduction
By embedding a character n-gram in word n-gram from a Bayesian perspective, Section 3 introduces a novel language model for word segmentation, which we call the Nested Pitman-Yor language model.
Nested Pitman-Yor Language Model
In contrast, in this paper we use a simple but more elaborate model, that is, a character n-gram language model that also employs HPYLM.
Nested Pitman-Yor Language Model
To avoid dependency on n-gram order n, we actually used the oo-gram language model (Mochihashi and Sumita, 2007), a variable order HPYLM, for characters.
Pitman-Yor process and n-gram models
In this representation, each n-gram context h (including the null context 6 for unigrams) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw.
Pitman-Yor process and n-gram models
As a result, the n-gram probability of this hierarchical Pitman—Yor language model (HPYLM) is recursively computed as
n-gram is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki
A Syntax Free Sequence-oriented Sentence Compression Method
Many studies on sentence compression employ the n-gram language model to evaluate the linguistic likelihood of a compressed sentence.
A Syntax Free Sequence-oriented Sentence Compression Method
N-gram distribution of short sentences may different from that of long sentences.
A Syntax Free Sequence-oriented Sentence Compression Method
Therefore, the n-gram probability sometimes disagrees with our intuition in terms of sentence compression.
Experimental Evaluation
We developed the n-gram language model from a 9 year set of Mainichi Newspaper articles.
Results and Discussion
This result shows that the n-gram language model is improper for sentence compression because the n-gram probability is computed by using a corpus that includes both short and long sentences.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola
Future Work
This powerful method, which was proposed in (Kam and Kopec, 1996; Popat et al., 2001) in the context of a finite-state model (but not of TSP), can be easily extended to N-gram situations, and typically converges in a small number of iterations.
Introduction
Typical nonlocal features include one or more n-gram language models as well as a distortion feature, measuring by how much the order of biphrases in the candidate translation deviates from their order in the source sentence.
Phrase-based Decoding as TSP
4.1 From Bigram to N-gram LM
Phrase-based Decoding as TSP
If we want to extend the power of the model to general n-gram language models, and in particular to the 3-gram
Phrase-based Decoding as TSP
The problem becomes even worse if we extend the compiling-out method to n-gram language models with n > 3.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Related Work
Smoothing in NLP usually refers to the problem of smoothing n-gram models.
Related Work
Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity.
Related Work
While n-gram models have traditionally dominated in language modeling, two recent efforts de-
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.
Experimental Evaluation
BLEUR includes the following 18 sentence-level scores: BLEU-n and n-gram precision scores (1 g n g 4); BLEU brevity penalty (BP); BLEU score divided by BP.
Experimental Evaluation
To counteract BLEU’s brittleness at the sentence level, we also smooth BLEU-n and n-gram precision as in Lin and Och (2004).
Experimental Evaluation
NIST-n scores (1 g n g 10) and information-weighted n-gram precision scores (1 g n g 4); NIST brevity penalty (BP); and NIST score divided by BP.
Introduction
BLEU and NIST measure MT quality by using the strong correlation between human judgments and the degree of n-gram overlap between a system hypothesis translation and one or more reference translations.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Das, Dipanjan and Smith, Noah A.
Introduction
Although paraphrase identification is defined in semantic terms, it is usually solved using statistical classifiers based on shallow lexical, n-gram , and syntactic “overlap” features.
Introduction
We use a product of experts (Hinton, 2002) to bring together a logistic regression classifier built from n-gram overlap features and our syntactic model.
Product of Experts
The features are of the form precisionn (number of n-gram matches divided by the number of n-grams in 51), recalln (number of n-gram matches divided by the number of n-grams in 52) and E, (harmonic mean of the previous two features), where l g n g 3.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jung, Sangkeun and Lee, Cheongjae and Kim, Kyungduk and Lee, Gary Geunbae
Overall architecture
The unseen rate of n-gram varies according to the simulated user.
Overall architecture
Notice that simulated user C, E and H generates higher unseen n-gram patterns over all word error settings.
Related work
N-gram based approaches (Eckert et al., 1997, Levin et al., 2000) and other approaches (Scheffler and Young, 2001, Pietquin and Dutoit, 2006, Schatzmann et al., 2007) are introduced.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Hua and Wang, Haifeng
Translation Selection
We use smoothed sentence-level BLEU score to replace the human assessments, where we use additive smoothing to avoid zero BLEU scores when we calculate the n-gram precisions.
Translation Selection
1-4 n-gram precisions against pseudo references (l g n g 4)
Translation Selection
15-19 n-gram precision against a target corpus (l g n g 5)
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: