Abstract | However, n-gram based metrics are still today the dominant approach. |
Alternatives to Correlation-based Meta-evaluation | Therefore, we have focused on other aspects of metric reliability that have revealed differences between n-gram and linguistic based metrics: |
Alternatives to Correlation-based Meta-evaluation | All n-gram based metrics achieve SIP and SIR values between 0.8 and 0.9. |
Alternatives to Correlation-based Meta-evaluation | This result suggests that n-gram based metrics are reasonably reliable for this purpose. |
Correlation with Human Judgements | Let us first analyze the correlation with human judgements for linguistic vs. n-gram based metrics. |
Correlation with Human Judgements | Linguistic metrics are represented by grey plots, and black plots represent metrics based on n-gram overlap. |
Correlation with Human Judgements | Therefore, we need additional meta-evaluation criteria in order to clarify the behavior of linguistic metrics as compared to n-gram based metrics. |
Introduction | However, the most commonly used metrics are still based on n-gram matching. |
Introduction | For that purpose, we compare — using four different test beds — the performance of 16 n-gram based metrics, 48 linguistic metrics and one combined metric from the state of the art. |
Previous Work on Machine Translation Meta-Evaluation | All that metrics were based on n-gram overlap. |
Language Identification | We implement a statistical model using a character based n-gram feature. |
Language Identification | For each language, we collect the n-gram counts (for n = l to n = 7 also using the word beginning and ending spaces) from the vocabulary of the training corpus, and then generate a probability distribution from these counts. |
Language Identification | In other words, this time we used a word-based n-gram method, only with n = 1. |
Related Work | Most of the work carried out to date on the written language identification problem consists of supervised approaches that are trained on a list of words or n-gram models for each reference language. |
Related Work | The n-gram based approaches are based on the counts of character or byte n-grams, which are sequences of n characters or bytes, extracted from a corpus for each reference language. |
Related Work | classification models that use the n-gram features have been proposed. |
Computing Feature Expectations | In this section, we consider BLEU in particular, for which the relevant features gb(e) are n-gram counts up to length n = 4. |
Computing Feature Expectations | The nodes are states in the decoding process that include the span (2', j) of the sentence to be translated, the grammar symbol 3 over that span, and the left and right context words of the translation relevant for computing n-gram language model scores.3 Each hyper-edge h represents the application of a synchronous rule 7" that combines nodes corresponding to non-terminals in |
Computing Feature Expectations | Each 71- gram that appears in a translation 6 is associated with some h in its derivation: the h corresponding to the rule that produces the n-gram . |
Consensus Decoding Algorithms | In this expression, BLEU(e; 6’) references 6’ only via its n-gram count features C(e’ , t).2 2The length penalty (1 — is also a function of n- |
Introduction | The contributions of this paper include a linear-time algorithm for MBR using linear similarities, a linear-time alternative to MBR using nonlinear similarity measures, and a forest-based extension to this procedure for similarities based on n-gram counts. |
Introduction | One is n-gram model over different units, such as word-level bigram/trigram models (Bangalore and Rambow, 2000; Langkilde, 2000), or factored language models integrated with syntactic tags (White et al. |
Introduction | (2008) develop a general-purpose realizer couched in the framework of Lexical Functional Grammar based on simple n-gram models. |
Introduction | (2009) present a dependency-spanning tree algorithm for word ordering, which first builds dependency trees to decide linear precedence between heads and modifiers then uses an n-gram language model to order siblings. |
Log-linear Models | We linearize the dependency relations by computing n-gram models, similar to traditional word-based language models, except using the names of dependency relations instead of words. |
Log-linear Models | The dependency relation model calculates the probability of dependency relation n-gram P(DR) according to Eq.(3). |
Log-linear Models | = HP(DRk I Dle—jn k=1 Word Model: We integrate an n-gram word model into the log-linear model for capturing the relation between adjacent words. |
Introduction | Lattice MBR decoding uses a linear approximation to the BLEU score (Pap-ineni et al., 2001); the weights in this linear loss are set heuristically by assuming that n-gram pre-cisions decay exponentially with n. However, this may not be optimal in practice. |
Minimum Bayes-Risk Decoding | They approximated log(BLEU) score by a linear function of n-gram matches and candidate length. |
Minimum Bayes-Risk Decoding | where w is an n-gram present in either E or E’, and 60, 61, ..., 6 N are weights which are determined empirically, where N is the maximum n-gram order. |
Minimum Bayes-Risk Decoding | Next, the posterior probability of each n-gram is computed. |
Abstract | Using an iterative decoding approach, n-gram agreement statistics between translations of multiple decoders are employed to re-rank both full and partial hypothesis explored in decoding. |
Collaborative Decoding | To compute the consensus measures, we further decompose each 61 (e, 6') into n-gram matching statistics between 6 and 6'. |
Collaborative Decoding | For each n-gram of order n, we introduce a pair of complementary consensus measure functions Gn+(e, e') and 671— (e, 6') described as follows: |
Collaborative Decoding | Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6. |
Experiments | All baseline decoders are extended with n-gram consensus —based co-decoding features to construct member decoders. |
Experiments | Table 4 shows the comparison results of a two-system co-decoding using different settings of n-gram agreement and disagreement features. |
Experiments | It is clearly shown that both n-gram agreement and disagreement types of features are helpful, and using them together is the best choice. |
Abstract | Our particular variational distributions are parameterized as n-gram models. |
Abstract | We also analytically show that interpolating these n-gram models for different n is similar to minimum-risk decoding for BLEU (Tromble et al., 2008). |
Introduction | In practice, we approximate with several different variational families Q, corresponding to n-gram (Markov) models of different orders. |
Variational Approximate Decoding | Since each q(y) is a distribution over output strings, a natural choice for Q is the family of n-gram models. |
Variational Approximate Decoding | where W is a set of n-gram types. |
Variational Approximate Decoding | Each 212 E W is an n-gram , which occurs cw(y) times in the string 3/, and 212 may be divided into an (n — l)-gram prefix (the history) and a l-gram suffix r(w) (the rightmost or current word). |
Abstract | Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any “wor ” indications. |
Experiments | NPY(n) means n-gram NPYLM. |
Inference | character n-gram model explicit as 9 we can set |
Inference | where p(cl---ck,k|@) is an n-gram probability given by (6), and p(l<:|@) is a probability that a word of length k will be generated from 9. |
Introduction | In this paper, we extend this work to propose a more efficient and accurate unsupervised word segmentation that will optimize the performance of the word n-gram Pitman-Yor (i.e. |
Introduction | Furthermore, it can be viewed as a method for building a high-performance n-gram language model directly from character strings of arbitrary language. |
Introduction | By embedding a character n-gram in word n-gram from a Bayesian perspective, Section 3 introduces a novel language model for word segmentation, which we call the Nested Pitman-Yor language model. |
Nested Pitman-Yor Language Model | In contrast, in this paper we use a simple but more elaborate model, that is, a character n-gram language model that also employs HPYLM. |
Nested Pitman-Yor Language Model | To avoid dependency on n-gram order n, we actually used the oo-gram language model (Mochihashi and Sumita, 2007), a variable order HPYLM, for characters. |
Pitman-Yor process and n-gram models | In this representation, each n-gram context h (including the null context 6 for unigrams) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw. |
Pitman-Yor process and n-gram models | As a result, the n-gram probability of this hierarchical Pitman—Yor language model (HPYLM) is recursively computed as |
A Syntax Free Sequence-oriented Sentence Compression Method | Many studies on sentence compression employ the n-gram language model to evaluate the linguistic likelihood of a compressed sentence. |
A Syntax Free Sequence-oriented Sentence Compression Method | N-gram distribution of short sentences may different from that of long sentences. |
A Syntax Free Sequence-oriented Sentence Compression Method | Therefore, the n-gram probability sometimes disagrees with our intuition in terms of sentence compression. |
Experimental Evaluation | We developed the n-gram language model from a 9 year set of Mainichi Newspaper articles. |
Results and Discussion | This result shows that the n-gram language model is improper for sentence compression because the n-gram probability is computed by using a corpus that includes both short and long sentences. |
Future Work | This powerful method, which was proposed in (Kam and Kopec, 1996; Popat et al., 2001) in the context of a finite-state model (but not of TSP), can be easily extended to N-gram situations, and typically converges in a small number of iterations. |
Introduction | Typical nonlocal features include one or more n-gram language models as well as a distortion feature, measuring by how much the order of biphrases in the candidate translation deviates from their order in the source sentence. |
Phrase-based Decoding as TSP | 4.1 From Bigram to N-gram LM |
Phrase-based Decoding as TSP | If we want to extend the power of the model to general n-gram language models, and in particular to the 3-gram |
Phrase-based Decoding as TSP | The problem becomes even worse if we extend the compiling-out method to n-gram language models with n > 3. |
Related Work | Smoothing in NLP usually refers to the problem of smoothing n-gram models. |
Related Work | Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity. |
Related Work | While n-gram models have traditionally dominated in language modeling, two recent efforts de- |
Experimental Evaluation | BLEUR includes the following 18 sentence-level scores: BLEU-n and n-gram precision scores (1 g n g 4); BLEU brevity penalty (BP); BLEU score divided by BP. |
Experimental Evaluation | To counteract BLEU’s brittleness at the sentence level, we also smooth BLEU-n and n-gram precision as in Lin and Och (2004). |
Experimental Evaluation | NIST-n scores (1 g n g 10) and information-weighted n-gram precision scores (1 g n g 4); NIST brevity penalty (BP); and NIST score divided by BP. |
Introduction | BLEU and NIST measure MT quality by using the strong correlation between human judgments and the degree of n-gram overlap between a system hypothesis translation and one or more reference translations. |
Introduction | Although paraphrase identification is defined in semantic terms, it is usually solved using statistical classifiers based on shallow lexical, n-gram , and syntactic “overlap” features. |
Introduction | We use a product of experts (Hinton, 2002) to bring together a logistic regression classifier built from n-gram overlap features and our syntactic model. |
Product of Experts | The features are of the form precisionn (number of n-gram matches divided by the number of n-grams in 51), recalln (number of n-gram matches divided by the number of n-grams in 52) and E, (harmonic mean of the previous two features), where l g n g 3. |
Overall architecture | The unseen rate of n-gram varies according to the simulated user. |
Overall architecture | Notice that simulated user C, E and H generates higher unseen n-gram patterns over all word error settings. |
Related work | N-gram based approaches (Eckert et al., 1997, Levin et al., 2000) and other approaches (Scheffler and Young, 2001, Pietquin and Dutoit, 2006, Schatzmann et al., 2007) are introduced. |
Translation Selection | We use smoothed sentence-level BLEU score to replace the human assessments, where we use additive smoothing to avoid zero BLEU scores when we calculate the n-gram precisions. |
Translation Selection | 1-4 n-gram precisions against pseudo references (l g n g 4) |
Translation Selection | 15-19 n-gram precision against a target corpus (l g n g 5) |