Automatic Evaluation Metrics | number of n-gram matches of the system translation against one or more reference translations. |
Automatic Evaluation Metrics | Generally, more n-gram matches result in a higher BLEU score. |
Automatic Evaluation Metrics | When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision. |
Metric Design Considerations | 4.1 Using N-gram Information |
Metric Design Considerations | Lemma and POS match Representing each n-gram by its sequence of lemma and POS-tag pairs, we first try to perform an exact match in both lemma and POS-tag. |
Metric Design Considerations | In all our n-gram matching, each n-gram in the system translation can only match at most one n-gram in the reference translation. |
Abstract | The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning. |
Introduction | LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus. |
Introduction | Recent work (Talbot and Osborne, 2007b) has demonstrated that randomized encodings can be used to represent n-gram counts for LMs with signficant space-savings, circumventing information-theoretic constraints on lossless data structures by allowing errors with some small probability. |
Introduction | Lookup is very efficient: the values of 3 cells in a large array are combined with the fingerprint of an n-gram . |
Perfect Hash-based Language Models | The model can erroneously return a value for an n-gram that was never actually stored, but will always return the correct value for an n-gram that is in the model. |
Perfect Hash-based Language Models | Each n-gram sci is drawn from some set of possible n-grams LI and its associated value from a corresponding set of possible values V. |
Perfect Hash-based Language Models | We do not store the n-grams and their probabilities directly but rather encode a fingerprint of each n-gram f together with its associated value in such a way that the value can be retrieved when the model is queried with the n-gram sci. |
Scaling Language Models | These are typically n-gram models that approximate the probability of a word sequence by assuming each token to be independent of all but n — l preceding tokens. |
Scaling Language Models | Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size. |
Scaling Language Models | While the approach results in significant space savings, working with corpus statistics, rather than n-gram probabilities directly, is computationally less efficient (particularly in a distributed setting) and introduces a dependency on the smoothing scheme used. |
Abstract | We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model. |
Decoding to Maximize BLEU | BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU. |
Introduction | This complexity arises from the interaction of the tree-based translation model with an n-gram language model. |
Language Model Integrated Decoding for SCFG | We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space. |
Language Model Integrated Decoding for SCFG | Without an n-gram language model, decoding using SCFG is not much different from CFG parsing. |
Language Model Integrated Decoding for SCFG | However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized. |
Multi-pass LM-Integrated Decoding | We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass. |
Multi-pass LM-Integrated Decoding | We can make the approximation even closer by incorporating local higher-order outside n-gram information for a state of X[z',j, u1,,,,n_1, v1,,,,n_1] into account. |
Multi-pass LM-Integrated Decoding | where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration. |
Evaluation | We smooth by adding 40 to all counts, equal to the minimum count in the n-gram data. |
Introduction | For example, in our n-gram collection (Section 3.4), “make it in advance” and “make them in advance” occur roughly the same number of times (442 vs. 449), indicating a referential pattern. |
Methodology | We gather pattern fillers from a large collection of n-gram frequencies. |
Methodology | We do the same processing to our n-gram corpus. |
Methodology | Also, other pronouns in the pattern are allowed to match a corresponding pronoun in an n-gram , regardless of differences in inflection and class. |
Background | makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class. |
Background | In the anytime mode, a best-first search is performed with a con-figurable time limit: the scores assigned by the n-gram model determine the order of the edges on the agenda, and thus have an impact on realization speed. |
Background | the one that covers the most elementary predications in the input logical form, with ties broken according to the n-gram score. |
The Approach | Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model. |
Abstract | To improve recall, irregularities in parse trees caused by verb form errors are taken into account; to improve precision, n-gram counts are utilized to filter proposed corrections. |
Experiments | For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM |
Experiments | Some kind of confidence measure on the n-gram counts might be appropriate for reducing such false alarms. |
Introduction | To improve recall, irregularities in parse trees caused by verb form errors are considered; to improve precision, n-gram counts are utilized to filter proposed corrections. |
Research Issues | We propose using n-gram counts as a filter to counter this kind of overgeneralization. |
Research Issues | A second goal is to show that n-gram counts can eflectively serve as a filter; in order to increase precision. |
Class-Based Language Modeling | Generalizing this leads to arbitrary order class-based n-gram models of the form: |
Introduction | In the case of n-gram language models this is done by factoring the probability: |
Introduction | do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1). |
Introduction | However, in the area of statistical machine translation, especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006). |
Experiments | In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature. |
Introduction | In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation. |
Our Method | For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure |
Our Method | Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags |
Experiments | Due to lower frequency of higher-order n-grams (as opposed to unigrams), higher-order n-gram language models are more sparse, which increases the probability of missing a particular sentiment marker in a sentence (Table 33). |
Experiments | training sets are required to overcome this higher n-gram sparseness in sentence-level annotation. |
Experiments | results depends on the genre and size of the n-gram : on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025). |
Experiments | The NIST metric clearly shows a significant improvement, because it mostly measures difficult n-gram matches (e. g. due to the long-distance rules we have been dealing with). |
Experiments | In n-gram based metrics, the scores for all words are equally weighted, so mistakes on crucial sentence constituents may be penalized the same as errors on redundant or meaningless words (Callison-Burch et al., 2006). |
Introduction | Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam- |
Related Work | Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system. |
Syllabification with Structured SVMs | In addition to these primary n-gram features, we experimented with linguistically-derived features. |
Syllabification with Structured SVMs | We believe that this is caused by the ability of the SVM to learn such generalizations from the n-gram features alone. |
Features | Trying to find phrase translations for any possible n-gram is not a good idea for two reasons. |
Features | We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence. |
Features | Now we turn to monolingual resources to evaluate the quality of an n-gram being a good phrase. |
The Model | In this model the distribution of the overall sentiment rating you is based on all the n-gram features of a review text. |
The Model | Then the distribution of ya, for every rated aspect a, can be computed from the distribution of you and from any n-gram feature where at least one word in the n-gram is assigned to the associated aspect topic (7“ 2 Zoe, 2 = a). |
The Model | b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams, J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a). |