Abstract | We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. |
Introduction | Parameters that are stored in the model are retrieved without error; however, false positives may occur whereby n-grams not in the model are incorrectly ‘found’ when requested. |
Introduction | We encode fingerprints (random hashes) of n-grams together with their associated probabilities using a perfect hash function generated at random (Majewski et al., 1996). |
Perfect Hash-based Language Models | We assume the n-grams and their associated parameter values have been precomputed and stored on disk. |
Perfect Hash-based Language Models | We then encode the model in an array such that each n-gram’s value can be retrieved. |
Perfect Hash-based Language Models | The model uses randomization to map n-grams to fingerprints and to generate a perfect hash function that associates n-grams with their values. |
Scaling Language Models | In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary. |
Scaling Language Models | Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size. |
Scaling Language Models | However, if we are willing to accept that occasionally our model will be unable to distinguish between distinct n-grams , then it is possible to store |
Abstract | This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc. |
Introduction | In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations. |
Introduction | Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other. |
Metric Design Considerations | We note, however, that matches between items (such as words, n-grams , etc.) |
Metric Design Considerations | In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair. |
Metric Design Considerations | Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match. |
Methodology | The maximum size of a context pattern depends on the size of n-grams available in the data. |
Methodology | In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five. |
Methodology | Shorter n-grams were not found to improve performance on development data and hence are not extracted. |
Data 5.1 Development Data | For disambiguation with n-grams (see §3.3), we made use of the WEB lT 5-GRAM corpus. |
Data 5.1 Development Data | Prepared by Google Inc., it contains English n-grams , up to 5-grams, with their observed frequency counts from a large number of web pages. |
Experiments | Table 6: The n-grams used for filtering, with examples of sentences which they are intended to differentiate. |
Experiments | 6.2 Disambiguation with N-grams |
Experiments | For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM |
A Generic Phrase Training Procedure | Usually all n-grams up to a predefined length limit are considered as candidate phrases. |
Features | First, due to data sparsity and/or alignment model’s capability, there would exist n-grams that cannot be aligned |
Features | well, for instance, n-grams that are part of a paraphrase translation or metaphorical expression. |
Features | Extracting candidate translations for such kind of n-grams for the sake of improving coverage (recall) might hurt translation quality (precision). |
The Model | The first score is computed on the basis of all the n-grams , but using a common set of weights independent of the aspect a. |
The Model | Another score is computed only using n-grams associated with the related topic, but an aspect-specific set of weights is used in this computation. |
The Model | b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams , J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a). |
Introduction | Use of longer n-grams improves translation results, but exacerbates this interaction. |
Introduction | We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition. |
Multi-pass LM-Integrated Decoding | where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams . |