Extensions of SemPOS | Surprisingly BLEU-2 performed better than any other n-grams for reasons that have yet to be examined. |
Problems of BLEU | Total n-grams 35,531 33,891 32,251 30,611 |
Problems of BLEU | Table 1: n-grams confirmed by the reference and containing error flags. |
Problems of BLEU | The suspicious cases are n-grams confirmed by the reference but still containing a flag (false positives) and n-grams not confirmed despite containing no error flag (false negatives). |
Experiments | For the web baseline (reported as Google), we stemmed all words in the Google n-grams and counted every verb 2) and noun n that appear in Gigaword. |
How Frequent is Unseen Data? | The dotted line uses Google n-grams as training. |
Models | We also avoided over-counting co-occurrences in lower order n-grams that appear again in 4 or 5 - grams. |
Abstract | We adopt a representation of concepts alternative to n-grams and propose two concept—scoring functions based on semantic overlap. |
The summarization framework | To represent sentences and answers we adopted an alternative approach to classical n-grams that could be defined bag-of-BEs. |
The summarization framework | Different from n-grams , they are variant in length and depend on parsing techniques, named entity detection, part-of-speech tagging and resolution of syntactic forms such as hyponyms, pronouns, per-tainyms, abbreviation and synonyms. |
Automated Classification | To provide information related to term usage to the classifier, we extracted trigram and 4-gram features from the Web lT Corpus (Brants and Franz, 2006), a large collection of n-grams and their counts created from approximately one trillion words of Web text. |
Automated Classification | Only n-grams containing lowercase words were used. |
Automated Classification | Only n-grams containing both terms (including plural forms) were extracted. |
Background | where gn(s) is the multi-set of all n-grams in a string s. In this definition, n-grams in e,~ and {rij} are weighted by Dt(i). |
Background | If the i-th training sample has a larger weight, the corresponding n-grams will have more contributions to the overall score WBLEU(E,R) . |
Background | In this method, a n-gram cache is used to store the most frequently and recently accessed n-grams . |