Index of papers in Proc. ACL 2011 that mention
  • n-grams
Bansal, Mohit and Klein, Dan
Introduction
Instead, we consider how to efficiently mine the Google n-grams corpus.
Introduction
This work uses a large web-scale corpus (Google n-grams ) to compute features for the full parsing task.
Web-count Features
The approach of Lauer (1995), for example, would be to take an ambiguous noun sequence like hydrogen ion exchange and compare the various counts (or associated conditional probabilities) of n-grams like hydrogen ion and hydrogen exchange.
Web-count Features
(2010) use web-scale n-grams to compute similar association statistics for longer sequences of nouns.
Web-count Features
These paraphrase features hint at the correct attachment decision by looking for web n-grams with special contexts that reveal syntax superficially.
Working with Web n-Grams
Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams
In particular, we only use counts of n-grams of the form :3 * y where the gap length is g 3.
Working with Web n-Grams
Next, we exploit a simple but efficient trie-based hashing algorithm to efficiently answer all of them in one pass over the n-grams corpus.
n-grams is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
Our approach first identifies statistically salient phrases of words and parts of speech — known as n-grams — in training texts generated in conditions where the social power
Abstract
Then, we apply machine learning to train classifiers with groups of these n-grams as features.
Abstract
Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.
n-grams is mentioned in 29 sentences in this paper.
Topics mentioned in this paper:
Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel
Abstract
This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA).
Abstract
In this work we explore the suitability of LHs over n-grams at the character-level for AA.
Abstract
We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches.
Introduction
In particular, we consider local histograms over n-grams at the character-level obtained via the locally-weighted bag of words (LOWBOW) framework (Lebanon et al., 2007).
Introduction
Results confirm that local histograms of character n-grams are more helpful for AA than the usual global histograms of words or character n-grams (Luyckx and Daelemans, 2010); our results are superior to those reported in related works.
Introduction
We also show that local histograms over character n-grams are more helpful than local histograms over words, as originally proposed by (Lebanon et al., 2007).
Related Work
Some researchers have gone a step further and have attempted to capture sequential information by using n-grams at the word-level (Peng et al., 2004) or by discovering maximal frequent word sequences (Coyotl-Morales et al., 2006).
Related Work
Unfortunately, because of computational limitations, the latter methods cannot discover enough sequential information from documents (e.g., word n-grams are often restricted to n E {1,2,3}, while full sequential information would be obtained with n E {l .
Related Work
Stamatatos and coworkers have studied the impact of feature selection, with character n-grams, in AA (Houvardas and Stamatatos, 2006; Stamatatos, 2006a), ensemble learning with character n-grams (Stamatatos, 2006b) and novel classification techniques based
n-grams is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Abstract
Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction
The largest language models (LMs) can contain as many as several hundred billion n-grams (Brants et al., 2007), so storage is a challenge.
Introduction
Overall, we are able to store the 4 billion n-grams of the Google WeblT (Brants and Franz, 2006) cor-
Language Model Implementations
Lookup is linear in the length of the key and logarithmic in the number of n-grams .
Preliminaries
This corpus, which is on the large end of corpora typically employed in language modeling, is a collection of nearly 4 billion n-grams extracted from over a trillion tokens of English text, and has a vocabulary of about 13.5 million words.
Preliminaries
Tries represent collections of n-grams using a tree.
Preliminaries
Each node in the tree encodes a word, and paths in the tree correspond to n-grams in the collection.
n-grams is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Habash, Nizar and Roth, Ryan
Experimental Settings
nw N—grams Normword 1/2/3-gram probabilities lem N—grams Lemma 1/2/3-gram probabilities pos N-grams POS 1/2/3-gram probabilities
Experimental Settings
served for word N-grams which did not appear in the models).
Experimental Settings
Like the N-grams , this number is binned; in this case there are 11 bins, with 10 spread evenly over the [0,1) range, and an extra bin for values of exactly 1 (i.e., when the word appears in every hypothesis in the set).
Results
First, Table 4 shows the effect of adding nw N-grams of successively higher orders to the word baseline.
Results
word+nw l—gram 49.51 12.9 word+nw l—gram+nw 2—gram 59.26 35.2 word+nw N—grams 59.33 35.3 +pos 58.50 33.4 +pos N-grams 57.35 30.8 +lem+lem N—grams 59.63 36.0 +lem+lem N—grams+na 59.93 36.7 +lem+lem N-grams+na+nw 59.77 36.3 +lem 60.92 38.9 +lem+na 60.47 37.9 +lem+lem N—grams 60.44 37.9
Results
Here, the best performer is the model which utilizes the word, nw N-grams,
n-grams is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Conclusion
ber of parameters (unique N-grams ).
Experiments
91'i’e4 1e5 1e6 1e7 1e8 1e9 Number of Unique N-grams
Introduction
Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Related Work
The former uses the web-scale data explicitly to create more data for training the model; while the latter explores the web-scale N-grams data (Lin et al., 2010) for compound bracketing disambiguation.
Related Work
However, we explore the web-scale data for dependency parsing, the performance improves log-linearly with the number of parameters (unique N-grams ).
Web-Derived Selectional Preference Features
N-grams appearing 40
Web-Derived Selectional Preference Features
In this paper, the selectional preferences have the same meaning with N-grams , which model the word-to-word relationships, rather than only considering the predicates and arguments relationships.
Web-Derived Selectional Preference Features
All n-grams with lower counts are discarded.
n-grams is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Abstract
The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Conclusion
As far as we know, this is the first work of building a complex large scale distributed language model with a principled approach that is more powerful than n-grams when both trained on a very large corpus with up to a billion tokens.
Experimental results
The composite n-gram/m-SLMfl’LSA model gives significant perpleXity reductions over baseline n-grams , n = 3,4,5 and m-SLMs, m = 2,3,4.
Experimental results
Also, in the same study in (Chamiak, 2003), they found that the outputs produced using the n-grams received higher scores from BLEU; ours did not.
Introduction
We conduct comprehensive experiments on corpora with 44 million tokens, 230 million tokens, and 1.3 billion tokens and compare perplexity results with n-grams (n=3,4,5 respectively) on these three corpora, we obtain drastic perplexity reductions.
Training algorithm
Where #(g,Wl,Gl,d) is the count of semantic content 9 in semantic annotation string Gl of the lth sentence Wl in document d, #(w:,11+1wh:,1ng,Wl,Tl,Gl,d) is the count of n-grams , its m most recent exposed headwords and semantic content 9 in parse Tl and semantic annotation string Gl of the lth sentence Wl in document d, #(tthhtag, Wl, Tl, d) is the count
Training algorithm
The topic of large scale distributed language models is relatively new, and existing works are restricted to n-grams only (Brants et al., 2007; Emami et al., 2007; Zhang et al., 2006).
n-grams is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Kiritchenko, Svetlana and Cherry, Colin
Experiments
The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
Method
The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
Method
We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
Method
Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
Related work
Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Schütze, Hinrich
Experimental Setup
An important parameter of the class-based model is size of the base set, i.e., the total number of n-grams (or rather i-grams) to be clustered.
Polynomial discounting
We could add a constant to d, but one of the basic premises of the KN model, derived from the assumption that n-gram marginals should be equal to relative frequencies, is that the discount is larger for more frequent n-grams although in many implementations of KN only the cases = l, = 2, and 2 3 are distinguished.
Related work
ability of a class on n-grams of lexical items (as opposed to classes) (Whittaker and Woodland, 2001; Emami and Jelinek, 2005; Uszkoreit and Brants, 2008).
Related work
Models that condition classes on lexical n-grams could be extended in a way similar to what we propose here.
Related work
Our use of classes of lexical n-grams for n > 1 has several precedents in the literature (Suhm and Waibel, 1994; Kuo and Reichl, 1999; Deligne and Sagisaka, 2000; Justo and Torres, 2009).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Paraphrase Evaluation Metrics
We introduce a new scoring metric PINC that measures how many n-grams differ between the two sentences.
Paraphrase Evaluation Metrics
grams and n-gramC are the lists of n-grams in the
Paraphrase Evaluation Metrics
The PINC score computes the percentage of n-grams that appear in the candidate sentence but not in the source sentence.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Jenny and Haghighi, Aria
Experiments
To collect the statistics, we take each NP in the training data and consider all possible 2-gms through 5-gms that are present in the NP’s modifier sequence, allowing for nonconsecutive n-grams .
Model
Previous counting approaches can be expressed as a real-valued feature that, given all n-grams generated by a permutation of modifiers, returns the count of all these n-grams in the original training data.
Model
We might also expect permutations that contain n-grams previously seen in the training data to be more natural sounding than other permutations that generate n-grams that have not been seen before.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: