Index of papers in Proc. ACL 2011 that mention

n-grams

Seen in text as:

n-grams (124)
N-grams (14)

Seen in 128 sentences in 11 papers.

1. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Instead, we consider how to efficiently mine the Google n-grams corpus.
Introduction	This work uses a large web-scale corpus (Google n-grams ) to compute features for the full parsing task.
Web-count Features	The approach of Lauer (1995), for example, would be to take an ambiguous noun sequence like hydrogen ion exchange and compare the various counts (or associated conditional probabilities) of n-grams like hydrogen ion and hydrogen exchange.
Web-count Features	(2010) use web-scale n-grams to compute similar association statistics for longer sequences of nouns.
Web-count Features	These paraphrase features hint at the correct attachment decision by looking for web n-grams with special contexts that reveal syntax superficially.
Working with Web n-Grams	Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams	In particular, we only use counts of n-grams of the form :3 * y where the gap length is g 3.
Working with Web n-Grams	Next, we exploit a simple but efficient trie-based hashing algorithm to efficiently answer all of them in one pass over the n-grams corpus.

n-grams is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

2. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our approach first identifies statistically salient phrases of words and parts of speech — known as n-grams — in training texts generated in conditions where the social power
Abstract	Then, we apply machine learning to train classifiers with groups of these n-grams as features.
Abstract	Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.

n-grams is mentioned in 29 sentences in this paper.

Topics mentioned in this paper:

3. Local Histograms of Character N-grams for Authorship Attribution

Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA).
Abstract	In this work we explore the suitability of LHs over n-grams at the character-level for AA.
Abstract	We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches.
Introduction	In particular, we consider local histograms over n-grams at the character-level obtained via the locally-weighted bag of words (LOWBOW) framework (Lebanon et al., 2007).
Introduction	Results confirm that local histograms of character n-grams are more helpful for AA than the usual global histograms of words or character n-grams (Luyckx and Daelemans, 2010); our results are superior to those reported in related works.
Introduction	We also show that local histograms over character n-grams are more helpful than local histograms over words, as originally proposed by (Lebanon et al., 2007).
Related Work	Some researchers have gone a step further and have attempted to capture sequential information by using n-grams at the word-level (Peng et al., 2004) or by discovering maximal frequent word sequences (Coyotl-Morales et al., 2006).
Related Work	Unfortunately, because of computational limitations, the latter methods cannot discover enough sequential information from documents (e.g., word n-grams are often restricted to n E {1,2,3}, while full sequential information would be obtained with n E {l .
Related Work	Stamatatos and coworkers have studied the impact of feature selection, with character n-grams, in AA (Houvardas and Stamatatos, 2006; Stamatatos, 2006a), ensemble learning with character n-grams (Stamatatos, 2006b) and novel classification techniques based

n-grams is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

n-grams (26)
SVM (10)
word-level (7)

4. Faster and Smaller N-Gram Language Models

Pauls, Adam and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction	The largest language models (LMs) can contain as many as several hundred billion n-grams (Brants et al., 2007), so storage is a challenge.
Introduction	Overall, we are able to store the 4 billion n-grams of the Google WeblT (Brants and Franz, 2006) cor-
Language Model Implementations	Lookup is linear in the length of the key and logarithmic in the number of n-grams .
Preliminaries	This corpus, which is on the large end of corpora typically employed in language modeling, is a collection of nearly 4 billion n-grams extracted from over a trillion tokens of English text, and has a vocabulary of about 13.5 million words.
Preliminaries	Tries represent collections of n-grams using a tree.
Preliminaries	Each node in the tree encodes a word, and paths in the tree correspond to n-grams in the collection.

n-grams is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

5. Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Habash, Nizar and Roth, Ryan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Settings	nw N—grams Normword 1/2/3-gram probabilities lem N—grams Lemma 1/2/3-gram probabilities pos N-grams POS 1/2/3-gram probabilities
Experimental Settings	served for word N-grams which did not appear in the models).
Experimental Settings	Like the N-grams , this number is binned; in this case there are 11 bins, with 10 spread evenly over the [0,1) range, and an extra bin for values of exactly 1 (i.e., when the word appears in every hypothesis in the set).
Results	First, Table 4 shows the effect of adding nw N-grams of successively higher orders to the word baseline.
Results	word+nw l—gram 49.51 12.9 word+nw l—gram+nw 2—gram 59.26 35.2 word+nw N—grams 59.33 35.3 +pos 58.50 33.4 +pos N-grams 57.35 30.8 +lem+lem N—grams 59.63 36.0 +lem+lem N—grams+na 59.93 36.7 +lem+lem N-grams+na+nw 59.77 36.3 +lem 60.92 38.9 +lem+na 60.47 37.9 +lem+lem N—grams 60.44 37.9
Results	Here, the best performer is the model which utilizes the word, nw N-grams,

n-grams is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

6. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	ber of parameters (unique N-grams ).
Experiments	91'i’e4 1e5 1e6 1e7 1e8 1e9 Number of Unique N-grams
Introduction	Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Related Work	The former uses the web-scale data explicitly to create more data for training the model; while the latter explores the web-scale N-grams data (Lin et al., 2010) for compound bracketing disambiguation.
Related Work	However, we explore the web-scale data for dependency parsing, the performance improves log-linearly with the number of parameters (unique N-grams ).
Web-Derived Selectional Preference Features	N-grams appearing 40
Web-Derived Selectional Preference Features	In this paper, the selectional preferences have the same meaning with N-grams , which model the word-to-word relationships, rather than only considering the predicates and arguments relationships.
Web-Derived Selectional Preference Features	All n-grams with lower counts are discarded.

n-grams is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

7. A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Conclusion	As far as we know, this is the first work of building a complex large scale distributed language model with a principled approach that is more powerful than n-grams when both trained on a very large corpus with up to a billion tokens.
Experimental results	The composite n-gram/m-SLMfl’LSA model gives significant perpleXity reductions over baseline n-grams , n = 3,4,5 and m-SLMs, m = 2,3,4.
Experimental results	Also, in the same study in (Chamiak, 2003), they found that the outputs produced using the n-grams received higher scores from BLEU; ours did not.
Introduction	We conduct comprehensive experiments on corpora with 44 million tokens, 230 million tokens, and 1.3 billion tokens and compare perplexity results with n-grams (n=3,4,5 respectively) on these three corpora, we obtain drastic perplexity reductions.
Training algorithm	Where #(g,Wl,Gl,d) is the count of semantic content 9 in semantic annotation string Gl of the lth sentence Wl in document d, #(w:,11+1wh:,1ng,Wl,Tl,Gl,d) is the count of n-grams , its m most recent exposed headwords and semantic content 9 in parse Tl and semantic annotation string Gl of the lth sentence Wl in document d, #(tthhtag, Wl, Tl, d) is the count
Training algorithm	The topic of large scale distributed language models is relatively new, and existing works are restricted to n-grams only (Brants et al., 2007; Emami et al., 2007; Zhang et al., 2006).

n-grams is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

language model (36)
BLEU (15)
n-gram (14)

8. Lexically-Triggered Hidden Markov Models for Clinical Document Coding

Kiritchenko, Svetlana and Cherry, Colin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
Method	The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
Method	We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
Method	Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
Related work	Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. Integrating history-length interpolation and classes in language modeling

Schütze, Hinrich

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	An important parameter of the class-based model is size of the base set, i.e., the total number of n-grams (or rather i-grams) to be clustered.
Polynomial discounting	We could add a constant to d, but one of the basic premises of the KN model, derived from the assumption that n-gram marginals should be equal to relative frequencies, is that the discount is larger for more frequent n-grams although in many implementations of KN only the cases = l, = 2, and 2 3 are distinguished.
Related work	ability of a class on n-grams of lexical items (as opposed to classes) (Whittaker and Woodland, 2001; Emami and Jelinek, 2005; Uszkoreit and Brants, 2008).
Related work	Models that condition classes on lexical n-grams could be extended in a way similar to what we propose here.
Related work	Our use of classes of lexical n-grams for n > 1 has several precedents in the literature (Suhm and Waibel, 1994; Kuo and Reichl, 1999; Deligne and Sagisaka, 2000; Justo and Torres, 2009).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Paraphrase Evaluation Metrics	We introduce a new scoring metric PINC that measures how many n-grams differ between the two sentences.
Paraphrase Evaluation Metrics	grams and n-gramC are the lists of n-grams in the
Paraphrase Evaluation Metrics	The PINC score computes the percentage of n-grams that appear in the candidate sentence but not in the source sentence.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

11. Ordering Prenominal Modifiers with a Reranking Approach

Liu, Jenny and Haghighi, Aria

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To collect the statistics, we take each NP in the training data and consider all possible 2-gms through 5-gms that are present in the NP’s modifier sequence, allowing for nonconsecutive n-grams .
Model	Previous counting approaches can be expressed as a real-valued feature that, given all n-grams generated by a permutation of modifiers, returns the count of all these n-grams in the original training data.
Model	We might also expect permutations that contain n-grams previously seen in the training data to be more natural sounding than other permutations that generate n-grams that have not been seen before.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

MAXENT (16)
n-gram (15)
reranking (7)