Index of papers in Proc. ACL that mention

n-grams

Seen in text as:

n-grams (497)
N-grams (23)
n-gram’s (6)

Seen in 478 sentences in 66 papers.

1. Modeling Thesis Clarity in Student Essays

Persing, Isaac and Ng, Vincent

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Classification	Additionally, since there are numerous word n-grams , some infrequent ones may just by chance only occur in positive training set instances, causing the learner to think they indicate the positive class when they do not.
Error Classification	For each essay, Aw+i counts the number of word n-grams we believe indicate that an essay is a positive example of 61-, and Aw—i counts the number of word n-grams we believe indicate an essay is not an example of 6,.
Error Classification	Aw+ n-grams for the Missing Details error tend to include phrases like “there is something” or “this statement”, while Aw— ngrams are often words taken directly from an essay’s prompt.
Evaluation	We see that the thesis clarity score predicting variation of the Baseline system, which employs as features only word n-grams and random indexing features, predicts the wrong score 65.8% of the time.

n-grams is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

F-score (27)
n-gram (14)
n-grams (11)

2. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Model	Instead of using all n-grams, a relevance based ranking method is proposed to select a subset of highly relevant n-grams for model building (details in §4).
Model	For notational convenience, we use terms to denote both words (unigrams) and phrases ( n-grams ).
Phrase Ranking based on Relevance	We now detail our method of preprocessing n-grams (phrases) based on relevance to select a subset of highly relevant n-grams for model building.
Phrase Ranking based on Relevance	A large number of irrelevant n-grams slow inference.
Phrase Ranking based on Relevance	This method, however, is expensive computationally and has a limitation for arbitrary length n-grams .

n-grams is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

3. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams .
Introduction	At the word level 96% of the simple words are found in the normal corpus and even for n-grams as large as 5, more than half of the n-grams can be found in the normal text.
Introduction	This extra information may help with data sparsity, providing better estimates for rare and unseen n-grams .
Introduction	On the other hand, there is still only modest overlap between the sentences for longer n-grams , particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical.
Why Does Unsimplified Data Help?	6.1 More n-grams
Why Does Unsimplified Data Help?	Table 3: Proportion of n-grams in the test sets that occur in the simple and normal training data sets.
Why Does Unsimplified Data Help?	We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency.

n-grams is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

4. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters.
Introduction	Parameters that are stored in the model are retrieved without error; however, false positives may occur whereby n-grams not in the model are incorrectly ‘found’ when requested.
Introduction	We encode fingerprints (random hashes) of n-grams together with their associated probabilities using a perfect hash function generated at random (Majewski et al., 1996).
Perfect Hash-based Language Models	We assume the n-grams and their associated parameter values have been precomputed and stored on disk.
Perfect Hash-based Language Models	We then encode the model in an array such that each n-gram’s value can be retrieved.
Perfect Hash-based Language Models	The model uses randomization to map n-grams to fingerprints and to generate a perfect hash function that associates n-grams with their values.
Scaling Language Models	In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary.
Scaling Language Models	Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models	However, if we are willing to accept that occasionally our model will be unable to distinguish between distinct n-grams , then it is possible to store

n-grams is mentioned in 46 sentences in this paper.

Topics mentioned in this paper:

5. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Standard techniques store the observed n-grams and derive probabilities of unobserved n-grams via their longest observed suffix and “backoff” costs associated with the prefix histories of the unobserved suffixes.
Introduction	Hence the size of the model grows with the number of observed n-grams , which is very large for typical training corpora.
Introduction	These data structures permit efficient querying for specific n-grams in a model that has been stored in a fraction of the space required to store the full, exact model, though with some probability of false positives.
Preliminaries	w,- in the training corpus; F is a regularized probability estimate that provides some probability mass for unobserved n-grams ; and 04h?)
Preliminaries	N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.
Preliminaries	Probabilities for the rest of the n-grams are calculated through the “otherwise” semantics in the equation above.

n-grams is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

6. Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

Liu, Chang and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	In the current formulation of TESLA-CELAB, two n-grams X and Y are either synonyms which completely match each other, or are completely unrelated.
Experiments	Compared to BLEU, TESLA allows more sophisticated weighting of n-grams and measures of word similarity including synonym relations.
Experiments	The covered n-gram matching rule is then able to award tricky n-grams such as TE, Ti, /1\ [E], 1/13 [IE5 and i9}.
Motivation	For example, between ¥_l—?fi_$ and ¥_5l?, higher-order n-grams such as and still have no match, and will be penalized accordingly, even though ¥_l—?fi_5lk and ¥_5l?
Motivation	N-grams such as which cross natural word boundaries and are meaningless by themselves can be particularly tricky.
The Algorithm	Two n-grams are connected if they are identical, or if they are identified as synonyms by Cilin.
The Algorithm	Notice that all n-grams are put in the same matching problem regardless of n, unlike in translation evaluation metrics designed for European languages.
The Algorithm	This enables us to designate n-grams with different values of n as synonyms, such as (n = 2) and 5!k (n = 1).

n-grams is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word-level (15)
n-grams (12)

7. Faster and Smaller N-Gram Language Models

Pauls, Adam and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction	The largest language models (LMs) can contain as many as several hundred billion n-grams (Brants et al., 2007), so storage is a challenge.
Introduction	Overall, we are able to store the 4 billion n-grams of the Google WeblT (Brants and Franz, 2006) cor-
Language Model Implementations	Lookup is linear in the length of the key and logarithmic in the number of n-grams .
Preliminaries	This corpus, which is on the large end of corpora typically employed in language modeling, is a collection of nearly 4 billion n-grams extracted from over a trillion tokens of English text, and has a vocabulary of about 13.5 million words.
Preliminaries	Tries represent collections of n-grams using a tree.
Preliminaries	Each node in the tree encodes a word, and paths in the tree correspond to n-grams in the collection.

n-grams is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

8. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	Un-igrams are produced by lexical rules, while higher-order n-grams can be produced either directly by lexical rules, or by combining constituents.
Computing Feature Expectations	The n-gram language model score of e similarly decomposes over the h in e that produce n-grams .
Computing Feature Expectations	The linear similarity measure takes the following form, where Tn is the set of n-grams:
Experimental Results	Above, we compare the precision, relative to reference translations, of sets of n-grams chosen in two ways.
Experimental Results	The left bar is the precision of the n-grams in 6*.
Experimental Results	The right bar is the precision of n-grams with E[c(6, 75)] > p. To justify this comparison, we chose p so that both methods of choosing 71- grams gave the same n-gram recall: the fraction of n-grams in reference translations that also appeared in 6* or had E[6(6, 75)] > p.

n-grams is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

9. Local Histograms of Character N-grams for Authorship Attribution

Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA).
Abstract	In this work we explore the suitability of LHs over n-grams at the character-level for AA.
Abstract	We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches.
Introduction	In particular, we consider local histograms over n-grams at the character-level obtained via the locally-weighted bag of words (LOWBOW) framework (Lebanon et al., 2007).
Introduction	Results confirm that local histograms of character n-grams are more helpful for AA than the usual global histograms of words or character n-grams (Luyckx and Daelemans, 2010); our results are superior to those reported in related works.
Introduction	We also show that local histograms over character n-grams are more helpful than local histograms over words, as originally proposed by (Lebanon et al., 2007).
Related Work	Some researchers have gone a step further and have attempted to capture sequential information by using n-grams at the word-level (Peng et al., 2004) or by discovering maximal frequent word sequences (Coyotl-Morales et al., 2006).
Related Work	Unfortunately, because of computational limitations, the latter methods cannot discover enough sequential information from documents (e.g., word n-grams are often restricted to n E {1,2,3}, while full sequential information would be obtained with n E {l .
Related Work	Stamatatos and coworkers have studied the impact of feature selection, with character n-grams, in AA (Houvardas and Stamatatos, 2006; Stamatatos, 2006a), ensemble learning with character n-grams (Stamatatos, 2006b) and novel classification techniques based

n-grams is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

n-grams (26)
SVM (10)
word-level (7)

10. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our approach first identifies statistically salient phrases of words and parts of speech — known as n-grams — in training texts generated in conditions where the social power
Abstract	Then, we apply machine learning to train classifiers with groups of these n-grams as features.
Abstract	Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.

n-grams is mentioned in 29 sentences in this paper.

Topics mentioned in this paper:

11. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Instead, we consider how to efficiently mine the Google n-grams corpus.
Introduction	This work uses a large web-scale corpus (Google n-grams ) to compute features for the full parsing task.
Web-count Features	The approach of Lauer (1995), for example, would be to take an ambiguous noun sequence like hydrogen ion exchange and compare the various counts (or associated conditional probabilities) of n-grams like hydrogen ion and hydrogen exchange.
Web-count Features	(2010) use web-scale n-grams to compute similar association statistics for longer sequences of nouns.
Web-count Features	These paraphrase features hint at the correct attachment decision by looking for web n-grams with special contexts that reveal syntax superficially.
Working with Web n-Grams	Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams	In particular, we only use counts of n-grams of the form :3 * y where the gap length is g 3.
Working with Web n-Grams	Next, we exploit a simple but efficient trie-based hashing algorithm to efficiently answer all of them in one pass over the n-grams corpus.

n-grams is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

12. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	These generally consist of a projection layer that maps words, sub-word units or n-grams to high dimensional embeddings; the latter are then combined component-wise with an operation such as summation.
Background	The trained weights in the filter m correspond to a linguistic feature detector that learns to recognise a specific class of n-grams .
Background	These n-grams have size n g m, where m is the width of the filter.
Experiments	tures based on long n-grams and to hierarchically combine these features is highly beneficial.
Experiments	In the first layer, the sequence is a continuous n-gram from the input sentence; in higher layers, sequences can be made of multiple separate n-grams .
Experiments	The feature detectors learn to recognise not just single n-grams, but patterns within n-grams that have syntactic, semantic or structural significance.
Introduction	Since individual sentences are rarely observed or not observed at all, one must represent a sentence in terms of features that depend on the words and short n-grams in the sentence that are frequently observed.
Introduction	by which the features of the sentence are extracted from the features of the words or n-grams .
Properties of the Sentence Model	The filters m of the wide convolution in the first layer can learn to recognise specific n-grams that have size less or equal to the filter width m; as we see in the experiments, m in the first layer is often set to a relatively large value
Properties of the Sentence Model	The subsequence of n-grams extracted by the generalised pooling operation induces in-variance to absolute positions, but maintains their order and relative positions.
Properties of the Sentence Model	This gives the RNN excellent performance at language modelling, but it is suboptimal for remembering at once the n-grams further back in the input sentence.

n-grams is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

13. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc.
Introduction	In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
Introduction	Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other.
Metric Design Considerations	We note, however, that matches between items (such as words, n-grams , etc.)
Metric Design Considerations	In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair.
Metric Design Considerations	Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match.

n-grams is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

14. Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition

Habash, Nizar and Roth, Ryan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Settings	nw N—grams Normword 1/2/3-gram probabilities lem N—grams Lemma 1/2/3-gram probabilities pos N-grams POS 1/2/3-gram probabilities
Experimental Settings	served for word N-grams which did not appear in the models).
Experimental Settings	Like the N-grams , this number is binned; in this case there are 11 bins, with 10 spread evenly over the [0,1) range, and an extra bin for values of exactly 1 (i.e., when the word appears in every hypothesis in the set).
Results	First, Table 4 shows the effect of adding nw N-grams of successively higher orders to the word baseline.
Results	word+nw l—gram 49.51 12.9 word+nw l—gram+nw 2—gram 59.26 35.2 word+nw N—grams 59.33 35.3 +pos 58.50 33.4 +pos N-grams 57.35 30.8 +lem+lem N—grams 59.63 36.0 +lem+lem N—grams+na 59.93 36.7 +lem+lem N-grams+na+nw 59.77 36.3 +lem 60.92 38.9 +lem+na 60.47 37.9 +lem+lem N—grams 60.44 37.9
Results	Here, the best performer is the model which utilizes the word, nw N-grams,

n-grams is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

15. Structured Learning for Taxonomy Induction with Belief Propagation

Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood, captured by semantic features based on patterns and statistics from Web n-grams and Wikipedia abstracts.
Analysis	Here, our Web 71- grams dataset (which only contains frequent n-grams ) and Wikipedia abstracts do not suffice and we would need to add richer Web data for such world knowledge to be reflected in the features.
Experiments	Feature sources: The n-gram semantic features are extracted from the Google n-grams corpus (Brants and Franz, 2006), a large collection of English n-grams (for n = 1 to 5) and their frequencies computed from almost 1 trillion tokens (95 billion sentences) of Web text.
Experiments	For this, we use a hash-trie on term pairs (similar to that of Bansal and Klein (2011)), and scan once through the 71- gram (or abstract) set, skipping many n-grams (or abstracts) based on fast checks of missing unigrams, exceeding length, suffix mismatches, etc.
Features	The Web n-grams corpus has broad coverage but is limited to up to 5-grams, so it may not contain pattem-based evidence for various longer multi-word terms and pairs.
Features	Similar to the Web n-grams case, we also fire Wikipedia-based pattern order features.
Introduction	The belief propagation approach allows us to efficiently and effectively incorporate heterogeneous relational evidence via hypernymy and siblinghood (e.g., coordination) cues, which we capture by semantic features based on simple surface patterns and statistics from Web n-grams and Wikipedia abstracts.
Related Work	8All the patterns and counts for our Web and Wikipedia edge and sibling features described above are extracted after stemming the words in the terms, the n-grams , and the abstracts (using the Porter stemmer).

n-grams is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

16. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	ber of parameters (unique N-grams ).
Experiments	91'i’e4 1e5 1e6 1e7 1e8 1e9 Number of Unique N-grams
Introduction	Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Related Work	The former uses the web-scale data explicitly to create more data for training the model; while the latter explores the web-scale N-grams data (Lin et al., 2010) for compound bracketing disambiguation.
Related Work	However, we explore the web-scale data for dependency parsing, the performance improves log-linearly with the number of parameters (unique N-grams ).
Web-Derived Selectional Preference Features	N-grams appearing 40
Web-Derived Selectional Preference Features	In this paper, the selectional preferences have the same meaning with N-grams , which model the word-to-word relationships, rather than only considering the predicates and arguments relationships.
Web-Derived Selectional Preference Features	All n-grams with lower counts are discarded.

n-grams is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

17. Tackling Sparse Data Issue in Machine Translation Evaluation

Bojar, Ondřej and Kos, Kamil and Mareċek, David

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Extensions of SemPOS	Surprisingly BLEU-2 performed better than any other n-grams for reasons that have yet to be examined.
Problems of BLEU	Total n-grams 35,531 33,891 32,251 30,611
Problems of BLEU	Table 1: n-grams confirmed by the reference and containing error flags.
Problems of BLEU	The suspicious cases are n-grams confirmed by the reference but still containing a flag (false positives) and n-grams not confirmed despite containing no error flag (false negatives).

n-grams is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

18. A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Conclusion	As far as we know, this is the first work of building a complex large scale distributed language model with a principled approach that is more powerful than n-grams when both trained on a very large corpus with up to a billion tokens.
Experimental results	The composite n-gram/m-SLMfl’LSA model gives significant perpleXity reductions over baseline n-grams , n = 3,4,5 and m-SLMs, m = 2,3,4.
Experimental results	Also, in the same study in (Chamiak, 2003), they found that the outputs produced using the n-grams received higher scores from BLEU; ours did not.
Introduction	We conduct comprehensive experiments on corpora with 44 million tokens, 230 million tokens, and 1.3 billion tokens and compare perplexity results with n-grams (n=3,4,5 respectively) on these three corpora, we obtain drastic perplexity reductions.
Training algorithm	Where #(g,Wl,Gl,d) is the count of semantic content 9 in semantic annotation string Gl of the lth sentence Wl in document d, #(w:,11+1wh:,1ng,Wl,Tl,Gl,d) is the count of n-grams , its m most recent exposed headwords and semantic content 9 in parse Tl and semantic annotation string Gl of the lth sentence Wl in document d, #(tthhtag, Wl, Tl, d) is the count
Training algorithm	The topic of large scale distributed language models is relatively new, and existing works are restricted to n-grams only (Brants et al., 2007; Emami et al., 2007; Zhang et al., 2006).

n-grams is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

language model (36)
BLEU (15)
n-gram (14)

19. Predicting and Eliciting Addressee's Emotion in Online Dialogue

Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Eliciting Addressee’s Emotion	7We have excluded n-grams that matched the emotional expressions used in Section 2 to avoid overfitting.
Predicting Addressee’s Emotion	We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion	The extracted n-grams could indicate a certain action that elicits a specific emotion (e. g., ‘have a fever’ in Table 2), or a style or tone of speaking (e. g., ‘Sorry’).
Predicting Addressee’s Emotion	Likewise, we extract word n-grams from the addressee’s utterance.

n-grams is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

20. Correcting Misuse of Verb Forms

Lee, John and Seneff, Stephanie

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data 5.1 Development Data	For disambiguation with n-grams (see §3.3), we made use of the WEB lT 5-GRAM corpus.
Data 5.1 Development Data	Prepared by Google Inc., it contains English n-grams , up to 5-grams, with their observed frequency counts from a large number of web pages.
Experiments	Table 6: The n-grams used for filtering, with examples of sentences which they are intended to differentiate.
Experiments	6.2 Disambiguation with N-grams
Experiments	For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

21. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Collaborative Decoding	Here we do not discriminate among different lexical n-grams and are only concerned with statistics aggregation of all n-grams of the same order.
Collaborative Decoding	Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Collaborative Decoding	So the corresponding feature value will be the expected number of occurrences in 17-[k (f) of all n-grams in e:
Discussion	Our method uses agreement information of n-grams , and consensus features are integrated into decoding models.
Experiments	In Table 5 we show in another dimension the impact of consensus-based features by restricting the maximum order of n-grams used to compute agreement statistics.
Experiments	One reason could be that the data sparsity for high-order n-grams leads to over fitting on development data.

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

22. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

MERT for MBR Parameter Optimization	This linear function contains n + 1 parameters 60, 61, ..., 6 N, where N is the maximum order of the n-grams involved.
Minimum Bayes-Risk Decoding	First, the set of n-grams is extracted from the lattice.
Minimum Bayes-Risk Decoding	For a moderately large lattice, there can be several thousands of n-grams and the procedure becomes expensive.
Minimum Bayes-Risk Decoding	For each node 75 in the lattice, we maintain a quantity Score(w, t) for each n-gram 212 that lies on a path from the source node to t. Score(w, t) is the highest posterior probability among all edges on the paths that terminate on t and contain n-gram w. The forward pass requires computing the n-grams introduced by each edge; to do this, we propagate n-grams (up to maximum order — l) terminating on each node.

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

23. Why-Question Answering using Intra- and Inter-Sentential Causal Relations

Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Causal Relations for Why-QA	Table 4: Causal relation features: n in n-grams is n = {2, 3} and n-grams in an effect part are distinguished from those in a cause part.
Causal Relations for Why-QA	The n-grams of 75 f1 and tfg are restricted to those containing at least one content word in a question.
Causal Relations for Why-QA	For example, word 3-gram “this/cause/QW” is extracted from This causes tsunamis in A2 for “Why is a tsunami generated?” Further, we create a word class version of word n-grams by converting the words in these word n-grams into their corresponding word class using the semantic word classes (500 classes for 5.5 million nouns) from our previous work (Oh et al., 2012).
Related Work	These previous studies took basically bag-of-words approaches and used the semantic knowledge to identify certain semantic associations using terms and n-grams .
System Architecture	employed three types of features for training the re-ranker: morphosyntactic features ( n-grams of morphemes and syntactic dependency chains), semantic word class features (semantic word classes obtained by automatic word clustering (Kazama and Torisawa, 2008)) and sentiment polarity features (word and phrase polarities).

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

24. A Joint Model for Discovery of Aspects in Utterances

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Approach Overview	We represent each utterance a as a vector wu of Nu word n-grams (segments), wuj, each of which are chosen from a vocabulary W of fixed-size V. We use entity lists obtained from web sources (explained next) to identify segments in the corpus.
Data and Approach Overview	Web n-Grams (G).
Experiments	Our vocabulary consists of n—grams and segments (phrases) in utterances that are extracted using web n-grams and entity lists of §3.
MultiLayer Context Model - MCM	* Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions.
MultiLayer Context Model - MCM	In (1) we assume that entities (E) are more indicative of the domain compared to other n-grams (G) and should be more dominant in sampling decision for domain topics.
MultiLayer Context Model - MCM	During Gibbs sampling, we keep track of the frequency of draws of domain, dialog act and slot indicating n-grams wj, in M D, M A and MS matrices, respectively.

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

25. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methodology	The maximum size of a context pattern depends on the size of n-grams available in the data.
Methodology	In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five.
Methodology	Shorter n-grams were not found to improve performance on development data and hence are not extracted.

n-grams is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

26. Lexically-Triggered Hidden Markov Models for Clinical Document Coding

Kiritchenko, Svetlana and Cherry, Colin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
Method	The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
Method	We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
Method	Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
Related work	Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

27. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Generic Phrase Training Procedure	Usually all n-grams up to a predefined length limit are considered as candidate phrases.
Features	First, due to data sparsity and/or alignment model’s capability, there would exist n-grams that cannot be aligned
Features	well, for instance, n-grams that are part of a paraphrase translation or metaphorical expression.
Features	Extracting candidate translations for such kind of n-grams for the sake of improving coverage (recall) might hurt translation quality (precision).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

28. Integrating history-length interpolation and classes in language modeling

Schütze, Hinrich

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	An important parameter of the class-based model is size of the base set, i.e., the total number of n-grams (or rather i-grams) to be clustered.
Polynomial discounting	We could add a constant to d, but one of the basic premises of the KN model, derived from the assumption that n-gram marginals should be equal to relative frequencies, is that the discount is larger for more frequent n-grams although in many implementations of KN only the cases = l, = 2, and 2 3 are distinguished.
Related work	ability of a class on n-grams of lexical items (as opposed to classes) (Whittaker and Woodland, 2001; Emami and Jelinek, 2005; Uszkoreit and Brants, 2008).
Related work	Models that condition classes on lexical n-grams could be extended in a way similar to what we propose here.
Related work	Our use of classes of lexical n-grams for n > 1 has several precedents in the literature (Suhm and Waibel, 1994; Kuo and Reichl, 1999; Deligne and Sagisaka, 2000; Justo and Torres, 2009).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

29. Reconstructing an Indo-European Family Tree from Non-native English Texts

Nagata, Ryo and Whittaker, Edward

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We removed n-grams that appeared less than five times8 in each subcorpus in the language models.
Implications for Work in Related Domains	The experimental results show that n-grams containing articles are predictive for identifying native languages.
Implications for Work in Related Domains	Importantly, all n-grams containing articles should be used in the classifier unlike the previous methods that are based only on n-grams containing article errors.
Implications for Work in Related Domains	Besides, no articles should be explicitly coded in n-grams for taking the overuse/underuse of articles into consideration.
Methods	In this language model, content words in n-grams are replaced with their corresponding POS tags.

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

30. Tri-Training for Authorship Attribution with Limited Training Data

Qian, Tieyun and Liu, Bing and Chen, Li and Peng, Zhiyong

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	CNG is a pro-file-based method which represents the author as the N most frequent character n-grams of all his/her training texts.
Proposed Tri-Training Algorithm	The features in the character view are the character n-grams of a document.
Proposed Tri-Training Algorithm	Character n-grams are simple and easily available for any natural language.
Proposed Tri-Training Algorithm	We use four content-independent structures including n-grams of POS tags (n = 1..3) and rewrite rules (Kim et al., 2011).
Related Work	Example features include function words (Argamon et al., 2007), richness features (Gamon 2004), punctuation frequencies (Graham et al., 2005), character (Grieve, 2007), word (Burrows, 1992) and POS n-grams (Gamon, 2004; Hirst and Feiguina, 2007), rewrite rules (Halteren et al., 1996), and similarities (Qian and Liu, 2013).

n-grams is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

31. Question Classification Transfer

Ligozat, Anne-Laure

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Table 1: Question classification precision for both levels of the hierarchy (features = word n-grams , classifier = libsvm)
Experiments	Using word n-grams , monolingual English classification obtains .798 correct classification for the fine grained classes, and .90 for the coarse grained classes, results which are very close to those obtained by (Zhang and Lee, 2003).
Experiments	Table 2: Question classification precision for both levels of the hierarchy (features = word n-grams with abbreviations, classifier = libsvm)

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

32. Conditional Random Fields for Responsive Surface Realisation using Global Features

Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	0 n-grams represents a simple 5- gram baseline that is similar to Oh and Rudnicky (2000)’s system.
Evaluation	CRF global 3.65 3.64 3.65 CRF local 310* 319* 313* CLASSiC 353* 3.59 348* n-grams 301* 309* 332*
Evaluation	This difference is significant for all categories compared with CRF (local) and n-grams (using a 1-sided Mann Whitney U-test, p < 0.001).
Introduction	In addition, we compare our system with alternative surface realisation methods from the literature, namely, a rank and boost approach and n-grams .

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

33. Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	However, oovs can be considered as n-grams (phrases) instead of unigrams.
Experiments & Results 4.1 Experimental Setup	We did not use trigrams or larger n-grams in our experiments.
Graph-based Lexicon Induction	However, constructing such graph and doing graph propagation on it is computationally very expensive for large n-grams .
Graph-based Lexicon Induction	These phrases are n-grams up to a certain value, which can result in millions of nodes.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

34. PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

BLEU and PORT	translation hypothesis to compute the numbers of the reference n-grams .
Experiments	Both BLEU and PORT perform matching of n-grams up to n = 4.
Experiments	In all tuning experiments, both BLEU and PORT performed lower case matching of n-grams up to n = 4.
Experiments	The BLEU-tuned and Qmean-tuned systems generate similar numbers of matching n-grams, but Qmean-tuned systems produce fewer n-grams (thus, shorter translations).

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

35. Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster

Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Problem Report and Aid Message Recognizers	MSAl Morpheme n-grams, syntactic dependency n-grams in the tweet and morpheme n-grams before and after the nucleus template.
Problem Report and Aid Message Recognizers	MSA2 Character n-grams of the nucleus template to capture conjugation and modality variations.
Problem Report and Aid Message Recognizers	MSA3 Morpheme and part-of-speech n-grams within the bunsetsu containing the nucleus template to capture conjugation and modality variations.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Veale, Tony and Li, Guofu

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Divergent (Re)Categorization	To tap into a richer source of concept properties than WordNet’s glosses, we can use web n-grams .
Divergent (Re)Categorization	Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006).
Divergent (Re)Categorization	So for each property P suggested by Google n-grams for a lexical concept C, we generate a like-simile for verbal behaviors such as swaggering and an as-as-simile for adjectives such as lonesome.
Summary and Conclusions	Using the Google n-grams as a source of tacit grouping constructions, we have created a comprehensive lookup table that provides Rex similarity scores for the most common (if often implicit) comparisons.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

37. Fast Online Lexicon Learning for Grounded Language Acquisition

Chen, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	This is mainly due to the additional minimum support constraint we added which discards many noisy lexical entries from infrequently seen n-grams .
Online Lexicon Learning Algorithm	and the corresponding navigation plan, we first segment the instruction into word tokens and construct n-grams from them.
Online Lexicon Learning Algorithm	From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. We then update the co-occurrence counts between all the n-grams w and all the connected subgraphs 9.
Online Lexicon Learning Algorithm	We also update the counts of how many examples we have encountered so far and counts of the n-grams w and subgraphs 9.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

38. Coreference Semantics from Web Features

Bansal, Mohit and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Semantics via Web Features	As the source of Web information, we use the Google n-grams corpus (Brants and Franz, 2006) which contains English n-grams (n = 1 to 5) and their Web frequency counts, derived from nearly 1 trillion word tokens and 95 billion sentences.
Semantics via Web Features	Using the n-grams corpus (for n = l to 5), we collect co-occurrence Web-counts by allowing a varying number of wildcards between hl and hg in the query.
Semantics via Web Features	2These clusters are derived form the V2 Google n-grams corpus.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

39. Approximation Strategies for Multi-Structure Sentence Compression

Thadani, Kapil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Following evaluations in machine translation as well as previous work in sentence compression (Unno et al., 2006; Clarke and Lapata, 2008; Martins and Smith, 2009; Napoles et al., 2011b; Thadani and McKeown, 2013), we evaluate system performance using F1 metrics over n-grams and dependency edges produced by parsing system output with RASP (Briscoe et al., 2006) and the Stanford parser.
Experiments	We report results over the following systems grouped into three categories of models: tokens + n-grams , tokens + dependencies, and joint models.
Introduction	Joint methods have also been proposed that invoke integer linear programming (ILP) formulations to simultaneously consider multiple structural inference problems—both over n-grams and input dependencies (Martins and Smith, 2009) or n-grams and all possible dependencies (Thadani and McKeown, 2013).
Multi-Structure Sentence Compression	C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

ILP (19)
bigram (13)
spanning tree (9)

40. Automatic Detection of Cognates Using Orthographic Alignment

Ciobanu, Alina Maria and Dinu, Liviu P.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Approach	The features we use are character n-grams around mismatches.
Our Approach	i) n-grams around gaps, i.e., we account only for insertions and deletions;
Our Approach	ii) n-grams around any type of mismatch, i.e., we account for all three types of mismatches.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

41. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Paraphrase Evaluation Metrics	We introduce a new scoring metric PINC that measures how many n-grams differ between the two sentences.
Paraphrase Evaluation Metrics	grams and n-gramC are the lists of n-grams in the
Paraphrase Evaluation Metrics	The PINC score computes the percentage of n-grams that appear in the candidate sentence but not in the source sentence.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

42. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Variational Approximate Decoding	whose edges correspond to n-grams (weighted with negative log-probabilities) and whose vertices correspond to (n — l)-grams.
Variational Approximate Decoding	This may be regarded as favoring n-grams that are likely to appear in the reference translation (because they are likely in the derivation forest).
Variational Approximate Decoding	However, in order to score well on the BLEU metric for MT evaluation (Papineni et al., 2001), which gives partial credit, we would also like to favor lower-order n-grams that are likely to appear in the reference, even if this means picking some less-likely high-order n-grams .
Variational vs. Min-Risk Decoding	Now, let us divide N, which contains n-gram types of different n, into several subsets Wn, each of which contains only the n-grams with a given length n. We can now rewrite (19) as follows,

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

43. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Generation & Propagation	For the unlabeled phrases, the set of possible target translations could be extremely large (e.g., all target language n-grams ).
Generation & Propagation	A nai've way to achieve this goal would be to extract all n-grams , from n = l to a maximum n-gram order, from the monolingual data, but this strategy would lead to a combinatorial explosion in the number of target phrases.
Generation & Propagation	This set of candidate phrases is filtered to include only n-grams occurring in the target monolingual corpus, and helps to prune passed-through OOV words and invalid translations.
Introduction	Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

44. Biases in Predicting the Human Language Model

Fine, Alex B. and Frank, Austin F. and Jaeger, T. Florian and Van Durme, Benjamin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Contrasting the predictive ability of statistics derived from 6 different corpora, we find intuitive results showing that, e.g., a British corpus over-predicts the speed with which an American will react to the words ward and duke, and that the Google n-grams over-predicts familiarity with technology terms.
Fitting Behavioral Data 2.1 Data	2Surprisingly, fife was determined to be one of the words with the largest frequency asymmetry between Switchboard and the Google n-grams corpus.
Introduction	Specifically, we predict human data from three widely used psycholinguistic experimental paradigms—lexical decision, word naming, and picture naming—using unigram frequency estimates from Google n-grams (Brants and Franz, 2006), Switchboard (Godfrey et al., 1992), spoken and written English portions of CELEX (Baayen et al., 1995), and spoken and written portions of the British National Corpus (BNC Consortium, 2007).
Introduction	For example, Google n-grams overestimates the ease with which humans will process words related to the web (tech, code, search, site), while the Switchboard corpus—a collection of informal telephone conversations between strangers—overestimates how quickly humans will react to colloquialisms (heck, dam) and backchannels (wow, right).

n-grams is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

language model (9)
n-grams (4)

45. Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT

Tu, Mei and Zhou, Yu and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	4.4 Analysis of Different Effects of Different N-grams
Experiments	To evaluate the effects of different n-grams for our proposed transfer model, we compared the uni-/bi-/tri-gram transfer models in SMT, and illustrate the results in Fig-
Experiments	Different translation qualities along with different n-grams for transfer model.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

46. "Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints

Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	In particular, we plan to extend the use of n-grams to larger contexts and consider more fine-grained tuning of other constraints, too.
Lexical Constraints for Humorous Word Substitution	Implementation Local coherence is implemented using n-grams .
Lexical Constraints for Humorous Word Substitution	To estimate the level of expectation triggered by a left-context, we rely on a vast collection of n-grams, the 2012 Google Books n-grams collection4 (Michel et al., 2011) and compute the cohesion of each n-gram, by comparing their expected frequency (assuming word inde-pence), to their observed number of occurrences.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

47. Parsing with Compositional Vector Grammars

Socher, Richard and Bauer, John and Manning, Christopher D. and Andrew Y., Ng

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	In many natural language systems, single words and n-grams are usefully described by their distributional similarities (Brown et al., 1992), among many others.
Introduction	n-grams will never be seen during training, especially when n is large.
Introduction	In this work, we present a new solution to learn features and phrase representations even for very long, unseen n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

48. Comparing Automatic Evaluation Measures for Image Description

Elliott, Desmond and Keller, Frank

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Recent approaches to this task have been based on slot-filling (Yang et al., 2011; Elliott and Keller, 2013), combining web-scale n-grams (Li et al., 2011), syntactic tree substitution (Mitchell et al., 2012), and description-by-retrieval (Farhadi et al., 2010; Ordonez et al., 2011; Hodosh et al., 2013).
Methodology	pn measures the effective overlap by calculating the proportion of the maximum number of n-grams co-occurring between a candidate and a reference and the total number of n-grams in the candidate text.
Methodology	(2012); to the best of our knowledge, the only image description work to use higher-order n-grams with BLEU is Elliott and Keller (2013).

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

49. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
Motivation	Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams .
Motivation	information retrieval, and again, interpolate latent topic models with N-grams to improve retrieval performance.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

50. Ordering Prenominal Modifiers with a Reranking Approach

Liu, Jenny and Haghighi, Aria

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	To collect the statistics, we take each NP in the training data and consider all possible 2-gms through 5-gms that are present in the NP’s modifier sequence, allowing for nonconsecutive n-grams .
Model	Previous counting approaches can be expressed as a real-valued feature that, given all n-grams generated by a permutation of modifiers, returns the count of all these n-grams in the original training data.
Model	We might also expect permutations that contain n-grams previously seen in the training data to be more natural sounding than other permutations that generate n-grams that have not been seen before.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

MAXENT (16)
n-gram (15)
reranking (7)

51. A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Titov, Ivan and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Model	The first score is computed on the basis of all the n-grams , but using a common set of weights independent of the aspect a.
The Model	Another score is computed only using n-grams associated with the related topic, but an aspect-specific set of weights is used in this computation.
The Model	b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams , J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

52. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Use of longer n-grams improves translation results, but exacerbates this interaction.
Introduction	We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
Multi-pass LM-Integrated Decoding	where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)

53. Language Identification of Search Engine Queries

Ceylan, Hakan and Kim, Yookyung

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Rank OIflGR using words —-- ——onte‘Carfo usin - rams --» ---d/x 7,‘ Monte Carlo using words w ~ 7 ~ ’ Relative Entropy using N-grams — —~—-’ IRelative EIntropy usin words If r r w
Related Work	The n-gram based approaches are based on the counts of character or byte n-grams , which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work	(Dunning, 1994) proposed a system that uses Markov Chains of byte n-grams with Bayesian Decision Rules to minimize the probability error.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

54. Word or Phrase? Learning Which Unit to Stress for Information Retrieval

Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	(Miller et al., 1999; Song and Croft, 1999) explore the use n-grams in retrieval models.
Previous Work	Subsequently, various types of phrases, such as sequential n-grams (Mitra et al., 1997), head-modifier pairs extracted from syntactic structures (Lewis and Croft, 1990; Zhai, 1997; Dillon and Gray, 1983; Strzalkowski et al., 1994), proximity-based phrases (Turpin and Mof—fat, 1999), were examined with conventional retrieval models (e. g. vector space model).
Previous Work	(Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

55. Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

Chambers, Nathanael and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For the web baseline (reported as Google), we stemmed all words in the Google n-grams and counted every verb 2) and noun n that appear in Gigaword.
How Frequent is Unseen Data?	The dotted line uses Google n-grams as training.
Models	We also avoided over-counting co-occurrences in lower order n-grams that appear again in 4 or 5 - grams.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

56. Metadata-Aware Measures for Answer Summarization in Community Question Answering

Tomasoni, Mattia and Huang, Minlie

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We adopt a representation of concepts alternative to n-grams and propose two concept—scoring functions based on semantic overlap.
The summarization framework	To represent sentences and answers we adopted an alternative approach to classical n-grams that could be defined bag-of-BEs.
The summarization framework	Different from n-grams , they are variant in length and depend on parsing techniques, named entity detection, part-of-speech tagging and resolution of syntactic forms such as hyponyms, pronouns, per-tainyms, abbreviation and synonyms.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

57. A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

Tratz, Stephen and Hovy, Eduard

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Classification	To provide information related to term usage to the classifier, we extracted trigram and 4-gram features from the Web lT Corpus (Brants and Franz, 2006), a large collection of n-grams and their counts created from approximately one trillion words of Web text.
Automated Classification	Only n-grams containing lowercase words were used.
Automated Classification	Only n-grams containing both terms (including plural forms) were extracted.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

58. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	where gn(s) is the multi-set of all n-grams in a string s. In this definition, n-grams in e,~ and {rij} are weighted by Dt(i).
Background	If the i-th training sample has a larger weight, the corresponding n-grams will have more contributions to the overall score WBLEU(E,R) .
Background	In this method, a n-gram cache is used to store the most frequently and recently accessed n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

59. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment 1: Textual Similarity	0 Character n-grams which were also used as one of our additional features.
Experiment 1: Textual Similarity	Another interesting point is the high scores achieved by the Character n-grams
Experiment 1: Textual Similarity	Dataset Mpar Mvid SMTe DW 0.448 0.820 0.660 ADW—MF 0.485 0.842 0.721 Explicit Semantic Analysis 0.427 0.781 0.619 Pairwise Word Similarity 0.564 0.835 0.527 Distributional Thesaurus 0.494 0.481 0.365 Character n-grams 0.658 0.771 0.554

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

60. Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Table 3: MWE identification with CRF: base are the features corresponding to token properties and word n-grams .
MWE-dedicated Features	Word n-grams .
MWE-dedicated Features	POS n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

61. Concept-to-text Generation via Discriminative Reranking

Konstas, Ioannis and Lapata, Mirella

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Design	Field bigrams/trigrams Analogously to the lexical features mentioned above, we introduce a series of nonlocal features that capture field n-grams , given a specific record.
Related Work	Local and nonlocal information (e.g., word n-grams , long-
Results	The l-BEST system has some grammaticality issues, which we avoid by defining features over lexical n-grams and repeated words.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

62. Learning Translation Consensus with Structured Label Propagation

Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	In this paper, we only tried Dice coefficient of n-grams and symmetrical sentence level BLEU as similarity measures.
Features and Training	Tl(e,e') is the propagating probability in equation (8), with the similarity measure Sim(e,e') defined as the Dice coefficient over the set of all n-grams in e and those in e'.
Features and Training	where N Grn(x) is the set of n-grams in string x, and Dice (A, B) is the Dice coefficient over sets A and B:

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

63. Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	While traditional LMs use word n-grams , where the n — 1 previous words predict the next word, newer models integrate long-span information in making decisions.
Introduction	For example, incorporating long-distance dependencies and syntactic structure can help the LM better predict words by complementing the predictive power of n-grams (Chelba and Jelinek, 2000; Collins et al., 2005; Filimonov and Harper, 2009; Kuo et al., 2009).
Syntactic Language Models	Structured language modeling incorporates syntactic parse trees to identify the head words in a hypothesis for modeling dependencies beyond n-grams .

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

64. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Feature templates such as rule n-grams and rule shapes only work if iterative mixing (algorithm 3) or feature selection (algorithm 4) are used.
Introduction	Such features include rule ids, rule-local n-grams , or types of rule shapes.
Local Features for Synchronous CFGs	Rule n-grams: These features identify n-grams of consecutive items in a rule.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

65. Semi-Supervised Semantic Tagging of Conversational Understanding using Markov Topic Regression

Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In addition, we extracted web n-grams and entity lists (see §3) from movie related web sites, and online blogs and reviews.
Experiments	We extract prior distributions for entities and n-grams to calculate entity list 77 and word-tag [3 priors (see §3.1).
Markov Topic Regression - MTR	We built a language model using SRILM (Stol-cke, 2002) on the domain specific sources such as top wiki pages and blogs on online movie reviews, etc., to obtain the probabilities of domain-specific n-grams , up to 3-grams.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

66. Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Perplexity Evaluation	In this experiment, we assessed the effectiveness of the TD and TO components in reducing the n-gram’s perplexity.
Perplexity Evaluation	Due to the incapability of n-grams to model long history-contexts, the TD and TO components are still effective in helping to enhance the prediction.
Related Work	There are also works on skipping irrelevant his-tory-words in order to reveal more informative n-grams (Siu & Ostendorf 2000, Guthrie et al.

n-grams is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: