Index of papers in Proc. ACL 2013 that mention
  • n-grams
Roark, Brian and Allauzen, Cyril and Riley, Michael
Introduction
Standard techniques store the observed n-grams and derive probabilities of unobserved n-grams via their longest observed suffix and “backoff” costs associated with the prefix histories of the unobserved suffixes.
Introduction
Hence the size of the model grows with the number of observed n-grams , which is very large for typical training corpora.
Introduction
These data structures permit efficient querying for specific n-grams in a model that has been stored in a fraction of the space required to store the full, exact model, though with some probability of false positives.
Preliminaries
w,- in the training corpus; F is a regularized probability estimate that provides some probability mass for unobserved n-grams ; and 04h?)
Preliminaries
N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.
Preliminaries
Probabilities for the rest of the n-grams are calculated through the “otherwise” semantics in the equation above.
n-grams is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Abstract
Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams .
Introduction
At the word level 96% of the simple words are found in the normal corpus and even for n-grams as large as 5, more than half of the n-grams can be found in the normal text.
Introduction
This extra information may help with data sparsity, providing better estimates for rare and unseen n-grams .
Introduction
On the other hand, there is still only modest overlap between the sentences for longer n-grams , particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical.
Why Does Unsimplified Data Help?
6.1 More n-grams
Why Does Unsimplified Data Help?
Table 3: Proportion of n-grams in the test sets that occur in the simple and normal training data sets.
Why Does Unsimplified Data Help?
We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency.
n-grams is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Model
Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Model
Instead of using all n-grams, a relevance based ranking method is proposed to select a subset of highly relevant n-grams for model building (details in §4).
Model
For notational convenience, we use terms to denote both words (unigrams) and phrases ( n-grams ).
Phrase Ranking based on Relevance
We now detail our method of preprocessing n-grams (phrases) based on relevance to select a subset of highly relevant n-grams for model building.
Phrase Ranking based on Relevance
A large number of irrelevant n-grams slow inference.
Phrase Ranking based on Relevance
This method, however, is expensive computationally and has a limitation for arbitrary length n-grams .
n-grams is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Error Classification
Additionally, since there are numerous word n-grams , some infrequent ones may just by chance only occur in positive training set instances, causing the learner to think they indicate the positive class when they do not.
Error Classification
For each essay, Aw+i counts the number of word n-grams we believe indicate that an essay is a positive example of 61-, and Aw—i counts the number of word n-grams we believe indicate an essay is not an example of 6,.
Error Classification
Aw+ n-grams for the Missing Details error tend to include phrases like “there is something” or “this statement”, while Aw— ngrams are often words taken directly from an essay’s prompt.
Evaluation
We see that the thesis clarity score predicting variation of the Baseline system, which employs as features only word n-grams and random indexing features, predicts the wrong score 65.8% of the time.
n-grams is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Eliciting Addressee’s Emotion
7We have excluded n-grams that matched the emotional expressions used in Section 2 to avoid overfitting.
Predicting Addressee’s Emotion
We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion
The extracted n-grams could indicate a certain action that elicits a specific emotion (e. g., ‘have a fever’ in Table 2), or a style or tone of speaking (e. g., ‘Sorry’).
Predicting Addressee’s Emotion
Likewise, we extract word n-grams from the addressee’s utterance.
n-grams is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori
Causal Relations for Why-QA
Table 4: Causal relation features: n in n-grams is n = {2, 3} and n-grams in an effect part are distinguished from those in a cause part.
Causal Relations for Why-QA
The n-grams of 75 f1 and tfg are restricted to those containing at least one content word in a question.
Causal Relations for Why-QA
For example, word 3-gram “this/cause/QW” is extracted from This causes tsunamis in A2 for “Why is a tsunami generated?” Further, we create a word class version of word n-grams by converting the words in these word n-grams into their corresponding word class using the semantic word classes (500 classes for 5.5 million nouns) from our previous work (Oh et al., 2012).
Related Work
These previous studies took basically bag-of-words approaches and used the semantic knowledge to identify certain semantic associations using terms and n-grams .
System Architecture
employed three types of features for training the re-ranker: morphosyntactic features ( n-grams of morphemes and syntactic dependency chains), semantic word class features (semantic word classes obtained by automatic word clustering (Kazama and Torisawa, 2008)) and sentiment polarity features (word and phrase polarities).
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Nagata, Ryo and Whittaker, Edward
Experiments
We removed n-grams that appeared less than five times8 in each subcorpus in the language models.
Implications for Work in Related Domains
The experimental results show that n-grams containing articles are predictive for identifying native languages.
Implications for Work in Related Domains
Importantly, all n-grams containing articles should be used in the classifier unlike the previous methods that are based only on n-grams containing article errors.
Implications for Work in Related Domains
Besides, no articles should be explicitly coded in n-grams for taking the overuse/underuse of articles into consideration.
Methods
In this language model, content words in n-grams are replaced with their corresponding POS tags.
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Veale, Tony and Li, Guofu
Divergent (Re)Categorization
To tap into a richer source of concept properties than WordNet’s glosses, we can use web n-grams .
Divergent (Re)Categorization
Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006).
Divergent (Re)Categorization
So for each property P suggested by Google n-grams for a lexical concept C, we generate a like-simile for verbal behaviors such as swaggering and an as-as-simile for adjectives such as lonesome.
Summary and Conclusions
Using the Google n-grams as a source of tacit grouping constructions, we have created a comprehensive lookup table that provides Rex similarity scores for the most common (if often implicit) comparisons.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn
Problem Report and Aid Message Recognizers
MSAl Morpheme n-grams, syntactic dependency n-grams in the tweet and morpheme n-grams before and after the nucleus template.
Problem Report and Aid Message Recognizers
MSA2 Character n-grams of the nucleus template to capture conjugation and modality variations.
Problem Report and Aid Message Recognizers
MSA3 Morpheme and part-of-speech n-grams within the bunsetsu containing the nucleus template to capture conjugation and modality variations.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Conclusion
However, oovs can be considered as n-grams (phrases) instead of unigrams.
Experiments & Results 4.1 Experimental Setup
We did not use trigrams or larger n-grams in our experiments.
Graph-based Lexicon Induction
However, constructing such graph and doing graph propagation on it is computationally very expensive for large n-grams .
Graph-based Lexicon Induction
These phrases are n-grams up to a certain value, which can result in millions of nodes.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ligozat, Anne-Laure
Experiments
Table 1: Question classification precision for both levels of the hierarchy (features = word n-grams , classifier = libsvm)
Experiments
Using word n-grams , monolingual English classification obtains .798 correct classification for the fine grained classes, and .90 for the coarse grained classes, results which are very close to those obtained by (Zhang and Lee, 2003).
Experiments
Table 2: Question classification precision for both levels of the hierarchy (features = word n-grams with abbreviations, classifier = libsvm)
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver
Evaluation
0 n-grams represents a simple 5- gram baseline that is similar to Oh and Rudnicky (2000)’s system.
Evaluation
CRF global 3.65 3.64 3.65 CRF local 310* 319* 313* CLASSiC 353* 3.59 348* n-grams 301* 309* 332*
Evaluation
This difference is significant for all categories compared with CRF (local) and n-grams (using a 1-sided Mann Whitney U-test, p < 0.001).
Introduction
In addition, we compare our system with alternative surface realisation methods from the literature, namely, a rank and boost approach and n-grams .
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto
Experiment 1: Textual Similarity
0 Character n-grams which were also used as one of our additional features.
Experiment 1: Textual Similarity
Another interesting point is the high scores achieved by the Character n-grams
Experiment 1: Textual Similarity
Dataset Mpar Mvid SMTe DW 0.448 0.820 0.660 ADW—MF 0.485 0.842 0.721 Explicit Semantic Analysis 0.427 0.781 0.619 Pairwise Word Similarity 0.564 0.835 0.527 Distributional Thesaurus 0.494 0.481 0.365 Character n-grams 0.658 0.771 0.554
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Socher, Richard and Bauer, John and Manning, Christopher D. and Andrew Y., Ng
Introduction
In many natural language systems, single words and n-grams are usefully described by their distributional similarities (Brown et al., 1992), among many others.
Introduction
n-grams will never be seen during training, especially when n is large.
Introduction
In this work, we present a new solution to learn features and phrase representations even for very long, unseen n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.
Conclusions
In particular, we plan to extend the use of n-grams to larger contexts and consider more fine-grained tuning of other constraints, too.
Lexical Constraints for Humorous Word Substitution
Implementation Local coherence is implemented using n-grams .
Lexical Constraints for Humorous Word Substitution
To estimate the level of expectation triggered by a left-context, we rely on a vast collection of n-grams, the 2012 Google Books n-grams collection4 (Michel et al., 2011) and compute the cohesion of each n-gram, by comparing their expected frequency (assuming word inde-pence), to their observed number of occurrences.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Perplexity Evaluation
In this experiment, we assessed the effectiveness of the TD and TO components in reducing the n-gram’s perplexity.
Perplexity Evaluation
Due to the incapability of n-grams to model long history-contexts, the TD and TO components are still effective in helping to enhance the prediction.
Related Work
There are also works on skipping irrelevant his-tory-words in order to reveal more informative n-grams (Siu & Ostendorf 2000, Guthrie et al.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Experiments
In addition, we extracted web n-grams and entity lists (see §3) from movie related web sites, and online blogs and reviews.
Experiments
We extract prior distributions for entities and n-grams to calculate entity list 77 and word-tag [3 priors (see §3.1).
Markov Topic Regression - MTR
We built a language model using SRILM (Stol-cke, 2002) on the domain specific sources such as top wiki pages and blogs on online movie reviews, etc., to obtain the probabilities of domain-specific n-grams , up to 3-grams.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: