Index of papers in Proc. ACL that mention
  • n-grams
Persing, Isaac and Ng, Vincent
Error Classification
Additionally, since there are numerous word n-grams , some infrequent ones may just by chance only occur in positive training set instances, causing the learner to think they indicate the positive class when they do not.
Error Classification
For each essay, Aw+i counts the number of word n-grams we believe indicate that an essay is a positive example of 61-, and Aw—i counts the number of word n-grams we believe indicate an essay is not an example of 6,.
Error Classification
Aw+ n-grams for the Missing Details error tend to include phrases like “there is something” or “this statement”, while Aw— ngrams are often words taken directly from an essay’s prompt.
Evaluation
We see that the thesis clarity score predicting variation of the Baseline system, which employs as features only word n-grams and random indexing features, predicts the wrong score 65.8% of the time.
n-grams is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Model
Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Model
Instead of using all n-grams, a relevance based ranking method is proposed to select a subset of highly relevant n-grams for model building (details in §4).
Model
For notational convenience, we use terms to denote both words (unigrams) and phrases ( n-grams ).
Phrase Ranking based on Relevance
We now detail our method of preprocessing n-grams (phrases) based on relevance to select a subset of highly relevant n-grams for model building.
Phrase Ranking based on Relevance
A large number of irrelevant n-grams slow inference.
Phrase Ranking based on Relevance
This method, however, is expensive computationally and has a limitation for arbitrary length n-grams .
n-grams is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Abstract
Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams .
Introduction
At the word level 96% of the simple words are found in the normal corpus and even for n-grams as large as 5, more than half of the n-grams can be found in the normal text.
Introduction
This extra information may help with data sparsity, providing better estimates for rare and unseen n-grams .
Introduction
On the other hand, there is still only modest overlap between the sentences for longer n-grams , particularly given that the corpus is sentence-aligned and that 27% of the sentence pairs in this aligned data set are identical.
Why Does Unsimplified Data Help?
6.1 More n-grams
Why Does Unsimplified Data Help?
Table 3: Proportion of n-grams in the test sets that occur in the simple and normal training data sets.
Why Does Unsimplified Data Help?
We hypothesize that the key benefit of additional normal data is access to more n-gram counts and therefore better probability estimation, particularly for n-grams in the simple corpus that are unseen or have low frequency.
n-grams is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Talbot, David and Brants, Thorsten
Abstract
We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters.
Introduction
Parameters that are stored in the model are retrieved without error; however, false positives may occur whereby n-grams not in the model are incorrectly ‘found’ when requested.
Introduction
We encode fingerprints (random hashes) of n-grams together with their associated probabilities using a perfect hash function generated at random (Majewski et al., 1996).
Perfect Hash-based Language Models
We assume the n-grams and their associated parameter values have been precomputed and stored on disk.
Perfect Hash-based Language Models
We then encode the model in an array such that each n-gram’s value can be retrieved.
Perfect Hash-based Language Models
The model uses randomization to map n-grams to fingerprints and to generate a perfect hash function that associates n-grams with their values.
Scaling Language Models
In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary.
Scaling Language Models
Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models
However, if we are willing to accept that occasionally our model will be unable to distinguish between distinct n-grams , then it is possible to store
n-grams is mentioned in 46 sentences in this paper.
Topics mentioned in this paper:
Roark, Brian and Allauzen, Cyril and Riley, Michael
Introduction
Standard techniques store the observed n-grams and derive probabilities of unobserved n-grams via their longest observed suffix and “backoff” costs associated with the prefix histories of the unobserved suffixes.
Introduction
Hence the size of the model grows with the number of observed n-grams , which is very large for typical training corpora.
Introduction
These data structures permit efficient querying for specific n-grams in a model that has been stored in a fraction of the space required to store the full, exact model, though with some probability of false positives.
Preliminaries
w,- in the training corpus; F is a regularized probability estimate that provides some probability mass for unobserved n-grams ; and 04h?)
Preliminaries
N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.
Preliminaries
Probabilities for the rest of the n-grams are calculated through the “otherwise” semantics in the equation above.
n-grams is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Liu, Chang and Ng, Hwee Tou
Discussion and Future Work
In the current formulation of TESLA-CELAB, two n-grams X and Y are either synonyms which completely match each other, or are completely unrelated.
Experiments
Compared to BLEU, TESLA allows more sophisticated weighting of n-grams and measures of word similarity including synonym relations.
Experiments
The covered n-gram matching rule is then able to award tricky n-grams such as TE, Ti, /1\ [E], 1/13 [IE5 and i9}.
Motivation
For example, between ¥_l—?fi_$ and ¥_5l?, higher-order n-grams such as and still have no match, and will be penalized accordingly, even though ¥_l—?fi_5lk and ¥_5l?
Motivation
N-grams such as which cross natural word boundaries and are meaningless by themselves can be particularly tricky.
The Algorithm
Two n-grams are connected if they are identical, or if they are identified as synonyms by Cilin.
The Algorithm
Notice that all n-grams are put in the same matching problem regardless of n, unlike in translation evaluation metrics designed for European languages.
The Algorithm
This enables us to designate n-grams with different values of n as synonyms, such as (n = 2) and 5!k (n = 1).
n-grams is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Abstract
Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram, the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction
The largest language models (LMs) can contain as many as several hundred billion n-grams (Brants et al., 2007), so storage is a challenge.
Introduction
Overall, we are able to store the 4 billion n-grams of the Google WeblT (Brants and Franz, 2006) cor-
Language Model Implementations
Lookup is linear in the length of the key and logarithmic in the number of n-grams .
Preliminaries
This corpus, which is on the large end of corpora typically employed in language modeling, is a collection of nearly 4 billion n-grams extracted from over a trillion tokens of English text, and has a vocabulary of about 13.5 million words.
Preliminaries
Tries represent collections of n-grams using a tree.
Preliminaries
Each node in the tree encodes a word, and paths in the tree correspond to n-grams in the collection.
n-grams is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
DeNero, John and Chiang, David and Knight, Kevin
Computing Feature Expectations
Un-igrams are produced by lexical rules, while higher-order n-grams can be produced either directly by lexical rules, or by combining constituents.
Computing Feature Expectations
The n-gram language model score of e similarly decomposes over the h in e that produce n-grams .
Computing Feature Expectations
The linear similarity measure takes the following form, where Tn is the set of n-grams:
Experimental Results
Above, we compare the precision, relative to reference translations, of sets of n-grams chosen in two ways.
Experimental Results
The left bar is the precision of the n-grams in 6*.
Experimental Results
The right bar is the precision of n-grams with E[c(6, 75)] > p. To justify this comparison, we chose p so that both methods of choosing 71- grams gave the same n-gram recall: the fraction of n-grams in reference translations that also appeared in 6* or had E[6(6, 75)] > p.
n-grams is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel
Abstract
This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA).
Abstract
In this work we explore the suitability of LHs over n-grams at the character-level for AA.
Abstract
We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches.
Introduction
In particular, we consider local histograms over n-grams at the character-level obtained via the locally-weighted bag of words (LOWBOW) framework (Lebanon et al., 2007).
Introduction
Results confirm that local histograms of character n-grams are more helpful for AA than the usual global histograms of words or character n-grams (Luyckx and Daelemans, 2010); our results are superior to those reported in related works.
Introduction
We also show that local histograms over character n-grams are more helpful than local histograms over words, as originally proposed by (Lebanon et al., 2007).
Related Work
Some researchers have gone a step further and have attempted to capture sequential information by using n-grams at the word-level (Peng et al., 2004) or by discovering maximal frequent word sequences (Coyotl-Morales et al., 2006).
Related Work
Unfortunately, because of computational limitations, the latter methods cannot discover enough sequential information from documents (e.g., word n-grams are often restricted to n E {1,2,3}, while full sequential information would be obtained with n E {l .
Related Work
Stamatatos and coworkers have studied the impact of feature selection, with character n-grams, in AA (Houvardas and Stamatatos, 2006; Stamatatos, 2006a), ensemble learning with character n-grams (Stamatatos, 2006b) and novel classification techniques based
n-grams is mentioned in 26 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
Our approach first identifies statistically salient phrases of words and parts of speech — known as n-grams — in training texts generated in conditions where the social power
Abstract
Then, we apply machine learning to train classifiers with groups of these n-grams as features.
Abstract
Unlike their works, our text classification techniques take into account the frequency of occurrence of word n-grams and part-of-speech (POS) tag sequences, and other measures of statistical salience in training data.
n-grams is mentioned in 29 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Introduction
Instead, we consider how to efficiently mine the Google n-grams corpus.
Introduction
This work uses a large web-scale corpus (Google n-grams ) to compute features for the full parsing task.
Web-count Features
The approach of Lauer (1995), for example, would be to take an ambiguous noun sequence like hydrogen ion exchange and compare the various counts (or associated conditional probabilities) of n-grams like hydrogen ion and hydrogen exchange.
Web-count Features
(2010) use web-scale n-grams to compute similar association statistics for longer sequences of nouns.
Web-count Features
These paraphrase features hint at the correct attachment decision by looking for web n-grams with special contexts that reveal syntax superficially.
Working with Web n-Grams
Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams
In particular, we only use counts of n-grams of the form :3 * y where the gap length is g 3.
Working with Web n-Grams
Next, we exploit a simple but efficient trie-based hashing algorithm to efficiently answer all of them in one pass over the n-grams corpus.
n-grams is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Background
These generally consist of a projection layer that maps words, sub-word units or n-grams to high dimensional embeddings; the latter are then combined component-wise with an operation such as summation.
Background
The trained weights in the filter m correspond to a linguistic feature detector that learns to recognise a specific class of n-grams .
Background
These n-grams have size n g m, where m is the width of the filter.
Experiments
tures based on long n-grams and to hierarchically combine these features is highly beneficial.
Experiments
In the first layer, the sequence is a continuous n-gram from the input sentence; in higher layers, sequences can be made of multiple separate n-grams .
Experiments
The feature detectors learn to recognise not just single n-grams, but patterns within n-grams that have syntactic, semantic or structural significance.
Introduction
Since individual sentences are rarely observed or not observed at all, one must represent a sentence in terms of features that depend on the words and short n-grams in the sentence that are frequently observed.
Introduction
by which the features of the sentence are extracted from the features of the words or n-grams .
Properties of the Sentence Model
The filters m of the wide convolution in the first layer can learn to recognise specific n-grams that have size less or equal to the filter width m; as we see in the experiments, m in the first layer is often set to a relatively large value
Properties of the Sentence Model
The subsequence of n-grams extracted by the generalised pooling operation induces in-variance to absolute positions, but maintains their order and relative positions.
Properties of the Sentence Model
This gives the RNN excellent performance at language modelling, but it is suboptimal for remembering at once the n-grams further back in the input sentence.
n-grams is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Chan, Yee Seng and Ng, Hwee Tou
Abstract
This general framework allows us to use arbitrary similarity functions between items, and to incorporate different information in our comparison, such as n-grams , dependency relations, etc.
Introduction
In this paper, we propose a new automatic MT evaluation metric, MAXSIM, that compares a pair of system-reference sentences by extracting n-grams and dependency relations.
Introduction
Recognizing that different concepts can be expressed in a variety of ways, we allow matching across synonyms and also compute a score between two matching items (such as between two n-grams or between two dependency relations), which indicates their degree of similarity with each other.
Metric Design Considerations
We note, however, that matches between items (such as words, n-grams , etc.)
Metric Design Considerations
In this subsection, we describe in detail how we match the n-grams of a system-reference sentence pair.
Metric Design Considerations
Lemma match For the remaining set of n-grams that are not yet matched, we now relax our matching criteria by allowing a match if their corresponding lemmas match.
n-grams is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Habash, Nizar and Roth, Ryan
Experimental Settings
nw N—grams Normword 1/2/3-gram probabilities lem N—grams Lemma 1/2/3-gram probabilities pos N-grams POS 1/2/3-gram probabilities
Experimental Settings
served for word N-grams which did not appear in the models).
Experimental Settings
Like the N-grams , this number is binned; in this case there are 11 bins, with 10 spread evenly over the [0,1) range, and an extra bin for values of exactly 1 (i.e., when the word appears in every hypothesis in the set).
Results
First, Table 4 shows the effect of adding nw N-grams of successively higher orders to the word baseline.
Results
word+nw l—gram 49.51 12.9 word+nw l—gram+nw 2—gram 59.26 35.2 word+nw N—grams 59.33 35.3 +pos 58.50 33.4 +pos N-grams 57.35 30.8 +lem+lem N—grams 59.63 36.0 +lem+lem N—grams+na 59.93 36.7 +lem+lem N-grams+na+nw 59.77 36.3 +lem 60.92 38.9 +lem+na 60.47 37.9 +lem+lem N—grams 60.44 37.9
Results
Here, the best performer is the model which utilizes the word, nw N-grams,
n-grams is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan
Abstract
Our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood, captured by semantic features based on patterns and statistics from Web n-grams and Wikipedia abstracts.
Analysis
Here, our Web 71- grams dataset (which only contains frequent n-grams ) and Wikipedia abstracts do not suffice and we would need to add richer Web data for such world knowledge to be reflected in the features.
Experiments
Feature sources: The n-gram semantic features are extracted from the Google n-grams corpus (Brants and Franz, 2006), a large collection of English n-grams (for n = 1 to 5) and their frequencies computed from almost 1 trillion tokens (95 billion sentences) of Web text.
Experiments
For this, we use a hash-trie on term pairs (similar to that of Bansal and Klein (2011)), and scan once through the 71- gram (or abstract) set, skipping many n-grams (or abstracts) based on fast checks of missing unigrams, exceeding length, suffix mismatches, etc.
Features
The Web n-grams corpus has broad coverage but is limited to up to 5-grams, so it may not contain pattem-based evidence for various longer multi-word terms and pairs.
Features
Similar to the Web n-grams case, we also fire Wikipedia-based pattern order features.
Introduction
The belief propagation approach allows us to efficiently and effectively incorporate heterogeneous relational evidence via hypernymy and siblinghood (e.g., coordination) cues, which we capture by semantic features based on simple surface patterns and statistics from Web n-grams and Wikipedia abstracts.
Related Work
8All the patterns and counts for our Web and Wikipedia edge and sibling features described above are extracted after stemming the words in the terms, the n-grams , and the abstracts (using the Porter stemmer).
n-grams is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Conclusion
ber of parameters (unique N-grams ).
Experiments
91'i’e4 1e5 1e6 1e7 1e8 1e9 Number of Unique N-grams
Introduction
Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Related Work
The former uses the web-scale data explicitly to create more data for training the model; while the latter explores the web-scale N-grams data (Lin et al., 2010) for compound bracketing disambiguation.
Related Work
However, we explore the web-scale data for dependency parsing, the performance improves log-linearly with the number of parameters (unique N-grams ).
Web-Derived Selectional Preference Features
N-grams appearing 40
Web-Derived Selectional Preference Features
In this paper, the selectional preferences have the same meaning with N-grams , which model the word-to-word relationships, rather than only considering the predicates and arguments relationships.
Web-Derived Selectional Preference Features
All n-grams with lower counts are discarded.
n-grams is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Bojar, Ondřej and Kos, Kamil and Mareċek, David
Extensions of SemPOS
Surprisingly BLEU-2 performed better than any other n-grams for reasons that have yet to be examined.
Problems of BLEU
Total n-grams 35,531 33,891 32,251 30,611
Problems of BLEU
Table 1: n-grams confirmed by the reference and containing error flags.
Problems of BLEU
The suspicious cases are n-grams confirmed by the reference but still containing a flag (false positives) and n-grams not confirmed despite containing no error flag (false negatives).
n-grams is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Abstract
The large scale distributed composite language model gives drastic perplexity reduction over n-grams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re—ranking the N -best list from a state-of—the-art parsing-based machine translation system.
Conclusion
As far as we know, this is the first work of building a complex large scale distributed language model with a principled approach that is more powerful than n-grams when both trained on a very large corpus with up to a billion tokens.
Experimental results
The composite n-gram/m-SLMfl’LSA model gives significant perpleXity reductions over baseline n-grams , n = 3,4,5 and m-SLMs, m = 2,3,4.
Experimental results
Also, in the same study in (Chamiak, 2003), they found that the outputs produced using the n-grams received higher scores from BLEU; ours did not.
Introduction
We conduct comprehensive experiments on corpora with 44 million tokens, 230 million tokens, and 1.3 billion tokens and compare perplexity results with n-grams (n=3,4,5 respectively) on these three corpora, we obtain drastic perplexity reductions.
Training algorithm
Where #(g,Wl,Gl,d) is the count of semantic content 9 in semantic annotation string Gl of the lth sentence Wl in document d, #(w:,11+1wh:,1ng,Wl,Tl,Gl,d) is the count of n-grams , its m most recent exposed headwords and semantic content 9 in parse Tl and semantic annotation string Gl of the lth sentence Wl in document d, #(tthhtag, Wl, Tl, d) is the count
Training algorithm
The topic of large scale distributed language models is relatively new, and existing works are restricted to n-grams only (Brants et al., 2007; Emami et al., 2007; Zhang et al., 2006).
n-grams is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Eliciting Addressee’s Emotion
7We have excluded n-grams that matched the emotional expressions used in Section 2 to avoid overfitting.
Predicting Addressee’s Emotion
We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion
The extracted n-grams could indicate a certain action that elicits a specific emotion (e. g., ‘have a fever’ in Table 2), or a style or tone of speaking (e. g., ‘Sorry’).
Predicting Addressee’s Emotion
Likewise, we extract word n-grams from the addressee’s utterance.
n-grams is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Lee, John and Seneff, Stephanie
Data 5.1 Development Data
For disambiguation with n-grams (see §3.3), we made use of the WEB lT 5-GRAM corpus.
Data 5.1 Development Data
Prepared by Google Inc., it contains English n-grams , up to 5-grams, with their observed frequency counts from a large number of web pages.
Experiments
Table 6: The n-grams used for filtering, with examples of sentences which they are intended to differentiate.
Experiments
6.2 Disambiguation with N-grams
Experiments
For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming
Collaborative Decoding
Here we do not discriminate among different lexical n-grams and are only concerned with statistics aggregation of all n-grams of the same order.
Collaborative Decoding
Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Collaborative Decoding
So the corresponding feature value will be the expected number of occurrences in 17-[k (f) of all n-grams in e:
Discussion
Our method uses agreement information of n-grams , and consensus features are integrated into decoding models.
Experiments
In Table 5 we show in another dimension the impact of consensus-based features by restricting the maximum order of n-grams used to compute agreement statistics.
Experiments
One reason could be that the data sparsity for high-order n-grams leads to over fitting on development data.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz
MERT for MBR Parameter Optimization
This linear function contains n + 1 parameters 60, 61, ..., 6 N, where N is the maximum order of the n-grams involved.
Minimum Bayes-Risk Decoding
First, the set of n-grams is extracted from the lattice.
Minimum Bayes-Risk Decoding
For a moderately large lattice, there can be several thousands of n-grams and the procedure becomes expensive.
Minimum Bayes-Risk Decoding
For each node 75 in the lattice, we maintain a quantity Score(w, t) for each n-gram 212 that lies on a path from the source node to t. Score(w, t) is the highest posterior probability among all edges on the paths that terminate on t and contain n-gram w. The forward pass requires computing the n-grams introduced by each edge; to do this, we propagate n-grams (up to maximum order — l) terminating on each node.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Oh, Jong-Hoon and Torisawa, Kentaro and Hashimoto, Chikara and Sano, Motoki and De Saeger, Stijn and Ohtake, Kiyonori
Causal Relations for Why-QA
Table 4: Causal relation features: n in n-grams is n = {2, 3} and n-grams in an effect part are distinguished from those in a cause part.
Causal Relations for Why-QA
The n-grams of 75 f1 and tfg are restricted to those containing at least one content word in a question.
Causal Relations for Why-QA
For example, word 3-gram “this/cause/QW” is extracted from This causes tsunamis in A2 for “Why is a tsunami generated?” Further, we create a word class version of word n-grams by converting the words in these word n-grams into their corresponding word class using the semantic word classes (500 classes for 5.5 million nouns) from our previous work (Oh et al., 2012).
Related Work
These previous studies took basically bag-of-words approaches and used the semantic knowledge to identify certain semantic associations using terms and n-grams .
System Architecture
employed three types of features for training the re-ranker: morphosyntactic features ( n-grams of morphemes and syntactic dependency chains), semantic word class features (semantic word classes obtained by automatic word clustering (Kazama and Torisawa, 2008)) and sentiment polarity features (word and phrase polarities).
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Data and Approach Overview
We represent each utterance a as a vector wu of Nu word n-grams (segments), wuj, each of which are chosen from a vocabulary W of fixed-size V. We use entity lists obtained from web sources (explained next) to identify segments in the corpus.
Data and Approach Overview
Web n-Grams (G).
Experiments
Our vocabulary consists of n—grams and segments (phrases) in utterances that are extracted using web n-grams and entity lists of §3.
MultiLayer Context Model - MCM
* Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions.
MultiLayer Context Model - MCM
In (1) we assume that entities (E) are more indicative of the domain compared to other n-grams (G) and should be more dominant in sampling decision for domain topics.
MultiLayer Context Model - MCM
During Gibbs sampling, we keep track of the frequency of draws of domain, dialog act and slot indicating n-grams wj, in M D, M A and MS matrices, respectively.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Lin, Dekang and Goebel, Randy
Methodology
The maximum size of a context pattern depends on the size of n-grams available in the data.
Methodology
In our n- gram collection (Section 3.4), the lengths of the n-grams range from unigrams to 5-grams, so our maximum pattern size is five.
Methodology
Shorter n-grams were not found to improve performance on development data and hence are not extracted.
n-grams is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kiritchenko, Svetlana and Cherry, Colin
Experiments
The classifiers use a feature set designed to mimic our LT-HMM as closely as possible, including n-grams , dictionary matches, ConText output, and symptom/disease se-
Method
The transition features are modeled as simple indicators over n-grams of present codes, for values of n up to 10, the largest number of codes proposed by
Method
We found it useful to pad our n-grams with “beginning of document” tokens for sequences when fewer than n codes have been labelled as present, but found it harmful to include an end-of-document tag once labelling is complete.
Method
Note that n-grams features are only code-specific, as they are not connected to any specific trigger term.
Related work
Statistical systems trained on only text-derived features (such as n-grams ) did not show good performance due to a wide variety of medical language and a relatively small training set (Goldstein et al., 2007).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
Usually all n-grams up to a predefined length limit are considered as candidate phrases.
Features
First, due to data sparsity and/or alignment model’s capability, there would exist n-grams that cannot be aligned
Features
well, for instance, n-grams that are part of a paraphrase translation or metaphorical expression.
Features
Extracting candidate translations for such kind of n-grams for the sake of improving coverage (recall) might hurt translation quality (precision).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Schütze, Hinrich
Experimental Setup
An important parameter of the class-based model is size of the base set, i.e., the total number of n-grams (or rather i-grams) to be clustered.
Polynomial discounting
We could add a constant to d, but one of the basic premises of the KN model, derived from the assumption that n-gram marginals should be equal to relative frequencies, is that the discount is larger for more frequent n-grams although in many implementations of KN only the cases = l, = 2, and 2 3 are distinguished.
Related work
ability of a class on n-grams of lexical items (as opposed to classes) (Whittaker and Woodland, 2001; Emami and Jelinek, 2005; Uszkoreit and Brants, 2008).
Related work
Models that condition classes on lexical n-grams could be extended in a way similar to what we propose here.
Related work
Our use of classes of lexical n-grams for n > 1 has several precedents in the literature (Suhm and Waibel, 1994; Kuo and Reichl, 1999; Deligne and Sagisaka, 2000; Justo and Torres, 2009).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Nagata, Ryo and Whittaker, Edward
Experiments
We removed n-grams that appeared less than five times8 in each subcorpus in the language models.
Implications for Work in Related Domains
The experimental results show that n-grams containing articles are predictive for identifying native languages.
Implications for Work in Related Domains
Importantly, all n-grams containing articles should be used in the classifier unlike the previous methods that are based only on n-grams containing article errors.
Implications for Work in Related Domains
Besides, no articles should be explicitly coded in n-grams for taking the overuse/underuse of articles into consideration.
Methods
In this language model, content words in n-grams are replaced with their corresponding POS tags.
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Qian, Tieyun and Liu, Bing and Chen, Li and Peng, Zhiyong
Experimental Evaluation
CNG is a pro-file-based method which represents the author as the N most frequent character n-grams of all his/her training texts.
Proposed Tri-Training Algorithm
The features in the character view are the character n-grams of a document.
Proposed Tri-Training Algorithm
Character n-grams are simple and easily available for any natural language.
Proposed Tri-Training Algorithm
We use four content-independent structures including n-grams of POS tags (n = 1..3) and rewrite rules (Kim et al., 2011).
Related Work
Example features include function words (Argamon et al., 2007), richness features (Gamon 2004), punctuation frequencies (Graham et al., 2005), character (Grieve, 2007), word (Burrows, 1992) and POS n-grams (Gamon, 2004; Hirst and Feiguina, 2007), rewrite rules (Halteren et al., 1996), and similarities (Qian and Liu, 2013).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ligozat, Anne-Laure
Experiments
Table 1: Question classification precision for both levels of the hierarchy (features = word n-grams , classifier = libsvm)
Experiments
Using word n-grams , monolingual English classification obtains .798 correct classification for the fine grained classes, and .90 for the coarse grained classes, results which are very close to those obtained by (Zhang and Lee, 2003).
Experiments
Table 2: Question classification precision for both levels of the hierarchy (features = word n-grams with abbreviations, classifier = libsvm)
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Dethlefs, Nina and Hastie, Helen and Cuayáhuitl, Heriberto and Lemon, Oliver
Evaluation
0 n-grams represents a simple 5- gram baseline that is similar to Oh and Rudnicky (2000)’s system.
Evaluation
CRF global 3.65 3.64 3.65 CRF local 310* 319* 313* CLASSiC 353* 3.59 348* n-grams 301* 309* 332*
Evaluation
This difference is significant for all categories compared with CRF (local) and n-grams (using a 1-sided Mann Whitney U-test, p < 0.001).
Introduction
In addition, we compare our system with alternative surface realisation methods from the literature, namely, a rank and boost approach and n-grams .
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Siahbani, Maryam and Haffari, Reza and Sarkar, Anoop
Conclusion
However, oovs can be considered as n-grams (phrases) instead of unigrams.
Experiments & Results 4.1 Experimental Setup
We did not use trigrams or larger n-grams in our experiments.
Graph-based Lexicon Induction
However, constructing such graph and doing graph propagation on it is computationally very expensive for large n-grams .
Graph-based Lexicon Induction
These phrases are n-grams up to a certain value, which can result in millions of nodes.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Larkin, Samuel
BLEU and PORT
translation hypothesis to compute the numbers of the reference n-grams .
Experiments
Both BLEU and PORT perform matching of n-grams up to n = 4.
Experiments
In all tuning experiments, both BLEU and PORT performed lower case matching of n-grams up to n = 4.
Experiments
The BLEU-tuned and Qmean-tuned systems generate similar numbers of matching n-grams, but Qmean-tuned systems produce fewer n-grams (thus, shorter translations).
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Varga, István and Sano, Motoki and Torisawa, Kentaro and Hashimoto, Chikara and Ohtake, Kiyonori and Kawai, Takao and Oh, Jong-Hoon and De Saeger, Stijn
Problem Report and Aid Message Recognizers
MSAl Morpheme n-grams, syntactic dependency n-grams in the tweet and morpheme n-grams before and after the nucleus template.
Problem Report and Aid Message Recognizers
MSA2 Character n-grams of the nucleus template to capture conjugation and modality variations.
Problem Report and Aid Message Recognizers
MSA3 Morpheme and part-of-speech n-grams within the bunsetsu containing the nucleus template to capture conjugation and modality variations.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Veale, Tony and Li, Guofu
Divergent (Re)Categorization
To tap into a richer source of concept properties than WordNet’s glosses, we can use web n-grams .
Divergent (Re)Categorization
Consider these descriptions of a cowboy from the Google n-grams (Brants & Franz, 2006).
Divergent (Re)Categorization
So for each property P suggested by Google n-grams for a lexical concept C, we generate a like-simile for verbal behaviors such as swaggering and an as-as-simile for adjectives such as lonesome.
Summary and Conclusions
Using the Google n-grams as a source of tacit grouping constructions, we have created a comprehensive lookup table that provides Rex similarity scores for the most common (if often implicit) comparisons.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, David
Experiments
This is mainly due to the additional minimum support constraint we added which discards many noisy lexical entries from infrequently seen n-grams .
Online Lexicon Learning Algorithm
and the corresponding navigation plan, we first segment the instruction into word tokens and construct n-grams from them.
Online Lexicon Learning Algorithm
From the corresponding navigation plan, we find all connected subgraphs of size less than or equal to m. We then update the co-occurrence counts between all the n-grams w and all the connected subgraphs 9.
Online Lexicon Learning Algorithm
We also update the counts of how many examples we have encountered so far and counts of the n-grams w and subgraphs 9.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Semantics via Web Features
As the source of Web information, we use the Google n-grams corpus (Brants and Franz, 2006) which contains English n-grams (n = 1 to 5) and their Web frequency counts, derived from nearly 1 trillion word tokens and 95 billion sentences.
Semantics via Web Features
Using the n-grams corpus (for n = l to 5), we collect co-occurrence Web-counts by allowing a varying number of wildcards between hl and hg in the query.
Semantics via Web Features
2These clusters are derived form the V2 Google n-grams corpus.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Thadani, Kapil
Experiments
Following evaluations in machine translation as well as previous work in sentence compression (Unno et al., 2006; Clarke and Lapata, 2008; Martins and Smith, 2009; Napoles et al., 2011b; Thadani and McKeown, 2013), we evaluate system performance using F1 metrics over n-grams and dependency edges produced by parsing system output with RASP (Briscoe et al., 2006) and the Stanford parser.
Experiments
We report results over the following systems grouped into three categories of models: tokens + n-grams , tokens + dependencies, and joint models.
Introduction
Joint methods have also been proposed that invoke integer linear programming (ILP) formulations to simultaneously consider multiple structural inference problems—both over n-grams and input dependencies (Martins and Smith, 2009) or n-grams and all possible dependencies (Thadani and McKeown, 2013).
Multi-Structure Sentence Compression
C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ciobanu, Alina Maria and Dinu, Liviu P.
Our Approach
The features we use are character n-grams around mismatches.
Our Approach
i) n-grams around gaps, i.e., we account only for insertions and deletions;
Our Approach
ii) n-grams around any type of mismatch, i.e., we account for all three types of mismatches.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Paraphrase Evaluation Metrics
We introduce a new scoring metric PINC that measures how many n-grams differ between the two sentences.
Paraphrase Evaluation Metrics
grams and n-gramC are the lists of n-grams in the
Paraphrase Evaluation Metrics
The PINC score computes the percentage of n-grams that appear in the candidate sentence but not in the source sentence.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev
Variational Approximate Decoding
whose edges correspond to n-grams (weighted with negative log-probabilities) and whose vertices correspond to (n — l)-grams.
Variational Approximate Decoding
This may be regarded as favoring n-grams that are likely to appear in the reference translation (because they are likely in the derivation forest).
Variational Approximate Decoding
However, in order to score well on the BLEU metric for MT evaluation (Papineni et al., 2001), which gives partial credit, we would also like to favor lower-order n-grams that are likely to appear in the reference, even if this means picking some less-likely high-order n-grams .
Variational vs. Min-Risk Decoding
Now, let us divide N, which contains n-gram types of different n, into several subsets Wn, each of which contains only the n-grams with a given length n. We can now rewrite (19) as follows,
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
For the unlabeled phrases, the set of possible target translations could be extremely large (e.g., all target language n-grams ).
Generation & Propagation
A nai've way to achieve this goal would be to extract all n-grams , from n = l to a maximum n-gram order, from the monolingual data, but this strategy would lead to a combinatorial explosion in the number of target phrases.
Generation & Propagation
This set of candidate phrases is filtered to include only n-grams occurring in the target monolingual corpus, and helps to prune passed-through OOV words and invalid translations.
Introduction
Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Fine, Alex B. and Frank, Austin F. and Jaeger, T. Florian and Van Durme, Benjamin
Abstract
Contrasting the predictive ability of statistics derived from 6 different corpora, we find intuitive results showing that, e.g., a British corpus over-predicts the speed with which an American will react to the words ward and duke, and that the Google n-grams over-predicts familiarity with technology terms.
Fitting Behavioral Data 2.1 Data
2Surprisingly, fife was determined to be one of the words with the largest frequency asymmetry between Switchboard and the Google n-grams corpus.
Introduction
Specifically, we predict human data from three widely used psycholinguistic experimental paradigms—lexical decision, word naming, and picture naming—using unigram frequency estimates from Google n-grams (Brants and Franz, 2006), Switchboard (Godfrey et al., 1992), spoken and written English portions of CELEX (Baayen et al., 1995), and spoken and written portions of the British National Corpus (BNC Consortium, 2007).
Introduction
For example, Google n-grams overestimates the ease with which humans will process words related to the web (tech, code, search, site), while the Switchboard corpus—a collection of informal telephone conversations between strangers—overestimates how quickly humans will react to colloquialisms (heck, dam) and backchannels (wow, right).
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
Experiments
4.4 Analysis of Different Effects of Different N-grams
Experiments
To evaluate the effects of different n-grams for our proposed transfer model, we compared the uni-/bi-/tri-gram transfer models in SMT, and illustrate the results in Fig-
Experiments
Different translation qualities along with different n-grams for transfer model.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Valitutti, Alessandro and Toivonen, Hannu and Doucet, Antoine and Toivanen, Jukka M.
Conclusions
In particular, we plan to extend the use of n-grams to larger contexts and consider more fine-grained tuning of other constraints, too.
Lexical Constraints for Humorous Word Substitution
Implementation Local coherence is implemented using n-grams .
Lexical Constraints for Humorous Word Substitution
To estimate the level of expectation triggered by a left-context, we rely on a vast collection of n-grams, the 2012 Google Books n-grams collection4 (Michel et al., 2011) and compute the cohesion of each n-gram, by comparing their expected frequency (assuming word inde-pence), to their observed number of occurrences.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Socher, Richard and Bauer, John and Manning, Christopher D. and Andrew Y., Ng
Introduction
In many natural language systems, single words and n-grams are usefully described by their distributional similarities (Brown et al., 1992), among many others.
Introduction
n-grams will never be seen during training, especially when n is large.
Introduction
In this work, we present a new solution to learn features and phrase representations even for very long, unseen n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Elliott, Desmond and Keller, Frank
Introduction
Recent approaches to this task have been based on slot-filling (Yang et al., 2011; Elliott and Keller, 2013), combining web-scale n-grams (Li et al., 2011), syntactic tree substitution (Mitchell et al., 2012), and description-by-retrieval (Farhadi et al., 2010; Ordonez et al., 2011; Hodosh et al., 2013).
Methodology
pn measures the effective overlap by calculating the proportion of the maximum number of n-grams co-occurring between a candidate and a reference and the total number of n-grams in the candidate text.
Methodology
(2012); to the best of our knowledge, the only image description work to use higher-order n-grams with BLEU is Elliott and Keller (2013).
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Introduction
Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
Motivation
Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams .
Motivation
information retrieval, and again, interpolate latent topic models with N-grams to improve retrieval performance.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Jenny and Haghighi, Aria
Experiments
To collect the statistics, we take each NP in the training data and consider all possible 2-gms through 5-gms that are present in the NP’s modifier sequence, allowing for nonconsecutive n-grams .
Model
Previous counting approaches can be expressed as a real-valued feature that, given all n-grams generated by a permutation of modifiers, returns the count of all these n-grams in the original training data.
Model
We might also expect permutations that contain n-grams previously seen in the training data to be more natural sounding than other permutations that generate n-grams that have not been seen before.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Titov, Ivan and McDonald, Ryan
The Model
The first score is computed on the basis of all the n-grams , but using a common set of weights independent of the aspect a.
The Model
Another score is computed only using n-grams associated with the related topic, but an aspect-specific set of weights is used in this computation.
The Model
b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams , J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zhang, Hao and Gildea, Daniel
Introduction
Use of longer n-grams improves translation results, but exacerbates this interaction.
Introduction
We examine the question of whether, given the reordering inherent in the machine translation problem, lower order n-grams will provide as valuable a search heuristic as they do for speech recognition.
Multi-pass LM-Integrated Decoding
where 8(2', j) is the set of candidate target language words outside the span of (i, j h B B is the product of the upper bounds for the two on-the-border n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Ceylan, Hakan and Kim, Yookyung
Introduction
Rank OIflGR using words —-- ——onte‘Carfo usin - rams --» ---d/x 7,‘ Monte Carlo using words w ~ 7 ~ ’ Relative Entropy using N-grams — —~—-’ IRelative EIntropy usin words If r r w
Related Work
The n-gram based approaches are based on the counts of character or byte n-grams , which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work
(Dunning, 1994) proposed a system that uses Markov Chains of byte n-grams with Bayesian Decision Rules to minimize the probability error.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang
Introduction
(Miller et al., 1999; Song and Croft, 1999) explore the use n-grams in retrieval models.
Previous Work
Subsequently, various types of phrases, such as sequential n-grams (Mitra et al., 1997), head-modifier pairs extracted from syntactic structures (Lewis and Croft, 1990; Zhai, 1997; Dillon and Gray, 1983; Strzalkowski et al., 1994), proximity-based phrases (Turpin and Mof—fat, 1999), were examined with conventional retrieval models (e. g. vector space model).
Previous Work
(Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chambers, Nathanael and Jurafsky, Daniel
Experiments
For the web baseline (reported as Google), we stemmed all words in the Google n-grams and counted every verb 2) and noun n that appear in Gigaword.
How Frequent is Unseen Data?
The dotted line uses Google n-grams as training.
Models
We also avoided over-counting co-occurrences in lower order n-grams that appear again in 4 or 5 - grams.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tomasoni, Mattia and Huang, Minlie
Abstract
We adopt a representation of concepts alternative to n-grams and propose two concept—scoring functions based on semantic overlap.
The summarization framework
To represent sentences and answers we adopted an alternative approach to classical n-grams that could be defined bag-of-BEs.
The summarization framework
Different from n-grams , they are variant in length and depend on parsing techniques, named entity detection, part-of-speech tagging and resolution of syntactic forms such as hyponyms, pronouns, per-tainyms, abbreviation and synonyms.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tratz, Stephen and Hovy, Eduard
Automated Classification
To provide information related to term usage to the classifier, we extracted trigram and 4-gram features from the Web lT Corpus (Brants and Franz, 2006), a large collection of n-grams and their counts created from approximately one trillion words of Web text.
Automated Classification
Only n-grams containing lowercase words were used.
Automated Classification
Only n-grams containing both terms (including plural forms) were extracted.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
where gn(s) is the multi-set of all n-grams in a string s. In this definition, n-grams in e,~ and {rij} are weighted by Dt(i).
Background
If the i-th training sample has a larger weight, the corresponding n-grams will have more contributions to the overall score WBLEU(E,R) .
Background
In this method, a n-gram cache is used to store the most frequently and recently accessed n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Pilehvar, Mohammad Taher and Jurgens, David and Navigli, Roberto
Experiment 1: Textual Similarity
0 Character n-grams which were also used as one of our additional features.
Experiment 1: Textual Similarity
Another interesting point is the high scores achieved by the Character n-grams
Experiment 1: Textual Similarity
Dataset Mpar Mvid SMTe DW 0.448 0.820 0.660 ADW—MF 0.485 0.842 0.721 Explicit Semantic Analysis 0.427 0.781 0.619 Pairwise Word Similarity 0.564 0.835 0.527 Distributional Thesaurus 0.494 0.481 0.365 Character n-grams 0.658 0.771 0.554
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Constant, Matthieu and Sigogne, Anthony and Watrin, Patrick
Evaluation
Table 3: MWE identification with CRF: base are the features corresponding to token properties and word n-grams .
MWE-dedicated Features
Word n-grams .
MWE-dedicated Features
POS n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Konstas, Ioannis and Lapata, Mirella
Experimental Design
Field bigrams/trigrams Analogously to the lexical features mentioned above, we introduce a series of nonlocal features that capture field n-grams , given a specific record.
Related Work
Local and nonlocal information (e.g., word n-grams , long-
Results
The l-BEST system has some grammaticality issues, which we avoid by defining features over lexical n-grams and repeated words.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Conclusion and Future Work
In this paper, we only tried Dice coefficient of n-grams and symmetrical sentence level BLEU as similarity measures.
Features and Training
Tl(e,e') is the propagating probability in equation (8), with the similarity measure Sim(e,e') defined as the Dice coefficient over the set of all n-grams in e and those in e'.
Features and Training
where N Grn(x) is the set of n-grams in string x, and Dice (A, B) is the Dice coefficient over sets A and B:
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev
Introduction
While traditional LMs use word n-grams , where the n — 1 previous words predict the next word, newer models integrate long-span information in making decisions.
Introduction
For example, incorporating long-distance dependencies and syntactic structure can help the LM better predict words by complementing the predictive power of n-grams (Chelba and Jelinek, 2000; Collins et al., 2005; Filimonov and Harper, 2009; Kuo et al., 2009).
Syntactic Language Models
Structured language modeling incorporates syntactic parse trees to identify the head words in a hypothesis for modeling dependencies beyond n-grams .
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Simianer, Patrick and Riezler, Stefan and Dyer, Chris
Experiments
Feature templates such as rule n-grams and rule shapes only work if iterative mixing (algorithm 3) or feature selection (algorithm 4) are used.
Introduction
Such features include rule ids, rule-local n-grams , or types of rule shapes.
Local Features for Synchronous CFGs
Rule n-grams: These features identify n-grams of consecutive items in a rule.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek and Tur, Gokhan and Sarikaya, Ruhi
Experiments
In addition, we extracted web n-grams and entity lists (see §3) from movie related web sites, and online blogs and reviews.
Experiments
We extract prior distributions for entities and n-grams to calculate entity list 77 and word-tag [3 priors (see §3.1).
Markov Topic Regression - MTR
We built a language model using SRILM (Stol-cke, 2002) on the domain specific sources such as top wiki pages and blogs on online movie reviews, etc., to obtain the probabilities of domain-specific n-grams , up to 3-grams.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Perplexity Evaluation
In this experiment, we assessed the effectiveness of the TD and TO components in reducing the n-gram’s perplexity.
Perplexity Evaluation
Due to the incapability of n-grams to model long history-contexts, the TD and TO components are still effective in helping to enhance the prediction.
Related Work
There are also works on skipping irrelevant his-tory-words in order to reveal more informative n-grams (Siu & Ostendorf 2000, Guthrie et al.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: