Index of papers in Proc. ACL 2014 that mention
  • n-grams
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Background
These generally consist of a projection layer that maps words, sub-word units or n-grams to high dimensional embeddings; the latter are then combined component-wise with an operation such as summation.
Background
The trained weights in the filter m correspond to a linguistic feature detector that learns to recognise a specific class of n-grams .
Background
These n-grams have size n g m, where m is the width of the filter.
Experiments
tures based on long n-grams and to hierarchically combine these features is highly beneficial.
Experiments
In the first layer, the sequence is a continuous n-gram from the input sentence; in higher layers, sequences can be made of multiple separate n-grams .
Experiments
The feature detectors learn to recognise not just single n-grams, but patterns within n-grams that have syntactic, semantic or structural significance.
Introduction
Since individual sentences are rarely observed or not observed at all, one must represent a sentence in terms of features that depend on the words and short n-grams in the sentence that are frequently observed.
Introduction
by which the features of the sentence are extracted from the features of the words or n-grams .
Properties of the Sentence Model
The filters m of the wide convolution in the first layer can learn to recognise specific n-grams that have size less or equal to the filter width m; as we see in the experiments, m in the first layer is often set to a relatively large value
Properties of the Sentence Model
The subsequence of n-grams extracted by the generalised pooling operation induces in-variance to absolute positions, but maintains their order and relative positions.
Properties of the Sentence Model
This gives the RNN excellent performance at language modelling, but it is suboptimal for remembering at once the n-grams further back in the input sentence.
n-grams is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan
Abstract
Our model incorporates heterogeneous relational evidence about both hypernymy and siblinghood, captured by semantic features based on patterns and statistics from Web n-grams and Wikipedia abstracts.
Analysis
Here, our Web 71- grams dataset (which only contains frequent n-grams ) and Wikipedia abstracts do not suffice and we would need to add richer Web data for such world knowledge to be reflected in the features.
Experiments
Feature sources: The n-gram semantic features are extracted from the Google n-grams corpus (Brants and Franz, 2006), a large collection of English n-grams (for n = 1 to 5) and their frequencies computed from almost 1 trillion tokens (95 billion sentences) of Web text.
Experiments
For this, we use a hash-trie on term pairs (similar to that of Bansal and Klein (2011)), and scan once through the 71- gram (or abstract) set, skipping many n-grams (or abstracts) based on fast checks of missing unigrams, exceeding length, suffix mismatches, etc.
Features
The Web n-grams corpus has broad coverage but is limited to up to 5-grams, so it may not contain pattem-based evidence for various longer multi-word terms and pairs.
Features
Similar to the Web n-grams case, we also fire Wikipedia-based pattern order features.
Introduction
The belief propagation approach allows us to efficiently and effectively incorporate heterogeneous relational evidence via hypernymy and siblinghood (e.g., coordination) cues, which we capture by semantic features based on simple surface patterns and statistics from Web n-grams and Wikipedia abstracts.
Related Work
8All the patterns and counts for our Web and Wikipedia edge and sibling features described above are extracted after stemming the words in the terms, the n-grams , and the abstracts (using the Porter stemmer).
n-grams is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Qian, Tieyun and Liu, Bing and Chen, Li and Peng, Zhiyong
Experimental Evaluation
CNG is a pro-file-based method which represents the author as the N most frequent character n-grams of all his/her training texts.
Proposed Tri-Training Algorithm
The features in the character view are the character n-grams of a document.
Proposed Tri-Training Algorithm
Character n-grams are simple and easily available for any natural language.
Proposed Tri-Training Algorithm
We use four content-independent structures including n-grams of POS tags (n = 1..3) and rewrite rules (Kim et al., 2011).
Related Work
Example features include function words (Argamon et al., 2007), richness features (Gamon 2004), punctuation frequencies (Graham et al., 2005), character (Grieve, 2007), word (Burrows, 1992) and POS n-grams (Gamon, 2004; Hirst and Feiguina, 2007), rewrite rules (Halteren et al., 1996), and similarities (Qian and Liu, 2013).
n-grams is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Ciobanu, Alina Maria and Dinu, Liviu P.
Our Approach
The features we use are character n-grams around mismatches.
Our Approach
i) n-grams around gaps, i.e., we account only for insertions and deletions;
Our Approach
ii) n-grams around any type of mismatch, i.e., we account for all three types of mismatches.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Fine, Alex B. and Frank, Austin F. and Jaeger, T. Florian and Van Durme, Benjamin
Abstract
Contrasting the predictive ability of statistics derived from 6 different corpora, we find intuitive results showing that, e.g., a British corpus over-predicts the speed with which an American will react to the words ward and duke, and that the Google n-grams over-predicts familiarity with technology terms.
Fitting Behavioral Data 2.1 Data
2Surprisingly, fife was determined to be one of the words with the largest frequency asymmetry between Switchboard and the Google n-grams corpus.
Introduction
Specifically, we predict human data from three widely used psycholinguistic experimental paradigms—lexical decision, word naming, and picture naming—using unigram frequency estimates from Google n-grams (Brants and Franz, 2006), Switchboard (Godfrey et al., 1992), spoken and written English portions of CELEX (Baayen et al., 1995), and spoken and written portions of the British National Corpus (BNC Consortium, 2007).
Introduction
For example, Google n-grams overestimates the ease with which humans will process words related to the web (tech, code, search, site), while the Switchboard corpus—a collection of informal telephone conversations between strangers—overestimates how quickly humans will react to colloquialisms (heck, dam) and backchannels (wow, right).
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Generation & Propagation
For the unlabeled phrases, the set of possible target translations could be extremely large (e.g., all target language n-grams ).
Generation & Propagation
A nai've way to achieve this goal would be to extract all n-grams , from n = l to a maximum n-gram order, from the monolingual data, but this strategy would lead to a combinatorial explosion in the number of target phrases.
Generation & Propagation
This set of candidate phrases is filtered to include only n-grams occurring in the target monolingual corpus, and helps to prune passed-through OOV words and invalid translations.
Introduction
Unlike previous work (Irvine and Callison-Burch, 2013a; Razmara et al., 2013), we use higher order n-grams instead of restricting to unigrams, since our approach goes beyond OOV mitigation and can enrich the entire translation model by using evidence from monolingual text.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Thadani, Kapil
Experiments
Following evaluations in machine translation as well as previous work in sentence compression (Unno et al., 2006; Clarke and Lapata, 2008; Martins and Smith, 2009; Napoles et al., 2011b; Thadani and McKeown, 2013), we evaluate system performance using F1 metrics over n-grams and dependency edges produced by parsing system output with RASP (Briscoe et al., 2006) and the Stanford parser.
Experiments
We report results over the following systems grouped into three categories of models: tokens + n-grams , tokens + dependencies, and joint models.
Introduction
Joint methods have also been proposed that invoke integer linear programming (ILP) formulations to simultaneously consider multiple structural inference problems—both over n-grams and input dependencies (Martins and Smith, 2009) or n-grams and all possible dependencies (Thadani and McKeown, 2013).
Multi-Structure Sentence Compression
C. In addition, we define bigram indicator variables yij E {0, l} to represent whether a particular order-preserving bigram2 (ti, tj> from S is present as a contiguous bigram in C as well as dependency indicator variables zij E {0, 1} corresponding to whether the dependency arc ti —> 253- is present in the dependency parse of C. The score for a given compression 0 can now be defined to factor over its tokens, n-grams and dependencies as follows.
n-grams is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Elliott, Desmond and Keller, Frank
Introduction
Recent approaches to this task have been based on slot-filling (Yang et al., 2011; Elliott and Keller, 2013), combining web-scale n-grams (Li et al., 2011), syntactic tree substitution (Mitchell et al., 2012), and description-by-retrieval (Farhadi et al., 2010; Ordonez et al., 2011; Hodosh et al., 2013).
Methodology
pn measures the effective overlap by calculating the proportion of the maximum number of n-grams co-occurring between a candidate and a reference and the total number of n-grams in the candidate text.
Methodology
(2012); to the best of our knowledge, the only image description work to use higher-order n-grams with BLEU is Elliott and Keller (2013).
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
Experiments
4.4 Analysis of Different Effects of Different N-grams
Experiments
To evaluate the effects of different n-grams for our proposed transfer model, we compared the uni-/bi-/tri-gram transfer models in SMT, and illustrate the results in Fig-
Experiments
Different translation qualities along with different n-grams for transfer model.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Introduction
Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
Motivation
Given the rise of unsupervised latent topic modeling with Latent Dirchlet Allocation (Blei et al., 2003) and similar latent variable approaches for discovering meaningful word co-occurrence patterns in large text corpora, we ought to be able to leverage these topic contexts instead of merely N-grams .
Motivation
information retrieval, and again, interpolate latent topic models with N-grams to improve retrieval performance.
n-grams is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: