Index of papers in Proc. ACL 2011 that mention
  • n-gram
Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li
Introduction
Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Web-Derived Selectional Preference Features
one is web, N-gram counts are approximated by Google hits.
Web-Derived Selectional Preference Features
This N-gram corpus records how often each unique sequence of words occurs.
Web-Derived Selectional Preference Features
times or more (1 in 25 billion) are kept, and appear in the n-gram tables.
n-gram is mentioned in 27 sentences in this paper.
Topics mentioned in this paper:
Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael
Abstract
Each n-gram is a sequence of words, POS tags or a combination of words and POS tags
Abstract
To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract
The first n-gram of the set, please AVB, would match pleaseARB bringAVB from the text.
n-gram is mentioned in 21 sentences in this paper.
Topics mentioned in this paper:
Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun
Composite language model
The n-gram language model is essentially a word predictor that given its entire document history it predicts next word wk+1 based on the last n-l words with probability p(wk+1|w,’:_n+2) where w’g_n+2 = wk—n+27'°' 710k:-
Composite language model
The SLM (Chelba and J elinek, 1998; Chelba and Jelinek, 2000) uses syntactic information beyond the regular n-gram models to capture sentence level long range dependencies.
Composite language model
When combining n-gram , m order SLM and
Experimental results
The m-SLM performs competitively with its counterpart n-gram (n=m+1) on large scale corpus.
Introduction
The Markov chain ( n-gram ) source models, which predict each word on the basis of previous n-l words, have been the workhorses of state-of—the-art speech recognizers and machine translators that help to resolve acoustic or foreign language ambiguities by placing higher probability on more likely original underlying word strings.
Introduction
are efficient at encoding local word interactions, the n-gram model clearly ignores the rich syntactic and semantic structures that constrain natural languages.
Introduction
(2006) integrated n-gram , structured language model (SLM) (Chelba and Jelinek, 2000) and probabilistic latent semantic analysis (PLSA) (Hofmann, 2001) under the directed MRF framework (Wang et al., 2005) and studied the stochastic properties for the composite language model.
Training algorithm
Similar to SLM (Chelba and Jelinek, 2000), we adopt an N -best list approximate EM re-estimation with modular modifications to seamlessly incorporate the effect of n-gram and PLSA components.
Training algorithm
This implies that when computing the language model probability of a sentence in a client, all servers need to be contacted for each n-gram request.
Training algorithm
(2007) follows a standard MapReduce paradigm (Dean and Ghemawat, 2004): the corpus is first divided and loaded into a number of clients, and n-gram counts are collected at each client, then the n-gram counts mapped and stored in a number of servers, resulting in exactly one server being contacted per n-gram when computing the language model probability of a sentence.
n-gram is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen
Introduction
Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
Introduction
Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
Related Work
(2007) use supertag n-gram LMs.
Related Work
An n-gram language model history is also maintained at each node in the translation lattice.
Related Work
The search space is further trimmed with hypothesis recombination, which collapses lattice nodes that share a common coverage vector and n-gram state.
n-gram is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Liu, Jenny and Haghighi, Aria
Abstract
We compare our error rates to the state-of-the-art and to a strong Google n-gram count baseline.
Abstract
We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59.1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
Experiments
We keep a table mapping each unique n-gram to the number of times it has been seen in the training data.
Experiments
4.2 Google n-gram Baseline
Experiments
The Google n-gram corpus is a collection of n-gram counts drawn from public webpages with a total of one trillion tokens — around 1 billion each of unique 3-grams, 4—grams, and 5-grams, and around 300,000 unique bigrams.
Results
MAXENT also outperforms the GOOGLE N-GRAM baseline for almost all test corpora and sequence lengths.
Results
For the Switchboard test corpus token and type accuracies, the GOOGLE N-GRAM baseline is more accurate than MAXENT for sequences of length 2 and overall, but the accuracy of MAXENT is competitive with that of GOOGLE N-GRAM .
Results
MAXENT also attains a maximum error reduction of 68.4% for the WSJ test corpus and an average error reduction of 41.8% when compared to GOOGLE N-GRAM .
n-gram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Pauls, Adam and Klein, Dan
Abstract
Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram , the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction
and Raj, 2001; Hsu and Glass, 2008), our methods are conceptually based on tabular trie encodings wherein each n-gram key is stored as the concatenation of one word (here, the last) and an offset encoding the remaining words (here, the context).
Language Model Implementations
For an n-gram language model, we can apply this implementation with a slight modification: we need n sorted arrays, one for each n-gram order.
Language Model Implementations
Because our keys are sorted according to their context-encoded representation, we cannot straightforwardly answer queries about an n-gram w without first determining its context encoding.
Preliminaries
Our goal in this paper is to provide data structures that map n-gram keys to values, i.e.
Preliminaries
However, because of the sheer number of keys and values needed for n-gram language modeling, generic implementations do not work efficiently “out of the box.” In this section, we will review existing techniques for encoding the keys and values of an n-gram language model, taking care to account for every bit of memory required by each implementation.
Preliminaries
In the WeblT corpus, the most frequent n-gram occurs about 95 billion times.
n-gram is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Conclusion
Unlike previous approaches, our method combines information from letter n-gram language models and word dictionaries and provides a robust decipherment model.
Decipherment
Combining letter n-gram language models with word dictionaries: Many existing probabilistic approaches use statistical letter n-gram language models of English to assign P (p) probabilities to plaintext hypotheses during decipherment.
Decipherment
3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
Decipherment
We train the letter n-gram LM on 50 million words of English text available from the Linguistic Data Consortium.
Experiments and Results
EM Method using letter n-gram LMs following the approach of Knight et al.
Experiments and Results
0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.
Introduction
(2006) use the Expectation Maximization (EM) algorithm (Dempster et al., 1977) to search for the best probabilistic key using letter n-gram models.
Introduction
Ravi and Knight (2008) formulate decipherment as an integer programming problem and provide an exact method to solve simple substitution ciphers by using letter n-gram models along with deterministic key constraints.
Introduction
0 Our new method combines information from word dictionaries along with letter n-gram models, providing a robust decipherment model which offsets the disadvantages faced by previous approaches.
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Analysis
The mid-words m which rank highly are those where the occurrence of hma as an n-gram is a good indicator that a attaches to h (m of course does not have to actually occur in the sentence).
Analysis
However, we additionally find other cues, most notably that if the N IN sequence occurs following a capitalized determiner, it tends to indicate a nominal attachment (in the n-gram , the preposition cannot attach leftward to anything else because of the beginning of the sentence).
Analysis
These features essentially say that if two heads ml and 2122 occur in the direct coordination n-gram ml and wg, then they are good heads to coordinate (coordination unfortunately looks the same as complementation or modification to a basic dependency model).
Introduction
(2010), which use Web-scale n-gram counts for multi-way noun bracketing decisions, though that work considers only sequences of nouns and uses only affinity-based web features.
Working with Web n-Grams
Lapata and Keller (2004) uses the number of page hits as the web-count of the queried n-gram (which is problematic according to Kilgarriff (2007)).
Working with Web n-Grams
Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams
Our system requires the counts from a large collection of these n-gram queries (around 4.5 million).
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith and Knight, Kevin
Machine Translation as a Decipherment Task
For P (e), we use a word n-gram LM trained on monolingual English data.
Machine Translation as a Decipherment Task
Whole-segment Language Models: When using word n-gram models of English for decipherment, we find that some of the foreign sentences are decoded into sequences (such as “THANK YOU TALKING ABOUT ‘2”) that are not good English.
Machine Translation as a Decipherment Task
This stems from the fact that n-gram LMs have no global information about what constitutes a valid English segment.
Word Substitution Decipherment
We model P(e) using a statistical word n-gram English language model (LM).
Word Substitution Decipherment
Our method holds several other advantages over the EM approach—(l) inference using smart sampling strategies permits efficient training, allowing us to scale to large data/vocabulary sizes, (2) incremental scoring of derivations during sampling allows efficient inference even when we use higher-order n-gram LMs, (3) there are no memory bottlenecks since the full channel model and derivation lattice are never instantiated during training, and (4) prior specification allows us to learn skewed distributions that are useful here—word substitution ciphers exhibit l-to-l correspondence between plaintext and cipher types.
Word Substitution Decipherment
build an English word n-gram LM, which is used in the decipherment process.
n-gram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.
Automated Approaches to Deceptive Opinion Spam Detection
In contrast to the other strategies just discussed, our text categorization approach to deception detection allows us to model both content and context with n-gram features.
Automated Approaches to Deceptive Opinion Spam Detection
Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection
We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Introduction
Notably, a combined classifier with both n-gram and psychological deception features achieves nearly 90% cross-validated accuracy on this task.
Results and Discussion
Surprisingly, models trained only on UNIGRAMS—the simplest n-gram feature set—outperform all non—text-categorization approaches, and models trained on BIGRAMSJr perform even better (one-tailed sign test p = 0.07).
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel
Experiments and Results
For our character n-gram experiments, we obtained LOWBOW representations for character 3-grams (only n-grams of size n = 3 were used) considering the 2, 500 most common n-grams.
Experiments and Results
Also, n-gram information is more dense in documents than word-level information.
Related Work
(2003) propose the use of language models at the n-gram character-level for AA, whereas Keselj et al.
Related Work
on characters at the n-gram level (Plakias and Stamatatos, 2008a).
Related Work
Acceptable performance in AA has been reported with character n-gram representations.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, David and Dolan, William
Abstract
The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates.
Paraphrase Evaluation Metrics
Thus, no n-gram overlaps are required to determine the semantic adequacy of the paraphrase candidates.
Paraphrase Evaluation Metrics
In essence, it is the inverse of BLEU since we want to minimize the number of n-gram overlaps between the two sentences.
Paraphrase Evaluation Metrics
where N is the maximum n-gram considered and n-
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Lo, Chi-kiu and Wu, Dekai
Abstract
As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent.
Abstract
N-gram based metrics assume that “good” translations tend to share the same leXical choices as the reference translations.
Abstract
As MT systems improve, the shortcomings of the n-gram based evaluation metrics are becoming more apparent.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bendersky, Michael and Croft, W. Bruce and Smith, David A.
Experiments
SEG-I method requires an access to a large web n-gram corpus (Brants and Franz, 2006).
Experiments
where SQ is the set of all possible query segmenta-tions, 8 is a possible segmentation, s is a segment in S, and count(s) is the frequency of s in the web n-gram corpus.
Experiments
(2009), and include, among others, n-gram frequencies in a sample of a query log, web corpus and Wikipedia titles.
Independent Query Annotations
(2010), an estimate of p(Cz-|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006).
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Lin, Ziheng and Ng, Hwee Tou and Kan, Min-Yen
Conclusion
In our approach, n-gram sub-sequences of transitions per term in the discourse role matrix then constitute the more fine-grained evidence used in our model to distinguish coherence from incoherence.
Using Discourse Relations
tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text.
Using Discourse Relations
In our pilot work where we implemented such a basic model with n-gram features for relation transitions, the performance was very poor.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Lin, Hui and Bilmes, Jeff
Submodularity in Summarization
By definition (Lin, 2004), ROUGE-N is the n-gram recall between a candidate summary and a set of reference summaries.
Submodularity in Summarization
Precisely, let S be the candidate summary (a set of sentences extracted from the ground set V), Ce : 2V —> Z+ be the number of times n-gram 6 occurs in summary 8, and R,- be the set of n-grams contained in the reference summary i (suppose we have K reference summaries, i.e., i = 1, - - - ,K).
Submodularity in Summarization
where T6,, is the number of times n-gram 6 occurs in reference summary 2'.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Rush, Alexander M. and Collins, Michael
Background: Hypergraphs
The second step is to integrate an n-gram language model with this hypergraph.
Introduction
The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model.1 The hypergraph represents a large set of possible translations, and is created by applying a synchronous grammar to the source language string.
Introduction
The language model is then uwdwmmmemMMmhmmMnmmwmmmmr Decoding with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Models
The N-gram approximation of the joint probability can be defined in terms of multigrams qi as:
Models
N-gram models of order > 1 did not work well because these models tended to learn noise (information from non-transliteration pairs) in the training data.
Previous Research
(2010) submitted another system based on a standard n-gram kernel which ranked first for the English/Hindi and English/Tamil tasks.6 For the English/Arabic task, the transliteration mining system of Noeman and Madkour (2010) was best.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas
Experimental Setup
Standard n-gram features are also used as features.4 The feature model is built as follows: for every lemma in the f-structure, we extract a set of morphological properties (definiteness, person, pronominal status etc.
Experimental Setup
use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST: modification of BLEU giving more weight to less frequent n-grams.
Related Work
The first widely known data-driven approach to surface realisation, or tactical generation, (Langk—ilde and Knight, 1998) used language-model n-gram statistics on a word lattice of candidate realisations to guide a ranker.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: