Index of papers in Proc. ACL 2011 that mention

n-gram

Seen in text as:

n-gram (160)
N-gram (34)
N-GRAM (9)

Seen in 181 sentences in 19 papers.

1. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Web-Derived Selectional Preference Features	one is web, N-gram counts are approximated by Google hits.
Web-Derived Selectional Preference Features	This N-gram corpus records how often each unique sequence of words occurs.
Web-Derived Selectional Preference Features	times or more (1 in 25 billion) are kept, and appear in the n-gram tables.

n-gram is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

2. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Each n-gram is a sequence of words, POS tags or a combination of words and POS tags
Abstract	To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract	The first n-gram of the set, please AVB, would match pleaseARB bringAVB from the text.

n-gram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

3. A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Composite language model	The n-gram language model is essentially a word predictor that given its entire document history it predicts next word wk+1 based on the last n-l words with probability p(wk+1\|w,’:_n+2) where w’g_n+2 = wk—n+27'°' 710k:-
Composite language model	The SLM (Chelba and J elinek, 1998; Chelba and Jelinek, 2000) uses syntactic information beyond the regular n-gram models to capture sentence level long range dependencies.
Composite language model	When combining n-gram , m order SLM and
Experimental results	The m-SLM performs competitively with its counterpart n-gram (n=m+1) on large scale corpus.
Introduction	The Markov chain ( n-gram ) source models, which predict each word on the basis of previous n-l words, have been the workhorses of state-of—the-art speech recognizers and machine translators that help to resolve acoustic or foreign language ambiguities by placing higher probability on more likely original underlying word strings.
Introduction	are efficient at encoding local word interactions, the n-gram model clearly ignores the rich syntactic and semantic structures that constrain natural languages.
Introduction	(2006) integrated n-gram , structured language model (SLM) (Chelba and Jelinek, 2000) and probabilistic latent semantic analysis (PLSA) (Hofmann, 2001) under the directed MRF framework (Wang et al., 2005) and studied the stochastic properties for the composite language model.
Training algorithm	Similar to SLM (Chelba and Jelinek, 2000), we adopt an N -best list approximate EM re-estimation with modular modifications to seamlessly incorporate the effect of n-gram and PLSA components.
Training algorithm	This implies that when computing the language model probability of a sentence in a client, all servers need to be contacted for each n-gram request.
Training algorithm	(2007) follows a standard MapReduce paradigm (Dean and Ghemawat, 2004): the corpus is first divided and loaded into a number of clients, and n-gram counts are collected at each client, then the n-gram counts mapped and stored in a number of servers, resulting in exactly one server being contacted per n-gram when computing the language model probability of a sentence.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

language model (36)
BLEU (15)
n-gram (14)

4. Incremental Syntactic Language Models for Phrase-based Translation

Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
Introduction	Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
Related Work	(2007) use supertag n-gram LMs.
Related Work	An n-gram language model history is also maintained at each node in the translation lattice.
Related Work	The search space is further trimmed with hypothesis recombination, which collapses lattice nodes that share a common coverage vector and n-gram state.

n-gram is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

5. Ordering Prenominal Modifiers with a Reranking Approach

Liu, Jenny and Haghighi, Aria

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare our error rates to the state-of-the-art and to a strong Google n-gram count baseline.
Abstract	We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59.1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
Experiments	We keep a table mapping each unique n-gram to the number of times it has been seen in the training data.
Experiments	4.2 Google n-gram Baseline
Experiments	The Google n-gram corpus is a collection of n-gram counts drawn from public webpages with a total of one trillion tokens — around 1 billion each of unique 3-grams, 4—grams, and 5-grams, and around 300,000 unique bigrams.
Results	MAXENT also outperforms the GOOGLE N-GRAM baseline for almost all test corpora and sequence lengths.
Results	For the Switchboard test corpus token and type accuracies, the GOOGLE N-GRAM baseline is more accurate than MAXENT for sequences of length 2 and overall, but the accuracy of MAXENT is competitive with that of GOOGLE N-GRAM .
Results	MAXENT also attains a maximum error reduction of 68.4% for the WSJ test corpus and an average error reduction of 41.8% when compared to GOOGLE N-GRAM .

n-gram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

MAXENT (16)
n-gram (15)
reranking (7)

6. Faster and Smaller N-Gram Language Models

Pauls, Adam and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram , the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction	and Raj, 2001; Hsu and Glass, 2008), our methods are conceptually based on tabular trie encodings wherein each n-gram key is stored as the concatenation of one word (here, the last) and an offset encoding the remaining words (here, the context).
Language Model Implementations	For an n-gram language model, we can apply this implementation with a slight modification: we need n sorted arrays, one for each n-gram order.
Language Model Implementations	Because our keys are sorted according to their context-encoded representation, we cannot straightforwardly answer queries about an n-gram w without first determining its context encoding.
Preliminaries	Our goal in this paper is to provide data structures that map n-gram keys to values, i.e.
Preliminaries	However, because of the sheer number of keys and values needed for n-gram language modeling, generic implementations do not work efficiently “out of the box.” In this section, we will review existing techniques for encoding the keys and values of an n-gram language model, taking care to account for every bit of memory required by each implementation.
Preliminaries	In the WeblT corpus, the most frequent n-gram occurs about 95 billion times.

n-gram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

7. Bayesian Inference for Zodiac and Other Homophonic Ciphers

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Unlike previous approaches, our method combines information from letter n-gram language models and word dictionaries and provides a robust decipherment model.
Decipherment	Combining letter n-gram language models with word dictionaries: Many existing probabilistic approaches use statistical letter n-gram language models of English to assign P (p) probabilities to plaintext hypotheses during decipherment.
Decipherment	3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
Decipherment	We train the letter n-gram LM on 50 million words of English text available from the Linguistic Data Consortium.
Experiments and Results	EM Method using letter n-gram LMs following the approach of Knight et al.
Experiments and Results	0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.
Introduction	(2006) use the Expectation Maximization (EM) algorithm (Dempster et al., 1977) to search for the best probabilistic key using letter n-gram models.
Introduction	Ravi and Knight (2008) formulate decipherment as an integer programming problem and provide an exact method to solve simple substitution ciphers by using letter n-gram models along with deterministic key constraints.
Introduction	0 Our new method combines information from word dictionaries along with letter n-gram models, providing a robust decipherment model which offsets the disadvantages faced by previous approaches.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

8. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	The mid-words m which rank highly are those where the occurrence of hma as an n-gram is a good indicator that a attaches to h (m of course does not have to actually occur in the sentence).
Analysis	However, we additionally find other cues, most notably that if the N IN sequence occurs following a capitalized determiner, it tends to indicate a nominal attachment (in the n-gram , the preposition cannot attach leftward to anything else because of the beginning of the sentence).
Analysis	These features essentially say that if two heads ml and 2122 occur in the direct coordination n-gram ml and wg, then they are good heads to coordinate (coordination unfortunately looks the same as complementation or modification to a basic dependency model).
Introduction	(2010), which use Web-scale n-gram counts for multi-way noun bracketing decisions, though that work considers only sequences of nouns and uses only affinity-based web features.
Working with Web n-Grams	Lapata and Keller (2004) uses the number of page hits as the web-count of the queried n-gram (which is problematic according to Kilgarriff (2007)).
Working with Web n-Grams	Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams	Our system requires the counts from a large collection of these n-gram queries (around 4.5 million).

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

9. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Machine Translation as a Decipherment Task	For P (e), we use a word n-gram LM trained on monolingual English data.
Machine Translation as a Decipherment Task	Whole-segment Language Models: When using word n-gram models of English for decipherment, we find that some of the foreign sentences are decoded into sequences (such as “THANK YOU TALKING ABOUT ‘2”) that are not good English.
Machine Translation as a Decipherment Task	This stems from the fact that n-gram LMs have no global information about what constitutes a valid English segment.
Word Substitution Decipherment	We model P(e) using a statistical word n-gram English language model (LM).
Word Substitution Decipherment	Our method holds several other advantages over the EM approach—(l) inference using smart sampling strategies permits efficient training, allowing us to scale to large data/vocabulary sizes, (2) incremental scoring of derivations during sampling allows efficient inference even when we use higher-order n-gram LMs, (3) there are no memory bottlenecks since the full channel model and derivation lattice are never instantiated during training, and (4) prior specification allows us to learn skewed distributions that are useful here—word substitution ciphers exhibit l-to-l correspondence between plaintext and cipher types.
Word Substitution Decipherment	build an English word n-gram LM, which is used in the decipherment process.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

10. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Approaches to Deceptive Opinion Spam Detection	In contrast to the other strategies just discussed, our text categorization approach to deception detection allows us to model both content and context with n-gram features.
Automated Approaches to Deceptive Opinion Spam Detection	Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection	We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Introduction	Notably, a combined classifier with both n-gram and psychological deception features achieves nearly 90% cross-validated accuracy on this task.
Results and Discussion	Surprisingly, models trained only on UNIGRAMS—the simplest n-gram feature set—outperform all non—text-categorization approaches, and models trained on BIGRAMSJr perform even better (one-tailed sign test p = 0.07).

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Local Histograms of Character N-grams for Authorship Attribution

Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	For our character n-gram experiments, we obtained LOWBOW representations for character 3-grams (only n-grams of size n = 3 were used) considering the 2, 500 most common n-grams.
Experiments and Results	Also, n-gram information is more dense in documents than word-level information.
Related Work	(2003) propose the use of language models at the n-gram character-level for AA, whereas Keselj et al.
Related Work	on characters at the n-gram level (Plakias and Stamatatos, 2008a).
Related Work	Acceptable performance in AA has been reported with character n-gram representations.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

n-grams (26)
SVM (10)
word-level (7)

12. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates.
Paraphrase Evaluation Metrics	Thus, no n-gram overlaps are required to determine the semantic adequacy of the paraphrase candidates.
Paraphrase Evaluation Metrics	In essence, it is the inverse of BLEU since we want to minimize the number of n-gram overlaps between the two sentences.
Paraphrase Evaluation Metrics	where N is the maximum n-gram considered and n-

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

13. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

Lo, Chi-kiu and Wu, Dekai

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent.
Abstract	N-gram based metrics assume that “good” translations tend to share the same leXical choices as the reference translations.
Abstract	As MT systems improve, the shortcomings of the n-gram based evaluation metrics are becoming more apparent.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Joint Annotation of Search Queries

Bendersky, Michael and Croft, W. Bruce and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	SEG-I method requires an access to a large web n-gram corpus (Brants and Franz, 2006).
Experiments	where SQ is the set of all possible query segmenta-tions, 8 is a possible segmentation, s is a segment in S, and count(s) is the frequency of s in the web n-gram corpus.
Experiments	(2009), and include, among others, n-gram frequencies in a sample of a query log, web corpus and Wikipedia titles.
Independent Query Annotations	(2010), an estimate of p(Cz-\|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006).

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Automatically Evaluating Text Coherence Using Discourse Relations

Lin, Ziheng and Ng, Hwee Tou and Kan, Min-Yen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	In our approach, n-gram sub-sequences of transitions per term in the discourse role matrix then constitute the more fine-grained evidence used in our model to distinguish coherence from incoherence.
Using Discourse Relations	tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text.
Using Discourse Relations	In our pilot work where we implemented such a basic model with n-gram features for relation transitions, the performance was very poor.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

16. A Class of Submodular Functions for Document Summarization

Lin, Hui and Bilmes, Jeff

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Submodularity in Summarization	By definition (Lin, 2004), ROUGE-N is the n-gram recall between a candidate summary and a set of reference summaries.
Submodularity in Summarization	Precisely, let S be the candidate summary (a set of sentences extracted from the ground set V), Ce : 2V —> Z+ be the number of times n-gram 6 occurs in summary 8, and R,- be the set of n-grams contained in the reference summary i (suppose we have K reference summaries, i.e., i = 1, - - - ,K).
Submodularity in Summarization	where T6,, is the number of times n-gram 6 occurs in reference summary 2'.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

17. Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

Rush, Alexander M. and Collins, Michael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background: Hypergraphs	The second step is to integrate an n-gram language model with this hypergraph.
Introduction	The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model.1 The hypergraph represents a large set of possible translations, and is created by applying a synchronous grammar to the source language string.
Introduction	The language model is then uwdwmmmemMMmhmmMnmmwmmmmr Decoding with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models	The N-gram approximation of the joint probability can be defined in terms of multigrams qi as:
Models	N-gram models of order > 1 did not work well because these models tended to learn noise (information from non-transliteration pairs) in the training data.
Previous Research	(2010) submitted another system based on a standard n-gram kernel which ranked first for the English/Hindi and English/Tamil tasks.6 For the English/Arabic task, the transliteration mining system of Noeman and Madkour (2010) was best.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. Underspecifying and Predicting Voice for Surface Realisation Ranking

Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Standard n-gram features are also used as features.4 The feature model is built as follows: for every lemma in the f-structure, we extract a set of morphological properties (definiteness, person, pronominal status etc.
Experimental Setup	use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST: modification of BLEU giving more weight to less frequent n-grams.
Related Work	The first widely known data-driven approach to surface realisation, or tactical generation, (Langk—ilde and Knight, 1998) used language-model n-gram statistics on a word lattice of candidate realisations to guide a ranker.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: