Index of papers in Proc. ACL that mention

n-gram

Seen in text as:

n-gram (779)
N-gram (86)
N-GRAM (9)
n-Gram (7)
N-Gram (5)

Seen in 809 sentences in 97 papers.

1. The Contribution of Linguistic Features to Automatic Machine Translation Evaluation

Amigó, Enrique and Giménez, Jesús and Gonzalo, Julio and Verdejo, Felisa

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	However, n-gram based metrics are still today the dominant approach.
Alternatives to Correlation-based Meta-evaluation	Therefore, we have focused on other aspects of metric reliability that have revealed differences between n-gram and linguistic based metrics:
Alternatives to Correlation-based Meta-evaluation	All n-gram based metrics achieve SIP and SIR values between 0.8 and 0.9.
Alternatives to Correlation-based Meta-evaluation	This result suggests that n-gram based metrics are reasonably reliable for this purpose.
Correlation with Human Judgements	Let us first analyze the correlation with human judgements for linguistic vs. n-gram based metrics.
Correlation with Human Judgements	Linguistic metrics are represented by grey plots, and black plots represent metrics based on n-gram overlap.
Correlation with Human Judgements	Therefore, we need additional meta-evaluation criteria in order to clarify the behavior of linguistic metrics as compared to n-gram based metrics.
Introduction	However, the most commonly used metrics are still based on n-gram matching.
Introduction	For that purpose, we compare — using four different test beds — the performance of 16 n-gram based metrics, 48 linguistic metrics and one combined metric from the state of the art.
Previous Work on Machine Translation Meta-Evaluation	All that metrics were based on n-gram overlap.

n-gram is mentioned in 23 sentences in this paper.

Topics mentioned in this paper:

2. A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

Tan, Ming and Zhou, Wenli and Zheng, Lei and Wang, Shaojun

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Composite language model	The n-gram language model is essentially a word predictor that given its entire document history it predicts next word wk+1 based on the last n-l words with probability p(wk+1\|w,’:_n+2) where w’g_n+2 = wk—n+27'°' 710k:-
Composite language model	The SLM (Chelba and J elinek, 1998; Chelba and Jelinek, 2000) uses syntactic information beyond the regular n-gram models to capture sentence level long range dependencies.
Composite language model	When combining n-gram , m order SLM and
Experimental results	The m-SLM performs competitively with its counterpart n-gram (n=m+1) on large scale corpus.
Introduction	The Markov chain ( n-gram ) source models, which predict each word on the basis of previous n-l words, have been the workhorses of state-of—the-art speech recognizers and machine translators that help to resolve acoustic or foreign language ambiguities by placing higher probability on more likely original underlying word strings.
Introduction	are efficient at encoding local word interactions, the n-gram model clearly ignores the rich syntactic and semantic structures that constrain natural languages.
Introduction	(2006) integrated n-gram , structured language model (SLM) (Chelba and Jelinek, 2000) and probabilistic latent semantic analysis (PLSA) (Hofmann, 2001) under the directed MRF framework (Wang et al., 2005) and studied the stochastic properties for the composite language model.
Training algorithm	Similar to SLM (Chelba and Jelinek, 2000), we adopt an N -best list approximate EM re-estimation with modular modifications to seamlessly incorporate the effect of n-gram and PLSA components.
Training algorithm	This implies that when computing the language model probability of a sentence in a client, all servers need to be contacted for each n-gram request.
Training algorithm	(2007) follows a standard MapReduce paradigm (Dean and Ghemawat, 2004): the corpus is first divided and loaded into a number of clients, and n-gram counts are collected at each client, then the n-gram counts mapped and stored in a number of servers, resulting in exactly one server being contacted per n-gram when computing the language model probability of a sentence.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

language model (36)
BLEU (15)
n-gram (14)

3. Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Zhou, Guangyou and Zhao, Jun and Liu, Kang and Cai, Li

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Another is a web-scale N-gram corpus, which is a N-gram corpus with N-grams of length 1-5 (Brants and Franz, 2006), we call it Google V1 in this paper.
Web-Derived Selectional Preference Features	one is web, N-gram counts are approximated by Google hits.
Web-Derived Selectional Preference Features	This N-gram corpus records how often each unique sequence of words occurs.
Web-Derived Selectional Preference Features	times or more (1 in 25 billion) are kept, and appear in the n-gram tables.

n-gram is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

4. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any “wor ” indications.
Experiments	NPY(n) means n-gram NPYLM.
Inference	character n-gram model explicit as 9 we can set
Inference	where p(cl---ck,k\|@) is an n-gram probability given by (6), and p(l<:\|@) is a probability that a word of length k will be generated from 9.
Introduction	In this paper, we extend this work to propose a more efficient and accurate unsupervised word segmentation that will optimize the performance of the word n-gram Pitman-Yor (i.e.
Introduction	Furthermore, it can be viewed as a method for building a high-performance n-gram language model directly from character strings of arbitrary language.
Introduction	By embedding a character n-gram in word n-gram from a Bayesian perspective, Section 3 introduces a novel language model for word segmentation, which we call the Nested Pitman-Yor language model.
Nested Pitman-Yor Language Model	In contrast, in this paper we use a simple but more elaborate model, that is, a character n-gram language model that also employs HPYLM.
Nested Pitman-Yor Language Model	To avoid dependency on n-gram order n, we actually used the oo-gram language model (Mochihashi and Sumita, 2007), a variable order HPYLM, for characters.
Pitman-Yor process and n-gram models	In this representation, each n-gram context h (including the null context 6 for unigrams) is a Chinese restaurant whose customers are the n-gram counts seated over the tables 1 - - -thw.
Pitman-Yor process and n-gram models	As a result, the n-gram probability of this hierarchical Pitman—Yor language model (HPYLM) is recursively computed as

n-gram is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

5. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our particular variational distributions are parameterized as n-gram models.
Abstract	We also analytically show that interpolating these n-gram models for different n is similar to minimum-risk decoding for BLEU (Tromble et al., 2008).
Introduction	In practice, we approximate with several different variational families Q, corresponding to n-gram (Markov) models of different orders.
Variational Approximate Decoding	Since each q(y) is a distribution over output strings, a natural choice for Q is the family of n-gram models.
Variational Approximate Decoding	where W is a set of n-gram types.
Variational Approximate Decoding	Each 212 E W is an n-gram , which occurs cw(y) times in the string 3/, and 212 may be divided into an (n — l)-gram prefix (the history) and a l-gram suffix r(w) (the rightmost or current word).

n-gram is mentioned in 32 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

6. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Using an iterative decoding approach, n-gram agreement statistics between translations of multiple decoders are employed to re-rank both full and partial hypothesis explored in decoding.
Collaborative Decoding	To compute the consensus measures, we further decompose each 61 (e, 6') into n-gram matching statistics between 6 and 6'.
Collaborative Decoding	For each n-gram of order n, we introduce a pair of complementary consensus measure functions Gn+(e, e') and 671— (e, 6') described as follows:
Collaborative Decoding	Gn+(e,e') is the n-gram agreement measure function which counts the number of occurrences in e'of n-grams in 6.
Experiments	All baseline decoders are extended with n-gram consensus —based co-decoding features to construct member decoders.
Experiments	Table 4 shows the comparison results of a two-system co-decoding using different settings of n-gram agreement and disagreement features.
Experiments	It is clearly shown that both n-gram agreement and disagreement types of features are helpful, and using them together is the best choice.

n-gram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

7. Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices

Kumar, Shankar and Macherey, Wolfgang and Dyer, Chris and Och, Franz

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Lattice MBR decoding uses a linear approximation to the BLEU score (Pap-ineni et al., 2001); the weights in this linear loss are set heuristically by assuming that n-gram pre-cisions decay exponentially with n. However, this may not be optimal in practice.
Minimum Bayes-Risk Decoding	They approximated log(BLEU) score by a linear function of n-gram matches and candidate length.
Minimum Bayes-Risk Decoding	where w is an n-gram present in either E or E’, and 60, 61, ..., 6 N are weights which are determined empirically, where N is the maximum n-gram order.
Minimum Bayes-Risk Decoding	Next, the posterior probability of each n-gram is computed.

n-gram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

n-gram (21)
BLEU (20)
phrase-based (11)

8. Incremental Syntactic Language Models for Phrase-based Translation

Schwartz, Lane and Callison-Burch, Chris and Schuler, William and Wu, Stephen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models.
Introduction	Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output.
Related Work	(2007) use supertag n-gram LMs.
Related Work	An n-gram language model history is also maintained at each node in the translation lattice.
Related Work	The search space is further trimmed with hypothesis recombination, which collapses lattice nodes that share a common coverage vector and n-gram state.

n-gram is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

9. Generating Image Descriptions Using Dependency Relational Patterns

Aker, Ahmet and Gaizauskas, Robert

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries.
Discussion and Conclusion	Our evaluations show that such an approach yields summaries which score more highly than an approach which uses a simpler representation of an object type model in the form of a n-gram language model.
Introduction	a corpus of descriptions of churches, a corpus of bridge descriptions, and so on) and reported results showing that incorporating such n-gram language models as a feature in a feature-based extractive summarizer improves the quality of automatically generated summaries.
Introduction	The main weakness of n-gram language models is that they only capture very local information aboutshofitennsequencesandcannotnnxkfllong distance dependencies between terms.
Introduction	If this information is expressed as in the first line of Table l, n-gram language models are likely to
Representing conceptual models 2.1 Object type corpora	We derive n-gram language and dependency pattern models using object type corpora made available to us by Aker and Gaizauskas.
Representing conceptual models 2.1 Object type corpora	2.2 N-gram language models
Representing conceptual models 2.1 Object type corpora	they calculate the probability that a sentence is generated based on a n-gram language model.
Summarizer	o LMSim3 : The similarity of a sentence S to an n-gram language model LM (the probability that the sentence S is generated by LM).

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

language models (16)
n-gram (11)

10. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding with the NNJ M	Because our NNJM is fundamentally an n-gram NNLM with additional source context, it can easily be integrated into any SMT decoder.
Decoding with the NNJ M	When performing hierarchical decoding with an n-gram LM, the leftmost and rightmost n — 1 words from each constituent must be stored in the state space.
Decoding with the NNJ M	We also train a separate lower-order n-gram model, which is necessary to compute estimate scores during hierarchical decoding.
Introduction	Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010).
Introduction	Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window.
Model Variations	Specifically, this means that we don’t use dependency-based rule extraction, and our decoder only contains the following MT features: (1) rule probabilities, (2) n-gram Kneser-Ney LM, (3) lexical smoothing, (4) target word count, (5) con-cat rule penalty.
Model Variations	This does not include the cost of n-gram creation or cached lookups, which amount to ~0.03 seconds per source word in our current implementation.14 However, the n-grams created for the NNJ M can be shared with the Kneser-Ney LM, which reduces the cost of that feature.
Model Variations	14In our decoder, roughly 95% of NNJM n-gram lookups within the same sentence are duplicates.
Neural Network Joint Model (NNJ M)	Formally, our model approximates the probability of target hypothesis T conditioned on source sentence S. We follow the standard n-gram LM decomposition of the target, where each target word ti is conditioned on the previous n — 1 target words.
Neural Network Joint Model (NNJ M)	If our neural network has only one hidden layer and is self-normalized, the only remaining computation is 512 calls to tanho and a single 513-dimensional dot product for the final output score.6 Thus, only ~3500 arithmetic operations are required per n-gram lookup, compared to ~2.8M for self-normalized NNJ M without pre-computation, and ~35M for the standard NNJM.7
Neural Network Joint Model (NNJ M)	“lookups/sec” is the number of unique n-gram probabilities that can be computed per second.

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

11. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	One is n-gram model over different units, such as word-level bigram/trigram models (Bangalore and Rambow, 2000; Langkilde, 2000), or factored language models integrated with syntactic tags (White et al.
Introduction	(2008) develop a general-purpose realizer couched in the framework of Lexical Functional Grammar based on simple n-gram models.
Introduction	(2009) present a dependency-spanning tree algorithm for word ordering, which first builds dependency trees to decide linear precedence between heads and modifiers then uses an n-gram language model to order siblings.
Log-linear Models	We linearize the dependency relations by computing n-gram models, similar to traditional word-based language models, except using the names of dependency relations instead of words.
Log-linear Models	The dependency relation model calculates the probability of dependency relation n-gram P(DR) according to Eq.(3).
Log-linear Models	= HP(DRk I Dle—jn k=1 Word Model: We integrate an n-gram word model into the log-linear model for capturing the relation between adjacent words.

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

12. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	In this section, we consider BLEU in particular, for which the relevant features gb(e) are n-gram counts up to length n = 4.
Computing Feature Expectations	The nodes are states in the decoding process that include the span (2', j) of the sentence to be translated, the grammar symbol 3 over that span, and the left and right context words of the translation relevant for computing n-gram language model scores.3 Each hyper-edge h represents the application of a synchronous rule 7" that combines nodes corresponding to non-terminals in
Computing Feature Expectations	Each 71- gram that appears in a translation 6 is associated with some h in its derivation: the h corresponding to the rule that produces the n-gram .
Consensus Decoding Algorithms	In this expression, BLEU(e; 6’) references 6’ only via its n-gram count features C(e’ , t).2 2The length penalty (1 — is also a function of n-
Introduction	The contributions of this paper include a linear-time algorithm for MBR using linear similarities, a linear-time alternative to MBR using nonlinear similarity measures, and a forest-based extension to this procedure for similarities based on n-gram counts.

n-gram is mentioned in 26 sentences in this paper.

Topics mentioned in this paper:

13. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context.
Abstract	We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.
Introduction	At the same time, because n-gram language models only condition on a local window of linear word-level context, they are poor models of long-range syntactic dependencies.
Introduction	Although several lines of work have proposed generative syntactic language models that improve on n-gram models for moderate amounts of data (Chelba, 1997; Xu et al., 2002; Charniak, 2001; Hall, 2004; Roark,
Introduction	Our model can be trained simply by collecting counts and using the same smoothing techniques normally applied to n-gram models (Kneser and Ney, 1995), enabling us to apply techniques developed for scaling 71- gram models out of the box (Brants et al., 2007; Pauls and Klein, 2011).
Treelet Language Modeling	The common denominator of most n-gram language models is that they assign probabilities roughly according to empirical frequencies for observed 77.-grams, but fall back to distributions conditioned on smaller contexts for unobserved n-grams, as shown in Figure 1(a).
Treelet Language Modeling	As in the n-gram case, we would like to pick h to be large enough to capture relevant dependencies, but small enough that we can obtain meaningful estimates from data.
Treelet Language Modeling	Although it is tempting to think that we can replace the left-to-right generation of n-gram models with the purely top-down generation of typical PCFGs, in practice, words are often highly predictive of the words that follow them — indeed, n-gram models would be terrible language models if this were not the case.

n-gram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

14. Language Identification of Search Engine Queries

Ceylan, Hakan and Kim, Yookyung

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Language Identification	We implement a statistical model using a character based n-gram feature.
Language Identification	For each language, we collect the n-gram counts (for n = l to n = 7 also using the word beginning and ending spaces) from the vocabulary of the training corpus, and then generate a probability distribution from these counts.
Language Identification	In other words, this time we used a word-based n-gram method, only with n = 1.
Related Work	Most of the work carried out to date on the written language identification problem consists of supervised approaches that are trained on a list of words or n-gram models for each reference language.
Related Work	The n-gram based approaches are based on the counts of character or byte n-grams, which are sequences of n characters or bytes, extracted from a corpus for each reference language.
Related Work	classification models that use the n-gram features have been proposed.

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

15. Efficient Multi-Pass Decoding for Synchronous Context Free Grammars

Zhang, Hao and Gildea, Daniel

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model.
Decoding to Maximize BLEU	BLEU is based on n-gram precision, and since each synchronous constituent in the tree adds a new 4-gram to the translation at the point where its children are concatenated, the additional pass approximately maximizes BLEU.
Introduction	This complexity arises from the interaction of the tree-based translation model with an n-gram language model.
Language Model Integrated Decoding for SCFG	We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
Language Model Integrated Decoding for SCFG	Without an n-gram language model, decoding using SCFG is not much different from CFG parsing.
Language Model Integrated Decoding for SCFG	However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
Multi-pass LM-Integrated Decoding	We take the same view as in speech recognition that a trigram integrated model is a finer-grained model than bigram model and in general we can do an n — l-gram decoding as a predicative pass for the following n-gram pass.
Multi-pass LM-Integrated Decoding	We can make the approximation even closer by incorporating local higher-order outside n-gram information for a state of X[z',j, u1,,,,n_1, v1,,,,n_1] into account.
Multi-pass LM-Integrated Decoding	where 6 is the Viterbi inside cost and 04 is the Viterbi outside cost, to globally prioritize the n-gram integrated states on the agenda for exploration.

n-gram is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
bigram (15)
language model (12)

16. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney (1995) smoothing.
Abstract	We present experimental results for heavily pruned backoff n-gram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods.
Introduction	Smoothed n-gram language models are the defacto standard statistical models of language for a wide range of natural language applications, including speech recognition and machine translation.
Introduction	Such models are trained on large text corpora, by counting the frequency of n-gram collocations, then normalizing and smoothing (regularizing) the resulting multinomial distributions.
Introduction	Briefly, the smoothing method reesti-mates lower-order n-gram parameters in order to avoid overestimating the likelihood of n-grams that already have ample probability mass allocated as part of higher-order n-grams.
Preliminaries	N-gram language models are typically presented mathematically in terms of words 212, the strings (histories) h that precede them, and the suffixes of the histories (backoffs) h’ that are used in the smoothing recursion.
Preliminaries	where is the count of the n-gram sequence 10H, .
Preliminaries	N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.

n-gram is mentioned in 42 sentences in this paper.

Topics mentioned in this paper:

17. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The computation of t/I(e,H(v)) is based on a linear combination of a set of n-gram consensuses-based features.
Background	For each order of n-gram, h; (e,H(v)) and h”— (e,H(v)) are defined to measure the n-gram agreement and disagreement between e and other translation candidates in H(v), respectively.
Background	If p orders of n-gram are used in computing t//(e,H(v)) , the total number of features in the system combination will be T +2>< p (T model-score-based features defined in Equation 8 and 2x p consensus-based features defined in Equation 9).

n-gram is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

18. Randomized Language Models via Perfect Hash Functions

Talbot, David and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The scheme can represent any standard n-gram model and is easily combined with existing model reduction techniques such as entropy-pruning.
Introduction	LMs are usually implemented as n-gram models parameterized for each distinct sequence of up to n words observed in the training corpus.
Introduction	Recent work (Talbot and Osborne, 2007b) has demonstrated that randomized encodings can be used to represent n-gram counts for LMs with signficant space-savings, circumventing information-theoretic constraints on lossless data structures by allowing errors with some small probability.
Introduction	Lookup is very efficient: the values of 3 cells in a large array are combined with the fingerprint of an n-gram .
Perfect Hash-based Language Models	The model can erroneously return a value for an n-gram that was never actually stored, but will always return the correct value for an n-gram that is in the model.
Perfect Hash-based Language Models	Each n-gram sci is drawn from some set of possible n-grams LI and its associated value from a corresponding set of possible values V.
Perfect Hash-based Language Models	We do not store the n-grams and their probabilities directly but rather encode a fingerprint of each n-gram f together with its associated value in such a way that the value can be retrieved when the model is queried with the n-gram sci.
Scaling Language Models	These are typically n-gram models that approximate the probability of a word sequence by assuming each token to be independent of all but n — l preceding tokens.
Scaling Language Models	Although n-grams observed in natural language corpora are not randomly distributed within this universe no lossless data structure that we are aware of can circumvent this space-dependency on both the n-gram order and the vocabulary size.
Scaling Language Models	While the approach results in significant space savings, working with corpus statistics, rather than n-gram probabilities directly, is computationally less efficient (particularly in a distributed setting) and introduces a dependency on the smoothing scheme used.

n-gram is mentioned in 36 sentences in this paper.

Topics mentioned in this paper:

19. Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long histo—ry-contexts.
Introduction	The commonly used n-gram model (Bahl et a1.
Introduction	Although n-gram models are simple and effective, modeling long history-contexts lead to severe data scarcity problems.
Language Modeling with TD and TO	The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model:
Language Modeling with TD and TO	Replacing the unigram model with a higher order n-gram model is important to compensate the damage incurred by the conditional independence assumption made earlier.
Motivation of the Proposed Approach	In the n-gram model, for example, these two attributes are jointly taken into account in the ordered word-sequence.
Motivation of the Proposed Approach	Consequently, the n-gram model can only be effectively implemented within a short history-context (e. g. of size of three or four).
Motivation of the Proposed Approach	However, intermediate distances beyond the n-gram model limits can be very useful and should not be discarded.
Related Work	2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 .

n-gram is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

20. Modeling Thesis Clarity in Student Essays

Persing, Isaac and Ng, Vincent

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Classification	While we employ seven types of features (see Sections 4.2 and 4.3), only the word n-gram features are subject to feature selection.2 Specifically, we employ
Error Classification	the top n,- n-gram features as selected according to information gain computed over the training data (see Yang and Pedersen (1997) for details).
Error Classification	Aggregated word n-gram features.
Evaluation	Our Baseline system, which only uses word n-gram and random indexing features, seems to perform uniformly poorly across both micro and macro F-scores (F and F; see row 1).
Score Prediction	6Before tuning the feature selection parameter, we have to sort the list of n-gram features occurring the training set.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

F-score (27)
n-gram (14)
n-grams (11)

21. MAXSIM: A Maximum Similarity Metric for Machine Translation Evaluation

Chan, Yee Seng and Ng, Hwee Tou

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automatic Evaluation Metrics	number of n-gram matches of the system translation against one or more reference translations.
Automatic Evaluation Metrics	Generally, more n-gram matches result in a higher BLEU score.
Automatic Evaluation Metrics	When determining the matches to calculate precision, BLEU uses a modified, or clipped n-gram precision.
Metric Design Considerations	4.1 Using N-gram Information
Metric Design Considerations	Lemma and POS match Representing each n-gram by its sequence of lemma and POS-tag pairs, we first try to perform an exact match in both lemma and POS-tag.
Metric Design Considerations	In all our n-gram matching, each n-gram in the system translation can only match at most one n-gram in the reference translation.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

unigram (24)
BLEU (20)
bigrams (16)

22. Faster and Smaller N-Gram Language Models

Pauls, Adam and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our most compact representation can store all 4 billion n-grams and associated counts for the Google n-gram corpus in 23 bits per n-gram , the most compact lossless representation to date, and even more compact than recent lossy compression techniques.
Introduction	and Raj, 2001; Hsu and Glass, 2008), our methods are conceptually based on tabular trie encodings wherein each n-gram key is stored as the concatenation of one word (here, the last) and an offset encoding the remaining words (here, the context).
Language Model Implementations	For an n-gram language model, we can apply this implementation with a slight modification: we need n sorted arrays, one for each n-gram order.
Language Model Implementations	Because our keys are sorted according to their context-encoded representation, we cannot straightforwardly answer queries about an n-gram w without first determining its context encoding.
Preliminaries	Our goal in this paper is to provide data structures that map n-gram keys to values, i.e.
Preliminaries	However, because of the sheer number of keys and values needed for n-gram language modeling, generic implementations do not work efficiently “out of the box.” In this section, we will review existing techniques for encoding the keys and values of an n-gram language model, taking care to account for every bit of memory required by each implementation.
Preliminaries	In the WeblT corpus, the most frequent n-gram occurs about 95 billion times.

n-gram is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

23. Extracting Social Power Relationships from Natural Language

Bramsen, Philip and Escobar-Molano, Martha and Patel, Ami and Alonso, Rafael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Each n-gram is a sequence of words, POS tags or a combination of words and POS tags
Abstract	To illustrate, consider the following feature set, a bigram and a trigram (each term in the n-gram either has the form word or Atag):
Abstract	The first n-gram of the set, please AVB, would match pleaseARB bringAVB from the text.

n-gram is mentioned in 21 sentences in this paper.

Topics mentioned in this paper:

24. Ordering Prenominal Modifiers with a Reranking Approach

Liu, Jenny and Haghighi, Aria

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We compare our error rates to the state-of-the-art and to a strong Google n-gram count baseline.
Abstract	We attain a maximum error reduction of 69.8% and average error reduction across all test sets of 59.1% compared to the state-of-the-art and a maximum error reduction of 68.4% and average error reduction across all test sets of 41.8% compared to our Google n-gram count baseline.
Experiments	We keep a table mapping each unique n-gram to the number of times it has been seen in the training data.
Experiments	4.2 Google n-gram Baseline
Experiments	The Google n-gram corpus is a collection of n-gram counts drawn from public webpages with a total of one trillion tokens — around 1 billion each of unique 3-grams, 4—grams, and 5-grams, and around 300,000 unique bigrams.
Results	MAXENT also outperforms the GOOGLE N-GRAM baseline for almost all test corpora and sequence lengths.
Results	For the Switchboard test corpus token and type accuracies, the GOOGLE N-GRAM baseline is more accurate than MAXENT for sequences of length 2 and overall, but the accuracy of MAXENT is competitive with that of GOOGLE N-GRAM .
Results	MAXENT also attains a maximum error reduction of 68.4% for the WSJ test corpus and an average error reduction of 41.8% when compared to GOOGLE N-GRAM .

n-gram is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

MAXENT (16)
n-gram (15)
reranking (7)

25. Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Kozareva, Zornitsa

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Task A: Polarity Classification	4.4 N-gram Evaluation and Results
Task A: Polarity Classification	N-gram features are widely used in a variety of classification tasks, therefore we also use them in our polarity classification task.
Task A: Polarity Classification	Figure 2 shows a study of the influence of the different information sources and their combination with n-gram features for English.
Task B: Valence Prediction	The Farsi and Russian regression models are based only on n-gram features, while the English and Spanish regression models have both n-gram and LIWC features.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

26. Bayesian Inference for Zodiac and Other Homophonic Ciphers

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Unlike previous approaches, our method combines information from letter n-gram language models and word dictionaries and provides a robust decipherment model.
Decipherment	Combining letter n-gram language models with word dictionaries: Many existing probabilistic approaches use statistical letter n-gram language models of English to assign P (p) probabilities to plaintext hypotheses during decipherment.
Decipherment	3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
Decipherment	We train the letter n-gram LM on 50 million words of English text available from the Linguistic Data Consortium.
Experiments and Results	EM Method using letter n-gram LMs following the approach of Knight et al.
Experiments and Results	0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.
Introduction	(2006) use the Expectation Maximization (EM) algorithm (Dempster et al., 1977) to search for the best probabilistic key using letter n-gram models.
Introduction	Ravi and Knight (2008) formulate decipherment as an integer programming problem and provide an exact method to solve simple substitution ciphers by using letter n-gram models along with deterministic key constraints.
Introduction	0 Our new method combines information from word dictionaries along with letter n-gram models, providing a robust decipherment model which offsets the disadvantages faced by previous approaches.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

27. Web-Scale Features for Full-Scale Parsing

Bansal, Mohit and Klein, Dan

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Analysis	The mid-words m which rank highly are those where the occurrence of hma as an n-gram is a good indicator that a attaches to h (m of course does not have to actually occur in the sentence).
Analysis	However, we additionally find other cues, most notably that if the N IN sequence occurs following a capitalized determiner, it tends to indicate a nominal attachment (in the n-gram , the preposition cannot attach leftward to anything else because of the beginning of the sentence).
Analysis	These features essentially say that if two heads ml and 2122 occur in the direct coordination n-gram ml and wg, then they are good heads to coordinate (coordination unfortunately looks the same as complementation or modification to a basic dependency model).
Introduction	(2010), which use Web-scale n-gram counts for multi-way noun bracketing decisions, though that work considers only sequences of nouns and uses only affinity-based web features.
Working with Web n-Grams	Lapata and Keller (2004) uses the number of page hits as the web-count of the queried n-gram (which is problematic according to Kilgarriff (2007)).
Working with Web n-Grams	Rather than working through a search API (or scraper), we use an offline web corpus — the Google n-gram corpus (Brants and Franz, 2006) — which contains English n-grams (n = l to 5) and their observed frequency counts, generated from nearly 1 trillion word tokens and 95 billion sentences.
Working with Web n-Grams	Our system requires the counts from a large collection of these n-gram queries (around 4.5 million).

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

28. Fast Online Lexicon Learning for Grounded Language Acquisition

Chen, David

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	To learn the meaning of an n-gram 21), Chen and Mooney first collect all navigation plans 9 that co-occur with w. This forms the initial candidate meaning set for 21).
Conclusion	In contrast to the previous approach that computed common subgraphs between different contexts in which an n-gram appeared, we instead focus on small, connected subgraphs and introduce an algorithm, SGOLL, that is an order of magnitude faster.
Online Lexicon Learning Algorithm	Even though they use beam-search to limit the size of the candidate set, if the initial candidate meaning set for a n-gram is large, it can take a long time to take just one pass through the list of all candidates.
Online Lexicon Learning Algorithm	: function Update(training example (67;, 1%)) for n-gram w that appears in 67; do
Online Lexicon Learning Algorithm	14: Increase the count of examples, each n-gram w and each subgraph g 15: end function

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

29. Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

Liu, Chang and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion and Future Work	The TESLA-M metric allows each n-gram to have a weight, which is primarily used to discount function words.
Experiments	Based on these synonyms, TESLA-CELAB is able to award less trivial n-gram matches, such as T fiflifififl.
Experiments	The covered n-gram matching rule is then able to award tricky n-grams such as TE, Ti, /1\ [E], 1/13 [IE5 and i9}.
Motivation	We formulate the n-gram matching process as a real-valued linear programming problem, which can be solved efficiently.
The Algorithm	The basic n-gram matching problem is shown in Figure 2.
The Algorithm	We observe that since has been matched, all its sub-n-grams should be considered matched as well, including and We call this the covered n-gram matching rule.
The Algorithm	However, we cannot simply perform covered n-gram matching as a post processing step.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word-level (15)
n-grams (12)

30. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models.
Conclusions	Using word repetitions, we effectively use a broad document context outside of the typical 2-5 N-gram window.
Introduction	ASR systems traditionally use N-gram language models to incorporate prior knowledge of word occurrence patterns into prediction of the next word in the token stream.
Introduction	N-gram models cannot, however, capture complex linguistic or topical phenomena that occur outside the typical 3-5 word scope of the model.
Introduction	Confidence scores from an ASR system (which incorporate N-gram probabilities) are optimized in order to produce the most likely sequence of words rather than the accuracy of individual word detections.
Motivation	We seek a workable definition of broad document context beyond N-gram models that will improve term detection performance on an arbitrary set of queries.
Motivation	A number of efforts have been made to augment traditional N-gram models with latent topic information (Khudanpur and Wu, 1999; Florian and Yarowsky, 1999; Liu and Liu, 2008; Hsu and Glass, 2006; Naptali et al., 2012) including some of the early work on Probabilistic Latent Semantic Analysis by Hofmann (2001).
Term and Document Frequency Statistics	In applying the burstiness quantity to term detection, we recall that the task requires us to locate a particular instance of a term, not estimate a count, hence the utility of N-gram language models predicting words in sequence.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

31. Distributional Identification of Non-Referential Pronouns

Bergsma, Shane and Lin, Dekang and Goebel, Randy

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	We smooth by adding 40 to all counts, equal to the minimum count in the n-gram data.
Introduction	For example, in our n-gram collection (Section 3.4), “make it in advance” and “make them in advance” occur roughly the same number of times (442 vs. 449), indicating a referential pattern.
Methodology	We gather pattern fillers from a large collection of n-gram frequencies.
Methodology	We do the same processing to our n-gram corpus.
Methodology	Also, other pronouns in the pattern are allowed to match a corresponding pronoun in an n-gram , regardless of differences in inflection and class.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

F-Score (13)
coreference (11)
n-gram (9)

32. Tagging The Web: Building A Robust Web Tagger with Neural Network

Ma, Ji and Zhang, Yue and Zhu, Jingbo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For each data set, we investigate an extensive set of combinations of hyper-parameters: the n-gram window (l,r) in {(1, 1), (2,1), (1,2), (2,2)}; the hidden layer size in {200, 300, 400}; the learning rate in {0.1, 0.01, 0.001}.
Experiments	5.3.2 Word and N-gram Representation
Experiments	By contrast, using n-gram representations improves the performance on both oov and non-oov.
Learning from Web Text	The basic idea is to share word representations across different positions in the input n-gram while using position-dependent weights to distinguish between different word orders.
Learning from Web Text	Let V0) represents the j-th visible variable of the WRRBM, which is a vector of length Then V0) 2 wk; means that the j-th word in the n-gram is wk.
Neural Network for POS Disambiguation	The input for the this module is the word n-gram (7,0,4, .
Neural Network for POS Disambiguation	representations of the input n-gram .
Related Work	While those approaches mainly explore token-level representations (word or character embeddings), using WRRBM is able to utilize both word and n-gram representations.

n-gram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

33. Discriminative Pronunciation Modeling: A Large-Margin, Feature-Rich Approach

Tang, Hao and Keshet, Joseph and Livescu, Karen

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Feature functions	We use 1‘91.” to denote the n-gram substring p1 .
Feature functions	The two substrings E and b are said to be equal if they have the same length and a7; 2 ()7; for 1 S i S n. For a given sub-word unit n-gram U E 73’", we use the shorthand U 6 T9 to mean that we can find U in 1‘9; i.e., there eXists an indeX i such that my”, 2 H. We use \|f9\| to denote the length of the sequence 1‘9.
Feature functions	Similarly to (Zweig et al., 2010), we adapt TF and IDF by treating a sequence of sub-word units as a “document” and n-gram sub-sequences as “words.” In this analogy, we use sub-sequences in surface pronunciations to “search” for baseforms in the dictionary.

n-gram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

34. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data?
Introduction	n-gram size: 1 2 3 4 5 simple in normal 0.96 0.80 0.68 0.61 0.55 normal in simple 0.87 0.68 0.58 0.51 0.46
Why Does Unsimplified Data Help?	trained using a smoothed version of the maximum likelihood estimate for an n-gram .
Why Does Unsimplified Data Help?	count(bc) where count(-) is the number of times the n-gram occurs in the training corpus.
Why Does Unsimplified Data Help?	For interpolated and backoff n-gram models, these counts are smoothed based on the probabilities of lower order n-gram models, which are inturn calculated based on counts from the corpus.

n-gram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

35. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The NBoW performs similarly to the non-neural n-gram based classifiers.
Experiments	Besides the RecNN that uses an external parser to produce structural features for the model, the other models use n-gram based or neural features that do not require external resources or additional annotations.
Experiments	We see a significant increase in the performance of the DCNN with respect to the non-neural n-gram based classifiers; in the presence of large amounts of training data these classifiers constitute particularly strong baselines.
Introduction	Convolving the same filter with the n-gram at every position in the sentence allows the features to be extracted independently of their position in the sentence.
Properties of the Sentence Model	4.1 Word and n-Gram Order
Properties of the Sentence Model	For most applications and in order to learn fine-grained feature detectors, it is beneficial for a model to be able to discriminate whether a specific n-gram occurs in the input.
Properties of the Sentence Model	2.3, the Max-TDNN is sensitive to word order, but max pooling only picks out a single n-gram feature in each row of the sentence matrix.

n-gram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

36. Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Chen, Wenliang and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Dependency language model	The standard N-gram based language model predicts the next word based on the N — 1 immediate previous words.
Dependency language model	However, the traditional N-gram language model can not capture long-distance word relations.
Dependency language model	The N-gram DLM predicts the next child of a head based on the N — 1 immediate previous children and the head itself.
Experiments	Then, we studied the effect of adding different N-gram DLMs to MSTl.
Introduction	The N-gram DLM has the ability to predict the next child based on the N-l immediate previous children and their head (Shen et al., 2008).
Introduction	The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

37. Deciphering Foreign Language

Ravi, Sujith and Knight, Kevin

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Machine Translation as a Decipherment Task	For P (e), we use a word n-gram LM trained on monolingual English data.
Machine Translation as a Decipherment Task	Whole-segment Language Models: When using word n-gram models of English for decipherment, we find that some of the foreign sentences are decoded into sequences (such as “THANK YOU TALKING ABOUT ‘2”) that are not good English.
Machine Translation as a Decipherment Task	This stems from the fact that n-gram LMs have no global information about what constitutes a valid English segment.
Word Substitution Decipherment	We model P(e) using a statistical word n-gram English language model (LM).
Word Substitution Decipherment	Our method holds several other advantages over the EM approach—(l) inference using smart sampling strategies permits efficient training, allowing us to scale to large data/vocabulary sizes, (2) incremental scoring of derivations during sampling allows efficient inference even when we use higher-order n-gram LMs, (3) there are no memory bottlenecks since the full channel model and derivation lattice are never instantiated during training, and (4) prior specification allows us to learn skewed distributions that are useful here—word substitution ciphers exhibit l-to-l correspondence between plaintext and cipher types.
Word Substitution Decipherment	build an English word n-gram LM, which is used in the decipherment process.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

38. Automatic Term Ambiguity Detection

Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	N-gram suggests no n-referential instances
Experimental Evaluation	Effectiveness To understand the contribution of the n-gram (NG), ontology (ON), and clustering (CL) based modules, we ran each separately, as well as every possible combination.
Experimental Evaluation	Of the three individual modules, the n-gram and clustering methods achieve F-measure of around 0.9, while the ontology-based module performs only modestly above baseline.
Term Ambiguity Detection (TAD)	This module examines n-gram data from a large text collection.
Term Ambiguity Detection (TAD)	The rationale behind the n-gram module is based on the understanding that terms appearing in non-named entity contexts are likely to be non-referential, and terms that can be non-referential are ambiguous.
Term Ambiguity Detection (TAD)	Since we wish for the ambiguity detection determination to be fast, we develop our method to make this judgment solely on the n-gram probability, without the need to examine each individual usage context.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

39. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Smoothing on count distributions	On integral counts, this is simple: we generate, for each n-gram type vu’w, an (n — l)—gram token u’w, for a total of n1+(ou’w) tokens.
Smoothing on count distributions	Analogously, on count distributions, for each n-gram type vu’w, we generate an (n — l)-gram token u’w with probability p(c(vu’w) > 0).
Smoothing on count distributions	Using the dynamic program in Section 3.2, computing the distributions for each r is linear in the number of n-gram types, and we only need to compute the distributions up to r = 2 (or r = 4 for modified KN), and store them for r = 0 (or up to r = 2 for modified KN).
Smoothing on integral counts	Let uw stand for an n-gram , where u stands for the (n — l) context words and w, the predicted word.
Smoothing on integral counts	where n1+(uo) = \|{w \| C(uw) > OH is the number of word types observed after context u, and qu(w) specifies how to distribute the subtracted discounts among unseen n-gram types.
Smoothing on integral counts	The probability of an n-gram token uw using the other tokens as training data is

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

40. Approximation Strategies for Multi-Structure Sentence Compression

Thadani, Kapil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Sentence compression has been shown to benefit from joint inference involving both n-gram and dependency-factored objectives but this typically requires expensive integer programming.
Experiments	0 ILP-Dep: A version of the joint ILP of Thadani and McKeown (2013) without n-gram variables and corresponding features.
Experiments	Starting with the n-gram approaches, the performance of 3-LM leads us to observe that the gains of supervised learning far outweigh the utility of higher-order n- gram factorization, which is also responsible for a significant increase in wall-clock time.
Experiments	We were surprised by the strong performance of the dependency-based inference techniques, which yielded results that approached the joint model in both n-gram and parse-based measures.
Introduction	Following an assumption often used in compression systems, the compressed output in this corpus is constructed by dropping tokens from the input sentence without any paraphrasing or reordering.1 A number of diverse approaches have been proposed for deletion-based sentence compression, including techniques that assemble the output text under an n-gram factorization over the input text (McDonald, 2006; Clarke and Lapata, 2008) or an arc factorization over input dependency parses (Filippova and Strube, 2008; Galanis and Androutsopoulos, 2010; Filippova and Altun, 2013).
Introduction	In this work, we develop approximate inference strategies to the joint approach of Thadani and McKeown (2013) which trade the optimality guarantees of exact ILP for faster inference by separately solving the n-gram and dependency subproblems and using Lagrange multipliers to enforce consistency between their solutions.
Multi-Structure Sentence Compression	Let a(y) E {0, 1}” denote the incidence vector of tokens contained in the n-gram sequence y and ,6(z) E {0, 1}” denote the incidence vector of words contained in the dependency tree 2.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

ILP (19)
bigram (13)
spanning tree (9)

41. Distributed Word Clustering for Large Scale Class-Based Language Modeling in Machine Translation

Uszkoreit, Jakob and Brants, Thorsten

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Class-Based Language Modeling	Generalizing this leads to arbitrary order class-based n-gram models of the form:
Introduction	In the case of n-gram language models this is done by factoring the probability:
Introduction	do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1).
Introduction	However, in the area of statistical machine translation, especially in the context of large training corpora, fewer experiments with class-based n-gram models have been performed with mixed success (Raab, 2006).

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

42. Correcting Misuse of Verb Forms

Lee, John and Seneff, Stephanie

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To improve recall, irregularities in parse trees caused by verb form errors are taken into account; to improve precision, n-gram counts are utilized to filter proposed corrections.
Experiments	For those categories with a high rate of false positives (all except BASEmd, BASEdO and FINITE), we utilized n-grams as filters, allowing a correction only when its n-gram count in the WEB 1T 5-GRAM
Experiments	Some kind of confidence measure on the n-gram counts might be appropriate for reducing such false alarms.
Introduction	To improve recall, irregularities in parse trees caused by verb form errors are considered; to improve precision, n-gram counts are utilized to filter proposed corrections.
Research Issues	We propose using n-gram counts as a filter to counter this kind of overgeneralization.
Research Issues	A second goal is to show that n-gram counts can eflectively serve as a filter; in order to increase precision.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

43. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline MT	The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Name-aware MT Evaluation	where wn is a set of positive weights summing to one and usually uniformly set as 712,, = l/N, c is the length of the system translation and 7“ is the length of reference translation, and pn is modified n-gram precision defined as: Z Z Countelipm-gram)
Name-aware MT Evaluation	As in BLEU metric, we first count the maximum number of times an n-gram occurs in any single reference translation.
Name-aware MT Evaluation	The weight of an n-gram in reference translation is the sum of weights of all tokens it contains.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

44. Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

Chambers, Nathanael and Jurafsky, Daniel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

How Frequent is Unseen Data?	The third line across the bottom of the figure is the number of unseen pairs using Google n-gram data as proxy argument counts.
How Frequent is Unseen Data?	Creating argument counts from n-gram counts is described in detail below in section 5.2.
Models	Using the Google n-gram corpus, we recorded all verb-noun co-occurrences, defined by appearing in any order in the same n-gram , up to and including 5-grams.
Models	For instance, the test pair (throwsubject, ball) is considered seen if there exists an n-gram such that throw and ball are both included.
Models	C (vd, n) as the number of times 2) and n (ignoring d) appear in the same n-gram .
Results	The Google n-gram backoff model is almost as good as backing off to the Erk smoothing model.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

45. Hypertagging: Supertagging for Surface Realization with CCG

Espinosa, Dominic and White, Michael and Mehay, Dennis

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class.
Background	In the anytime mode, a best-first search is performed with a con-figurable time limit: the scores assigned by the n-gram model determine the order of the edges on the agenda, and thus have an impact on realization speed.
Background	the one that covers the most elementary predications in the input logical form, with ties broken according to the n-gram score.
The Approach	Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

logical form (13)
POS tags (13)
CCG (11)

46. Omni-word Feature and Soft Constraint for Chinese Relation Extraction

Chen, Yanping and Zheng, Qinghua and Zhang, Wei

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Feature Construction	Despite the Omni-word can be seen as a subset of n-Gram feature.
Feature Construction	It is not the same as the n-Gram feature.
Feature Construction	N-Gram features are more fragmented.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

47. Local Histograms of Character N-grams for Authorship Attribution

Escalante, Hugo Jair and Solorio, Thamar and Montes-y-Gomez, Manuel

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	For our character n-gram experiments, we obtained LOWBOW representations for character 3-grams (only n-grams of size n = 3 were used) considering the 2, 500 most common n-grams.
Experiments and Results	Also, n-gram information is more dense in documents than word-level information.
Related Work	(2003) propose the use of language models at the n-gram character-level for AA, whereas Keselj et al.
Related Work	on characters at the n-gram level (Plakias and Stamatatos, 2008a).
Related Work	Acceptable performance in AA has been reported with character n-gram representations.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

n-grams (26)
SVM (10)
word-level (7)

48. A Joint Model for Discovery of Aspects in Utterances

Celikyilmaz, Asli and Hakkani-Tur, Dilek

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data and Approach Overview	(Ilia) Entity List Prior (Illa) Web N-Gram Context Prior
Data and Approach Overview	n-gram ——> web quer logs domain specific entity act parameters prior \
Experiments	* Base—MCM: Our first version injects an informative prior for domain, dialog act and slot topic distributions using information extracted from only labeled training utterances and inject as prior constraints (corpus n-gram base measure during topic assignments.
MultiLayer Context Model - MCM	* Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions.
MultiLayer Context Model - MCM	* Corpus n-Gram Base Measure (2%): Similar to other measures, MCM also encodes n-gram constraints as word-frequency features extracted from labeled utterances.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

49. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

New Sense Indicators	N-gram Probability Features The goal of the Type:NgramProb feature is to capture the fact that “unusual contexts” might imply new senses.
New Sense Indicators	To capture this, we can look at the log probability of the word under consideration given its N-gram context, both according to an old-domain language model (call this 6°“) and a new-domain language
New Sense Indicators	From these four values, we compute corpus-level (and therefore type-based) statistics of the new domain n-gram log probability (Eflgw, the difference between the n-gram probabilities in each domain (623” — 6:51), the difference between the n-gram and unigram probabilities in the new domain (EQSW — 633‘”), and finally the combined difference: 623"” — [SSW + 63:: — 635’).

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

50. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Future Work	This powerful method, which was proposed in (Kam and Kopec, 1996; Popat et al., 2001) in the context of a finite-state model (but not of TSP), can be easily extended to N-gram situations, and typically converges in a small number of iterations.
Introduction	Typical nonlocal features include one or more n-gram language models as well as a distortion feature, measuring by how much the order of biphrases in the candidate translation deviates from their order in the source sentence.
Phrase-based Decoding as TSP	4.1 From Bigram to N-gram LM
Phrase-based Decoding as TSP	If we want to extend the power of the model to general n-gram language models, and in particular to the 3-gram
Phrase-based Decoding as TSP	The problem becomes even worse if we extend the compiling-out method to n-gram language models with n > 3.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

51. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Distributed representations	For each training update, we read an n-gram x = (W1, .
Distributed representations	We also create a corrupted or noise n-gram J?
Distributed representations	0 We corrupt the last word of each n-gram .

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

52. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models.
Introduction	In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
Introduction	Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
Introduction	3. efficient integration of n-gram language model: as translation grows left-to-right in our algorithm, integrating n-gram language models is straightforward.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

53. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

Ott, Myle and Choi, Yejin and Cardie, Claire and Hancock, Jeffrey T.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Approaches to Deceptive Opinion Spam Detection	In contrast to the other strategies just discussed, our text categorization approach to deception detection allows us to model both content and context with n-gram features.
Automated Approaches to Deceptive Opinion Spam Detection	Specifically, we consider the following three n-gram feature sets, with the corresponding features lowercased and unstemmed: UNIGRAMS, BIGRAMS+, TRIGRAMS+, where the superscript + indicates that the feature set subsumes the preceding feature set.
Automated Approaches to Deceptive Opinion Spam Detection	We consider all three n-gram feature sets, namely UNIGRAMS, BIGRAMS+, and TRIGRAMS+, with corresponding language models smoothed using the interpolated Kneser-Ney method (Chen and Goodman, 1996).
Introduction	Notably, a combined classifier with both n-gram and psychological deception features achieves nearly 90% cross-validated accuracy on this task.
Results and Discussion	Surprisingly, models trained only on UNIGRAMS—the simplest n-gram feature set—outperform all non—text-categorization approaches, and models trained on BIGRAMSJr perform even better (one-tailed sign test p = 0.07).

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

54. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model.
Empirical Evaluation	The reduced dataset consists of 1095586 tokens (after n-gram preprocessing in §4), 40102 posts with an average of 27 posts or interactions per pair.
Model	Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Phrase Ranking based on Relevance	While this is reasonable, a significant n-gram with high likelihood score may not necessarily be relevant to the problem domain.
Phrase Ranking based on Relevance	This is nothing wrong per se because the statistical tests only judge significance of an n-gram, but a significant n-gram may not necessarily be relevant in a given problem domain.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

55. Collecting Highly Parallel Data for Paraphrase Evaluation

Chen, David and Dolan, William

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates.
Paraphrase Evaluation Metrics	Thus, no n-gram overlaps are required to determine the semantic adequacy of the paraphrase candidates.
Paraphrase Evaluation Metrics	In essence, it is the inverse of BLEU since we want to minimize the number of n-gram overlaps between the two sentences.
Paraphrase Evaluation Metrics	where N is the maximum n-gram considered and n-

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

56. A Syntax-Free Approach to Japanese Sentence Compression

Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Syntax Free Sequence-oriented Sentence Compression Method	Many studies on sentence compression employ the n-gram language model to evaluate the linguistic likelihood of a compressed sentence.
A Syntax Free Sequence-oriented Sentence Compression Method	N-gram distribution of short sentences may different from that of long sentences.
A Syntax Free Sequence-oriented Sentence Compression Method	Therefore, the n-gram probability sometimes disagrees with our intuition in terms of sentence compression.
Experimental Evaluation	We developed the n-gram language model from a 9 year set of Mainichi Newspaper articles.
Results and Discussion	This result shows that the n-gram language model is improper for sentence compression because the n-gram probability is computed by using a corpus that includes both short and long sentences.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

57. Measure Word Generation for English-Chinese SMT Systems

Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
Introduction	In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
Our Method	For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
Our Method	Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

58. Predicting and Eliciting Addressee's Emotion in Online Dialogue

Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	RESPONSE The n-gram and emotion features induced from the response.
Experiments	The n-gram and emotion features induced from the response and the addressee’s utterance.
Predicting Addressee’s Emotion	We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion	The extracted n-grams activate another set of binary n-gram features.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

59. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decipherment Model for Machine Translation	For P(e), we use a word n-gram language model (LM) trained on monolingual target text.
Decipherment Model for Machine Translation	Generate a target (e.g., English) string 6 = 61.43;, with probability P (6) according to an n-gram language model.
Feature-based representation for Source and Target	For instance, context features for word w may include other words (or phrases) that appear in the immediate context ( n-gram window) surrounding w in the monolingual corpus.
Feature-based representation for Source and Target	The feature construction process is described in more detail below: Target Language: We represent each word (or phrase) ei with the following contextual features along with their counts: (a) ficontem: every (word n-gram , position) pair immediately preceding e,-in the monolingual corpus (n=l , position=— l), (b) similar features f+conte$t to model the context following ei, and (c) we also throw in generic context features fscontewt without position information—every word that co-occurs with e, in the same sen-

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

60. Modeling Sentences in the Latent Space

Guo, Weiwei and Diab, Mona

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The performance of WTMF on CDR is compared with (a) an Information Retrieval model (IR) that is based on surface word matching, (b) an n-gram model ( N-gram ) that captures phrase overlaps by returning the number of overlapping ngrams as the similarity score of two sentences, (c) LSA that uses svds() function in Matlab, and (d) LDA that uses Gibbs Sampling for inference (Griffiths and Steyvers, 2004).
Experiments and Results	The similarity of two sentences is computed by cosine similarity (except N-gram ).
Experiments and Results	We mainly compare the performance of IR, N-gram , LSA, LDA, and WTMF models.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

61. BRAINSUP: Brainstorming Support for Creative Sentence Generation

Özbal, Gözde and Pighin, Daniele and Strapparava, Carlo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Architecture of BRAINSUP	N-gram likelihood.
Architecture of BRAINSUP	This is simply the likelihood of a sentence estimated by an n-gram language model, to enforce the generation of well-formed word sequences.
Architecture of BRAINSUP	When a solution is not complete, in the computation we include only the sequences of contiguous words (i.e., not interrupted by empty slots) having length greater than or equal to the order of the n-gram model.
Evaluation	The four combinations of features are: base: Target-word scorer + N-gram likelihood + Dependency likelihood + Variety scorer + Unusual-words scorer + Semantic cohesion; base+D: all the scorers in base + Domain relatedness; base+D+C: all the scorers in base+D + Chromatic connotation; base+D+E: all the scorers in base+D + Emotional connotation; base+D+P: all the scorers in base+D + Phonetic features.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

62. Robust Machine Translation Evaluation with Entailment Features

Pado, Sebastian and Galley, Michel and Jurafsky, Dan and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	BLEUR includes the following 18 sentence-level scores: BLEU-n and n-gram precision scores (1 g n g 4); BLEU brevity penalty (BP); BLEU score divided by BP.
Experimental Evaluation	To counteract BLEU’s brittleness at the sentence level, we also smooth BLEU-n and n-gram precision as in Lin and Och (2004).
Experimental Evaluation	NIST-n scores (1 g n g 10) and information-weighted n-gram precision scores (1 g n g 4); NIST brevity penalty (BP); and NIST score divided by BP.
Introduction	BLEU and NIST measure MT quality by using the strong correlation between human judgments and the degree of n-gram overlap between a system hypothesis translation and one or more reference translations.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

63. Predicting Grammaticality on an Ordinal Scale

Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores).
Discussion and Conclusions	While Post found that such a system can effectively distinguish grammatical news text sentences from sentences generated by a language model, measuring the grammaticality of real sentences from language leam-ers seems to require a wider variety of features, including n-gram counts, language model scores, etc.
Experiments	n-gram frequencies from Gigaword and whether the link parser can fully parse the sentence.
System Description	3.2.2 n-gram Count and Language Model Features

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

64. MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

Lo, Chi-kiu and Wu, Dekai

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent.
Abstract	N-gram based metrics assume that “good” translations tend to share the same leXical choices as the reference translations.
Abstract	As MT systems improve, the shortcomings of the n-gram based evaluation metrics are becoming more apparent.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

65. Joint Annotation of Search Queries

Bendersky, Michael and Croft, W. Bruce and Smith, David A.

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	SEG-I method requires an access to a large web n-gram corpus (Brants and Franz, 2006).
Experiments	where SQ is the set of all possible query segmenta-tions, 8 is a possible segmentation, s is a segment in S, and count(s) is the frequency of s in the web n-gram corpus.
Experiments	(2009), and include, among others, n-gram frequencies in a sample of a query log, web corpus and Wikipedia titles.
Independent Query Annotations	(2010), an estimate of p(Cz-\|7“) is a smoothed estimator that combines the information from the retrieved sentence 7“ with the information about unigrams (for capitalization and POS tagging) and bigrams (for segmentation) from a large n-gram corpus (Brants and Franz, 2006).

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

66. Generating Code-switched Text for Lexical Learning

Labutov, Igor and Lipson, Hod

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	trailing word in the n-gram ).
Model	Intuitively, an update to the parameter of from 2,24 occurs after the learner observes word w,- in a context (this may be an n-gram , an entire sentence or paragraph containing 10,-, but we will restrict our attention to fixed-length n-grams).
Model	For this task, we focus only on the following context features for predicting the “predictability” of words: n-gram probability, vector-space similarity score, coreferring mentions.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

67. Deciphering Foreign Language by Combining Language Models and Context Vectors

Nuhn, Malte and Mauser, Arne and Ney, Hermann

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model.
Experimental Evaluation	being approximately 15 to 20 times faster than their n-gram based approach.
Experimental Evaluation	To summarize: Our method is significantly faster than n-gram LM based approaches and obtains better results than any previously published method.
Translation Model	Stochastically generate the target sentence according to an n-gram language model.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

LM (27)
BLEU (16)
translation model (15)

68. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoder Integration	One exception is the n-gram language model which requires the preceding n — 1 words as well.
Decoder Integration	To solve this problem, we follow previous work on lattice rescoring with recurrent networks that maintained the usual n-gram context but kept a beam of hidden layer configurations at each state (Auli et al., 2013).
Decoder Integration	This approximation has been effective for lattice rescoring, since the translations represented by each state are in fact very similar: They share both the same source words as well as the same n-gram context which is likely to result in similar recurrent histories that can be safely pruned.
Introduction	Decoding with feed-forward architectures is straightforward, since predictions are based on a fixed size input, similar to n-gram language models.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

69. PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

Chen, Boxing and Kuhn, Roland and Larkin, Samuel

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

BLEU and PORT	First, define n-gram precision p(n) and recall r(n):
BLEU and PORT	where Pg N) is the geometr1c average of n-gram prec1s1ons
BLEU and PORT	The average precision and average recall used in PORT (unlike those used in BLEU) are the arithmetic average of n-gram precisions Pa(N) and recalls Ra(N):
Experiments	As usual, French-English is the outlier: the two outputs here are typically so similar that BLEU and Qmean tuning yield very similar n-gram statistics.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

70. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Smoothing in NLP usually refers to the problem of smoothing n-gram models.
Related Work	Sophisticated smoothing techniques like modified Kneser-Ney and Katz smoothing (Chen and Goodman, 1996) smooth together the predictions of unigram, bi-gram, trigram, and potentially higher n-gram sequences to obtain accurate probability estimates in the face of data sparsity.
Related Work	While n-gram models have traditionally dominated in language modeling, two recent efforts de-

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

71. Learning Translation Consensus with Structured Label Propagation

Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features and Training	Here simple n-gram similarity is used for the sake of efficiency.
Features and Training	Like GC , there are four features with respect to the value of n in n-gram similarity measure.
Graph Construction	Solid lines are edges connecting nodes with sufficient source side n-gram similarity, such as the one between "E A M N" and "E A B C".
Introduction	Collaborative decoding (Li et al., 2009) scores the translation of a source span by its n-gram similarity to the translations by other systems.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

72. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Methods	In order to annotate lattice edges with an n-gram LM, every path coming into a node must end with the same sequence of (n — l) tokens.
Methods	Programmatic composition of a lattice with an n-gram LM acceptor is a well understood problem.
Methods	With each node corresponding to a single LM context, annotation of outgoing edges with n-gram LM scores is straightforward.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

73. Structured Learning for Taxonomy Induction with Belief Propagation

Bansal, Mohit and Burkett, David and de Melo, Gerard and Klein, Dan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Feature sources: The n-gram semantic features are extracted from the Google n-grams corpus (Brants and Franz, 2006), a large collection of English n-grams (for n = 1 to 5) and their frequencies computed from almost 1 trillion tokens (95 billion sentences) of Web text.
Features	3.2 Semantic Features 3.2.1 Web n-gram Features Patterns and counts: Hypernymy for a term pair
Features	Hence, we fire Web n-gram pattern features and Wikipedia presence, distance, and pattern features, similar to those described above, on each potential sibling term pair.7 The main difference here from the edge factors is that the sibling factors are symmetric (in the sense that Sig-k, is redundant to Sim) and hence the patterns are undirected.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

74. Linguistically debatable or just plain wrong?

Plank, Barbara and Hovy, Dirk and Sogaard, Anders

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hard cases and annotation errors	the longest sequence of words (n-gram) in a corpus that has been observed with a token being tagged differently in another occurence of the same n-gram in the same corpus.
Hard cases and annotation errors	For each variation n-gram that we found in WSJ-OO, i.e, a word in various contexts and the possible tags associated with it, we present annotators with the cross product of contexts and tags.
Hard cases and annotation errors	The figure shows, for instance, that the variation n-gram regarding ADP-ADV is the second most frequent one (dark gray), and approximately 70% of ADP-ADV disagreements are linguistically hard cases (light gray).

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

75. A Hybrid Approach to Skeleton-based Translation

Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Skeleton-based Approach to MT 2.1 Skeleton Identification	For language modeling, lm is the standard n-gram language model adopted in the baseline system.
A Skeleton-based Approach to MT 2.1 Skeleton Identification	In such a way of string representation, the skeletal language model can be implemented as a standard n-gram language model, that is, a string probability is calculated by a product of a sequence of n-gram probabilities (involving normal words and X).
A Skeleton-based Approach to MT 2.1 Skeleton Identification	The skeletal language model is then trained on these generalized strings in a standard way of n-gram language modeling.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

76. An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models	The N-gram approximation of the joint probability can be defined in terms of multigrams qi as:
Models	N-gram models of order > 1 did not work well because these models tended to learn noise (information from non-transliteration pairs) in the training data.
Previous Research	(2010) submitted another system based on a standard n-gram kernel which ranked first for the English/Hindi and English/Tamil tasks.6 For the English/Arabic task, the transliteration mining system of Noeman and Madkour (2010) was best.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

77. Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discriminative Reranking for OCR	Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR	Semantic coherence feature (“SemCoh”) is motivated by the fact that semantic information can be very useful in modeling the fluency of phrases, and can augment the information provided by n-gram LMs.
Introduction	The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model (LM) to emphasize the fluency of the output.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

reranking (14)
LM (9)
error rate (4)

78. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Further examination of the differences between the two systems yielded that most of the improvements are due to better bigrams and trigrams, as indicated by the breakdown of the BLEU score precision per n-gram , and primarily leverages higher quality generated candidates from the baseline system.
Evaluation	Furthermore, despite completely unaligned, non-comparable monolingual text on the Urdu and English sides, and a very large language model, we can still achieve gains in excess of 1.2 BLEU points (“SLP”) in a difficult evaluation scenario, which shows that the technique adds a genuine translation improvement over and above na‘1've memorization of n-gram sequences.
Generation & Propagation	A nai've way to achieve this goal would be to extract all n-grams, from n = l to a maximum n-gram order, from the monolingual data, but this strategy would lead to a combinatorial explosion in the number of target phrases.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

79. A Class of Submodular Functions for Document Summarization

Lin, Hui and Bilmes, Jeff

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Submodularity in Summarization	By definition (Lin, 2004), ROUGE-N is the n-gram recall between a candidate summary and a set of reference summaries.
Submodularity in Summarization	Precisely, let S be the candidate summary (a set of sentences extracted from the ground set V), Ce : 2V —> Z+ be the number of times n-gram 6 occurs in summary 8, and R,- be the set of n-grams contained in the reference summary i (suppose we have K reference summaries, i.e., i = 1, - - - ,K).
Submodularity in Summarization	where T6,, is the number of times n-gram 6 occurs in reference summary 2'.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

80. Enriching Morphologically Poor Languages for Statistical Machine Translation

Avramidis, Eleftherios and Koehn, Philipp

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The NIST metric clearly shows a significant improvement, because it mostly measures difficult n-gram matches (e. g. due to the long-distance rules we have been dealing with).
Experiments	In n-gram based metrics, the scores for all words are equally weighted, so mistakes on crucial sentence constituents may be penalized the same as errors on redundant or meaningless words (Callison-Burch et al., 2006).
Introduction	Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam-

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

81. Automatic Syllabification with Structured SVMs for Letter-to-Phoneme Conversion

Bartlett, Susan and Kondrak, Grzegorz and Cherry, Colin

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Chen (2003) uses an n-gram model and Viterbi decoder as a syllabifier, and then applies it as a preprocessing step in his maximum-entropy-based English L2P system.
Syllabification with Structured SVMs	In addition to these primary n-gram features, we experimented with linguistically-derived features.
Syllabification with Structured SVMs	We believe that this is caused by the ability of the SVM to learn such generalizations from the n-gram features alone.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

82. Phrase Table Training for Precision and Recall: What Makes a Good Phrase and a Good Phrase Pair?

Deng, Yonggang and Xu, Jia and Gao, Yuqing

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	Trying to find phrase translations for any possible n-gram is not a good idea for two reasons.
Features	We will define a confidence metric to estimate how reliably the model can align an n-gram in one side to a phrase on the other side given a parallel sentence.
Features	Now we turn to monolingual resources to evaluate the quality of an n-gram being a good phrase.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

83. A Joint Model of Text and Aspect Ratings for Sentiment Summarization

Titov, Ivan and McDonald, Ryan

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Model	In this model the distribution of the overall sentiment rating you is based on all the n-gram features of a review text.
The Model	Then the distribution of ya, for every rated aspect a, can be computed from the distribution of you and from any n-gram feature where at least one word in the n-gram is assigned to the associated aspect topic (7“ 2 Zoe, 2 = a).
The Model	b; is the bias term which regulates the prior distribution P(ya = y), f iterates through all the n-grams, J1me and Jif are common weights and aspect-specific weights for n-gram feature f. pinz is equal to a fraction of words in n-gram feature f assigned to the aspect topic (7“ = [00, z = a).

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

84. Paraphrase Identification as Probabilistic Quasi-Synchronous Recognition

Das, Dipanjan and Smith, Noah A.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Although paraphrase identification is defined in semantic terms, it is usually solved using statistical classifiers based on shallow lexical, n-gram , and syntactic “overlap” features.
Introduction	We use a product of experts (Hinton, 2002) to bring together a logistic regression classifier built from n-gram overlap features and our syntactic model.
Product of Experts	The features are of the form precisionn (number of n-gram matches divided by the number of n-grams in 51), recalln (number of n-gram matches divided by the number of n-grams in 52) and E, (harmonic mean of the previous two features), where l g n g 3.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

85. Hybrid Approach to User Intention Modeling for Dialog Simulation

Jung, Sangkeun and Lee, Cheongjae and Kim, Kyungduk and Lee, Gary Geunbae

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Overall architecture	The unseen rate of n-gram varies according to the simulated user.
Overall architecture	Notice that simulated user C, E and H generates higher unseen n-gram patterns over all word error settings.
Related work	N-gram based approaches (Eckert et al., 1997, Levin et al., 2000) and other approaches (Scheffler and Young, 2001, Pietquin and Dutoit, 2006, Schatzmann et al., 2007) are introduced.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

logistic regression (3)
n-gram (3)

86. Revisiting Pivot Language Approach for Machine Translation

Wu, Hua and Wang, Haifeng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Selection	We use smoothed sentence-level BLEU score to replace the human assessments, where we use additive smoothing to avoid zero BLEU scores when we calculate the n-gram precisions.
Translation Selection	1-4 n-gram precisions against pseudo references (l g n g 4)
Translation Selection	15-19 n-gram precision against a target corpus (l g n g 5)

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

87. Tackling Sparse Data Issue in Machine Translation Evaluation

Bojar, Ondřej and Kos, Kamil and Mareċek, David

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Aside from including dependency and n-gram relations in the scoring, we also apply and evaluate SemPOS for English.
Problems of BLEU	Table 1 estimates the overall magnitude of this issue: For 1-grams to 4-grams in 1640 instances (different MT outputs and different annotators) of 200 sentences with manually flagged errors4, we count how often the n-gram is confirmed by the reference and how often it contains an error flag.
Problems of BLEU	Fortunately, there are relatively few false positives in n-gram based metrics: 6.3% of unigrams and far fewer higher n-grams.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

88. A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

Tratz, Stephen and Hovy, Eduard

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Automated Classification	Web 1T N-gram Features
Automated Classification	Table 3 describes the extracted n-gram features.
Automated Classification	The influence of the Web lT n-gram features was somewhat mixed.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

89. Minimum Bayes Risk based Answer Re-ranking for Question Answering

Duan, Nan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

MBR-based Answering Re-ranking	0 answer-level n-gram correlation feature:
MBR-based Answering Re-ranking	where w denotes an n-gram in A, #w(“4k3) denotes the number of times that w occurs in
MBR-based Answering Re-ranking	o passage-level n-gram correlation feature:

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

n-gram (3)
question answering (3)

90. Automatically Evaluating Text Coherence Using Discourse Relations

Lin, Ziheng and Ng, Hwee Tou and Kan, Min-Yen

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	In our approach, n-gram sub-sequences of transitions per term in the discourse role matrix then constitute the more fine-grained evidence used in our model to distinguish coherence from incoherence.
Using Discourse Relations	tion of the n-gram discourse relation transition sequences in gold standard coherent text, and a similar one for incoherent text.
Using Discourse Relations	In our pilot work where we implemented such a basic model with n-gram features for relation transitions, the performance was very poor.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

91. Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

Rush, Alexander M. and Collins, Michael

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background: Hypergraphs	The second step is to integrate an n-gram language model with this hypergraph.
Introduction	The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model.1 The hypergraph represents a large set of possible translations, and is created by applying a synchronous grammar to the source language string.
Introduction	The language model is then uwdwmmmemMMmhmmMnmmwmmmmr Decoding with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

92. When Specialists and Generalists Work Together: Overcoming Domain Dependence in Sentiment Tagging

Andreevskaia, Alina and Bergler, Sabine

In Proc. ACL 2008, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Due to lower frequency of higher-order n-grams (as opposed to unigrams), higher-order n-gram language models are more sparse, which increases the probability of missing a particular sentiment marker in a sentence (Table 33).
Experiments	training sets are required to overcome this higher n-gram sparseness in sentence-level annotation.
Experiments	results depends on the genre and size of the n-gram : on product reviews, all results are statistically significant at oz 2 0.025 level; on movie reviews, the difference between NaVe Bayes and SVM is statistically significant at oz 2 0.01 but the significance diminishes as the size of the n- gram increases; on news, only bigrams produce a statistically significant (a = 0.01) difference between the two machine learning methods, while on blogs the difference between SVMs and NaVe Bayes is most pronounced when unigrams are used (a = 0.025).

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

in-domain (26)
unigrams (14)
SVM (10)

93. Underspecifying and Predicting Voice for Surface Realisation Ranking

Zarriess, Sina and Cahill, Aoife and Kuhn, Jonas

In Proc. ACL 2011, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	Standard n-gram features are also used as features.4 The feature model is built as follows: for every lemma in the f-structure, we extract a set of morphological properties (definiteness, person, pronominal status etc.
Experimental Setup	use several standard measures: a) exact match: how often does the model select the original corpus sentence, b) BLEU: n-gram overlap between top-ranked and original sentence, c) NIST: modification of BLEU giving more weight to less frequent n-grams.
Related Work	The first widely known data-driven approach to surface realisation, or tactical generation, (Langk—ilde and Knight, 1998) used language-model n-gram statistics on a word lattice of candidate realisations to guide a ranker.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

94. Coreference Semantics from Web Features

Bansal, Mohit and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way.
Introduction	In order to harness the information on the Web without presupposing a deep understanding of all Web text, we instead turn to a diverse collection of Web n-gram counts (Brants and Franz, 2006) which, in aggregate, contain diffuse and indirect, but often robust, cues to reference.
Semantics via Web Features	These clusters come from distributional K -Means clustering (with K = 1000) on phrases, using the n-gram context as features.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

95. Mixing Multiple Translation Models in Statistical Machine Translation

Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion & Future Work	In addition, we can extend our approach by applying some of the techniques used in other system combination approaches such as consensus decoding, using n-gram features, tuning using forest-based MERT, among other possible extensions.
Related Work 5.1 Domain Adaptation	In other words, it requires all component models to fully decode each sentence, compute n-gram expectations from each component model and calculate posterior probabilities over translation derivations.
Related Work 5.1 Domain Adaptation	Finally, main techniques used in this work are orthogonal to our approach such as Minimum Bayes Risk decoding, using n-gram features and tuning using MERT.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

96. Computational Approaches to Sentence Completion

Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Sentence Completion via Language Modeling	3.1 Backoff N-gram Language Model
Sentence Completion via Language Modeling	3.2 Maximum Entropy Class-Based N-gram Language Model
Sentence Completion via Latent Semantic Analysis	4.3 A LSA N-gram Language Model

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

97. Using Conceptual Class Attributes to Characterize Social Media Users

Bergsma, Shane and Van Durme, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning Class Attributes	We extract prevalent common nouns for males and females by selecting only those nouns that (a) occur more than 200 times in the dataset, (b) mostly occur with male or female pronouns, and (c) occur as lowercase more often than uppercase in a web-scale N-gram corpus (Lin et al., 2010).
Learning Class Attributes	We obtain the best of both worlds by matching our precise pattern against a version of the Google N-gram Corpus that includes the part-of-speech tag distributions for every N-gram (Lin et al., 2010).
Twitter Gender Prediction	We include n-gram features with the original capitalization pattern and separate features with the n- grams lower-cased.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: