Index of papers in Proc. ACL 2013 that mention
  • n-gram
Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou
Abstract
We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long histo—ry-contexts.
Introduction
The commonly used n-gram model (Bahl et a1.
Introduction
Although n-gram models are simple and effective, modeling long history-contexts lead to severe data scarcity problems.
Language Modeling with TD and TO
The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model:
Language Modeling with TD and TO
Replacing the unigram model with a higher order n-gram model is important to compensate the damage incurred by the conditional independence assumption made earlier.
Motivation of the Proposed Approach
In the n-gram model, for example, these two attributes are jointly taken into account in the ordered word-sequence.
Motivation of the Proposed Approach
Consequently, the n-gram model can only be effectively implemented within a short history-context (e. g. of size of three or four).
Motivation of the Proposed Approach
However, intermediate distances beyond the n-gram model limits can be very useful and should not be discarded.
Related Work
2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 .
n-gram is mentioned in 20 sentences in this paper.
Topics mentioned in this paper:
Persing, Isaac and Ng, Vincent
Error Classification
While we employ seven types of features (see Sections 4.2 and 4.3), only the word n-gram features are subject to feature selection.2 Specifically, we employ
Error Classification
the top n,- n-gram features as selected according to information gain computed over the training data (see Yang and Pedersen (1997) for details).
Error Classification
Aggregated word n-gram features.
Evaluation
Our Baseline system, which only uses word n-gram and random indexing features, seems to perform uniformly poorly across both micro and macro F-scores (F and F; see row 1).
Score Prediction
6Before tuning the feature selection parameter, we have to sort the list of n-gram features occurring the training set.
n-gram is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
Roark, Brian and Allauzen, Cyril and Riley, Michael
Abstract
We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney (1995) smoothing.
Abstract
We present experimental results for heavily pruned backoff n-gram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods.
Introduction
Smoothed n-gram language models are the defacto standard statistical models of language for a wide range of natural language applications, including speech recognition and machine translation.
Introduction
Such models are trained on large text corpora, by counting the frequency of n-gram collocations, then normalizing and smoothing (regularizing) the resulting multinomial distributions.
Introduction
Briefly, the smoothing method reesti-mates lower-order n-gram parameters in order to avoid overestimating the likelihood of n-grams that already have ample probability mass allocated as part of higher-order n-grams.
Preliminaries
N-gram language models are typically presented mathematically in terms of words 212, the strings (histories) h that precede them, and the suffixes of the histories (backoffs) h’ that are used in the smoothing recursion.
Preliminaries
where is the count of the n-gram sequence 10H, .
Preliminaries
N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.
n-gram is mentioned in 42 sentences in this paper.
Topics mentioned in this paper:
Kozareva, Zornitsa
Task A: Polarity Classification
4.4 N-gram Evaluation and Results
Task A: Polarity Classification
N-gram features are widely used in a variety of classification tasks, therefore we also use them in our polarity classification task.
Task A: Polarity Classification
Figure 2 shows a study of the influence of the different information sources and their combination with n-gram features for English.
Task B: Valence Prediction
The Farsi and Russian regression models are based only on n-gram features, while the English and Spanish regression models have both n-gram and LIWC features.
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Kauchak, David
Introduction
Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data?
Introduction
n-gram size: 1 2 3 4 5 simple in normal 0.96 0.80 0.68 0.61 0.55 normal in simple 0.87 0.68 0.58 0.51 0.46
Why Does Unsimplified Data Help?
trained using a smoothed version of the maximum likelihood estimate for an n-gram .
Why Does Unsimplified Data Help?
count(bc) where count(-) is the number of times the n-gram occurs in the training corpus.
Why Does Unsimplified Data Help?
For interpolated and backoff n-gram models, these counts are smoothed based on the probabilities of lower order n-gram models, which are inturn calculated based on counts from the corpus.
n-gram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.
Experimental Evaluation
N-gram suggests no n-referential instances
Experimental Evaluation
Effectiveness To understand the contribution of the n-gram (NG), ontology (ON), and clustering (CL) based modules, we ran each separately, as well as every possible combination.
Experimental Evaluation
Of the three individual modules, the n-gram and clustering methods achieve F-measure of around 0.9, while the ontology-based module performs only modestly above baseline.
Term Ambiguity Detection (TAD)
This module examines n-gram data from a large text collection.
Term Ambiguity Detection (TAD)
The rationale behind the n-gram module is based on the understanding that terms appearing in non-named entity contexts are likely to be non-referential, and terms that can be non-referential are ambiguous.
Term Ambiguity Detection (TAD)
Since we wish for the ambiguity detection determination to be fast, we develop our method to make this judgment solely on the n-gram probability, without the need to examine each individual usage context.
n-gram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen
Baseline MT
The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Name-aware MT Evaluation
where wn is a set of positive weights summing to one and usually uniformly set as 712,, = l/N, c is the length of the system translation and 7“ is the length of reference translation, and pn is modified n-gram precision defined as: Z Z Countelipm-gram)
Name-aware MT Evaluation
As in BLEU metric, we first count the maximum number of times an n-gram occurs in any single reference translation.
Name-aware MT Evaluation
The weight of an n-gram in reference translation is the sum of weights of all tokens it contains.
n-gram is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel
New Sense Indicators
N-gram Probability Features The goal of the Type:NgramProb feature is to capture the fact that “unusual contexts” might imply new senses.
New Sense Indicators
To capture this, we can look at the log probability of the word under consideration given its N-gram context, both according to an old-domain language model (call this 6°“) and a new-domain language
New Sense Indicators
From these four values, we compute corpus-level (and therefore type-based) statistics of the new domain n-gram log probability (Eflgw, the difference between the n-gram probabilities in each domain (623” — 6:51), the difference between the n-gram and unigram probabilities in the new domain (EQSW — 633‘”), and finally the combined difference: 623"” — [SSW + 63:: — 635’).
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Liu, Yang
Abstract
As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models.
Introduction
In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
Introduction
Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
Introduction
3. efficient integration of n-gram language model: as translation grows left-to-right in our algorithm, integrating n-gram language models is straightforward.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mukherjee, Arjun and Liu, Bing
Conclusion
A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model.
Empirical Evaluation
The reduced dataset consists of 1095586 tokens (after n-gram preprocessing in §4), 40102 posts with an average of 27 posts or interactions per pair.
Model
Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Phrase Ranking based on Relevance
While this is reasonable, a significant n-gram with high likelihood score may not necessarily be relevant to the problem domain.
Phrase Ranking based on Relevance
This is nothing wrong per se because the statistical tests only judge significance of an n-gram, but a significant n-gram may not necessarily be relevant in a given problem domain.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi
Experiments
RESPONSE The n-gram and emotion features induced from the response.
Experiments
The n-gram and emotion features induced from the response and the addressee’s utterance.
Predicting Addressee’s Emotion
We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion
The extracted n-grams activate another set of binary n-gram features.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Özbal, Gözde and Pighin, Daniele and Strapparava, Carlo
Architecture of BRAINSUP
N-gram likelihood.
Architecture of BRAINSUP
This is simply the likelihood of a sentence estimated by an n-gram language model, to enforce the generation of well-formed word sequences.
Architecture of BRAINSUP
When a solution is not complete, in the computation we include only the sequences of contiguous words (i.e., not interrupted by empty slots) having length greater than or equal to the order of the n-gram model.
Evaluation
The four combinations of features are: base: Target-word scorer + N-gram likelihood + Dependency likelihood + Variety scorer + Unusual-words scorer + Semantic cohesion; base+D: all the scorers in base + Domain relatedness; base+D+C: all the scorers in base+D + Chromatic connotation; base+D+E: all the scorers in base+D + Emotional connotation; base+D+P: all the scorers in base+D + Phonetic features.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Ravi, Sujith
Decipherment Model for Machine Translation
For P(e), we use a word n-gram language model (LM) trained on monolingual target text.
Decipherment Model for Machine Translation
Generate a target (e.g., English) string 6 = 61.43;, with probability P (6) according to an n-gram language model.
Feature-based representation for Source and Target
For instance, context features for word w may include other words (or phrases) that appear in the immediate context ( n-gram window) surrounding w in the monolingual corpus.
Feature-based representation for Source and Target
The feature construction process is described in more detail below: Target Language: We represent each word (or phrase) ei with the following contextual features along with their counts: (a) ficontem: every (word n-gram , position) pair immediately preceding e,-in the monolingual corpus (n=l , position=— l), (b) similar features f+conte$t to model the context following ei, and (c) we also throw in generic context features fscontewt without position information—every word that co-occurs with e, in the same sen-
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bergsma, Shane and Van Durme, Benjamin
Learning Class Attributes
We extract prevalent common nouns for males and females by selecting only those nouns that (a) occur more than 200 times in the dataset, (b) mostly occur with male or female pronouns, and (c) occur as lowercase more often than uppercase in a web-scale N-gram corpus (Lin et al., 2010).
Learning Class Attributes
We obtain the best of both worlds by matching our precise pattern against a version of the Google N-gram Corpus that includes the part-of-speech tag distributions for every N-gram (Lin et al., 2010).
Twitter Gender Prediction
We include n-gram features with the original capitalization pattern and separate features with the n- grams lower-cased.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Duan, Nan
MBR-based Answering Re-ranking
0 answer-level n-gram correlation feature:
MBR-based Answering Re-ranking
where w denotes an n-gram in A, #w(“4k3) denotes the number of times that w occurs in
MBR-based Answering Re-ranking
o passage-level n-gram correlation feature:
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona
Discriminative Reranking for OCR
Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR
Semantic coherence feature (“SemCoh”) is motivated by the fact that semantic information can be very useful in modeling the fluency of phrases, and can augment the information provided by n-gram LMs.
Introduction
The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model (LM) to emphasize the fluency of the output.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: