Index of papers in Proc. ACL 2013 that mention

n-gram

Seen in text as:

n-gram (145)
N-gram (15)

Seen in 142 sentences in 16 papers.

1. Modeling of term-distance and term-occurrence information for improving n-gram language model performance

Chong, Tze Yuang and E. Banchs, Rafael and Chng, Eng Siong and Li, Haizhou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We attempt to extract this information from history-contexts of up to ten words in size, and found it complements well the n-gram model, which inherently suffers from data scarcity in learning long histo—ry-contexts.
Introduction	The commonly used n-gram model (Bahl et a1.
Introduction	Although n-gram models are simple and effective, modeling long history-contexts lead to severe data scarcity problems.
Language Modeling with TD and TO	The prior, which is usually implemented as a unigram model, can be also replaced with a higher order n-gram model as, for instance, the bigram model:
Language Modeling with TD and TO	Replacing the unigram model with a higher order n-gram model is important to compensate the damage incurred by the conditional independence assumption made earlier.
Motivation of the Proposed Approach	In the n-gram model, for example, these two attributes are jointly taken into account in the ordered word-sequence.
Motivation of the Proposed Approach	Consequently, the n-gram model can only be effectively implemented within a short history-context (e. g. of size of three or four).
Motivation of the Proposed Approach	However, intermediate distances beyond the n-gram model limits can be very useful and should not be discarded.
Related Work	2007) disassembles the n-gram into (n—l) word-pairs, such that each pair is modeled by a distance-k bigram model, where 1 S k s n — 1 .

n-gram is mentioned in 20 sentences in this paper.

Topics mentioned in this paper:

2. Modeling Thesis Clarity in Student Essays

Persing, Isaac and Ng, Vincent

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Error Classification	While we employ seven types of features (see Sections 4.2 and 4.3), only the word n-gram features are subject to feature selection.2 Specifically, we employ
Error Classification	the top n,- n-gram features as selected according to information gain computed over the training data (see Yang and Pedersen (1997) for details).
Error Classification	Aggregated word n-gram features.
Evaluation	Our Baseline system, which only uses word n-gram and random indexing features, seems to perform uniformly poorly across both micro and macro F-scores (F and F; see row 1).
Score Prediction	6Before tuning the feature selection parameter, we have to sort the list of n-gram features occurring the training set.

n-gram is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

F-score (27)
n-gram (14)
n-grams (11)

3. Smoothed marginal distribution constraints for language modeling

Roark, Brian and Allauzen, Cyril and Riley, Michael

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present an algorithm for re-estimating parameters of backoff n-gram language models so as to preserve given marginal distributions, along the lines of well-known Kneser-Ney (1995) smoothing.
Abstract	We present experimental results for heavily pruned backoff n-gram models, and demonstrate perplexity and word error rate reductions when used with various baseline smoothing methods.
Introduction	Smoothed n-gram language models are the defacto standard statistical models of language for a wide range of natural language applications, including speech recognition and machine translation.
Introduction	Such models are trained on large text corpora, by counting the frequency of n-gram collocations, then normalizing and smoothing (regularizing) the resulting multinomial distributions.
Introduction	Briefly, the smoothing method reesti-mates lower-order n-gram parameters in order to avoid overestimating the likelihood of n-grams that already have ample probability mass allocated as part of higher-order n-grams.
Preliminaries	N-gram language models are typically presented mathematically in terms of words 212, the strings (histories) h that precede them, and the suffixes of the histories (backoffs) h’ that are used in the smoothing recursion.
Preliminaries	where is the count of the n-gram sequence 10H, .
Preliminaries	N-gram language models allow for a sparse representation, so that only a subset of the possible n-grams must be explicitly stored.

n-gram is mentioned in 42 sentences in this paper.

Topics mentioned in this paper:

4. Multilingual Affect Polarity and Valence Prediction in Metaphor-Rich Texts

Kozareva, Zornitsa

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Task A: Polarity Classification	4.4 N-gram Evaluation and Results
Task A: Polarity Classification	N-gram features are widely used in a variety of classification tasks, therefore we also use them in our polarity classification task.
Task A: Polarity Classification	Figure 2 shows a study of the influence of the different information sources and their combination with n-gram features for English.
Task B: Valence Prediction	The Farsi and Russian regression models are based only on n-gram features, while the English and Spanish regression models have both n-gram and LIWC features.

n-gram is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

5. Improving Text Simplification Language Modeling Using Unsimplified Text Data

Kauchak, David

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Table 1 shows the n-gram overlap proportions in a sentence aligned data set of 137K sentence pairs from aligning Simple English Wikipedia and English Wikipedia articles (Coster and Kauchak, 2011a).1 The data highlights two conflicting views: does the benefit of additional data outweigh the problem of the source of the data?
Introduction	n-gram size: 1 2 3 4 5 simple in normal 0.96 0.80 0.68 0.61 0.55 normal in simple 0.87 0.68 0.58 0.51 0.46
Why Does Unsimplified Data Help?	trained using a smoothed version of the maximum likelihood estimate for an n-gram .
Why Does Unsimplified Data Help?	count(bc) where count(-) is the number of times the n-gram occurs in the training corpus.
Why Does Unsimplified Data Help?	For interpolated and backoff n-gram models, these counts are smoothed based on the probabilities of lower order n-gram models, which are inturn calculated based on counts from the corpus.

n-gram is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

6. Automatic Term Ambiguity Detection

Baldwin, Tyler and Li, Yunyao and Alexe, Bogdan and Stanoi, Ioana R.

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Evaluation	N-gram suggests no n-referential instances
Experimental Evaluation	Effectiveness To understand the contribution of the n-gram (NG), ontology (ON), and clustering (CL) based modules, we ran each separately, as well as every possible combination.
Experimental Evaluation	Of the three individual modules, the n-gram and clustering methods achieve F-measure of around 0.9, while the ontology-based module performs only modestly above baseline.
Term Ambiguity Detection (TAD)	This module examines n-gram data from a large text collection.
Term Ambiguity Detection (TAD)	The rationale behind the n-gram module is based on the understanding that terms appearing in non-named entity contexts are likely to be non-referential, and terms that can be non-referential are ambiguous.
Term Ambiguity Detection (TAD)	Since we wish for the ambiguity detection determination to be fast, we develop our method to make this judgment solely on the n-gram probability, without the need to examine each individual usage context.

n-gram is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

7. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline MT	The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Name-aware MT Evaluation	where wn is a set of positive weights summing to one and usually uniformly set as 712,, = l/N, c is the length of the system translation and 7“ is the length of reference translation, and pn is modified n-gram precision defined as: Z Z Countelipm-gram)
Name-aware MT Evaluation	As in BLEU metric, we first count the maximum number of times an n-gram occurs in any single reference translation.
Name-aware MT Evaluation	The weight of an n-gram in reference translation is the sum of weights of all tokens it contains.

n-gram is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

8. SenseSpotting: Never let your parallel data tie you to an old domain

Carpuat, Marine and Daume III, Hal and Henry, Katharine and Irvine, Ann and Jagarlamudi, Jagadeesh and Rudinger, Rachel

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

New Sense Indicators	N-gram Probability Features The goal of the Type:NgramProb feature is to capture the fact that “unusual contexts” might imply new senses.
New Sense Indicators	To capture this, we can look at the log probability of the word under consideration given its N-gram context, both according to an old-domain language model (call this 6°“) and a new-domain language
New Sense Indicators	From these four values, we compute corpus-level (and therefore type-based) statistics of the new domain n-gram log probability (Eflgw, the difference between the n-gram probabilities in each domain (623” — 6:51), the difference between the n-gram and unigram probabilities in the new domain (EQSW — 633‘”), and finally the combined difference: 623"” — [SSW + 63:: — 635’).

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

9. A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Liu, Yang

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models.
Introduction	In addition, it is straightforward to integrate n-gram language models into phrase-based decoders in which translation always grows left-to-right.
Introduction	Unfortunately, as syntax-based decoders often generate target-language words in a bottom-up way using the CKY algorithm, integrating n-gram language models becomes more expensive because they have to maintain target boundary words at both ends of a partial translation (Chiang, 2007; Huang and Chiang, 2007).
Introduction	3. efficient integration of n-gram language model: as translation grows left-to-right in our algorithm, integrating n-gram language models is straightforward.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Discovering User Interactions in Ideological Discussions

Mukherjee, Arjun and Liu, Bing

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	A novel technique was also proposed to rank n-gram phrases where relevance based ranking was used in conjunction with a semi-supervised generative model.
Empirical Evaluation	The reduced dataset consists of 1095586 tokens (after n-gram preprocessing in §4), 40102 posts with an average of 27 posts or interactions per pair.
Model	Like most generative models for text, a post (document) is viewed as a bag of n-grams and each n-gram (word/phrase) takes one value from a predefined vocabulary.
Phrase Ranking based on Relevance	While this is reasonable, a significant n-gram with high likelihood score may not necessarily be relevant to the problem domain.
Phrase Ranking based on Relevance	This is nothing wrong per se because the statistical tests only judge significance of an n-gram, but a significant n-gram may not necessarily be relevant in a given problem domain.

n-gram is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Predicting and Eliciting Addressee's Emotion in Online Dialogue

Hasegawa, Takayuki and Kaji, Nobuhiro and Yoshinaga, Naoki and Toyoda, Masashi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	RESPONSE The n-gram and emotion features induced from the response.
Experiments	The n-gram and emotion features induced from the response and the addressee’s utterance.
Predicting Addressee’s Emotion	We extract all the n-grams (n g 3) in the response to induce (binary) n-gram features.
Predicting Addressee’s Emotion	The extracted n-grams activate another set of binary n-gram features.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. BRAINSUP: Brainstorming Support for Creative Sentence Generation

Özbal, Gözde and Pighin, Daniele and Strapparava, Carlo

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Architecture of BRAINSUP	N-gram likelihood.
Architecture of BRAINSUP	This is simply the likelihood of a sentence estimated by an n-gram language model, to enforce the generation of well-formed word sequences.
Architecture of BRAINSUP	When a solution is not complete, in the computation we include only the sequences of contiguous words (i.e., not interrupted by empty slots) having length greater than or equal to the order of the n-gram model.
Evaluation	The four combinations of features are: base: Target-word scorer + N-gram likelihood + Dependency likelihood + Variety scorer + Unusual-words scorer + Semantic cohesion; base+D: all the scorers in base + Domain relatedness; base+D+C: all the scorers in base+D + Chromatic connotation; base+D+E: all the scorers in base+D + Emotional connotation; base+D+P: all the scorers in base+D + Phonetic features.

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decipherment Model for Machine Translation	For P(e), we use a word n-gram language model (LM) trained on monolingual target text.
Decipherment Model for Machine Translation	Generate a target (e.g., English) string 6 = 61.43;, with probability P (6) according to an n-gram language model.
Feature-based representation for Source and Target	For instance, context features for word w may include other words (or phrases) that appear in the immediate context ( n-gram window) surrounding w in the monolingual corpus.
Feature-based representation for Source and Target	The feature construction process is described in more detail below: Target Language: We represent each word (or phrase) ei with the following contextual features along with their counts: (a) ficontem: every (word n-gram , position) pair immediately preceding e,-in the monolingual corpus (n=l , position=— l), (b) similar features f+conte$t to model the context following ei, and (c) we also throw in generic context features fscontewt without position information—every word that co-occurs with e, in the same sen-

n-gram is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Using Conceptual Class Attributes to Characterize Social Media Users

Bergsma, Shane and Van Durme, Benjamin

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Learning Class Attributes	We extract prevalent common nouns for males and females by selecting only those nouns that (a) occur more than 200 times in the dataset, (b) mostly occur with male or female pronouns, and (c) occur as lowercase more often than uppercase in a web-scale N-gram corpus (Lin et al., 2010).
Learning Class Attributes	We obtain the best of both worlds by matching our precise pattern against a version of the Google N-gram Corpus that includes the part-of-speech tag distributions for every N-gram (Lin et al., 2010).
Twitter Gender Prediction	We include n-gram features with the original capitalization pattern and separate features with the n- grams lower-cased.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

15. Minimum Bayes Risk based Answer Re-ranking for Question Answering

Duan, Nan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

MBR-based Answering Re-ranking	0 answer-level n-gram correlation feature:
MBR-based Answering Re-ranking	where w denotes an n-gram in A, #w(“4k3) denotes the number of times that w occurs in
MBR-based Answering Re-ranking	o passage-level n-gram correlation feature:

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

n-gram (3)
question answering (3)

16. Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discriminative Reranking for OCR	Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR	Semantic coherence feature (“SemCoh”) is motivated by the fact that semantic information can be very useful in modeling the fluency of phrases, and can augment the information provided by n-gram LMs.
Introduction	The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model (LM) to emphasize the fluency of the output.

n-gram is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

reranking (14)
LM (9)
error rate (4)