Index of papers in Proc. ACL 2012 that mention
  • n-gram
Pauls, Adam and Klein, Dan
Abstract
We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context.
Abstract
We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.
Introduction
At the same time, because n-gram language models only condition on a local window of linear word-level context, they are poor models of long-range syntactic dependencies.
Introduction
Although several lines of work have proposed generative syntactic language models that improve on n-gram models for moderate amounts of data (Chelba, 1997; Xu et al., 2002; Charniak, 2001; Hall, 2004; Roark,
Introduction
Our model can be trained simply by collecting counts and using the same smoothing techniques normally applied to n-gram models (Kneser and Ney, 1995), enabling us to apply techniques developed for scaling 71- gram models out of the box (Brants et al., 2007; Pauls and Klein, 2011).
Treelet Language Modeling
The common denominator of most n-gram language models is that they assign probabilities roughly according to empirical frequencies for observed 77.-grams, but fall back to distributions conditioned on smaller contexts for unobserved n-grams, as shown in Figure 1(a).
Treelet Language Modeling
As in the n-gram case, we would like to pick h to be large enough to capture relevant dependencies, but small enough that we can obtain meaningful estimates from data.
Treelet Language Modeling
Although it is tempting to think that we can replace the left-to-right generation of n-gram models with the purely top-down generation of typical PCFGs, in practice, words are often highly predictive of the words that follow them — indeed, n-gram models would be terrible language models if this were not the case.
n-gram is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Chen, David
Background
To learn the meaning of an n-gram 21), Chen and Mooney first collect all navigation plans 9 that co-occur with w. This forms the initial candidate meaning set for 21).
Conclusion
In contrast to the previous approach that computed common subgraphs between different contexts in which an n-gram appeared, we instead focus on small, connected subgraphs and introduce an algorithm, SGOLL, that is an order of magnitude faster.
Online Lexicon Learning Algorithm
Even though they use beam-search to limit the size of the candidate set, if the initial candidate meaning set for a n-gram is large, it can take a long time to take just one pass through the list of all candidates.
Online Lexicon Learning Algorithm
: function Update(training example (67;, 1%)) for n-gram w that appears in 67; do
Online Lexicon Learning Algorithm
14: Increase the count of examples, each n-gram w and each subgraph g 15: end function
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Liu, Chang and Ng, Hwee Tou
Discussion and Future Work
The TESLA-M metric allows each n-gram to have a weight, which is primarily used to discount function words.
Experiments
Based on these synonyms, TESLA-CELAB is able to award less trivial n-gram matches, such as T fiflifififl.
Experiments
The covered n-gram matching rule is then able to award tricky n-grams such as TE, Ti, /1\ [E], 1/13 [IE5 and i9}.
Motivation
We formulate the n-gram matching process as a real-valued linear programming problem, which can be solved efficiently.
The Algorithm
The basic n-gram matching problem is shown in Figure 2.
The Algorithm
We observe that since has been matched, all its sub-n-grams should be considered matched as well, including and We call this the covered n-gram matching rule.
The Algorithm
However, we cannot simply perform covered n-gram matching as a post processing step.
n-gram is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Tang, Hao and Keshet, Joseph and Livescu, Karen
Feature functions
We use 1‘91.” to denote the n-gram substring p1 .
Feature functions
The two substrings E and b are said to be equal if they have the same length and a7; 2 ()7; for 1 S i S n. For a given sub-word unit n-gram U E 73’", we use the shorthand U 6 T9 to mean that we can find U in 1‘9; i.e., there eXists an indeX i such that my”, 2 H. We use |f9| to denote the length of the sequence 1‘9.
Feature functions
Similarly to (Zweig et al., 2010), we adapt TF and IDF by treating a sequence of sub-word units as a “document” and n-gram sub-sequences as “words.” In this analogy, we use sub-sequences in surface pronunciations to “search” for baseforms in the dictionary.
n-gram is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Chen, Wenliang and Zhang, Min and Li, Haizhou
Dependency language model
The standard N-gram based language model predicts the next word based on the N — 1 immediate previous words.
Dependency language model
However, the traditional N-gram language model can not capture long-distance word relations.
Dependency language model
The N-gram DLM predicts the next child of a head based on the N — 1 immediate previous children and the head itself.
Experiments
Then, we studied the effect of adding different N-gram DLMs to MSTl.
Introduction
The N-gram DLM has the ability to predict the next child based on the N-l immediate previous children and their head (Shen et al., 2008).
Introduction
The DLM-based features can capture the N-gram information of the parent-children structures for the parsing model.
n-gram is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Celikyilmaz, Asli and Hakkani-Tur, Dilek
Data and Approach Overview
(Ilia) Entity List Prior (Illa) Web N-Gram Context Prior
Data and Approach Overview
n-gram ——> web quer logs domain specific entity act parameters prior \
Experiments
* Base—MCM: Our first version injects an informative prior for domain, dialog act and slot topic distributions using information extracted from only labeled training utterances and inject as prior constraints (corpus n-gram base measure during topic assignments.
MultiLayer Context Model - MCM
* Web n-Gram Context Base Measure (2%): As explained in §3, we use the web n-grams as additional information for calculating the base measures of the Dirichlet topic distributions.
MultiLayer Context Model - MCM
* Corpus n-Gram Base Measure (2%): Similar to other measures, MCM also encodes n-gram constraints as word-frequency features extracted from labeled utterances.
n-gram is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Kuhn, Roland and Larkin, Samuel
BLEU and PORT
First, define n-gram precision p(n) and recall r(n):
BLEU and PORT
where Pg N) is the geometr1c average of n-gram prec1s1ons
BLEU and PORT
The average precision and average recall used in PORT (unlike those used in BLEU) are the arithmetic average of n-gram precisions Pa(N) and recalls Ra(N):
Experiments
As usual, French-English is the outlier: the two outputs here are typically so similar that BLEU and Qmean tuning yield very similar n-gram statistics.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Guo, Weiwei and Diab, Mona
Experiments and Results
The performance of WTMF on CDR is compared with (a) an Information Retrieval model (IR) that is based on surface word matching, (b) an n-gram model ( N-gram ) that captures phrase overlaps by returning the number of overlapping ngrams as the similarity score of two sentences, (c) LSA that uses svds() function in Matlab, and (d) LDA that uses Gibbs Sampling for inference (Griffiths and Steyvers, 2004).
Experiments and Results
The similarity of two sentences is computed by cosine similarity (except N-gram ).
Experiments and Results
We mainly compare the performance of IR, N-gram , LSA, LDA, and WTMF models.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming
Features and Training
Here simple n-gram similarity is used for the sake of efficiency.
Features and Training
Like GC , there are four features with respect to the value of n in n-gram similarity measure.
Graph Construction
Solid lines are edges connecting nodes with sufficient source side n-gram similarity, such as the one between "E A M N" and "E A B C".
Introduction
Collaborative decoding (Li et al., 2009) scores the translation of a source span by its n-gram similarity to the translations by other systems.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Nuhn, Malte and Mauser, Arne and Ney, Hermann
Abstract
On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model.
Experimental Evaluation
being approximately 15 to 20 times faster than their n-gram based approach.
Experimental Evaluation
To summarize: Our method is significantly faster than n-gram LM based approaches and obtains better results than any previously published method.
Translation Model
Stochastically generate the target sentence according to an n-gram language model.
n-gram is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Bansal, Mohit and Klein, Dan
Abstract
To address semantic ambiguities in coreference resolution, we use Web n-gram features that capture a range of world knowledge in a diffuse but robust way.
Introduction
In order to harness the information on the Web without presupposing a deep understanding of all Web text, we instead turn to a diverse collection of Web n-gram counts (Brants and Franz, 2006) which, in aggregate, contain diffuse and indirect, but often robust, cues to reference.
Semantics via Web Features
These clusters come from distributional K -Means clustering (with K = 1000) on phrases, using the n-gram context as features.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop
Conclusion & Future Work
In addition, we can extend our approach by applying some of the techniques used in other system combination approaches such as consensus decoding, using n-gram features, tuning using forest-based MERT, among other possible extensions.
Related Work 5.1 Domain Adaptation
In other words, it requires all component models to fully decode each sentence, compute n-gram expectations from each component model and calculate posterior probabilities over translation derivations.
Related Work 5.1 Domain Adaptation
Finally, main techniques used in this work are orthogonal to our approach such as Minimum Bayes Risk decoding, using n-gram features and tuning using MERT.
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang
Sentence Completion via Language Modeling
3.1 Backoff N-gram Language Model
Sentence Completion via Language Modeling
3.2 Maximum Entropy Class-Based N-gram Language Model
Sentence Completion via Latent Semantic Analysis
4.3 A LSA N-gram Language Model
n-gram is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: