Index of papers in Proc. ACL 2010 that mention
  • language model
Bicknell, Klinton and Levy, Roger
Explaining between-word regressions
This simple example just illustrates the point that if a reader is combining noisy visual information with a language model , then confidence in previous regions will sometimes fall.
Models of eye movements in reading
Unfortunately, however, the Mr. Chips model simplifies the problem of reading in a number of ways: First, it uses a unigram model as its language model , and thus fails to use any information in the linguistic context to help with word identification.
Models of eye movements in reading
Specifically, our model identifies the words in a sentence by performing Bayesian inference combining noisy input from a realistic visual model with a language model that takes context into account.
Reading as Bayesian inference
Specifically, the model begins reading with a prior distribution over possible identities of a sentence given by its language model .
Reading as Bayesian inference
model’s prior distribution over the identity of the sentence given the language model is updated to a posterior distribution taking into account both the language model and the visual input obtained thus far.
Reading as Bayesian inference
Given the visual input and a language model, inferences about the identity of the sentence w can be made by standard Bayesian inference, where the prior is given by the language model and the likelihood is a function of the total visual input obtained from the first to the ith timestep Ii ,
Simulation 1
5.1.2 Language model
Simulation 1
Our reader’s language model was an unsmoothed bigram model created using a vocabulary set con-
Simulation 1
Specifically, we constructed the model’s initial belief state (i.e., the distribution over sentences given by its language model ) by directly translating the bigram model into a wFSA in the log semiring.
Simulation 2
6.1.3 Language model
language model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Corlett, Eric and Penn, Gerald
Experiment
In the event that a trigram or bigram would be found in the plaintext that was not counted in the language model , add one smoothing was used.
Experiment
Our character-level language model used was developed from the first 1.5 million characters of the Wall Street Journal section of the Penn Tree-bank corpus.
Introduction
If the text from which a language model is trained is of a different genre than the plaintext of a cipher, the unigraph letter frequencies may differ substantially from those of the language model , and so frequency counting will be misleading.
Introduction
Such inefficiency indicates that integer programming may simply be the wrong tool for the job, possibly because language model probabilities computed from empirical data are not smoothly distributed enough over the space in which a cutting-plane method would attempt to compute a linear relaxation of this problem.
Introduction
This difference in difficulty, while real, is not inherent, but rather an artefact of the character-level n-gram language models that they (and we) use, in which preponderant evidence of differences in short character sequences is necessary for the model to clearly favour one letter-substitution mapping over another.
Terminology
Every possible full solution to a cipher C will produce a plaintext string with some associated language model probability, and we will consider the best possible solution to be the one that gives the highest probability.
Terminology
For the sake of concreteness, we will assume here that the language model is a character-level trigram model.
The Algorithm
Backpointers are necessary to reference one of the two language model probabilities.
The Algorithm
Cells that would produce inconsistencies are left at zero, and these as well as cells that the language model assigns zero to can only produce zero entries in later columns.
The Algorithm
The n p x n p cells of every column 2' do not depend on each other —only on the cells of the previous two columns 2' — 1 and i— 2, as well as the language model .
language model is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Duan, Xiangyu and Zhang, Min and Li, Haizhou
Conclusion
Removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, shows that pseudo-words still improve translational performance significantly over unary words.
Experiments and Results
The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
Experiments and Results
A 5-gram language model is trained on English side of parallel corpus.
Experiments and Results
Xinhua portion of the English Gigaword3 corpus is used together with English side of large corpus to train a 4-gram language model .
Introduction
Further experiments of removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, show that pseudo-words still improve translational performance significantly over unary words.
language model is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut
Our Approach
(1) 3.1.1 Language Model
Our Approach
The language model (LM) pm?)
Our Approach
The parameters of the language model are learned from a monolingual Urdu corpus.
language model is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
Pitler, Emily and Louis, Annie and Nenkova, Ani
Indicators of linguistic quality
3.1 Word choice: language models
Indicators of linguistic quality
Language models (LM) are a way of computing how familiar a text is to readers using the distribution of words from a large background corpus.
Indicators of linguistic quality
We built unigram, bigram, and trigram language models with Good-Turing smoothing over the New York Times (NYT) section of the English Gigaword corpus (over 900 million words).
Results and discussion
Coh-Metrix, which has been proposed as a comprehensive characterization of text, does not perform as well as the language model and the entity coherence classes, which contain considerably fewer features related to only one aspect of text.
Results and discussion
It is apparent from the results that continuity, entity coherence, sentence fluency and language models are the most powerful classes of features that should be used in automation of evaluation and against which novel predictors of text quality should be compared.
Results and discussion
For example, the language model features, which are the second best class for the system-level, do not fare as well at the input-level.
language model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Aker, Ahmet and Gaizauskas, Robert
Abstract
Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries.
Introduction
They also experimented with representing such conceptual models using n- gram language models derived from corpora consisting of collections of descriptions of instances of specific object types (e.g.
Introduction
a corpus of descriptions of churches, a corpus of bridge descriptions, and so on) and reported results showing that incorporating such n-gram language models as a feature in a feature-based extractive summarizer improves the quality of automatically generated summaries.
Introduction
The main weakness of n-gram language models is that they only capture very local information aboutshofitennsequencesandcannotnnxkfllong distance dependencies between terms.
Representing conceptual models 2.1 Object type corpora
2.2 N-gram language models
Representing conceptual models 2.1 Object type corpora
Aker and Gaizauskas (2009) experimented with uni-gram and bi-gram language models to capture the features commonly used when describing an object type and used these to bias the sentence selection of the summarizer towards the sentences that contain these features.
Representing conceptual models 2.1 Object type corpora
As in Song and Croft (1999) they used their language models in a gener-
language model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Liu, Qun
Abstract
We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model ) to ensure grammaticality.
Decoding
where the first two terms are translation and language model probabilities, 6(0) is the target string (English sentence) for derivation 0, the third and forth items are the dependency language model probabilities on the target side computed with words and POS tags separately, De (0) is the target dependency tree of 0, the fifth one is the parsing probability of the source side tree TC(0) 6 FC, the ill(0) is the penalty for the number of ill-formed dependency structures in 0, and the last two terms are derivation and translation length penalties, respectively.
Decoding
For each node, we use the cube pruning technique (Chiang, 2007; Huang and Chiang, 2007) to produce partial hypotheses and compute all the feature scores including the dependency language model score (Section 4.1).
Decoding
4.1 Dependency Language Model Computing
Experiments
We also store the POS tag information for each word in dependency trees, and compute two different dependency language models for words and POS tags in dependency tree separately.
Experiments
We use SRI Language Modeling Toolkit (Stolcke, 2002) to train a 4-gram language model with Kneser-Ney smoothing on the first 1/3 of the Xinhua portion of Giga-word corpus.
Experiments
This suggests that using dependency language model really improves the translation quality by less than 1 BLEU point.
language model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
May, Jonathan and Knight, Kevin and Vogler, Heiko
Decoding Experiments
We add an English syntax language model £ to the cascade of transducers just described to better simulate an actual machine translation decoding task.
Decoding Experiments
The language model is cast as an identity WTT and thus fits naturally into the experimental framework.
Decoding Experiments
In our experiments we try several different language models to demonstrate varying performance of the application algorithms.
language model is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick
Conclusion and perspectives
It would also be interesting to test the impact of another lexical language model , learned on non-SMS sentences.
Evaluation
The language model of the evaluation is a 3-gram.
Evaluation
(2008a), who showed on a French corpus comparable to ours that, if using a larger language model is always rewarded, the improvement quickly decreases with every higher level and is already quite small between 2-gram and 3-gram.
Overview of the system
In our system, all lexicons, language models and sets of rules are compiled into finite-state machines (FSMs) and combined with the input text by composition (0).
Overview of the system
Third, a combination of the lattice of solutions with a language model , and the choice of the best sequence of lexical units.
Related work
A language model is then applied on the word lattice, and the most probable word sequence is finally chosen by applying a best-path algorithm on the lattice.
The normalization models
All tokens Tj of S are concatenated together and composed with the lexical language model LM.
The normalization models
4.6 The language model
The normalization models
Our language model is an n-gram of lexical forms, smoothed by linear interpolation (Chen and Goodman, 1998), estimated on the normalized part of our training corpus and compiled into a weighted FST LMw.
language model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Wu, Stephen and Bachrach, Asaf and Cardenas, Carlos and Schuler, William
Discussion
This is particularly true when the sentence structure is defined in a language model that is psycholinguistically plausible (here, bounded-memory right-corner form).
Discussion
This accords with an understated result of Boston et al.’s eye-tracking study (2008a): a richer language model predicts eye movements during reading better than an oversimplified one.
Discussion
Frank (2009) similarly reports improvements in the reading-time predictiveness of unlexi-calized surprisal when using a language model that is more plausible than PCFGs.
Introduction
Ideally, a psychologically-plausible language model would produce a surprisal that would correlate better with linguistic complexity.
Introduction
Therefore, the specification of how to encode a syntactic language model is of utmost importance to the quality of the metric.
Introduction
The purpose of this paper is to determine whether the language model defined by the HHMM parser can also predict reading times —it would be strange if a psychologically plausible model did not also produce Viable complexity metrics.
Parsing Model
Both of these metrics fall out naturally from the time-series representation of the language model .
Parsing Model
With the understanding of what operations need to occur, a formal definition of the language model is in order.
language model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank
Abstract
In this paper we analyze reading times in terms of a single predictive measure which integrates a model of semantic composition with an incremental parser and a language model .
Integrating Semantic Constraint into Surprisal
While surprisal is a theoretically well-motivated measure, formalizing the idea of linguistic processing being highly predictive in terms of probabilistic language models , the measurement of semantic constraint in terms of vector similarities lacks a clear motivation.
Integrating Semantic Constraint into Surprisal
This can be achieved by turning a vector model of semantic similarity into a probabilistic language model .
Integrating Semantic Constraint into Surprisal
There are in fact a number of approaches to deriving language models from distributional models of semantics (e.g., Bellegarda 2000; Coccaro and Jurafsky 1998; Gildea and Hofmann 1999).
Models of Processing Difficulty
The basic idea is that the processing costs relating to the expectations of the language processor can be expressed in terms of the probabilities assigned by some form of language model to the input.
Models of Processing Difficulty
Surprisal could be also defined using a vanilla language model that does not take any structural or grammatical information into account (Frank 2009).
language model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua
Clustering-based word representations
So it is a class-based bigram language model .
Clustering-based word representations
Deschacht and Moens (2009) use a latent-variable language model to improve semantic role labeling.
Distributed representations
Word embeddings are typically induced using neural language models , which use neural networks as the underlying predictive model (Bengio, 2008).
Distributed representations
Historically, training and testing of neural language models has been slow, scaling as the size of the vocabulary for each model computation (Bengio et al., 2001; Bengio et al., 2003).
Distributed representations
Collobert and Weston (2008) presented a neural language model that could be trained over billions of words, because the gradient of the loss was computed stochastically over a small sample of possible outputs, in a spirit similar to Bengio and Sénecal (2003).
Introduction
Neural language models (Bengio et al., 2001; Schwenk & Gauvain, 2002; Mnih & Hinton, 2007; Collobert & Weston, 2008), on the other hand, induce dense real-valued low-dimensional
Introduction
(See Bengio (2008) for a more complete list of references on neural language models .)
Unlabled Data
These auxiliary tasks are sometimes specific to the supervised task, and sometimes general language modeling tasks like “predict the missing word”.
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Huang, Fei and Yates, Alexander
Abstract
We leverage recently-developed techniques for learning representations of text using latent-variable language models , and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling.
Introduction
Using latent-variable language models , we learn representations of texts that provide novel kinds of features to our supervised learning algorithms.
Introduction
The next section provides background information on learning representations for NLP tasks using latent-variable language models .
Introduction
2 Open-Domain Representations Using Latent-Variable Language Models
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Yeniterzi, Reyyan and Oflazer, Kemal
Conclusions
The problem is essentially one of generating multiple candidate sentences with the unattached function words ambiguously positioned (say in a lattice) and then use a second language model to rerank these sentences to select the target sentence.
Experimental Setup and Results
Furthermore, in factored models, we can employ different language models for different factors.
Experimental Setup and Results
We believe that the use of multiple language models (some much less sparse than the surface LM) in the factored baseline is the main reason for the improvement.
Experimental Setup and Results
3.2.3 Experiments with higher-order language models
Introduction
The main reason given for these problems was that the same statistical translation, reordering and language modeling mechanisms were being employed to both determine the morphological structure of the words and, at the same time, get the global order of the words correct.
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris
Related Work
First, a pre-def1ned confusion set is used to generate candidate corrections, then a scoring model, such as a trigram language model or na'1've Bayes classifier, is used to rank the candidates according to their context (e.g., Golding and Roth, 1996; Mangu and Brill, 1997; Church et al., 2007).
Related Work
(2009) present a query speller system in which both the error model and the language model are trained using Web data.
Related Work
Typically, a language model (source model) is used to capture contextual information, while an error model (channel model) is considered to be context free in that it does not take into account any contextual information in modeling word transformation probabilities.
The Baseline Speller System
where the error model P(QIC) models the transformation probability from C to Q, and the language model P(C) models how likely C is a correctly spelled query.
The Baseline Speller System
The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing.
The Baseline Speller System
Since we define the logarithm of the probabilities of the language model and the error model (i.e., the edit distance function) as features, the ranker can be viewed as a more general framework, subsuming the source channel model as a special case.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Wuebker, Joern and Mauser, Arne and Ney, Hermann
Alignment
A language model is not used in this case, as the system is constrained to the given target sentence and thus the language model score has no effect on the alignment.
Alignment
To deal with this problem, instead of simple phrase length restriction, we propose to apply the leaving-one-out method, which is also used for language modeling techniques (Kneser and Ney, 1995).
Experimental Evaluation
The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
Experimental Evaluation
We used a 4-gram language model with modified Kneser-Ney discounting for all experiments.
Introduction
The phrase model is combined with a language model , word lexicon models, word and phrase penalty, and many oth: ers.
Related Work
They report improvements over a phrase-based model that uses an inverse phrase model and a language model .
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen
Background
Since all the member systems share the same data resources, such as language model and translation table, we only need to keep one copy of the required resources in memory.
Background
Another method to speed up the system is to accelerate n-gram language model with n-gram caching techniques.
Background
If the required n-gram hits the cache, the corresponding n-gram probability is returned by the cached copy rather than re-fetching the original data in language model .
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Laskowski, Kornel
Conceptual Framework
In language modeling practice, one finds the likelihood P ( w | (9 of a word sequence w of length under a model (9, to be an inconvenient measure for comparison.
Discussion
This makes it suitable for comparison of conversational genres, in much the same way as are general language models of words.
Discussion
Accordingly, as for language models , density estimation in future turn-taking models may be im-
Introduction
The current work attempts to address this problem by proposing a simple framework, which, at least conceptually, borrows quite heavily from the standard language modeling paradigm.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Feng, Yansong and Lapata, Mirella
Abstractive Caption Generation
Specifically, we use an adaptive language model (Kneser et al., 1997) that modifies an
Abstractive Caption Generation
where P(wi E C |wi E D) is the probability of W, appearing in the caption given that it appears in the document D, and Padap(wi|wi_1,wi_2) the language model adapted with probabilities from our image annotation model:
Experimental Setup
The scaling parameter [3 for the adaptive language model was also tuned on the development set using a range of [05,09].
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Snyder, Benjamin and Barzilay, Regina and Knight, Kevin
Evaluation Tasks and Results
To produce baseline cognate identification predictions, we calculate the probability of each latent Hebrew letter sequence predicted by the HMM, and compare it to a uniform character-level Ugaritic language model (as done by our model, to avoid automatically assigning higher cognate probability to shorter Ugaritic words).
Inference
We also calculate P = 0) using a uniform uni-gram character-level language model (and thus depends only on the number of characters in ui).
Model
Otherwise, a lone word it is generated, according a uniform character-level language model .
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wang, Jia and Li, Qing and Chen, Yuanzhu Peter and Lin, Zhangxi
Conclusion and Future Work
By combining such information with traditional statistical language models , it is capable of suggesting relevant articles that meet the dynamic nature of a discussion in social media.
Experimental Evaluation
The second one, LM, is based on statistical language models for relevant information retrieval (Ponte and Croft, 1998).
Experimental Evaluation
bilistic language model for each article, and ranks them on query likelihood, i.e.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Chen, Boxing and Foster, George and Kuhn, Roland
Experiments
We trained two language models : the first one is a 4-gram LM which is estimated on the target side of the texts used in the large data condition.
Experiments
Both language models are used for both tasks.
Experiments
Only the target-language half of the parallel training data are used to train the language model in this task.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi
Experiments
Here, the first item is the language model (LM) probability where 7'(d) is the target string of derivation d; the second item is the translation length penalty; and the third item is the translation score, which is decomposed into a product of feature values of rules:
Experiments
SRI Language Modeling Toolkit (Stolcke, 2002) was employed to train 5-gram English and Japanese LMs on the training set.
Related Work
By introducing supertags into the target language side, i.e., the target language model and the target side of the phrase table, significant improvement was achieved for Arabic-to-English translation.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min and Li, Haizhou
Features
To some extent, these two features have similar function to a target language model or pos-based target language model .
Related Work
(2009) study several confidence features based on mutual information between words and n-gram and backward n-gram language model for word-level and sentence-level CE.
SMT System
We build a four-gram language model using the SRILM toolkit (Stolcke, 2002), which is trained
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: