Index of papers in Proc. ACL 2008 that mention
  • language model
Zhang, Hao and Gildea, Daniel
Abstract
We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model .
Introduction
This complexity arises from the interaction of the tree-based translation model with an n-gram language model .
Introduction
First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model , and the second pass expands these states into trigram-based
Introduction
The general bigram-to-trigram technique is common in speech recognition (Murveit et al., 1993), where lattices from a bigram-based decoder are re-scored with a trigram language model .
Language Model Integrated Decoding for SCFG
We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space.
Language Model Integrated Decoding for SCFG
Without an n-gram language model , decoding using SCFG is not much different from CFG parsing.
Language Model Integrated Decoding for SCFG
However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized.
Multi-pass LM-Integrated Decoding
very good estimate of the outside cost using a trigram model since a bigram language model and a trigram language model must be strongly correlated.
Multi-pass LM-Integrated Decoding
We propagate the outside cost of the parent to its children by combining with the inside cost of the other children and the interaction cost, i.e., the language model cost between the focused child and the other children.
Multi-pass LM-Integrated Decoding
(2007) also take a two-pass decoding approach, with the first pass leaving the language model boundary words out of the dynamic programming state, such that only one hypothesis is retained for each span and grammar symbol.
language model is mentioned in 12 sentences in this paper.
Topics mentioned in this paper:
Uszkoreit, Jakob and Brants, Thorsten
Abstract
In statistical language modeling , one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes.
Abstract
The resulting clusterings are then used in training partially class—based language models .
Experiments
We trained a number of predictive class-based language models on different Arabic and English corpora using clusterings trained on the complete data of the same corpus.
Experiments
We use each predictive class-based language model as well as a word-based model as separate feature functions in the log-linear combination in Eq.
Experiments
The word-based language model used by the system in these experiments is a 5-gram model also trained on the enlarget data set.
Introduction
A statistical language model assigns a probability P(w) to any given string of words win 2 w1,...,wm.
Introduction
In the case of n-gram language models this is done by factoring the probability:
Introduction
do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1).
language model is mentioned in 15 sentences in this paper.
Topics mentioned in this paper:
Duan, Huizhong and Cao, Yunbo and Lin, Chin-Yew and Yu, Yong
Abstract
Then we model question topic and question focus in a language modeling framework for search.
Abstract
Experimental results indicate that our approach of identifying question topic and question focus for search significantly outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR).
Introduction
vector space model, Okapi, language model , and translation-based model, within the setting of question search (Jeon et al., 2005b).
Introduction
On the basis of this, we then propose to model question topic and question focus in a language modeling framework for search.
Our Approach to Question Search
model question topic and question focus in a language modeling framework for search.
Our Approach to Question Search
We employ the framework of language modeling (for information retrieval) to develop our approach to question search.
Our Approach to Question Search
In the language modeling approach to information retrieval, the relevance of a targeted question q to a queried question q is given by the probability p(q|fi) of generating the queried question q
language model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Fleischman, Michael and Roy, Deb
Abstract
Grounded language models represent the relationship between words and the nonlinguistic context in which they are said.
Abstract
Results show that grounded language models improve perplexity and word error rate over text based language models , and further, support video information retrieval better than human generated speech transcriptions.
Introduction
The method is based on the use of grounded language models to repre-
Introduction
Grounded language models are based on research from cognitive science on grounded models of meaning.
Introduction
This paper extends previous work on grounded models of meaning by learning a grounded language model from naturalistic data collected from broadcast video of Major League Baseball games.
Linguistic Mapping
We model this relationship, much like traditional language models , using conditional probability distributions.
Linguistic Mapping
Unlike traditional language models, however, our grounded language models condition the probability of a word not only on the word(s) uttered before it, but also on the temporal pattern features that describe the nonlinguistic context in which it was uttered.
language model is mentioned in 46 sentences in this paper.
Topics mentioned in this paper:
Kaufmann, Tobias and Pfister, Beat
Abstract
We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree.
Abstract
The language model is applied by means of an N -best rescoring step, which allows to directly measure the performance gains relative to the baseline system without rescoring.
Introduction
Other linguistically inspired language models like Chelba and J elinek (2000) and Roark (2001) have been applied to continuous speech recognition.
Introduction
In the first place, we want our language model to reliably distinguish between grammatical and ungrammatical phrases.
Introduction
However, their grammar-based language model did not make use of a probabilistic component, and it was applied to a rather simple recognition task (dictation texts for pupils read and recorded under good acoustic conditions, no out-of-vocabulary words).
Language Model 2.1 The General Approach
The language model weight A and the word insertion penalty ip lead to a better performance in practice, but they have no theoretical justification.
Language Model 2.1 The General Approach
Our grammar-based language model is incorporated into the above expression as an additional probability Pyram(W), weighted by a parameter ,u:
Language Model 2.1 The General Approach
A major problem of grammar-based approaches to language modeling is how to deal with out-of-grammar utterances.
language model is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Talbot, David and Brants, Thorsten
Abstract
We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters.
Abstract
We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework.
Experimental Setup
We deploy the randomized LM in a distributed framework which allows it to scale more easily by distributing it across multiple language model servers.
Introduction
Language models (LMs) are a core component in statistical machine translation, speech recognition, optical character recognition and many other areas.
Introduction
With large monolingual corpora available in major languages, making use of all the available data is now a fundamental challenge in language modeling .
Introduction
have considered alternative parameterizations such as class-based models (Brown et al., 1992), model reduction techniques such as entropy-based pruning (Stolcke, 1998), novel represention schemes such as suffix arrays (Emami et al., 2007), Golomb Coding (Church et al., 2007) and distributed language models that scale more readily (Brants et al., 2007).
Scaling Language Models
In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary.
Scaling Language Models
Recent work (Talbot and Osborne, 2007b) has used lossy encodings based on Bloom filters (Bloom, 1970) to represent logarithmically quantized corpus statistics for language modeling .
language model is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Mei, Qiaozhu and Zhai, ChengXiang
Abstract
We propose language modeling methods for solving this problem, and study how to incorporate features such as authority and proximity to accurately estimate the impact language model .
Impact Summarization
To solve these challenges, in the next section, we propose to model impact with un-igram language models and score sentences using
Impact Summarization
We further propose methods for estimating the impact language model based on several features including the authority of citations, and the citation proximity.
Introduction
We propose language models to exploit both the citation context and original content of a paper to generate an impact-based summary.
Introduction
We study how to incorporate features such as authority and proximity into the estimation of language models .
Introduction
We propose and evaluate several different strategies for estimating the impact language model , which is key to impact summarization.
Language Models for Impact Summarization
3.1 Impact language models
Language Models for Impact Summarization
We thus propose to represent such a virtual impact query with a unigram language model .
Language Models for Impact Summarization
Such a model is expected to assign high probabilities to those words that can describe the impact of paper d, just as we expect a query language model in ad hoc retrieval to assign high probabilities to words that tend to occur in relevant documents (Ponte and Croft, 1998).
language model is mentioned in 23 sentences in this paper.
Topics mentioned in this paper:
Shen, Libin and Xu, Jinxi and Weischedel, Ralph
Abstract
With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n—gram language model .
Dependency Language Model
wh-as-head represents 21);, used as the head, and it is different from 212;, in the dependency language model .
Dependency Language Model
In order to calculate the dependency language model score, or depLM score for short, on the fly for
Discussion
(2003) described a two-step string-to-CFG—tree translation model which employed a syntax-based language model to select the best translation from a target parse forest built in the first step.
Implementation Details
Language model score .
Implementation Details
Dependency language model score 8.
Introduction
language model during decoding, in order to exploit long-distance word relations which are unavailable with a traditional n-gram language model on target strings.
Introduction
Section 3 illustrates of the use of dependency language models .
String-to-Dependency Translation
Formal definitions also allow us to easily extend the framework to incorporate a dependency language model in decoding.
String-to-Dependency Translation
Supposing we use a traditional trigram language model in decoding, we need to specify the leftmost two words and the rightmost two words in a state.
String-to-Dependency Translation
In the next section, we will explain how to extend categories and states to exploit a dependency language model during decoding.
language model is mentioned in 11 sentences in this paper.
Topics mentioned in this paper:
Cao, Guihong and Robertson, Stephen and Nie, Jian-Yun
Abstract
The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model ), or according to its expected impact on the retrieval effectiveness (using a regression model).
Bigram Expansion Model for Alteration Selection
The query context is modeled by a bigram language model as in (Peng et al.
Bigram Expansion Model for Alteration Selection
In this work, we used bigram language model to calculate the probability of each path.
Bigram Expansion Model for Alteration Selection
P(el,ez,...,ei,...,en) = P(e1 )H:=2P(ek Iek_1) (2) P(ek|ek_1) is estimated with a back-off bigram language model (Goodman, 2001).
Conclusion
In the first method proposed — the Bigram Expansion model, query context is modeled by a bigram language model .
Introduction
The query context is modeled by a bigram language model .
Related Work
2007), a bigram language model is used to determine the alteration of the head word that best fits the query.
Related Work
In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates.
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Deng, Yonggang and Xu, Jia and Gao, Yuqing
A Generic Phrase Training Procedure
Each normalized feature score derived from word alignment models or language models will be log-linearly combined to generate the final score.
Discussions
We propose several information metrics derived from posterior distribution, language model and word alignments as feature functions.
Experimental Results
Like other log-linear model based decoders, active features in our translation engine include translation models in two directions, lexicon weights in two directions, language model , lexicalized distortion models, sentence length penalty and other heuristics.
Experimental Results
The language model is a statistical trigram model estimated with Modified Kneser—Ney smoothing (Chen and Goodman, 1996) using only English sentences in the parallel training data.
Features
All these features are data-driven and defined based on models, such as statistical word alignment model or language model .
Features
We apply a language model (LM) to describe the predictive uncertainty (PU) between words in two directions.
Features
Given a history 10711—1, a language model specifies a conditional distribution of the future word being predicted to follow the history.
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Espinosa, Dominic and White, Michael and Mehay, Dennis
Background
OpenCCG implements a symbolic-statistical chart realization algorithm (Kay, 1996; Carroll et al., 1999; White, 2006b) combining (l) a theoretically grounded approach to syntax and semantic composition with (2) factored language models (Bilmes and Kirchhoff, 2003) for making choices among the options left open by the grammar.
Background
makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class.
Background
2.3 Factored Language Models
Introduction
Assigned categories are instantiated in OpenCCG’s chart realizer where, together with a treebank-derived syntactic grammar (Hockenmaier and Steedman, 2007) and a factored language model (Bilmes and Kirchhoff, 2003), they constrain the English word-strings that are chosen to express the LF.
The Approach
Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model .
The Approach
As shown in Table l, with the large grammar derived from the training sections, many fewer complete realizations are found (before timing out) using the factored language model than are possible, as indicated by the results of using the oracle model.
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Toutanova, Kristina and Suzuki, Hisami and Ruopp, Achim
Inflection prediction models
We stemmed the reference translations, predicted the inflection for each stem, and measured the accuracy of prediction, using a set of sentences that were not part of the training data (1K sentences were used for Arabic and 5K for Russian).2 Our model performs significantly better than both the random and trigram language model baselines, and achieves an accuracy of over 91%, which suggests that the model is effective when its input is clean in its stem choice and order.
Integration of inflection models with MT systems
Given such a list of candidate stem sequences, the base MT model together with the inflection model and a language model choose a translation Y* as follows:
Integration of inflection models with MT systems
PLM) is the joint probability of the sequence of inflected words according to a trigram language model (LM).
Integration of inflection models with MT systems
In addition, stemming the target sentences reduces the sparsity in the translation tables and language model , and is likely to impact positively the performance of an MT system in terms of its ability to recover correct sequences of stems in the target.
Introduction
(Goldwater and McClosky, 2005), while the application of a target language model has almost solely been responsible for addressing the second aspect.
Machine translation systems and data
(2003), a trigram target language model , two order models, word count, phrase count, and average phrase size functions.
Machine translation systems and data
The features include log-probabilities according to inverted and direct channel models estimated by relative frequency, lexical weighting channel models, a trigram target language model , distortion, word count and phrase count.
Machine translation systems and data
For each language pair, we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model ), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation.
language model is mentioned in 8 sentences in this paper.
Topics mentioned in this paper:
Zhang, Dongdong and Li, Mu and Duan, Nan and Li, Chi-Ho and Zhou, Ming
Experiments
In the experiments, the language model is a Chinese 5-gram language model trained with the Chinese part of the LDC parallel corpus and the Xin-hua part of the Chinese Gigaword corpus with about 27 million words.
Experiments
In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature.
Introduction
Moreover, Chinese measure words often have a long distance dependency to their head words which makes language model ineffective in selecting the correct measure words from the measure word candidate set.
Introduction
In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation.
Model Training and Application 3.1 Training
We used the SRI Language Modeling Toolkit (Stolcke, 2002) to train a five-gram model with modified Kneser-Ney smoothing (Chen and Goodman, 1998).
Our Method
For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure
Our Method
Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Blunsom, Phil and Cohn, Trevor and Osborne, Miles
Discriminative Synchronous Transduction
ilar to the methods for decoding with a SCFG intersected with an n-gram language model, which require language model contexts to be stored in each chart cell.
Discussion and Further Work
To do so would require integrating a language model feature into the max-translation decoding algorithm.
Evaluation
The feature set includes: a trigram language model (lm) trained
Evaluation
To compare our model directly with these systems we would need to incorporate additional features and a language model , work which we have left for a later date.
Evaluation
The relative scores confirm that our model, with its minimalist feature set, achieves comparable performance to the standard feature set without the language model .
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Branavan, S.R.K. and Chen, Harr and Eisenstein, Jacob and Barzilay, Regina
Introduction
Each property indexes a language model , thus allowing documents that incorporate the same
Model Description
Keyphrases are drawn from a set of clusters; words in the documents are drawn from language models indexed by a set of topics, where the topics correspond to the keyphrase clusters.
Model Description
language models of each topic
Model Description
In the LDA framework, each word is generated from a language model that is indexed by the word’s topic assignment.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Olsson, J. Scott and Oard, Douglas W.
Experiments
The resulting interview transcripts have a reported mean word error rate (WER) of approximately 25% on held out data, which was obtained by priming the language model with meta-data available from preinterview questionnaires.
Experiments
We use a mixture of the training transcripts and various newswire sources for our language model training.
Experiments
We did not attempt to prime the language model for particular interviewees or otherwise utilize any interview metadata.
Introduction
Limitations in signal processing, acoustic modeling, pronunciation, vocabulary, and language modeling can be accommodated in several ways, each of which make different tradeoffs and thus induce different
Previous Work
In the extreme case, the term may simply be out of vocabulary, although this may occur for various other reasons (e. g., poor language modeling or pronunciation dictionaries).
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Fleck, Margaret M.
Previous work
Language modelling methods build word ngram models, like those used in speech recognition.
Previous work
3.2 Language modelling methods
Previous work
So far, language modelling methods have been more effective.
The new approach
This corresponds roughly to a unigram language model .
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Avramidis, Eleftherios and Koehn, Philipp
Experiments
For testing the factored translation systems, we used Moses (Koehn et al., 2007), along with a 5-gram SRILM language model (Stolcke, 2002).
Factored Model
The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model , the reordering model, the translation models and the generation models.
Introduction
0 The basic SMT approach uses the target language model as a feature in the argument maximisation function.
Introduction
This language model is trained on grammatically correct text, and would therefore give a good probability for word sequences that are likely to occur in a sentence, while it would penalise ungrammatical or badly ordered formations.
Introduction
Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam-
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Mi, Haitao and Huang, Liang and Liu, Qun
Experiments
language model and (93'8' is the length penalty term
Experiments
We also use the SRI Language Modeling Toolkit (Stolcke, 2002) to train a trigram language model with Kneser-Ney smoothing on the English side of the bitext.
Experiments
Besides the trigram language model trained on the English side of these bitext, we also use another trigram model trained on the first 1/3 of the Xinhua portion of Gigaword corpus.
Forest-based translation
The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Li, Zhifei and Yarowsky, David
Experimental Results
To handle different directions of translation between Chinese and English, we built two trigram language models with modified Kneser-Ney smoothing (Chen and Goodman, 1998) using the SRILM toolkit (Stolcke, 2002).
Experimental Results
Feature Baseline AAMT language model 0.137 0.133 phrase translation 0.066 0.023 lexical translation 0.061 0.078 reverse phrase translation 0.059 0.103 reverse lexical translation 0.
Unsupervised Translation Induction for Chinese Abbreviations
Moreover, our approach utilizes both Chinese and English monolingual data to help MT, while most SMT systems utilizes only the English monolingual data to build a language model .
Unsupervised Translation Induction for Chinese Abbreviations
However, since most of statistical translation models (Koehn et al., 2003; Chiang, 2007; Galley et al., 2006) are symmetrical, it is relatively easy to train a translation system to translate from English to Chinese, except that we need to train a Chinese language model from the Chinese monolingual data.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Yang, Fan and Zhao, Jun and Zou, Bo and Liu, Kang and Liu, Feifan
Statistical Transliteration Model
The language model P(e) is trained from English texts.
Statistical Transliteration Model
generative probability of a English syllable language model .
Statistical Transliteration Model
2) The language model in backward transliteration describes the relationship of syllables in words.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cherry, Colin
Discussion
So long as the vocabulary present in our phrase table and language model supports a literal translation, cohesion tends to produce an improvement.
Discussion
In the baseline translation, the language model encourages the system to move the negation away from “exist” and toward “reduce.” The result is a tragic reversal of meaning in the translation.
Introduction
order, forcing the decoder to rely heavily on its language model .
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: