Abstract | We take a multi-pass approach to machine translation decoding when using synchronous context-free grammars as the translation model and n-gram language models: the first pass uses a bigram language model, and the resulting parse forest is used in the second pass to guide search with a trigram language model . |
Introduction | This complexity arises from the interaction of the tree-based translation model with an n-gram language model . |
Introduction | First, we present a two-pass decoding algorithm, in which the first pass explores states resulting from an integrated bigram language model , and the second pass expands these states into trigram-based |
Introduction | The general bigram-to-trigram technique is common in speech recognition (Murveit et al., 1993), where lattices from a bigram-based decoder are re-scored with a trigram language model . |
Language Model Integrated Decoding for SCFG | We begin by introducing Synchronous Context Free Grammars and their decoding algorithms when an n-gram language model is integrated into the grammatical search space. |
Language Model Integrated Decoding for SCFG | Without an n-gram language model , decoding using SCFG is not much different from CFG parsing. |
Language Model Integrated Decoding for SCFG | However, when we want to integrate an n-gram language model into the search, our goal is searching for the derivation whose total sum of weights of productions and n-gram log probabilities is maximized. |
Multi-pass LM-Integrated Decoding | very good estimate of the outside cost using a trigram model since a bigram language model and a trigram language model must be strongly correlated. |
Multi-pass LM-Integrated Decoding | We propagate the outside cost of the parent to its children by combining with the inside cost of the other children and the interaction cost, i.e., the language model cost between the focused child and the other children. |
Multi-pass LM-Integrated Decoding | (2007) also take a two-pass decoding approach, with the first pass leaving the language model boundary words out of the dynamic programming state, such that only one hypothesis is retained for each span and grammar symbol. |
Abstract | In statistical language modeling , one technique to reduce the problematic effects of data sparsity is to partition the vocabulary into equivalence classes. |
Abstract | The resulting clusterings are then used in training partially class—based language models . |
Experiments | We trained a number of predictive class-based language models on different Arabic and English corpora using clusterings trained on the complete data of the same corpus. |
Experiments | We use each predictive class-based language model as well as a word-based model as separate feature functions in the log-linear combination in Eq. |
Experiments | The word-based language model used by the system in these experiments is a 5-gram model also trained on the enlarget data set. |
Introduction | A statistical language model assigns a probability P(w) to any given string of words win 2 w1,...,wm. |
Introduction | In the case of n-gram language models this is done by factoring the probability: |
Introduction | do not differ in the last n — 1 words, one problem n-gram language models suffer from is that the training data is too sparse to reliably estimate all conditional probabilities P(w,~ lwzf 1). |
Abstract | Then we model question topic and question focus in a language modeling framework for search. |
Abstract | Experimental results indicate that our approach of identifying question topic and question focus for search significantly outperforms the baseline methods such as Vector Space Model (VSM) and Language Model for Information Retrieval (LMIR). |
Introduction | vector space model, Okapi, language model , and translation-based model, within the setting of question search (Jeon et al., 2005b). |
Introduction | On the basis of this, we then propose to model question topic and question focus in a language modeling framework for search. |
Our Approach to Question Search | model question topic and question focus in a language modeling framework for search. |
Our Approach to Question Search | We employ the framework of language modeling (for information retrieval) to develop our approach to question search. |
Our Approach to Question Search | In the language modeling approach to information retrieval, the relevance of a targeted question q to a queried question q is given by the probability p(q|fi) of generating the queried question q |
Abstract | Grounded language models represent the relationship between words and the nonlinguistic context in which they are said. |
Abstract | Results show that grounded language models improve perplexity and word error rate over text based language models , and further, support video information retrieval better than human generated speech transcriptions. |
Introduction | The method is based on the use of grounded language models to repre- |
Introduction | Grounded language models are based on research from cognitive science on grounded models of meaning. |
Introduction | This paper extends previous work on grounded models of meaning by learning a grounded language model from naturalistic data collected from broadcast video of Major League Baseball games. |
Linguistic Mapping | We model this relationship, much like traditional language models , using conditional probability distributions. |
Linguistic Mapping | Unlike traditional language models, however, our grounded language models condition the probability of a word not only on the word(s) uttered before it, but also on the temporal pattern features that describe the nonlinguistic context in which it was uttered. |
Abstract | We propose a language model based on a precise, linguistically motivated grammar (a handcrafted Head-driven Phrase Structure Grammar) and a statistical model estimating the probability of a parse tree. |
Abstract | The language model is applied by means of an N -best rescoring step, which allows to directly measure the performance gains relative to the baseline system without rescoring. |
Introduction | Other linguistically inspired language models like Chelba and J elinek (2000) and Roark (2001) have been applied to continuous speech recognition. |
Introduction | In the first place, we want our language model to reliably distinguish between grammatical and ungrammatical phrases. |
Introduction | However, their grammar-based language model did not make use of a probabilistic component, and it was applied to a rather simple recognition task (dictation texts for pupils read and recorded under good acoustic conditions, no out-of-vocabulary words). |
Language Model 2.1 The General Approach | The language model weight A and the word insertion penalty ip lead to a better performance in practice, but they have no theoretical justification. |
Language Model 2.1 The General Approach | Our grammar-based language model is incorporated into the above expression as an additional probability Pyram(W), weighted by a parameter ,u: |
Language Model 2.1 The General Approach | A major problem of grammar-based approaches to language modeling is how to deal with out-of-grammar utterances. |
Abstract | We propose a succinct randomized language model which employs a peifect hash fimc-tion to encode fingerprints of n-grams and their associated probabilities, backoff weights, or other parameters. |
Abstract | We demonstrate the space-savings of the scheme via machine translation experiments within a distributed language modeling framework. |
Experimental Setup | We deploy the randomized LM in a distributed framework which allows it to scale more easily by distributing it across multiple language model servers. |
Introduction | Language models (LMs) are a core component in statistical machine translation, speech recognition, optical character recognition and many other areas. |
Introduction | With large monolingual corpora available in major languages, making use of all the available data is now a fundamental challenge in language modeling . |
Introduction | have considered alternative parameterizations such as class-based models (Brown et al., 1992), model reduction techniques such as entropy-based pruning (Stolcke, 1998), novel represention schemes such as suffix arrays (Emami et al., 2007), Golomb Coding (Church et al., 2007) and distributed language models that scale more readily (Brants et al., 2007). |
Scaling Language Models | In language modeling the universe under consideration is the set of all possible n-grams of length n for given vocabulary. |
Scaling Language Models | Recent work (Talbot and Osborne, 2007b) has used lossy encodings based on Bloom filters (Bloom, 1970) to represent logarithmically quantized corpus statistics for language modeling . |
Abstract | We propose language modeling methods for solving this problem, and study how to incorporate features such as authority and proximity to accurately estimate the impact language model . |
Impact Summarization | To solve these challenges, in the next section, we propose to model impact with un-igram language models and score sentences using |
Impact Summarization | We further propose methods for estimating the impact language model based on several features including the authority of citations, and the citation proximity. |
Introduction | We propose language models to exploit both the citation context and original content of a paper to generate an impact-based summary. |
Introduction | We study how to incorporate features such as authority and proximity into the estimation of language models . |
Introduction | We propose and evaluate several different strategies for estimating the impact language model , which is key to impact summarization. |
Language Models for Impact Summarization | 3.1 Impact language models |
Language Models for Impact Summarization | We thus propose to represent such a virtual impact query with a unigram language model . |
Language Models for Impact Summarization | Such a model is expected to assign high probabilities to those words that can describe the impact of paper d, just as we expect a query language model in ad hoc retrieval to assign high probabilities to words that tend to occur in relevant documents (Ponte and Croft, 1998). |
Abstract | With this new framework, we employ a target dependency language model during decoding to exploit long distance word relations, which are unavailable with a traditional n—gram language model . |
Dependency Language Model | wh-as-head represents 21);, used as the head, and it is different from 212;, in the dependency language model . |
Dependency Language Model | In order to calculate the dependency language model score, or depLM score for short, on the fly for |
Discussion | (2003) described a two-step string-to-CFG—tree translation model which employed a syntax-based language model to select the best translation from a target parse forest built in the first step. |
Implementation Details | Language model score . |
Implementation Details | Dependency language model score 8. |
Introduction | language model during decoding, in order to exploit long-distance word relations which are unavailable with a traditional n-gram language model on target strings. |
Introduction | Section 3 illustrates of the use of dependency language models . |
String-to-Dependency Translation | Formal definitions also allow us to easily extend the framework to incorporate a dependency language model in decoding. |
String-to-Dependency Translation | Supposing we use a traditional trigram language model in decoding, we need to specify the leftmost two words and the rightmost two words in a state. |
String-to-Dependency Translation | In the next section, we will explain how to extend categories and states to exploit a dependency language model during decoding. |
Abstract | The selection is made according to the appropriateness of the alteration to the query context (using a bigram language model ), or according to its expected impact on the retrieval effectiveness (using a regression model). |
Bigram Expansion Model for Alteration Selection | The query context is modeled by a bigram language model as in (Peng et al. |
Bigram Expansion Model for Alteration Selection | In this work, we used bigram language model to calculate the probability of each path. |
Bigram Expansion Model for Alteration Selection | P(el,ez,...,ei,...,en) = P(e1 )H:=2P(ek Iek_1) (2) P(ek|ek_1) is estimated with a back-off bigram language model (Goodman, 2001). |
Conclusion | In the first method proposed — the Bigram Expansion model, query context is modeled by a bigram language model . |
Introduction | The query context is modeled by a bigram language model . |
Related Work | 2007), a bigram language model is used to determine the alteration of the head word that best fits the query. |
Related Work | In this paper, one of the proposed methods will also use a bigram language model of the query to determine the appropriate alteration candidates. |
A Generic Phrase Training Procedure | Each normalized feature score derived from word alignment models or language models will be log-linearly combined to generate the final score. |
Discussions | We propose several information metrics derived from posterior distribution, language model and word alignments as feature functions. |
Experimental Results | Like other log-linear model based decoders, active features in our translation engine include translation models in two directions, lexicon weights in two directions, language model , lexicalized distortion models, sentence length penalty and other heuristics. |
Experimental Results | The language model is a statistical trigram model estimated with Modified Kneser—Ney smoothing (Chen and Goodman, 1996) using only English sentences in the parallel training data. |
Features | All these features are data-driven and defined based on models, such as statistical word alignment model or language model . |
Features | We apply a language model (LM) to describe the predictive uncertainty (PU) between words in two directions. |
Features | Given a history 10711—1, a language model specifies a conditional distribution of the future word being predicted to follow the history. |
Background | OpenCCG implements a symbolic-statistical chart realization algorithm (Kay, 1996; Carroll et al., 1999; White, 2006b) combining (l) a theoretically grounded approach to syntax and semantic composition with (2) factored language models (Bilmes and Kirchhoff, 2003) for making choices among the options left open by the grammar. |
Background | makes use of n-gram language models over words represented as vectors of factors, including surface form, part of speech, supertag and semantic class. |
Background | 2.3 Factored Language Models |
Introduction | Assigned categories are instantiated in OpenCCG’s chart realizer where, together with a treebank-derived syntactic grammar (Hockenmaier and Steedman, 2007) and a factored language model (Bilmes and Kirchhoff, 2003), they constrain the English word-strings that are chosen to express the LF. |
The Approach | Table 1: Percentage of complete realizations using an oracle n-gram model versus the best performing factored language model . |
The Approach | As shown in Table l, with the large grammar derived from the training sections, many fewer complete realizations are found (before timing out) using the factored language model than are possible, as indicated by the results of using the oracle model. |
Inflection prediction models | We stemmed the reference translations, predicted the inflection for each stem, and measured the accuracy of prediction, using a set of sentences that were not part of the training data (1K sentences were used for Arabic and 5K for Russian).2 Our model performs significantly better than both the random and trigram language model baselines, and achieves an accuracy of over 91%, which suggests that the model is effective when its input is clean in its stem choice and order. |
Integration of inflection models with MT systems | Given such a list of candidate stem sequences, the base MT model together with the inflection model and a language model choose a translation Y* as follows: |
Integration of inflection models with MT systems | PLM) is the joint probability of the sequence of inflected words according to a trigram language model (LM). |
Integration of inflection models with MT systems | In addition, stemming the target sentences reduces the sparsity in the translation tables and language model , and is likely to impact positively the performance of an MT system in terms of its ability to recover correct sequences of stems in the target. |
Introduction | (Goldwater and McClosky, 2005), while the application of a target language model has almost solely been responsible for addressing the second aspect. |
Machine translation systems and data | (2003), a trigram target language model , two order models, word count, phrase count, and average phrase size functions. |
Machine translation systems and data | The features include log-probabilities according to inverted and direct channel models estimated by relative frequency, lexical weighting channel models, a trigram target language model , distortion, word count and phrase count. |
Machine translation systems and data | For each language pair, we used a set of parallel sentences (train) for training the MT system sub-models (e.g., phrase tables, language model ), a set of parallel sentences (lambda) for training the combination weights with max-BLEU training, a set of parallel sentences (dev) for training a small number of combination parameters for our integration methods (see Section 5), and a set of parallel sentences (test) for final evaluation. |
Experiments | In the experiments, the language model is a Chinese 5-gram language model trained with the Chinese part of the LDC parallel corpus and the Xin-hua part of the Chinese Gigaword corpus with about 27 million words. |
Experiments | In the tables, Lm denotes the n-gram language model feature, T mh denotes the feature of collocation between target head words and the candidate measure word, Smh denotes the feature of collocation between source head words and the candidate measure word, HS denotes the feature of source head word selection, Punc denotes the feature of target punctuation position, T [ex denotes surrounding word features in translation, Slex denotes surrounding word features in source sentence, and Pas denotes Part-Of-Speech feature. |
Introduction | Moreover, Chinese measure words often have a long distance dependency to their head words which makes language model ineffective in selecting the correct measure words from the measure word candidate set. |
Introduction | In this case, an n-gram language model with n<15 cannot capture the MW-HW collocation. |
Model Training and Application 3.1 Training | We used the SRI Language Modeling Toolkit (Stolcke, 2002) to train a five-gram model with modified Kneser-Ney smoothing (Chen and Goodman, 1998). |
Our Method | For target features, n-gram language model score is defined as the sum of log n-gram probabilities within the target window after the measure |
Our Method | Target features Source features n-gram language model MW-HW collocation score MW-HW collocation surrounding words surrounding words source head word punctuation position POS tags |
Discriminative Synchronous Transduction | ilar to the methods for decoding with a SCFG intersected with an n-gram language model, which require language model contexts to be stored in each chart cell. |
Discussion and Further Work | To do so would require integrating a language model feature into the max-translation decoding algorithm. |
Evaluation | The feature set includes: a trigram language model (lm) trained |
Evaluation | To compare our model directly with these systems we would need to incorporate additional features and a language model , work which we have left for a later date. |
Evaluation | The relative scores confirm that our model, with its minimalist feature set, achieves comparable performance to the standard feature set without the language model . |
Introduction | Each property indexes a language model , thus allowing documents that incorporate the same |
Model Description | Keyphrases are drawn from a set of clusters; words in the documents are drawn from language models indexed by a set of topics, where the topics correspond to the keyphrase clusters. |
Model Description | — language models of each topic |
Model Description | In the LDA framework, each word is generated from a language model that is indexed by the word’s topic assignment. |
Experiments | The resulting interview transcripts have a reported mean word error rate (WER) of approximately 25% on held out data, which was obtained by priming the language model with meta-data available from preinterview questionnaires. |
Experiments | We use a mixture of the training transcripts and various newswire sources for our language model training. |
Experiments | We did not attempt to prime the language model for particular interviewees or otherwise utilize any interview metadata. |
Introduction | Limitations in signal processing, acoustic modeling, pronunciation, vocabulary, and language modeling can be accommodated in several ways, each of which make different tradeoffs and thus induce different |
Previous Work | In the extreme case, the term may simply be out of vocabulary, although this may occur for various other reasons (e. g., poor language modeling or pronunciation dictionaries). |
Previous work | Language modelling methods build word ngram models, like those used in speech recognition. |
Previous work | 3.2 Language modelling methods |
Previous work | So far, language modelling methods have been more effective. |
The new approach | This corresponds roughly to a unigram language model . |
Experiments | For testing the factored translation systems, we used Moses (Koehn et al., 2007), along with a 5-gram SRILM language model (Stolcke, 2002). |
Factored Model | The factored statistical machine translation model uses a log-linear approach, in order to combine the several components, including the language model , the reordering model, the translation models and the generation models. |
Introduction | 0 The basic SMT approach uses the target language model as a feature in the argument maximisation function. |
Introduction | This language model is trained on grammatically correct text, and would therefore give a good probability for word sequences that are likely to occur in a sentence, while it would penalise ungrammatical or badly ordered formations. |
Introduction | Thus, with respect to these methods, there is a problem when agreement needs to be applied on part of a sentence whose length exceeds the order of the of the target n-gram language model and the size of the chunks that are translated (see Figure 1 for an exam- |
Experiments | language model and (93'8' is the length penalty term |
Experiments | We also use the SRI Language Modeling Toolkit (Stolcke, 2002) to train a trigram language model with Kneser-Ney smoothing on the English side of the bitext. |
Experiments | Besides the trigram language model trained on the English side of these bitext, we also use another trigram model trained on the first 1/3 of the Xinhua portion of Gigaword corpus. |
Forest-based translation | The decoder performs two tasks on the translation forest: l-best search with integrated language model (LM), and k-best search with LM to be used in minimum error rate training. |
Experimental Results | To handle different directions of translation between Chinese and English, we built two trigram language models with modified Kneser-Ney smoothing (Chen and Goodman, 1998) using the SRILM toolkit (Stolcke, 2002). |
Experimental Results | Feature Baseline AAMT language model 0.137 0.133 phrase translation 0.066 0.023 lexical translation 0.061 0.078 reverse phrase translation 0.059 0.103 reverse lexical translation 0. |
Unsupervised Translation Induction for Chinese Abbreviations | Moreover, our approach utilizes both Chinese and English monolingual data to help MT, while most SMT systems utilizes only the English monolingual data to build a language model . |
Unsupervised Translation Induction for Chinese Abbreviations | However, since most of statistical translation models (Koehn et al., 2003; Chiang, 2007; Galley et al., 2006) are symmetrical, it is relatively easy to train a translation system to translate from English to Chinese, except that we need to train a Chinese language model from the Chinese monolingual data. |
Statistical Transliteration Model | The language model P(e) is trained from English texts. |
Statistical Transliteration Model | generative probability of a English syllable language model . |
Statistical Transliteration Model | 2) The language model in backward transliteration describes the relationship of syllables in words. |
Discussion | So long as the vocabulary present in our phrase table and language model supports a literal translation, cohesion tends to produce an improvement. |
Discussion | In the baseline translation, the language model encourages the system to move the negation away from “exist” and toward “reduce.” The result is a tragic reversal of meaning in the translation. |
Introduction | order, forcing the decoder to rely heavily on its language model . |