Index of papers in Proc. ACL 2014 that mention
  • language model
Zhang, Hui and Chiang, David
Abstract
We rederive all the steps of KN smoothing to operate on count distributions instead of integral counts, and apply it to two tasks where KN smoothing was not applicable before: one in language model adaptation, and the other in word alignment.
Introduction
Such cases have been noted for language modeling (Goodman, 2001; Goodman, 2004), domain adaptation (Tam and Schultz, 2008), grapheme-to-phoneme conversion (Bisani and Ney, 2008), and phrase-based translation (Andres-Ferrer, 2010; Wuebker et al., 2012).
Introduction
One is language model domain adaptation, and the other is word alignment using the IBM models (Brown et al., 1993).
Language model adaptation
N -gram language models are widely used in applications like machine translation and speech recognition to select fluent output sentences.
Language model adaptation
Here, we propose to assign each sentence a probability to indicate how likely it is to belong to the domain of interest, and train a language model using expected KN smoothing.
Language model adaptation
They first train two language models , pin on a set of in-domain data, and pout on a set of general-domain data.
Related Work
This method subtracts D directly from the fractional counts, zeroing out counts that are smaller than D. The discount D must be set by minimizing an error metric on held-out data using a line search (Tam, p. c.) or Powell’s method (Bisani and Ney, 2008), requiring repeated estimation and evaluation of the language model .
Smoothing on integral counts
Before presenting our method, we review KN smoothing on integer counts as applied to language models , although, as we will demonstrate in Section 7, KN smoothing is applicable to other tasks as well.
language model is mentioned in 17 sentences in this paper.
Topics mentioned in this paper:
Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin
Abstract
Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Abstract
By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model.
Introduction
Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
Introduction
To overcome the problem, we first propose the method combining translation model with language model in data selection.
Introduction
The language model measures the domain-specif1c generation probability of sentences, being used to select domain-relevant sentences at both sides of source and target language.
Related Work
The existing data selection methods are mostly based on language model .
Related Work
(2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models .
Related Work
(2011) improved the perplexity-based approach and proposed bilingual cross-entropy difference as a ranking function with in-and general- domain language models .
language model is mentioned in 31 sentences in this paper.
Topics mentioned in this paper:
Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming
Abstract
RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner.
Experiments and Results
The language model is a 5-gram language model trained with the target sentences in the training data.
Introduction
Recurrent neural networks are leveraged to learn language model , and they keep the history information circularly inside the network for arbitrarily long time (Mikolov et al., 2010).
Introduction
DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling , translation modelling and distortion modelling.
Introduction
In recursive neural networks, all the representations of nodes are generated based on their child nodes, and it is difficult to integrate additional global information, such as language model and distortion model.
Our Model
Recurrent neural network is usually used for sequence processing, such as language model (Mikolov et al., 2010).
Our Model
Commonly used sequence processing methods, such as Hidden Markov Model (HMM) and n-gram language model , only use a limited history for the prediction.
Our Model
In HMM, the previous state is used as the history, and for n-gram language model (for example n equals to 3), the history is the previous two words.
Related Work
(2013) extend the recurrent neural network language model , in order to use both the source and target side information to scoring translation candidates.
language model is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Auli, Michael and Gao, Jianfeng
Abstract
Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation.
Abstract
We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion.
Expected BLEU Training
We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003).
Expected BLEU Training
We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1):
Expected BLEU Training
which computes a sentence-level language model score as the sum of individual word scores.
Introduction
In this paper we focus on recurrent neural network architectures which have recently advanced the state of the art in language modeling (Mikolov et al., 2010; Mikolov et al., 2011; Sundermeyer et al., 2013) with several subsequent applications in machine translation (Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Hu et al., 2014).
Introduction
(2013) who demonstrated that feed-forward network-based language models are more accurate in first-pass decoding than in rescoring.
Introduction
Decoding with feed-forward architectures is straightforward, since predictions are based on a fixed size input, similar to n-gram language models .
Recurrent Neural Network LMs
Our model has a similar structure to the recurrent neural network language model of Mikolov et al.
language model is mentioned in 16 sentences in this paper.
Topics mentioned in this paper:
Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris
Evaluation
In §3.3, we then examined the effect of using a very large 5-gram language model training on 7.5 billion English tokens to understand the nature of the improvements in §3.2.
Evaluation
The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
Evaluation
The 13 baseline features (2 lexical, 2 phrasal, 5 HRM, and 1 language model , word penalty, phrase length feature and distortion penalty feature) were tuned using MERT (Och, 2003), which is also used to tune the 4 feature weights introduced by the secondary phrase table (2 lexical and 2 phrasal, other features being shared between the two tables).
Generation & Propagation
These candidates are scored using stem-level translation probabilities, morpheme-level lexical weighting probabilities, and a language model , and only the top 30 candidates are included.
Introduction
We evaluated the proposed approach on both Arabic-English and Urdu-English under a range of scenarios (§3), varying the amount and type of monolingual corpora used, and obtained improvements between 1 and 4 BLEU points, even when using very large language models .
language model is mentioned in 18 sentences in this paper.
Topics mentioned in this paper:
P, Deepak and Visweswariah, Karthik
Abstract
We use translation models and language models to exploit lexical correlations and solution post character respectively.
Introduction
The cornerstone of our technique is the usage of a hitherto unexplored textual feature, lexical correlations between problems and solutions, that is exploited along with language model based characterization of solution posts.
Introduction
We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively.
Our Approach
Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions.
Our Approach
In short, each solution word is assumed to be generated from the language model or the translation model (conditioned on the problem words) with a probability of A and l — A respectively, thus accounting for the correlation assumption.
Our Approach
Of the solution words above, generic words such as try and should could probably be explained by (i.e., sampled from) the solution language model , whereas disconnect and rejoin could be correlated well with surf and wifi and hence are more likely to be supported better by the translation model.
Related Work
We will use translation and language models in our method for solution identification.
language model is mentioned in 14 sentences in this paper.
Topics mentioned in this paper:
van Gompel, Maarten and van den Bosch, Antal
Abstract
We study the feasibility of exploiting cross-lingual context to obtain high-quality translation suggestions that improve over statistical language modelling and word-sense disambiguation baselines.
Baselines
A second baseline was constructed by weighing the probabilities from the translation table directly with the L2 language model described earlier.
Baselines
target language modelling ) which is also cus-
Introduction
The main research question in this research is how to disambiguate an L1 word or phrase to its L2 translation based on an L2 context, and whether such cross-lingual contextual approaches provide added value compared to baseline models that are not context informed or compared to standard language models .
System
3.1 Language Model
System
We also implement a statistical language model as an optional component of our classifier-based system and also as a baseline to compare our system to.
System
The language model is a trigram-based back-off language model with Kneser-Ney smoothing, computed using SRILM (Stolcke, 2002) and trained on the same training data as the translation model.
language model is mentioned in 25 sentences in this paper.
Topics mentioned in this paper:
Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang
A Skeleton-based Approach to MT 2.1 Skeleton Identification
In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models .
A Skeleton-based Approach to MT 2.1 Skeleton Identification
Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification
lm(d) and wlm are the score and weight of the language model , respectively.
Evaluation
A 5-gram language model was trained on the Xinhua portion of the English Gi-gaword corpus in addition to the target-side of the bilingual data.
Introduction
0 We develop a skeletal language model to describe the possibility of translation skeleton and handle some of the long-distance word dependencies.
language model is mentioned in 19 sentences in this paper.
Topics mentioned in this paper:
Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz
Abstract
Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented Views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model (LM).
Methods
This trivially allows for an unsegmented language model and never makes desegmentation errors.
Methods
Doing so enables the inclusion of an unsegmented target language model , and with a small amount of bookkeeping, it also allows the inclusion of features related to the operations performed during desegmentation (see Section 3.4).
Methods
We now have a desegmented lattice, but it has not been annotated with an unsegmented (word-level) language model .
Related Work
Bojar (2007) incorporates such analyses into a factored model, to either include a language model over target morphological tags, or model the generation of morphological features.
Related Work
They introduce an additional desegmentation technique that augments the table-based approach with an unsegmented language model .
Related Work
Oflazer and Durgar El-Kahlout (2007) desegment 1000-best lists for English-to-Turkish translation to enable scoring with an unsegmented language model .
language model is mentioned in 13 sentences in this paper.
Topics mentioned in this paper:
Wintrode, Jonathan and Khudanpur, Sanjeev
Abstract
We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models .
Introduction
ASR systems traditionally use N-gram language models to incorporate prior knowledge of word occurrence patterns into prediction of the next word in the token stream.
Introduction
Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
Introduction
The strength of this phenomenon suggests it may be more viable for improving term-detection than, say, topic-sensitive language models .
Motivation
The re-scoring approach we present is closely related to adaptive or cache language models (Je-linek, 1997; Kuhn and De Mori, 1990; Kneser and Steinbiss, 1993).
Motivation
The primary difference between this and previous work on similar language models is the narrower focus here on the term detection task, in which we consider each search term in isolation, rather than all words in the vocabulary.
Results
We train ASR acoustic and language models from the training corpus using the Kaldi speech recognition toolkit (Povey et al., 2011) following the default BABEL training and search recipe which is described in detail by Chen et al.
Term and Document Frequency Statistics
A similar phenomenon is observed concerning adaptive language models (Church, 2000).
Term and Document Frequency Statistics
In general, we can think of using word repetitions to re-score term detection as applying a limited form of adaptive or cache language model (Je-linek, 1997).
Term and Document Frequency Statistics
In applying the burstiness quantity to term detection, we recall that the task requires us to locate a particular instance of a term, not estimate a count, hence the utility of N-gram language models predicting words in sequence.
language model is mentioned in 10 sentences in this paper.
Topics mentioned in this paper:
Fine, Alex B. and Frank, Austin F. and Jaeger, T. Florian and Van Durme, Benjamin
Abstract
We consider the prediction of three human behavioral measures — lexical decision, word naming, and picture naming —through the lens of domain bias in language modeling .
Abstract
This study aims to provoke increased consideration of the human language model by NLP practitioners: biases are not limited to differences between corpora (i.e.
Discussion
Our analyses reveal that 6 commonly used corpora fail to reflect the human language model in various ways related to dialect, modality, and other properties of each corpus.
Discussion
Our results point to a type of bias in commonly used language models that has been previously overlooked.
Discussion
Just as language models have been used to predict reading grade-level of documents (Collins-Thompson and Callan, 2004), human language models could be
Introduction
Computational linguists build statistical language models for aiding in natural language processing (NLP) tasks.
Introduction
In the current study, we exploit errors of the latter variety—failure of a language model to predict human performance—to investigate bias across several frequently used corpora in computational linguistics.
Introduction
: Human Language Model
language model is mentioned in 9 sentences in this paper.
Topics mentioned in this paper:
Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing
Related Work
(2012) adopt the tweets with emoticons to smooth the language model and Hu et al.
Related Work
With the revival of interest in deep learning (Bengio et al., 2013), incorporating the continuous representation of a word as features has been proving effective in a variety of NLP tasks, such as parsing (Socher et al., 2013a), language modeling (Bengio et al., 2003; Mnih and Hinton, 2009) and NER (Turian et al., 2010).
Related Work
The training objective is that the original ngram is expected to obtain a higher language model score than the corrupted ngram by a margin of 1.
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Cohen, Shay B. and Collins, Michael
Abstract
Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.
Experiments on Parsing
8 Experiments on the Saul and Pereira (1997) Model for Language Modeling
Experiments on Parsing
We now describe a second set of experiments, on the Saul and Pereira (1997) model for language modeling .
Experiments on Parsing
We performed the language modeling experiments for a number of reasons.
Introduction
We describe experiments on learning of L-PCFGs, and also on learning of the latent-variable language model of Saul and Pereira (1997).
language model is mentioned in 7 sentences in this paper.
Topics mentioned in this paper:
Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel
Abstract
In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores).
Discussion and Conclusions
While Post found that such a system can effectively distinguish grammatical news text sentences from sentences generated by a language model, measuring the grammaticality of real sentences from language leam-ers seems to require a wider variety of features, including n-gram counts, language model scores, etc.
Experiments
To create further baselines for comparison, we selected the following features that represent ways one might approximate grammaticality if a comprehensive model was unavailable: whether the link parser can fully parse the sentence (complete_l ink), the Gigaword language model score (gigaword_avglogprob), and the number of misspelled tokens (nummisspelled).
System Description
3.2.2 n-gram Count and Language Model Features
System Description
The model computes the following features from a 5-gram language model trained on the same three sections of English Gigaword using the SRILM toolkit (Stolcke, 2002):
System Description
Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM”).
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John
Abstract
Recent work has shown success in using neural network language models (NNLMs) as features in MT systems.
Introduction
Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010).
Introduction
Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window.
Model Variations
In particular, we can reverse the translation direction of the languages, as well as the direction of the language model .
Model Variations
0 5-gram Kneser-Ney LM 0 Recurrent neural network language model (RNNLM) (Mikolov et al., 2010)
Neural Network Joint Model (NNJ M)
Fortunately, neural network language models are able to elegantly scale up and take advantage of arbitrarily large context sizes.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Tan, Chenhao and Lee, Lillian and Pang, Bo
Introduction
This crawling process also yielded 632K TAC pairs whose only difference was spacing, and an additional 558M “unpaired” tweets; as shown later in this paper, we used these extra corpora for computing language models and other auxiliary information.
Introduction
Table 5: Conformity to the community and one’s own past, measured via scores assigned by various language models .
Introduction
We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Kobayashi, Hayato
Experiments
Although we did not examine the accuracy of real tasks in this paper, there is an interesting report that the word error rate of language models follows a power law with respect to perplexity (Klakow and Peters, 2002).
Introduction
Removing low-frequency words from a corpus (often called cutofi‘) is a common practice to save on the computational costs involved in learning language models and topic models.
Introduction
In the case of language models , we often have to remove low-frequency words because of a lack of computational resources, since the feature space of k:-grams tends to be so large that we sometimes need cutoffs even in a distributed environment (Brants et al., 2007).
Perplexity on Reduced Corpora
Constant restoring is similar to the additive smoothing defined by 13(w) oc p’ + A, which is used to solve the zero-frequency problem of language models (Chen and Goodman, 1996).
Perplexity on Reduced Corpora
77k: _ 1 H7Tk (7176 _ 1)H7Tk This means that we can determine the rough sparseness of k-grams and adjust some of the parameters such as the gram size k in learning statistical language models .
Perplexity on Reduced Corpora
LDA is a probabilistic language model that generates a corpus as a mixture of hidden topics, and it allows us to infer two parameters: the document-topic distribution 6 that represents the mixture rate of topics in each document, and the topic-word distribution gb that represents the occurrence rate of words in each topic.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Zhai, Ke and Williams, Jason D
Experiments
Illustrated by the highlighted states in 6, LM—HMM model conflates interactions that commonly occur at the beginning and end of a dialogue—i.e., “acknowledge agent” and “resolve problem”, since their underlying language models are likely to produce similar probability distributions over words.
Experiments
By incorporating topic information, our proposed models (e.g., TM—HMMSS in Figure 5) are able to enforce the state transitions towards more frequent flow patterns, which further helps to overcome the weakness of language model .
Latent Structure in Dialogues
The simplest formulation we consider is an HMM where each state contains a unigram language model (LM), proposed by Chotimongkol (2008) for task-oriented dialogue and originally
Latent Structure in Dialogues
3: For each word in utterance n, first choose a word source 7“ according to 1', and then depending on 7“, generate a word 21) either from the session-wide topic distribution 6 or the language model specified by the state 37,.
Latent Structure in Dialogues
4Note that a TM-HMMS model with state-specific topic models (instead of state-specific language models ) would be subsumed by TM—HMM, since one topic could be used as the background topic in TM -HMMS.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Narayan, Shashi and Gardent, Claire
Related Work
It is combined with a language model to improve grammaticality and the decoder translates sentences into sim-
Simplification Framework
In addition, the language model we integrate in the SMT module helps ensuring better fluency and grammaticality.
Simplification Framework
Finally the translation and language model ensures that published, describing and boson are simplified to wrote, explaining and elementary particle respectively; and that the phrase “In 1964” is moved from the beginning of the sentence to its end.
Simplification Framework
Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality.
language model is mentioned in 6 sentences in this paper.
Topics mentioned in this paper:
Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen
Introduction
The best-performing systems for these applications today rely on training on large amounts of data: in the case of ASR, the data is aligned audio and transcription, plus large unannotated data for the language modeling ; in the case of OCR, it is transcribed optical data; in the case of MT, it is aligned bitexts.
Introduction
For ASR and OCR, which can compose words from smaller units (phones or graphically recognized letters), an expanded target language vocabulary can be directly exploited without the need for changing the technology at all: the new words need to be inserted into the relevant resources (lexicon, language model ) etc, with appropriately estimated probabilities.
Introduction
The expanded word combinations can be used to extend the language models used for MT to bias against incoherent hypothesized new sequences of segmented words.
Morphology-based Vocabulary Expansion
In the Bigram Affix model, we do the same for the stem as in the Fixed Affix model, but for prefixes and suffixes, we create a bigram language model in the finite state machine.
Morphology-based Vocabulary Expansion
We reweight the weights in the WFST model (Fixed or Bigram) by composing it with a letter trigraph language model (WoTr).
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona
MT System Selection
These features rely on language models , MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary.
MT System Selection
two language models : MSA and Egyptian.
MT System Selection
The second set of features uses perplexity against language models built from the source-side of the training data of each of the four
Machine Translation Experiments
The language model for our systems is trained on English Gigaword (Graff and Cieri, 2003).
Machine Translation Experiments
We use SRILM Toolkit (Stolcke, 2002) to build a 5-gram language model with modified
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Tu, Mei and Zhou, Yu and Zong, Chengqing
A semantic span can include one or more eus.
Most translation systems adopt the features from a translation model, a language model , and sometimes a reordering model.
A semantic span can include one or more eus.
The process of training this transfer model and smoothing is similar to the process of training a language model .
A semantic span can include one or more eus.
formula (6) are estimated in the same way as a factored language model , which has the advantage of easily incorporating various linguistic information.
Experiments
A 5-gram language model is trained with SRILM5 on the combination of the Xinhua portion of the English Giga-word corpus combined with the English part of FBIS.
Experiments
probabilities, the BTG reordering features, and the language model feature.
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan
Conclusion
Further improvement is possible by incorporating topic models deeper in the decoding process and adding domain knowledge to the language model .
Discussion
6.3 Improving Language Models
Discussion
Topic models capture document-level properties of language, but a critical component of machine translation systems is the language model , which provides local constraints and preferences.
Discussion
Domain adaptation for language models (Bellegarda, 2004; Wood and Teh, 2009) is an important avenue for improving machine translation.
Experiments
We train a modified Kneser—Ney trigram language model on English (Chen and Goodman, 1996).
language model is mentioned in 5 sentences in this paper.
Topics mentioned in this paper:
Xiong, Deyi and Zhang, Min
Conclusion
Additionally, we also want to induce sense clusters for words in the target language so that we can build sense-based language model and integrate it into SMT.
Decoding with Sense-Based Translation Model
error rate training (MERT) (Och, 2003) together with other models such as the language model .
Experiments
We trained a 5-gram language model on the Xinhua section of the English Gigaword corpus (306 million words) using the SRILM toolkit (Stolcke, 2002) with the modified Kneser—Ney smoothing (Chen and Goodman, 1996).
Related Work
(2007) also explore a bilingual topic model for translation and language model adaptation.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Levitan, Rivka and Elson, David
Features
We look at the language model (LM) score and the number of alternate pronunciations of the first query, predicting that a misrecognized query will have a lower LM score and more alternate pronunciations.
Prediction task
In addition, the language model likelihood for the first query was, as expected, significantly lower for retries.
Related Work
Retry cases are identified with joint language modeling across multiple transcripts, with the intuition that retry pairs tend to be closely related or exact duplicates.
Related Work
While we follow this work in our usage of joint language modeling , our application encompasses open domain voice searches and voice actions (such as placing calls), so we cannot use simplifying domain assumptions.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Jia, Zhongye and Zhao, Hai
Experiments
SRILM (Stolcke, 2002) is adopted for language model training and KenLM (Heafield, 2011; Heafield et al., 2013) for language model query.
Pinyin Input Method Model
The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Related Works
They solved the typo correction problem by decomposing the conditional probability P(H |P) of Chinese character sequence H given pinyin sequence P into a language model P(wi|wi_1) and a typing model The typing model that was estimated on real user input data was for typo correction.
Related Works
Various approaches were made for the task including language model (LM) based methods (Chen et al., 2013), ME model (Han and Chang, 2013), CRF (Wang et al., 2013d; Wang et al., 2013a), SMT (Chiu et al., 2013; Liu et al., 2013), and graph model (Jia et al., 2013), etc.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Hasan, Kazi Saidul and Ng, Vincent
Keyphrase Extraction Approaches
3.3.4 Language Modeling
Keyphrase Extraction Approaches
These feature values are estimated using language models (LMs) trained on a foreground corpus and a background corpus.
Keyphrase Extraction Approaches
In sum, LMA uses a language model rather than heuristics to identify phrases, and relies on the language model trained on the background corpus to determine how “unique” a candidate keyphrase is to the domain represented by the foreground corpus.
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun
Experiments
An in-house language modeling toolkit is used to train the 5-gram language model with modified Kneser-Ney smoothing (Kneser and Ney, 1995).
Experiments
The English monolingual data used for language modeling is the same as in Table 1.
Related Work
They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
Topic Similarity Model with Neural Network
Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).
language model is mentioned in 4 sentences in this paper.
Topics mentioned in this paper:
Labutov, Igor and Lipson, Hod
Model
Broadly, as the learner progresses from one sentence to the next, exposing herself to more novel words, the updated parameters of the language model in turn guide the selection of new “switch-points” for replacing source words with the target foreign words.
Model
Generally, this value may come directly from the surprisal quantity given by a language model , or may incorporate additional features that are found informative in predicting the constraint on the word.
Related Work
Building on their work, (Adel et al., 2012) employ additional features and a recurrent network language model for modeling code-switching in conversational speech.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil
Background
The RNN is primarily used as a language model , but may also be viewed as a sentence model with a linear structure.
Introduction
Besides comprising powerful classifiers as part of their architecture, neural sentence models can be used to condition a neural language model to generate sentences word by word (Schwenk, 2012; Mikolov and Zweig, 2012; Kalchbrenner and Blunsom, 2013a).
Properties of the Sentence Model
This gives the RNN excellent performance at language modelling , but it is suboptimal for remembering at once the n-grams further back in the input sentence.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Jansen, Peter and Surdeanu, Mihai and Clark, Peter
Models and Features
In particular, we use the recurrent neural network language model (RNNLM) of Mikolov et al.
Models and Features
Like any language model , a RNNLM estimates the probability of observing a word given the preceding context, but, in this process, it learns word embeddings into a latent, conceptual space with a fixed number of dimensions.
Related Work
(2013) recently addressed the problem of answer sentence selection and demonstrated that LS models, including recurrent neural network language models (RNNLM), have a higher contribution to overall performance than exploiting syntactic analysis.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Hermann, Karl Moritz and Blunsom, Phil
Introduction
Successful applications of such models include language modelling (Bengio et al., 2003), paraphrase detection (Erk and Pado, 2008), and dialogue analysis (Kalchbrenner and Blunsom, 2013).
Related Work
Neural language models are another popular approach for inducing distributed word representations (Bengio et al., 2003).
Related Work
They have received a lot of attention in recent years (Collobert and Weston, 2008; Mnih and Hinton, 2009; Mikolov et al., 2010, inter alia) and have achieved state of the art performance in language modelling .
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Gyawali, Bikash and Gardent, Claire
Conclusion
In the current version of the generator, the output is ranked using a simple language model trained on the GENIA corpus.
Generating from the KBGen Knowledge-Base
To rank the generator output, we train a language model on the GeniA corpus 4, a corpus of 2000 MEDLINE asbtracts about biology containing more than 400000 words (Kim et al., 2003) and use this model to rank the generated sentences by decreasing probability.
Related Work
They intersect the grammar with a language model to improve fluency; use a weighted hypergraph to pack the derivations; and find the best derivation tree using Viterbi algorithm.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Baroni, Marco and Dinu, Georgiana and Kruszewski, Germán
Abstract
Context-predicting models (more commonly known as embeddings or neural language models ) are the new kids on the distributional semantics block.
Introduction
This is in part due to the fact that context-predicting vectors were first developed as an approach to language modeling and/or as a way to initialize feature vectors in neural-network-based “deep learning” NLP architectures, so their effectiveness as semantic representations was initially seen as little more than an interesting side effect.
Introduction
Predictive DSMs are also called neural language models , because their supervised context prediction training is performed with neural networks, or, more cryptically, “embeddings”.
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper:
Bamman, David and Underwood, Ted and Smith, Noah A.
Data
To manage the degrees of freedom in the model described in §4, we perform dimensionality reduction on the vocabulary by learning word embed-dings with a log-linear continuous skip-gram language model (Mikolov et al., 2013) on the entire collection of 15,099 books.
Model
Maximum entropy approaches to language modeling have been used since Rosenfeld (1996) to incorporate long-distance information, such as previously-mentioned trigger words, into n-gram language models .
Model
Number of personas (hyperparameter) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter
language model is mentioned in 3 sentences in this paper.
Topics mentioned in this paper: