Index of papers in Proc. ACL 2014 that mention

language model

Seen in text as:

language model (204)
language models (69)
language modeling (21)
Language Model (8)
language modelling (6)
Language Modeling (3)

Seen in 296 sentences in 35 papers.

1. Kneser-Ney Smoothing on Expected Counts

Zhang, Hui and Chiang, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We rederive all the steps of KN smoothing to operate on count distributions instead of integral counts, and apply it to two tasks where KN smoothing was not applicable before: one in language model adaptation, and the other in word alignment.
Introduction	Such cases have been noted for language modeling (Goodman, 2001; Goodman, 2004), domain adaptation (Tam and Schultz, 2008), grapheme-to-phoneme conversion (Bisani and Ney, 2008), and phrase-based translation (Andres-Ferrer, 2010; Wuebker et al., 2012).
Introduction	One is language model domain adaptation, and the other is word alignment using the IBM models (Brown et al., 1993).
Language model adaptation	N -gram language models are widely used in applications like machine translation and speech recognition to select fluent output sentences.
Language model adaptation	Here, we propose to assign each sentence a probability to indicate how likely it is to belong to the domain of interest, and train a language model using expected KN smoothing.
Language model adaptation	They first train two language models , pin on a set of in-domain data, and pout on a set of general-domain data.
Related Work	This method subtracts D directly from the fractional counts, zeroing out counts that are smaller than D. The discount D must be set by minimizing an error metric on held-out data using a line search (Tam, p. c.) or Powell’s method (Bisani and Ney, 2008), requiring repeated estimation and evaluation of the language model .
Smoothing on integral counts	Before presenting our method, we review KN smoothing on integer counts as applied to language models , although, as we will demonstrate in Section 7, KN smoothing is applicable to other tasks as well.

language model is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

2. Effective Selection of Translation Model Training Data

Liu, Le and Hong, Yu and Liu, Hao and Wang, Xing and Yao, Jianmin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Most current data selection methods solely use language models trained on a small scale in-domain data to select domain-relevant sentence pairs from general-domain parallel corpus.
Abstract	By contrast, we argue that the relevance between a sentence pair and target domain can be better evaluated by the combination of language model and translation model.
Introduction	Current data selection methods mostly use language models trained on small scale in-domain data to measure domain relevance and select domain-relevant parallel sentence pairs to expand training corpora.
Introduction	To overcome the problem, we first propose the method combining translation model with language model in data selection.
Introduction	The language model measures the domain-specif1c generation probability of sentences, being used to select domain-relevant sentences at both sides of source and target language.
Related Work	The existing data selection methods are mostly based on language model .
Related Work	(2010) ranked the sentence pairs in the general-domain corpus according to the perplexity scores of sentences, which are computed with respect to in-domain language models .
Related Work	(2011) improved the perplexity-based approach and proposed bilingual cross-entropy difference as a ranking function with in-and general- domain language models .

language model is mentioned in 31 sentences in this paper.

Topics mentioned in this paper:

3. A Recursive Recurrent Neural Network for Statistical Machine Translation

Liu, Shujie and Yang, Nan and Li, Mu and Zhou, Ming

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks, so as to generate the translation candidates in a bottom up manner.
Experiments and Results	The language model is a 5-gram language model trained with the target sentences in the training data.
Introduction	Recurrent neural networks are leveraged to learn language model , and they keep the history information circularly inside the network for arbitrarily long time (Mikolov et al., 2010).
Introduction	DNN is also introduced to Statistical Machine Translation (SMT) to learn several components or features of conventional framework, including word alignment, language modelling , translation modelling and distortion modelling.
Introduction	In recursive neural networks, all the representations of nodes are generated based on their child nodes, and it is difficult to integrate additional global information, such as language model and distortion model.
Our Model	Recurrent neural network is usually used for sequence processing, such as language model (Mikolov et al., 2010).
Our Model	Commonly used sequence processing methods, such as Hidden Markov Model (HMM) and n-gram language model , only use a limited history for the prediction.
Our Model	In HMM, the previous state is used as the history, and for n-gram language model (for example n equals to 3), the history is the previous two words.
Related Work	(2013) extend the recurrent neural network language model , in order to use both the source and target side information to scoring translation candidates.

language model is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

4. Decoder Integration and Expected BLEU Training for Recurrent Neural Network Language Models

Auli, Michael and Gao, Jianfeng

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation.
Abstract	We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion.
Expected BLEU Training	We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003).
Expected BLEU Training	We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt\|w1...wt_1,ht_1):
Expected BLEU Training	which computes a sentence-level language model score as the sum of individual word scores.
Introduction	In this paper we focus on recurrent neural network architectures which have recently advanced the state of the art in language modeling (Mikolov et al., 2010; Mikolov et al., 2011; Sundermeyer et al., 2013) with several subsequent applications in machine translation (Auli et al., 2013; Kalchbrenner and Blunsom, 2013; Hu et al., 2014).
Introduction	(2013) who demonstrated that feed-forward network-based language models are more accurate in first-pass decoding than in rescoring.
Introduction	Decoding with feed-forward architectures is straightforward, since predictions are based on a fixed size input, similar to n-gram language models .
Recurrent Neural Network LMs	Our model has a similar structure to the recurrent neural network language model of Mikolov et al.

language model is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

5. Graph-based Semi-Supervised Learning of Translation Models from Monolingual Data

Saluja, Avneesh and Hassan, Hany and Toutanova, Kristina and Quirk, Chris

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	In §3.3, we then examined the effect of using a very large 5-gram language model training on 7.5 billion English tokens to understand the nature of the improvements in §3.2.
Evaluation	The Urdu to English evaluation in §3.4 focuses on how noisy parallel data and completely monolingual (i.e., not even comparable) text can be used for a realistic low-resource language pair, and is evaluated with the larger language model only.
Evaluation	The 13 baseline features (2 lexical, 2 phrasal, 5 HRM, and 1 language model , word penalty, phrase length feature and distortion penalty feature) were tuned using MERT (Och, 2003), which is also used to tune the 4 feature weights introduced by the secondary phrase table (2 lexical and 2 phrasal, other features being shared between the two tables).
Generation & Propagation	These candidates are scored using stem-level translation probabilities, morpheme-level lexical weighting probabilities, and a language model , and only the top 30 candidates are included.
Introduction	We evaluated the proposed approach on both Arabic-English and Urdu-English under a range of scenarios (§3), varying the amount and type of monolingual corpora used, and obtained improvements between 1 and 4 BLEU points, even when using very large language models .

language model is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

language model (18)
BLEU (15)
bigram (13)

6. Unsupervised Solution Post Identification from Discussion Forums

P, Deepak and Visweswariah, Karthik

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We use translation models and language models to exploit lexical correlations and solution post character respectively.
Introduction	The cornerstone of our technique is the usage of a hitherto unexplored textual feature, lexical correlations between problems and solutions, that is exploited along with language model based characterization of solution posts.
Introduction	We model the lexical correlation and solution post character using regularized translation models and unigram language models respectively.
Our Approach	Consider a unigram language model 83 that models the lexical characteristics of solution posts, and a translation model 73 that models the lexical correlation between problems and solutions.
Our Approach	In short, each solution word is assumed to be generated from the language model or the translation model (conditioned on the problem words) with a probability of A and l — A respectively, thus accounting for the correlation assumption.
Our Approach	Of the solution words above, generic words such as try and should could probably be explained by (i.e., sampled from) the solution language model , whereas disconnect and rejoin could be correlated well with surf and wifi and hence are more likely to be supported better by the translation model.
Related Work	We will use translation and language models in our method for solution identification.

language model is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

7. Translation Assistance by Translation of L1 Fragments in an L2 Context

van Gompel, Maarten and van den Bosch, Antal

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We study the feasibility of exploiting cross-lingual context to obtain high-quality translation suggestions that improve over statistical language modelling and word-sense disambiguation baselines.
Baselines	A second baseline was constructed by weighing the probabilities from the translation table directly with the L2 language model described earlier.
Baselines	target language modelling ) which is also cus-
Introduction	The main research question in this research is how to disambiguate an L1 word or phrase to its L2 translation based on an L2 context, and whether such cross-lingual contextual approaches provide added value compared to baseline models that are not context informed or compared to standard language models .
System	3.1 Language Model
System	We also implement a statistical language model as an optional component of our classifier-based system and also as a baseline to compare our system to.
System	The language model is a trigram-based back-off language model with Kneser-Ney smoothing, computed using SRILM (Stolcke, 2002) and trained on the same training data as the translation model.

language model is mentioned in 25 sentences in this paper.

Topics mentioned in this paper:

8. A Hybrid Approach to Skeleton-based Translation

Xiao, Tong and Zhu, Jingbo and Zhang, Chunliang

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Skeleton-based Approach to MT 2.1 Skeleton Identification	In this work both the skeleton translation model gskel (d) and full translation model gfuu (d) resemble the usual forms used in phrase-based MT, i.e., the model score is computed by a linear combination of a group of phrase-based features and language models .
A Skeleton-based Approach to MT 2.1 Skeleton Identification	Given a translation model m, a language model lm and a vector of feature weights w, the model score of a derivation d is computed by
A Skeleton-based Approach to MT 2.1 Skeleton Identification	lm(d) and wlm are the score and weight of the language model , respectively.
Evaluation	A 5-gram language model was trained on the Xinhua portion of the English Gi-gaword corpus in addition to the target-side of the bilingual data.
Introduction	0 We develop a skeletal language model to describe the possibility of translation skeleton and handle some of the long-distance word dependencies.

language model is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

9. Lattice Desegmentation for Statistical Machine Translation

Salameh, Mohammad and Cherry, Colin and Kondrak, Grzegorz

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our novel lattice desegmentation algorithm effectively combines both segmented and desegmented Views of the target language for a large subspace of possible translation outputs, which allows for inclusion of features related to the desegmentation process, as well as an unsegmented language model (LM).
Methods	This trivially allows for an unsegmented language model and never makes desegmentation errors.
Methods	Doing so enables the inclusion of an unsegmented target language model , and with a small amount of bookkeeping, it also allows the inclusion of features related to the operations performed during desegmentation (see Section 3.4).
Methods	We now have a desegmented lattice, but it has not been annotated with an unsegmented (word-level) language model .
Related Work	Bojar (2007) incorporates such analyses into a factored model, to either include a language model over target morphological tags, or model the generation of morphological features.
Related Work	They introduce an additional desegmentation technique that augments the table-based approach with an unsegmented language model .
Related Work	Oflazer and Durgar El-Kahlout (2007) desegment 1000-best lists for English-to-Turkish translation to enable scoring with an unsegmented language model .

language model is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

LM (16)
language model (13)
BLEU (13)

10. Can You Repeat That? Using Word Repetition to Improve Spoken Term Detection

Wintrode, Jonathan and Khudanpur, Sanjeev

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We aim to improve spoken term detection performance by incorporating contextual information beyond traditional N-gram language models .
Introduction	ASR systems traditionally use N-gram language models to incorporate prior knowledge of word occurrence patterns into prediction of the next word in the token stream.
Introduction	Yet, though many language models more sophisticated than N- grams have been proposed, N-grams are empirically hard to beat in terms of WER.
Introduction	The strength of this phenomenon suggests it may be more viable for improving term-detection than, say, topic-sensitive language models .
Motivation	The re-scoring approach we present is closely related to adaptive or cache language models (Je-linek, 1997; Kuhn and De Mori, 1990; Kneser and Steinbiss, 1993).
Motivation	The primary difference between this and previous work on similar language models is the narrower focus here on the term detection task, in which we consider each search term in isolation, rather than all words in the vocabulary.
Results	We train ASR acoustic and language models from the training corpus using the Kaldi speech recognition toolkit (Povey et al., 2011) following the default BABEL training and search recipe which is described in detail by Chen et al.
Term and Document Frequency Statistics	A similar phenomenon is observed concerning adaptive language models (Church, 2000).
Term and Document Frequency Statistics	In general, we can think of using word repetitions to re-score term detection as applying a limited form of adaptive or cache language model (Je-linek, 1997).
Term and Document Frequency Statistics	In applying the burstiness quantity to term detection, we recall that the task requires us to locate a particular instance of a term, not estimate a count, hence the utility of N-gram language models predicting words in sequence.

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

11. Biases in Predicting the Human Language Model

Fine, Alex B. and Frank, Austin F. and Jaeger, T. Florian and Van Durme, Benjamin

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We consider the prediction of three human behavioral measures — lexical decision, word naming, and picture naming —through the lens of domain bias in language modeling .
Abstract	This study aims to provoke increased consideration of the human language model by NLP practitioners: biases are not limited to differences between corpora (i.e.
Discussion	Our analyses reveal that 6 commonly used corpora fail to reflect the human language model in various ways related to dialect, modality, and other properties of each corpus.
Discussion	Our results point to a type of bias in commonly used language models that has been previously overlooked.
Discussion	Just as language models have been used to predict reading grade-level of documents (Collins-Thompson and Callan, 2004), human language models could be
Introduction	Computational linguists build statistical language models for aiding in natural language processing (NLP) tasks.
Introduction	In the current study, we exploit errors of the latter variety—failure of a language model to predict human performance—to investigate bias across several frequently used corpora in computational linguistics.
Introduction	: Human Language Model

language model is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

language model (9)
n-grams (4)

12. Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

Tang, Duyu and Wei, Furu and Yang, Nan and Zhou, Ming and Liu, Ting and Qin, Bing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	(2012) adopt the tweets with emoticons to smooth the language model and Hu et al.
Related Work	With the revival of interest in deep learning (Bengio et al., 2013), incorporating the continuous representation of a word as features has been proving effective in a variety of NLP tasks, such as parsing (Socher et al., 2013a), language modeling (Bengio et al., 2003; Mnih and Hinton, 2009) and NER (Turian et al., 2010).
Related Work	The training objective is that the original ngram is expected to obtain a higher language model score than the corrupted ngram by a margin of 1.

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

13. A Provably Correct Learning Algorithm for Latent-Variable PCFGs

Cohen, Shay B. and Collins, Michael

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Experiments on parsing and a language modeling problem show that the algorithm is efficient and effective in practice.
Experiments on Parsing	8 Experiments on the Saul and Pereira (1997) Model for Language Modeling
Experiments on Parsing	We now describe a second set of experiments, on the Saul and Pereira (1997) model for language modeling .
Experiments on Parsing	We performed the language modeling experiments for a number of reasons.
Introduction	We describe experiments on learning of L-PCFGs, and also on learning of the latent-variable language model of Saul and Pereira (1997).

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Predicting Grammaticality on an Ordinal Scale

Heilman, Michael and Cahill, Aoife and Madnani, Nitin and Lopez, Melissa and Mulholland, Matthew and Tetreault, Joel

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this work, we construct a statistical model of grammaticality using various linguistic features (e.g., misspelling counts, parser outputs, n-gram language model scores).
Discussion and Conclusions	While Post found that such a system can effectively distinguish grammatical news text sentences from sentences generated by a language model, measuring the grammaticality of real sentences from language leam-ers seems to require a wider variety of features, including n-gram counts, language model scores, etc.
Experiments	To create further baselines for comparison, we selected the following features that represent ways one might approximate grammaticality if a comprehensive model was unavailable: whether the link parser can fully parse the sentence (complete_l ink), the Gigaword language model score (gigaword_avglogprob), and the number of misspelled tokens (nummisspelled).
System Description	3.2.2 n-gram Count and Language Model Features
System Description	The model computes the following features from a 5-gram language model trained on the same three sections of English Gigaword using the SRILM toolkit (Stolcke, 2002):
System Description	Finally, the system computes the average log-probability and number of out-of-vocabulary words from a language model trained on a collection of essays written by nonnative English speakers7 (“nonnative LM”).

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Fast and Robust Neural Network Joint Models for Statistical Machine Translation

Devlin, Jacob and Zbib, Rabih and Huang, Zhongqiang and Lamar, Thomas and Schwartz, Richard and Makhoul, John

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Recent work has shown success in using neural network language models (NNLMs) as features in MT systems.
Introduction	Initially, these models were primarily used to create n-gram neural network language models (NNLMs) for speech recognition and machine translation (Bengio et al., 2003; Schwenk, 2010).
Introduction	Specifically, we introduce a novel formulation for a neural network joint model (NNJ M), which augments an n-gram target language model with an m-word source window.
Model Variations	In particular, we can reverse the translation direction of the languages, as well as the direction of the language model .
Model Variations	0 5-gram Kneser-Ney LM 0 Recurrent neural network language model (RNNLM) (Mikolov et al., 2010)
Neural Network Joint Model (NNJ M)	Fortunately, neural network language models are able to elegantly scale up and take advantage of arbitrarily large context sizes.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. The effect of wording on message propagation: Topic- and author-controlled natural experiments on Twitter

Tan, Chenhao and Lee, Lillian and Pang, Bo

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	This crawling process also yielded 632K TAC pairs whose only difference was spacing, and an additional 558M “unpaired” tweets; as shown later in this paper, we used these extra corpora for computing language models and other auxiliary information.
Introduction	Table 5: Conformity to the community and one’s own past, measured via scores assigned by various language models .
Introduction	We measure a tweet’s similarity to expectations by its score according to the relevant language model, fi ZweTlog(p(m)), where T refers to either all the unigrams (unigram model) or all and only bi-grams (bigram model).16 We trained a Twitter-community language model from our 558M unpaired tweets, and personal language models from each author’s tweet history.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Perplexity on Reduced Corpora

Kobayashi, Hayato

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Although we did not examine the accuracy of real tasks in this paper, there is an interesting report that the word error rate of language models follows a power law with respect to perplexity (Klakow and Peters, 2002).
Introduction	Removing low-frequency words from a corpus (often called cutofi‘) is a common practice to save on the computational costs involved in learning language models and topic models.
Introduction	In the case of language models , we often have to remove low-frequency words because of a lack of computational resources, since the feature space of k:-grams tends to be so large that we sometimes need cutoffs even in a distributed environment (Brants et al., 2007).
Perplexity on Reduced Corpora	Constant restoring is similar to the additive smoothing defined by 13(w) oc p’ + A, which is used to solve the zero-frequency problem of language models (Chen and Goodman, 1996).
Perplexity on Reduced Corpora	77k: _ 1 H7Tk (7176 _ 1)H7Tk This means that we can determine the rough sparseness of k-grams and adjust some of the parameters such as the gram size k in learning statistical language models .
Perplexity on Reduced Corpora	LDA is a probabilistic language model that generates a corpus as a mixture of hidden topics, and it allows us to infer two parameters: the document-topic distribution 6 that represents the mixture rate of topics in each document, and the topic-word distribution gb that represents the occurrence rate of words in each topic.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

unigram (13)
topic models (12)
LDA (10)

18. Discovering Latent Structure in Task-Oriented Dialogues

Zhai, Ke and Williams, Jason D

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Illustrated by the highlighted states in 6, LM—HMM model conflates interactions that commonly occur at the beginning and end of a dialogue—i.e., “acknowledge agent” and “resolve problem”, since their underlying language models are likely to produce similar probability distributions over words.
Experiments	By incorporating topic information, our proposed models (e.g., TM—HMMSS in Figure 5) are able to enforce the state transitions towards more frequent flow patterns, which further helps to overcome the weakness of language model .
Latent Structure in Dialogues	The simplest formulation we consider is an HMM where each state contains a unigram language model (LM), proposed by Chotimongkol (2008) for task-oriented dialogue and originally
Latent Structure in Dialogues	3: For each word in utterance n, first choose a word source 7“ according to 1', and then depending on 7“, generate a word 21) either from the session-wide topic distribution 6 or the language model specified by the state 37,.
Latent Structure in Dialogues	4Note that a TM-HMMS model with state-specific topic models (instead of state-specific language models ) would be subsumed by TM—HMM, since one topic could be used as the background topic in TM -HMMS.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

19. Hybrid Simplification using Deep Semantics and Machine Translation

Narayan, Shashi and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	It is combined with a language model to improve grammaticality and the decoder translates sentences into sim-
Simplification Framework	In addition, the language model we integrate in the SMT module helps ensuring better fluency and grammaticality.
Simplification Framework	Finally the translation and language model ensures that published, describing and boson are simplified to wrote, explaining and elementary particle respectively; and that the phrase “In 1964” is moved from the beginning of the sentence to its end.
Simplification Framework	Our simplification framework consists of a probabilistic model for splitting and dropping which we call DRS simplification model (DRS-SM); a phrase based translation model for substitution and reordering (PBMT); and a language model learned on Simple English Wikipedia (LM) for fluency and grammaticality.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

20. Unsupervised Morphology-Based Vocabulary Expansion

Rasooli, Mohammad Sadegh and Lippincott, Thomas and Habash, Nizar and Rambow, Owen

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	The best-performing systems for these applications today rely on training on large amounts of data: in the case of ASR, the data is aligned audio and transcription, plus large unannotated data for the language modeling ; in the case of OCR, it is transcribed optical data; in the case of MT, it is aligned bitexts.
Introduction	For ASR and OCR, which can compose words from smaller units (phones or graphically recognized letters), an expanded target language vocabulary can be directly exploited without the need for changing the technology at all: the new words need to be inserted into the relevant resources (lexicon, language model ) etc, with appropriately estimated probabilities.
Introduction	The expanded word combinations can be used to extend the language models used for MT to bias against incoherent hypothesized new sequences of segmented words.
Morphology-based Vocabulary Expansion	In the Bigram Affix model, we do the same for the stem as in the Fixed Affix model, but for prefixes and suffixes, we create a bigram language model in the finite state machine.
Morphology-based Vocabulary Expansion	We reweight the weights in the WFST model (Fixed or Bigram) by composing it with a letter trigraph language model (WoTr).

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

21. Sentence Level Dialect Identification for Machine Translation System Selection

Salloum, Wael and Elfardy, Heba and Alamir-Salloum, Linda and Habash, Nizar and Diab, Mona

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

MT System Selection	These features rely on language models , MSA and Egyptian morphological analyzers and a Highly Dialectal Egyptian lexicon to decide whether each word is MSA, Egyptian, Both, or Out of Vocabulary.
MT System Selection	two language models : MSA and Egyptian.
MT System Selection	The second set of features uses perplexity against language models built from the source-side of the training data of each of the four
Machine Translation Experiments	The language model for our systems is trained on English Gigaword (Graff and Cieri, 2003).
Machine Translation Experiments	We use SRILM Toolkit (Stolcke, 2002) to build a 5-gram language model with modified

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

22. Enhancing Grammatical Cohesion: Generating Transitional Expressions for SMT

Tu, Mei and Zhou, Yu and Zong, Chengqing

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A semantic span can include one or more eus.	Most translation systems adopt the features from a translation model, a language model , and sometimes a reordering model.
A semantic span can include one or more eus.	The process of training this transfer model and smoothing is similar to the process of training a language model .
A semantic span can include one or more eus.	formula (6) are estimated in the same way as a factored language model , which has the advantage of easily incorporating various linguistic information.
Experiments	A 5-gram language model is trained with SRILM5 on the combination of the Xinhua portion of the English Giga-word corpus combined with the English part of FBIS.
Experiments	probabilities, the BTG reordering features, and the language model feature.

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

23. Polylingual Tree-Based Topic Models for Translation Domain Adaptation

Hu, Yuening and Zhai, Ke and Eidelman, Vladimir and Boyd-Graber, Jordan

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Further improvement is possible by incorporating topic models deeper in the decoding process and adding domain knowledge to the language model .
Discussion	6.3 Improving Language Models
Discussion	Topic models capture document-level properties of language, but a critical component of machine translation systems is the language model , which provides local constraints and preferences.
Discussion	Domain adaptation for language models (Bellegarda, 2004; Wood and Teh, 2009) is an important avenue for improving machine translation.
Experiments	We train a modified Kneser—Ney trigram language model on English (Chen and Goodman, 1996).

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

24. A Sense-Based Translation Model for Statistical Machine Translation

Xiong, Deyi and Zhang, Min

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Additionally, we also want to induce sense clusters for words in the target language so that we can build sense-based language model and integrate it into SMT.
Decoding with Sense-Based Translation Model	error rate training (MERT) (Och, 2003) together with other models such as the language model .
Experiments	We trained a 5-gram language model on the Xinhua section of the English Gigaword corpus (306 million words) using the SRILM toolkit (Stolcke, 2002) with the modified Kneser—Ney smoothing (Chen and Goodman, 1996).
Related Work	(2007) also explore a bilingual topic model for translation and language model adaptation.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

25. Detecting Retries of Voice Search Queries

Levitan, Rivka and Elson, David

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	We look at the language model (LM) score and the number of alternate pronunciations of the first query, predicting that a misrecognized query will have a lower LM score and more alternate pronunciations.
Prediction task	In addition, the language model likelihood for the first query was, as expected, significantly lower for retries.
Related Work	Retry cases are identified with joint language modeling across multiple transcripts, with the intuition that retry pairs tend to be closely related or exact duplicates.
Related Work	While we follow this work in our usage of joint language modeling , our application encompasses open domain voice searches and voice actions (such as placing calls), so we cannot use simplifying domain assumptions.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

26. A Joint Graph Model for Pinyin-to-Chinese Conversion with Typo Correction

Jia, Zhongye and Zhao, Hai

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	SRILM (Stolcke, 2002) is adopted for language model training and KenLM (Heafield, 2011; Heafield et al., 2013) for language model query.
Pinyin Input Method Model	The edge weight the negative logarithm of conditional probability P(Sj+1,k SM) that a syllable Sm- is followed by Sj+1,k, which is give by a bigram language model of pinyin syllables:
Related Works	They solved the typo correction problem by decomposing the conditional probability P(H \|P) of Chinese character sequence H given pinyin sequence P into a language model P(wi\|wi_1) and a typing model The typing model that was estimated on real user input data was for typo correction.
Related Works	Various approaches were made for the task including language model (LM) based methods (Chen et al., 2013), ME model (Han and Chang, 2013), CRF (Wang et al., 2013d; Wang et al., 2013a), SMT (Chiu et al., 2013; Liu et al., 2013), and graph model (Jia et al., 2013), etc.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

27. Automatic Keyphrase Extraction: A Survey of the State of the Art

Hasan, Kazi Saidul and Ng, Vincent

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Keyphrase Extraction Approaches	3.3.4 Language Modeling
Keyphrase Extraction Approaches	These feature values are estimated using language models (LMs) trained on a foreground corpus and a background corpus.
Keyphrase Extraction Approaches	In sum, LMA uses a language model rather than heuristics to identify phrases, and relies on the language model trained on the background corpus to determine how “unique” a candidate keyphrase is to the domain represented by the foreground corpus.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

28. Learning Topic Representation for SMT with Neural Networks

Cui, Lei and Zhang, Dongdong and Liu, Shujie and Chen, Qiming and Li, Mu and Zhou, Ming and Yang, Muyun

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	An in-house language modeling toolkit is used to train the 5-gram language model with modified Kneser-Ney smoothing (Kneser and Ney, 1995).
Experiments	The English monolingual data used for language modeling is the same as in Table 1.
Related Work	They incorporated the bilingual topic information into language model adaptation and lexicon translation model adaptation, achieving significant improvements in the large-scale evaluation.
Topic Similarity Model with Neural Network	Standard features: Translation model, including translation probabilities and lexical weights for both directions (4 features), 5-gram language model (1 feature), word count (1 feature), phrase count (1 feature), NULL penalty (1 feature), number of hierarchical rules used (1 feature).

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

29. Generating Code-switched Text for Lexical Learning

Labutov, Igor and Lipson, Hod

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Model	Broadly, as the learner progresses from one sentence to the next, exposing herself to more novel words, the updated parameters of the language model in turn guide the selection of new “switch-points” for replacing source words with the target foreign words.
Model	Generally, this value may come directly from the surprisal quantity given by a language model , or may incorporate additional features that are found informative in predicting the constraint on the word.
Related Work	Building on their work, (Adel et al., 2012) employ additional features and a recurrent network language model for modeling code-switching in conversational speech.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

30. A Convolutional Neural Network for Modelling Sentences

Kalchbrenner, Nal and Grefenstette, Edward and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	The RNN is primarily used as a language model , but may also be viewed as a sentence model with a linear structure.
Introduction	Besides comprising powerful classifiers as part of their architecture, neural sentence models can be used to condition a neural language model to generate sentences word by word (Schwenk, 2012; Mikolov and Zweig, 2012; Kalchbrenner and Blunsom, 2013a).
Properties of the Sentence Model	This gives the RNN excellent performance at language modelling , but it is suboptimal for remembering at once the n-grams further back in the input sentence.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

31. Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

Jansen, Peter and Surdeanu, Mihai and Clark, Peter

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Models and Features	In particular, we use the recurrent neural network language model (RNNLM) of Mikolov et al.
Models and Features	Like any language model , a RNNLM estimates the probability of observing a word given the preceding context, but, in this process, it learns word embeddings into a latent, conceptual space with a fixed number of dimensions.
Related Work	(2013) recently addressed the problem of answer sentence selection and demonstrated that LS models, including recurrent neural network language models (RNNLM), have a higher contribution to overall performance than exploiting syntactic analysis.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

32. Multilingual Models for Compositional Distributed Semantics

Hermann, Karl Moritz and Blunsom, Phil

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	Successful applications of such models include language modelling (Bengio et al., 2003), paraphrase detection (Erk and Pado, 2008), and dialogue analysis (Kalchbrenner and Blunsom, 2013).
Related Work	Neural language models are another popular approach for inducing distributed word representations (Bengio et al., 2003).
Related Work	They have received a lot of attention in recent years (Collobert and Weston, 2008; Mnih and Hinton, 2009; Mikolov et al., 2010, inter alia) and have achieved state of the art performance in language modelling .

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

33. Surface Realisation from Knowledge-Bases

Gyawali, Bikash and Gardent, Claire

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	In the current version of the generator, the output is ranked using a simple language model trained on the GENIA corpus.
Generating from the KBGen Knowledge-Base	To rank the generator output, we train a language model on the GeniA corpus 4, a corpus of 2000 MEDLINE asbtracts about biology containing more than 400000 words (Kim et al., 2003) and use this model to rank the generated sentences by decreasing probability.
Related Work	They intersect the grammar with a language model to improve fluency; use a weighted hypergraph to pack the derivations; and find the best derivation tree using Viterbi algorithm.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

34. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors

Baroni, Marco and Dinu, Georgiana and Kruszewski, Germán

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Context-predicting models (more commonly known as embeddings or neural language models ) are the new kids on the distributional semantics block.
Introduction	This is in part due to the fact that context-predicting vectors were first developed as an approach to language modeling and/or as a way to initialize feature vectors in neural-network-based “deep learning” NLP architectures, so their effectiveness as semantic representations was initially seen as little more than an interesting side effect.
Introduction	Predictive DSMs are also called neural language models , because their supervised context prediction training is performed with neural networks, or, more cryptically, “embeddings”.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

35. A Bayesian Mixed Effects Model of Literary Character

Bamman, David and Underwood, Ted and Smith, Noah A.

In Proc. ACL 2014, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Data	To manage the degrees of freedom in the model described in §4, we perform dimensionality reduction on the vocabulary by learning word embed-dings with a log-linear continuous skip-gram language model (Mikolov et al., 2013) on the entire collection of 15,099 books.
Model	Maximum entropy approaches to language modeling have been used since Rosenfeld (1996) to incorporate long-distance information, such as previously-mentioned trigger words, into n-gram language models .
Model	Number of personas (hyperparameter) D Number of documents Cd Number of characters in document d Wd,c Number of (cluster, role) tuples for character 0 md Metadata for document d (ranges over M authors) 0d Document d’s distribution over personas pd,c Character C’s persona j An index for a <7“, w) tuple in the data 1113' Word cluster ID for tuple j rj Role for tuple j 6 {agent, patient, poss, pred} 77 Coefficients for the log-linear language model M, A Laplace mean and scale (for regularizing 77) a Dirichlet concentration parameter

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: