Index of papers in Proc. ACL 2012 that mention

language model

Seen in text as:

language model (136)
language models (48)
language modeling (25)
Language Model (7)

Seen in 210 sentences in 22 papers.

1. Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

Chen, Wenliang and Zhang, Min and Li, Haizhou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper, we present an approach to enriching high—order feature representations for graph-based dependency parsing models using a dependency language model and beam search.
Abstract	The dependency language model is built on a large-amount of additional auto-parsed data that is processed by a baseline parser.
Abstract	Based on the dependency language model , we represent a set of features for the parsing model.
Dependency language model	Language models play a very important role for statistical machine translation (SMT).
Dependency language model	The standard N-gram based language model predicts the next word based on the N — 1 immediate previous words.
Dependency language model	However, the traditional N-gram language model can not capture long-distance word relations.
Introduction	In this paper, we solve this issue by enriching the feature representations for a graph-based model using a dependency language model (DLM) (Shen et al., 2008).
Introduction	0 We utilize the dependency language model to enhance the graph-based parsing model.
Parsing with dependency language model	In this section, we propose a parsing model which includes the dependency language model by extending the model of McDonald et al.

language model is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

2. Computational Approaches to Sentence Completion

Zweig, Geoffrey and Platt, John C. and Meek, Christopher and Burges, Christopher J.C. and Yessenalina, Ainur and Liu, Qiang

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model ; and methods that evaluate global coherence, such as latent semantic analysis.
Introduction	To investigate the usefulness of local information, we evaluated n—gram language model scores, from both a conventional model with Good—Turing smoothing, and with a recently proposed maximum—entropy class—based n—gram model (Chen, 2009a; Chen, 2009b).
Introduction	Also in the language modeling vein, but with potentially global context, we evaluate the use of a recurrent neural network language model .
Introduction	In all the language modeling approaches, a model is used to compute a sentence probability with each of the potential completions.
Related Work	The KU system uses just an N—gram language model to do this ranking.
Related Work	The UNT system uses a large variety of information sources, and a language model score receives the highest weight.
Sentence Completion via Language Modeling	Perhaps the most straightforward approach to solving the sentence completion task is to form the complete sentence with each option in turn, and to evaluate its likelihood under a language model .
Sentence Completion via Language Modeling	In this section, we describe the suite of state—of—the—art language modeling techniques for which we will present results.
Sentence Completion via Language Modeling	3.1 Backoff N-gram Language Model

language model is mentioned in 27 sentences in this paper.

Topics mentioned in this paper:

3. Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition

Sim, Khe Chai

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Probabilistic Formulation for HVR	where P(W) can be modelled by the word-based 77.-gram language model (Chen and Goodman, 1996) commonly used in automatic speech recognition.
A Probabilistic Formulation for HVR	0 Language model score: P(W)
A Probabilistic Formulation for HVR	Note that the acoustic model and language model scores are already used in the conventional ASR.
Abstract	In addition to the acoustic and language models used in automatic speech recognition systems, HVR uses the haptic and partial lexical models as additional knowledge sources to reduce the recognition search space and suppress confusions.
Experimental Results	These sentences contain a variety of given names, surnames and city names so that confusions cannot be easily resolved using a language model .
Experimental Results	The ASR system used in all the experiments reported in this paper consists of a set of HMM-based triphone acoustic models and an n-gram language model .
Experimental Results	A bigram language model with a vocabulary size of 200 words was used for testing.
Haptic Voice Recognition (HVR)	In conventional ASR, acoustically similar word sequences are typically resolved implicitly using a language model where contexts of neighboring words are used for disambiguation.
Integration of Knowledge Sources	where fl, 5, 75 and 7:1 denote the WFST representation of the acoustic model, language model , PLI model and haptic model respectively.
Integration of Knowledge Sources	(2002) has shown that Hidden Markov Models (HMMs) and n-gram language models can be viewed as WFSTs.
Introduction	In addition to the acoustic model and language model used in ASR, haptic model and partial lexical model are also introduced to facilitate the integration of more sophisticated haptic events, such as the keystrokes, into HVR.

language model is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

4. Large-Scale Syntactic Language Modeling with Treelets

Pauls, Adam and Klein, Dan

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose a simple generative, syntactic language model that conditions on overlapping windows of tree context (or treelets) in the same way that n-gram language models condition on overlapping windows of linear context.
Abstract	We estimate the parameters of our model by collecting counts from automatically parsed text using standard n-gram language model estimation techniques, allowing us to train a model on over one billion tokens of data using a single machine in a matter of hours.
Introduction	N -gram language models are a central component of all speech recognition and machine translation systems, and a great deal of research centers around refining models (Chen and Goodman, 1998), efficient storage (Pauls and Klein, 2011; Heafield, 2011), and integration into decoders (Koehn, 2004; Chiang, 2005).
Introduction	At the same time, because n-gram language models only condition on a local window of linear word-level context, they are poor models of long-range syntactic dependencies.
Introduction	Although several lines of work have proposed generative syntactic language models that improve on n-gram models for moderate amounts of data (Chelba, 1997; Xu et al., 2002; Charniak, 2001; Hall, 2004; Roark,
Treelet Language Modeling	The common denominator of most n-gram language models is that they assign probabilities roughly according to empirical frequencies for observed 77.-grams, but fall back to distributions conditioned on smaller contexts for unobserved n-grams, as shown in Figure 1(a).
Treelet Language Modeling	to use back-off-based smoothing for syntactic language modeling — such techniques have been applied to models that condition on headword contexts (Charniak, 2001; Roark, 2004; Zhang, 2009).

language model is mentioned in 31 sentences in this paper.

Topics mentioned in this paper:

5. You Had Me at Hello: How Phrasing Affects Memorability

Danescu-Niculescu-Mizil, Cristian and Cheng, Justin and Kleinberg, Jon and Lee, Lillian

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Hello. My name is Inigo Montoya.	First, we show a concrete sense in which memorable quotes are indeed distinctive: with respect to lexical language models trained on the newswire portions of the Brown corpus [21], memorable quotes have significantly lower likelihood than their non-memorable counterparts.
Hello. My name is Inigo Montoya.	In particular, we analyze a corpus of advertising slogans, and we show that these slogans have significantly greater likelihood at both the word level and the part-of-speech level with respect to a language model trained on memorable movie quotes, compared to a corresponding language model trained on non-memorable movie quotes.
Never send a human to do a machine’s job.	In order to assess different levels of lexical and syntactic distinctiveness, we employ a total of six Laplace-smoothed8 language models : l-gram, 2-gram, and 3-gram word LMs and l-gram, 2-gram and 3-gram part-of-speech9 LMs.
Never send a human to do a machine’s job.	As indicated in Table 3, for each of our lexical “common language” models , in about 60% of the quote pairs, the memorable quote is more distinctive.
Never send a human to do a machine’s job.	The language models’ vocabulary was that of the entire training corpus.

language model is mentioned in 13 sentences in this paper.

Topics mentioned in this paper:

6. Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

Elsner, Micha and Goldwater, Sharon and Eisenstein, Jacob

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We present a Bayesian model that clusters together phonetic variants of the same lexical item while learning both a language model over lexical items and a log-linear model of pronunciation variability based on articulatory features.
Experiments	Nonetheless, it represents phonetic variability more realistically than the Bernstein-Ratner—Brent corpus, while still maintaining the lexical characteristics of infant-directed speech (as compared to the Buckeye corpus, with its much larger vocabulary and more complex language model ).
Inference	The language modeling term relating to the intended string again factors into multiple components.
Inference	Because neither the transducer nor the language model are perfect models of the true distribution, they can have incompatible dynamic ranges.
Inference	3The transducer scores can be cached since they depend only on surface forms, but the language model scores cannot.
Introduction	Previous models with similar goals have learned from an artificial corpus with a small vocabulary (Driesen et al., 2009; Rasanen, 2011) or have modeled variability only in vowels (Feldman et al., 2009); to our knowledge, this paper is the first to use a naturalistic infant-directed corpus while modeling variability in all segments, and to incorporate word-level context (a bigram language model ).
Introduction	Our model is conceptually similar to those used in speech recognition and other applications: we assume the intended tokens are generated from a bigram language model and then distorted by a noisy channel, in particular a log-linear model of phonetic variability.
Introduction	But unlike speech recognition, we have no (intended-form, surface-form) training pairs to train the phonetic model, nor even a dictionary of intended-form strings to train the language model .
Lexical-phonetic model	Our lexical-phonetic model is defined using the standard noisy channel framework: first a sequence of intended word tokens is generated using a language model , and then each token is transformed by a probabilistic finite-state transducer to produce the observed surface sequence.
Related work	In contrast, our model uses a symbolic representation for sounds, but models variability in all segment types and incorporates a bigram word-level language model .

language model is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

7. Improving Word Representations via Global Context and Multiple Word Prototypes

Huang, Eric and Socher, Richard and Manning, Christopher and Ng, Andrew

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models .
Conclusion	Our new multi-prototype neural language model outperforms previous neural models and competitive baselines on this new dataset.
Experiments	Table 3 shows our results compared to previous methods, including C&W’s language model and the hierarchical log-bilinear (HLBL) model (Mnih and Hinton, 2008), which is a probabilistic, linear neural model.
Global Context-Aware Neural Language Model	Note that Collobert and Weston (2008)’s language model corresponds to the network using only local context.
Introduction	We introduce a new neural-network-based language model that distinguishes and uses both local and global context via a joint training objective.
Introduction	We show that our multi-prototype model improves upon the single-prototype version and outperforms other neural language models and baselines on this dataset.
Related Work	Neural language models (Bengio et al., 2003; Mnih and Hinton, 2007; Collobert and Weston, 2008; Schwenk and Gauvain, 2002; Emami et al., 2003) have been shown to be very powerful at language modeling , a task where models are asked to accurately predict the next word given previously seen words.
Related Work	Schwenk and Gauvain (2002) tried to incorporate larger context by combining partial parses of past word sequences and a neural language model .
Related Work	They used up to 3 previous head words and showed increased performance on language modeling .

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

8. Word Sense Disambiguation Improves Information Retrieval

Zhong, Zhi and Ng, Hwee Tou

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Together with the senses predicted for words in documents, we propose a novel approach to incorporate word senses into the language modeling approach to IR and also exploit the integration of synonym relations.
Incorporating Senses into Language Modeling Approaches	The next problem is to incorporate the sense information into the language modeling approach.
Incorporating Senses into Language Modeling Approaches	Given a query (1 and a document d in text collection C, we want to reestimate the language models by making use of the sense information assigned to them.
Incorporating Senses into Language Modeling Approaches	With this language model , the probability of a query term in a document is enlarged by the synonyms of its senses; The more its synonym senses in a document, the higher the probability.
Introduction	We incorporate word senses into the language modeling (LM) approach to IR (Ponte and Croft, 1998), and utilize sense synonym relations to further improve the performance.
The Language Modeling Approach to IR	3.1 The language modeling approach
The Language Modeling Approach to IR	In the language modeling approach to IR, language models are constructed for each query (1 and each document d in a text collection C. The documents in C are ranked by the distance to a given query (1 according to the language models .
The Language Modeling Approach to IR	The most commonly used language model in IR is the unigram model, in which terms are assumed to be independent of each other.

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

9. Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining

Rastrow, Ariya and Dredze, Mark and Khudanpur, Sanjeev

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Long-span features, such as syntax, can improve language models for tasks such as speech recognition and machine translation.
Abstract	However, these language models can be difficult to use in practice because of the time required to generate features for rescoring a large hypothesis set.
Abstract	When using these improved tools in a language model for speech recognition, we obtain significant speed improvements with both N -best and hill climbing rescoring, and show that up-training leads to WER reduction.
Conclusion	The computational complexity of accurate syntactic processing can make structured language models impractical for applications such as ASR that require scoring hundreds of hypotheses per input.
Incorporating Syntactic Structures	These are then passed to the language model along with the word sequence for scoring.
Introduction	Language models (LM) are crucial components in tasks that require the generation of coherent natural language text, such as automatic speech recognition (ASR) and machine translation (MT).
Related Work	The lattice parser therefore, is itself a language model .
Syntactic Language Models	There have been several approaches to include syntactic information in both generative and discriminative language models .
Syntactic Language Models	Structured language modeling incorporates syntactic parse trees to identify the head words in a hypothesis for modeling dependencies beyond n-grams.
Syntactic Language Models	Our Language Model .

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

10. Mixing Multiple Translation Models in Statistical Machine Translation

Razmara, Majid and Foster, George and Sankaran, Baskaran and Sarkar, Anoop

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion & Future Work	Future work includes extending this approach to use multiple translation models with multiple language models in ensemble decoding.
Experiments & Results 4.1 Experimental Setup	For the mixture baselines, we used a standard one-pass phrase-based system (Koehn et al., 2003), Portage (Sadat et al., 2005), with the following 7 features: relative-frequency and lexical translation model (TM) probabilities in both directions; word-displacement distortion model; language model (LM) and word count.
Experiments & Results 4.1 Experimental Setup	Fixing the language model allows us to compare various translation model combination techniques.
Introduction	Common techniques for model adaptation adapt two main components of contemporary state-of-the-art SMT systems: the language model and the translation model.
Introduction	However, language model adaptation is a more straightforward problem compared to
Introduction	translation model adaptation, because various measures such as perplexity of adapted language models can be easily computed on data in the target domain.
Related Work 5.1 Domain Adaptation	They use language model perplexities from IN to select relavant sentences from OUT.

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

11. Text Segmentation by Language Using Minimum Description Length

Yamaguchi, Hiroshi and Tanaka-Ishii, Kumiko

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Calculation of Cross-Entropy	(XZ-), is the cross—entropy of X7; for L,- multiplied by Various methods for computing cross—entropy have been proposed, and these can be roughly classified into two types based on different methods of universal coding and the language model .
Calculation of Cross-Entropy	For example, (Benedetto et al., 2002) and (Cilibrasi and Vitanyi, 2005) used the universal coding approach, whereas (Teahan and Harper, 2001) and (Sibun and Reynar, 1996) were based on language modeling using PPM and Kullback—Leibler divergence, respectively.
Calculation of Cross-Entropy	As a representative method for calculating the cross—entropy through statistical language modeling , we adopt prediction by partial matching (PPM), a language—based encoding method devised by (Cleary and Witten, 1984).
In the experiments reported here, n is set to 5 throughout.	lel ), gives the description length of the remaining characters under the language model for L.
Introduction	They used statistical language modeling and heuristics to detect foreign words and tested the case of English embedded in German texts.
Problem Formulation	In our setting, we assume that a small amount (up to kilobytes) of monolingual plain text sample data is available for every language, e.g., the Universal Declaration of Human Rights, which serves to generate the language model used for language identification.
Problem Formulation	calculates the description length of a text segment X,- through the use of a language model for Li.
Problem Formulation	Here, the first term corresponds to the code length of the text chunk X,- given a language model for L,, which in fact corresponds to the cross—entropy of X,- for L,- multiplied by The remaining terms give the code lengths of the parameters used to describe the length of the first term: the second term corresponds to the segment location; the third term, to the identified language; and the fourth term, to the language model of language Li.

language model is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

12. A Class-Based Agreement Model for Generating Accurately Inflected Translations

Green, Spence and DeNero, John

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Class-based Model of Agreement	However, in MT, we seek a measure of sentence quality (1(6) that is comparable across different hypotheses on the beam (much like the n-gram language model score).
A Class-based Model of Agreement	We trained a simple add-1 smoothed bigram language model over gold class sequences in the same treebank training data:
Experiments	Our distributed 4—gram language model was trained on 600 million words of Arabic text, also collected from many sources including the Web (Brants et al., 2007).
Inference during Translation Decoding	With a trigram language model , the state might be the last two words of the translation prefix.
Introduction	Intuition might suggest that the standard 71- gram language model (LM) is suflicient to handle agreement phenomena.
Related Work	Monz (2011) recently investigated parameter estimation for POS-based language models , but his classes did not include inflectional features.
Related Work	One exception was the quadratic-time dependency language model presented by Galley and Manning (2009).

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

phrase-based (10)
CRF (9)
LM (9)

13. Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Wuebker, Joern and Ney, Hermann and Zens, Richard

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this work we present two extensions to the well-known dynamic programming beam search in phrase-based statistical machine translation (SMT), aiming at increased efficiency of decoding by minimizing the number of language model computations and hypothesis expansions.
Abstract	Our results show that language model based pre-sorting yields a small improvement in translation quality and a speedup by a factor of 2.
Experimental Evaluation	The English language model is a 4-gram LM created with the SRILM toolkit (Stolcke, 2002) on all bilingual and parts of the provided monolingual data.
Introduction	Research efforts to increase search efficiency for phrase-based MT (Koehn et al., 2003) have explored several directions, ranging from generalizing the stack decoding algorithm (Ortiz et al., 2006) to additional early pruning techniques (Delaney et al., 2006), (Moore and Quirk, 2007) and more efficient language model (LM) querying (Heafield, 2011).
Introduction	ith Language Model LookAhead
Search Algorithm Extensions	2.2 Language Model LookAhead

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

LM (42)
BLEU (15)
beam search (6)

14. Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

Simianer, Patrick and Riezler, Stefan and Dyer, Chris

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	3-gram (news-commentary) and 5-gram (Europarl) language models are trained on the data described in Table 1, using the SRILM toolkit (Stol-cke, 2002) and binarized for efficient querying using kenlm (Heafield, 2011).
Experiments	For the 5-gram language models, we replaced every word in the lm training data with <unk> that did not appear in the English part of the parallel training data to build an open vocabulary language model .
Experiments	7Absolute improvements would be possible, e. g., by using larger language models or by adding news data to the ep training set when evaluating on crawl test sets (see, e. g., Dyer et al.
Introduction	The standard SMT training pipeline combines scores from large count-based translation models and language models with a few other features and tunes these using the well-understood line-search technique for error minimization of Och (2003).
Introduction	The modeler’s goals might be to identify complex properties of translations, or to counter errors of pre-trained translation models and language models by explicitly down-weighting translations that exhibit certain undesired properties.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

15. Labeling Documents with Timestamps: Learning from their Time Expressions

Chambers, Nathanael

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	Unigram NLLR and Filtered NLLR are the language model implementations of previous work as described in Section 3.1.
Previous Work	They learned unigram language models (LMs) for specific time periods and scored articles with log-likelihood ratio scores.
Timestamp Classifiers	3.1 Language Models
Timestamp Classifiers	We apply Dirichlet-smoothing to the language models (as in de J ong et al.
Timestamp Classifiers	The above language modeling and MaxEnt approaches are token-based classifiers that one could apply to any topic classification domain.

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

MaxEnt (15)
unigrams (15)
NER (7)

16. Deciphering Foreign Language by Combining Language Models and Context Vectors

Nuhn, Malte and Mauser, Arne and Ney, Hermann

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model .
Introduction	Combining Language Models and
Training Algorithm and Implementation	As described in Section 4, the overall procedure is divided into two alternating steps: After initialization we first perform EM training of the translation model for 20-30 iterations using a 2- gram or S-gram language model in the target language.
Training Algorithm and Implementation	The generative story described in Section 3 is implemented as a cascade of a permutation, insertion, lexicon, deletion and language model finite state transducers using OpenFST (Allauzen et al., 2007).
Translation Model	Stochastically generate the target sentence according to an n-gram language model .

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

LM (27)
BLEU (16)
translation model (15)

17. A Computational Approach to the Automation of Creative Naming

Ozbal, Gozde and Strapparava, Carlo

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation	Therefore, we implemented a ranking mechanism which used a hybrid scoring method by giving equal weights to the language model and the normalized phonetic similarity.
System Description	To check the likelihood and well-formedness of the new string after the replacement, we learn a 3- gram language model with absolute smoothing.
System Description	For leam-ing the language model , we only consider the words in the CMU pronunciation dictionary which also exist in WordNet.
System Description	We remove the words containing at least one trigram which is very unlikely according to the language model .

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

18. Learning Translation Consensus with Structured Label Propagation

Liu, Shujie and Li, Chi-Ho and Li, Mu and Zhou, Ming

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments and Results	The features we used are commonly used features as standard BTG decoder, such as translation probabilities, lexical weights, language model , word penalty and distortion probabilities.
Experiments and Results	The language model is 5-gram language model trained with the target sentences in the training data.
Experiments and Results	The language model is 5-gram language model trained with the Giga-Word corpus plus the English sentences in the training data.
Features and Training	We also use other fundamental features, such as translation probabilities, lexical weights, distortion probability, word penalty, and language model probability.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

Xiao, Xinyan and Xiong, Deyi and Zhang, Min and Liu, Qun and Lin, Shouxun

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	The monolingual data for training English language model includes the Xinhua portion of the GIGAWORD corpus, which contains 238M English words.
Experiments	A 4—gram language model was trained on the monolingual data by the SRILM toolkit (Stolcke, 2002).
Related Work	Researchers also introduce topic model for cross-lingual language model adaptation (Tam et al., 2007; Ruiz and Federico, 2011).
Related Work	Based on the bilingual topic model, they apply the source-side topic weights into the target-side topic model, and adapt the n-gram language model of target side.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

20. Improve SMT Quality with Automatically Extracted Paraphrase Rules

He, Wei and Wu, Hua and Wang, Haifeng and Liu, Ting

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We used SRILM2 for the training of language models (S-gram in all the experiments).
Experiments	We trained a Chinese language model for the EC translation on the Chinese part of the bi-text.
Experiments	For the English language model of CE translation, an extra corpus named Tanaka was used besides the English part of the bilingual corpora.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

21. Concept-to-text Generation via Discriminative Reranking

Konstas, Ioannis and Lapata, Mirella

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Design	imately via cube pruning (Chiang, 2007), by integrating a trigram language model extracted from the training set (see Konstas and Lapata (2012) for details).
Experimental Design	Lexical Features These features encourage grammatical coherence and inform lexical selection over and above the limited horizon of the language model captured by Rules (6)—(9).
Problem Formulation	In machine translation, a decoder that implements forest rescoring (Huang and Chiang, 2007) uses the language model as an external criterion of the goodness of sub-translations on account of their grammaticality.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. Prediction of Learning Curves in Machine Translation

Kolachina, Prasanth and Cancedda, Nicola and Dymetman, Marc and Venkatapathy, Sriram

In Proc. ACL 2012, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inferring a learning curve from mostly monolingual data	In this section we address scenario 81: we have access to a source-language monolingual collection (from which portions to be manually translated could be sampled) and a target-language in—domain monolingual corpus, to supplement the target side of a parallel corpus while training a language model .
Inferring a learning curve from mostly monolingual data	(b) perplexity of language models of order 2 to 5 derived from the monolingual source corpus computed on the source side of the test corpus.
Inferring a learning curve from mostly monolingual data	The Lasso regression model selected four features from the entire feature set: i) Size of the test set (sentences & tokens) ii) PerpleXity of language model (order 5) on the test set iii) Type-token ratio of the target monolingual corpus .

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: