Index of papers in Proc. ACL 2009 that mention

language model

Seen in text as:

language model (94)
language models (30)
language modeling (11)
Language Model (8)

Seen in 136 sentences in 21 papers.

1. Quadratic-Time Dependency Parsing for Machine Translation

Galley, Michel and Manning, Christopher D.

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	This paper applies MST parsing to MT, and describes how it can be integrated into a phrase-based decoder to compute dependency language model scores.
Abstract	Our results show that augmenting a state-of-the-art phrase-based system with this dependency language model leads to significant improvements in TER (0.92%) and BLEU (0.45%) scores on five NIST Chinese-English evaluation test sets.
Dependency parsing for machine translation	While it seems that loopy graphs are undesirable when the goal is to obtain a syntactic analysis, that is not necessarily the case when one just needs a language modeling score.
Introduction	Hierarchical approaches to machine translation have proven increasingly successful in recent years (Chiang, 2005; Marcu et al., 2006; Shen et al., 2008), and often outperform phrase-based systems (Och and Ney, 2004; Koehn et al., 2003) on.ungetlanguage fluency'and.adequacy; Ilouh ever, their benefits generally come with high computational costs, particularly when chart parsing, such as CKY, is integrated with language models of high orders (Wu, 1996).
Introduction	Indeed, researchers have shown that gigantic language models are key to state-of-the-art performance (Brants et al., 2007), and the ability of phrase-based decoders to handle large-size, high-order language models with no consequence on asymptotic running time during decoding presents a compelling advantage over CKY decoders, whose time complexity grows prohibitively large with higher-order language mod-ds
Introduction	Most interestingly, the time complexity of non-projective dependency parsing remains quadratic as the order of the language model increases.
Machine translation experiments	We use the standard features implemented almost exactly as in Moses: four translation features (phrase-based translation probabilities and lexically-weighted probabilities), word penalty, phrase penalty, linear distortion, and language model score.
Machine translation experiments	In order to train a competitive baseline given our computational resources, we built a large 5-gram language model using the Xinhua and AFP sections of the Gigaword corpus (LDC2007T40) in addition to the target side of the parallel data.
Machine translation experiments	The language model was smoothed with the modified Kneser-Ney algorithm as implemented in (Stolcke, 2002), and we only kept 4-grams and 5-grams that occurred at least three times in the training data.6

language model is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

2. Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem

Zaslavskiy, Mikhail and Dymetman, Marc and Cancedda, Nicola

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	First we consider a bigram Language Model and the algorithms try to find the reordering that maximizes the LM score.
Experiments	Then we consider a trigram based Language Model and the algorithms again try to maximize the LM score.
Experiments	This means that, when using a bigram language model , it is often possible to reorder the words of a randomly permuted reference sentence in such a way that the LM score of the reordered sentence is larger than the LM of the reference.
Introduction	Typical nonlocal features include one or more n-gram language models as well as a distortion feature, measuring by how much the order of biphrases in the candidate translation deviates from their order in the source sentence.
Phrase-based Decoding as TSP	o The language model cost of producing the target words of 19’ right after the target words of b; with a bigram language model , this cost can be precomputed directly from b and b’.
Phrase-based Decoding as TSP	Successful phrase-based systems typically employ language models of order higher than two.
Phrase-based Decoding as TSP	If we want to extend the power of the model to general n-gram language models , and in particular to the 3-gram

language model is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

language model (14)
BLEU (11)
LM (11)

3. A Syntax-Free Approach to Japanese Sentence Compression

Hirao, Tsutomu and Suzuki, Jun and Isozaki, Hideki

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

A Syntax Free Sequence-oriented Sentence Compression Method	As an alternative to syntactic parsing, we propose two novel features, intra-sentence positional term weighting (IPTW) and the patched language model (PLM) for our syntax-free sentence compressor.
A Syntax Free Sequence-oriented Sentence Compression Method	3.2.2 Patched Language Model
A Syntax Free Sequence-oriented Sentence Compression Method	Many studies on sentence compression employ the n-gram language model to evaluate the linguistic likelihood of a compressed sentence.
Abstract	As an alternative to syntactic parsing, we propose a novel term weighting technique based on the positional information within the original sentence and a novel language model that combines statistics from the original sentence and a general corpus.
Conclusions	0 As an alternative to the syntactic parser, we proposed two novel features, Intra-sentence positional term weighting (IPTW) and the Patched language model (PLM), and showed their effectiveness by conducting automatic and human evaluations,
Experimental Evaluation	We developed the n-gram language model from a 9 year set of Mainichi Newspaper articles.
Introduction	To maintain the subject-predicate relationship in the compressed sentence and retain fluency without using syntactic parsers, we propose two novel features: intra-sentence positional term weighting (IPTW) and the patched language model (PLM).
Introduction	PLM is a form of summarization-oriented fluency statistics derived from the original sentence and the general language model .
Results and Discussion	Replacing PLM with the bigram language model (w/o PLM) degrades the performance significantly.
Results and Discussion	This result shows that the n-gram language model is improper for sentence compression because the n-gram probability is computed by using a corpus that includes both short and long sentences.

language model is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

Mochihashi, Daichi and Yamada, Takeshi and Ueda, Naonori

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our model is a nested hierarchical Pitman-Yor language model , where Pitman-Yor spelling model is embedded in the word model.
Abstract	Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any “wor ” indications.
Introduction	Bayesian Kneser—Ney) language model , with an accurate character oo-gram Pitman-Yor spelling model embedded in word models.
Introduction	Furthermore, it can be viewed as a method for building a high-performance n-gram language model directly from character strings of arbitrary language.
Introduction	we briefly describe a language model based on the Pitman-Yor process (Teh, 2006b), which is a generalization of the Dirichlet process used in previous research.
Nested Pitman-Yor Language Model	In contrast, in this paper we use a simple but more elaborate model, that is, a character n-gram language model that also employs HPYLM.
Nested Pitman-Yor Language Model	Figure 2: Chinese restaurant representation of our Nested Pitman-Yor Language Model (NPYLM).
Pitman-Yor process and n-gram models	To compute a probability p(w\|s) in (l), we adopt a Bayesian language model lately proposed by (Teh, 2006b; Goldwater et al., 2005) based on the Pitman-Yor process, a generalization of the Dirichlet process.
Pitman-Yor process and n-gram models	As a result, the n-gram probability of this hierarchical Pitman—Yor language model (HPYLM) is recursively computed as
Pitman-Yor process and n-gram models	When we set thw E l, (4) recovers a Kneser-Ney smoothing: thus a HPYLM is a Bayesian Kneser—Ney language model as well as an extension of the hierarchical Dirichlet Process (HDP) used in Goldwater et al.

language model is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

5. Application-driven Statistical Paraphrase Generation

Zhao, Shiqi and Lan, Xiang and Liu, Ting and Li, Sheng

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	The language model is trained using a 9 GB English corpus.
Statistical Paraphrase Generation	Our SPG model contains three sub-models: a paraphrase model, a language model , and a usability model, which control the adequacy, fluency,
Statistical Paraphrase Generation	Language Model: We use a trigram language model in this work.
Statistical Paraphrase Generation	The language model based score for the paraphrase t is computed as:

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

6. Variational Decoding for Statistical Machine Translation

Li, Zhifei and Eisner, Jason and Khudanpur, Sanjeev

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Results	We also used a 5-gram language model with modified Kneser-Ney smoothing (Chen and Goodman, 1998), trained on a data set consisting of a 130M words in English Giga-word (LDC2007T07) and the English side of the parallel corpora.
Experimental Results	We use GIZA++ (Och and Ney, 2000), a suffix-array (Lopez, 2007), SRILM (Stol-cke, 2002), and risk-based deterministic annealing (Smith and Eisner, 2006)17 to obtain word alignments, translation models, language models , and the optimal weights for combining these models, respectively.
Variational Approximate Decoding	Of course, this last point also means that our computation becomes intractable as n —> 00.8 However, if p(y \| at) is defined by a hypergraph HG(:c) whose structure explicitly incorporates an m-gram language model , both training and decoding will be efficient when m 2 n. We will give algorithms for this case that are linear in the size of HG(:c).9
Variational Approximate Decoding	9A reviewer asks about the interaction with backed-off language models .
Variational Approximate Decoding	We sketch a method that works for any language model given by a weighted FSA, L. The variational family Q can be specified by any deterministic weighted FSA, Q, with weights parameterized by ((5.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

n-gram (32)
Viterbi (23)
BLEU (15)

7. Automatic sense prediction for implicit discourse relations in text

Pitler, Emily and Louis, Annie and Nenkova, Ani

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Classification Results	The language model features were completely useless for distinguishing contingencies from
Features for sense prediction of implicit discourse relations	For each sense, we created uni-gram and bigram language models over the implicit examples in the training set.
Features for sense prediction of implicit discourse relations	We compute each example’s probability according to each of these language models .
Features for sense prediction of implicit discourse relations	of the spans’ likelihoods according to the various language models .

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

8. A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections

Weerkamp, Wouter and Balog, Krisztian and de Rijke, Maarten

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	In the setting of language modeling approaches to query expansion, the local analysis idea has been instantiated by estimating additional query language models (Lafferty and Zhai, 2003; Tao and Zhai, 2006) or relevance models (Lavrenko and Croft, 2001) from a set of feedback documents.
Related Work	(2005) also try to uncover multiple aspects of a query, and to that they provide an iterative “pseudo-query” generation technique, using cluster-based language models .
Related Work	Diaz and Metzler (2006) were the first to give a systematic account of query expansion using an external corpus in a language modeling setting, to improve the estimation of relevance models.
Retrieval Framework	We work in the setting of generative language models .
Retrieval Framework	Within the language modeling approach, one builds a language model from each document, and ranks documents based on the probability of the document model generating the query.
Retrieval Framework	The particulars of the language modeling approach have been discussed extensively in the literature (see, e.g., Balog et al.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

language modeling (6)
general model (5)

9. Distributional Representations for Handling Sparsity in Supervised Sequence-Labeling

Huang, Fei and Yates, Alexander

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	Sparsity for low-order contexts has recently spurred interest in using latent variables to represent distributions over contexts in language models .
Related Work	While n-gram models have traditionally dominated in language modeling , two recent efforts de-
Related Work	Several authors investigate neural network models that learn not just one latent state, but rather a vector of latent variables, to represent each word in a language model (Bengio et al., 2003; Emami et al., 2003; Morin and Bengio, 2005).
Smoothing Natural Language Sequences	2.3 Latent Variable Language Model Representation
Smoothing Natural Language Sequences	Latent variable language models (LVLMs) can be used to produce just such a distributional representation.

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Fast Consensus Decoding over Translation Forests

DeNero, John and Chiang, David and Knight, Kevin

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Computing Feature Expectations	The nodes are states in the decoding process that include the span (2', j) of the sentence to be translated, the grammar symbol 3 over that span, and the left and right context words of the translation relevant for computing n-gram language model scores.3 Each hyper-edge h represents the application of a synchronous rule 7" that combines nodes corresponding to non-terminals in
Computing Feature Expectations	3Decoder states can include additional information as well, such as local configurations for dependency language model scoring.
Computing Feature Expectations	The weight of h is the incremental score contributed to all translations containing the rule application, including translation model features on 7“ and language model features that depend on both 7“ and the English contexts of the child nodes.
Experimental Results	All four systems used two language models: one trained from the combined English sides of both parallel texts, and another, larger, language model trained on 2 billion words of English text (1 billion for Chinese-English SBMT).

language model is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Discovering the Discriminative Views: Measuring Term Weights for Sentiment Analysis

Kim, Jungi and Li, Jin-Ji and Lee, Jong-Hyeok

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	For the relevance retrieval model, we faithfully reproduce the passage-based language model with pseudo-relevance feedback (Lee et al., 2008).
Term Weighting and Sentiment Analysis	IR models, such as Vector Space (VS), probabilistic models such as BM25, and Language Modeling (LM), albeit in different forms of approach and measure, employ heuristics and formal modeling approaches to effectively evaluate the relevance of a term to a document (Fang et al., 2004).
Term Weighting and Sentiment Analysis	In our experiments, we use the Vector Space model with Pivoted Normalization (VS), Probabilistic model (BM25), and Language modeling with Dirichlet Smoothing (LM).
Term Weighting and Sentiment Analysis	5With proper assumptions and derivations, p(w \ d) can be derived to language modeling approaches.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

12. Collaborative Decoding: Partial Hypothesis Re-ranking Using Translation Consensus between Decoders

Li, Mu and Duan, Nan and Zhang, Dongdong and Li, Chi-Ho and Zhou, Ming

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Collaborative Decoding	Similar to a language model score, n-gram consensus -based feature values cannot be summed up from smaller hypotheses.
Discussion	They also empirically show that n-gram agreement is the most important factor for improvement apart from language models .
Experiments	The language model used for all models (include decoding models and system combination models described in Section 2.6) is a 5-gram model trained with the English part of bilingual data and xinhua portion of LDC English Giga-word corpus version 3.
Experiments	We parsed the language model training data with Berkeley parser, and then trained a dependency language model based on the parsing output.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Improving Tree-to-Tree Translation with Packed Forests

Liu, Yang and Lü, Yajuan and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding	1 to a log-linear model (Och and Ney, 2002) that uses the following eight features: relative frequencies in two directions, lexical weights in two directions, number of rules used, language model score, number of target words produced, and the probability of matched source tree (Mi et al., 2008).
Decoding	We use the cube pruning method (Chiang, 2007) to approximately intersect the translation forest with the language model .
Experiments	A trigram language model was trained on the English sentences of the training corpus.
Related Work	In machine translation, the concept of packed forest is first used by Huang and Chiang (2007) to characterize the search space of decoding with language models .

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

14. Learning to Tell Tales: A Data-driven Approach to Story Generation

McIntyre, Neil and Lapata, Mirella

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	And Knight and Hatzivassiloglou (1995) use a language model for selecting a fluent sentence among the vast number of surface realizations corresponding to a single semantic representation.
Introduction	The top-ranked candidate is selected for presentation and verbalized using a language model interfaced with RealPro (Lavoie and Rambow, 1997), a text generation engine.
The Story Generator	Since we do not know a priori which of these parameters will result in a grammatical sentence, we generate all possible combinations and select the most likely one according to a language model .
The Story Generator	We used the SRI toolkit to train a trigram language model on the British National Corpus, with interpolated Kneser—Ney smoothing and perplexity as the scoring metric for the generated sentences.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

15. Word or Phrase? Learning Which Unit to Stress for Information Retrieval

Song, Young-In and Lee, Jung-Tae and Rim, Hae-Chang

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Previous Work	These approaches have focused to model statistical or syntactic phrasal relations under the language modeling method for information retrieval.
Previous Work	(Srikanth and Srihari, 2003; Maisonnasse et al., 2005) examined the effectiveness of syntactic relations in a query by using language modeling framework.
Previous Work	(Song and Croft, 1999; Miller et al., 1999; Gao et al., 2004; Metzler and Croft, 2005) investigated the effectiveness of language modeling approach in modeling statistical phrases such as n-grams or proximity-based phrases.
Proposed Method	We start out by presenting a simple phrase-based language modeling retrieval model that assumes uniform contribution of words and phrases.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

16. Cross Language Dependency Parsing using a Bilingual Lexicon

Zhao, Hai and Song, Yan and Kit, Chunyu and Zhou, Guodong

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Treebank Translation and Dependency Transformation	In detail, a word-based decoding is used, which adopts a log-linear framework as in (Och and Ney, 2002) with only two features, translation model and language model,
Treebank Translation and Dependency Transformation	is the language model , a word trigram model trained from the CTB.
Treebank Translation and Dependency Transformation	Thus the decoding process is actually only determined by the language model .

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

17. Joint Decoding with Multiple Translation Models

Liu, Yang and Mi, Haitao and Feng, Yang and Liu, Qun

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	For language model, we used the SRI Language Modeling Toolkit (Stolcke, 2002) to train a 4-gram model on the Xinhua portion of GIGAWORD corpus.
Joint Decoding	2There are also features independent of derivations, such as language model and word penalty.
Joint Decoding	Although left-to-right decoding might enable a more efficient use of language models and hopefully produce better translations, we adopt bottom-up decoding in this paper just for convenience.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

18. Topological Ordering of Function Words in Hierarchical Phrase-based Translation

Setiawan, Hendra and Kan, Min Yen and Li, Haizhou and Resnik, Philip

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experimental Setup	For the language model , we used a 5-gram model with modified Kneser-Ney smoothing (Kneser and Ney, 1995) trained on the English side of our training data as well as portions of the Giga-word v2 English corpus.
Experimental Setup	For the language model , we used a 5-gram model trained on the English portion of the whole training data plus portions of the Gigaword v2 corpus.
Hierarchical Phrase-based System	Given 6 and f as the source and target phrases associated with the rule, typical features used are rule’s translation probability Ptmn,(f\|e') and its inverse Ptmn,(e'\| f), the lexical probability Pl“ (fl 6) and its inverse Pl“ (6 \| f Systems generally also employ a word penalty, a phrase penalty, and target language model feature.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

19. A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation

Sun, Jun and Zhang, Min and Tan, Chew Lim

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	In the experiments, we train the translation model on FBIS corpus (7.2M (Chinese) + 9.2M (English) words) and train a 4-gram language model on the Xinhua portion of the English Gigaword corpus (181M words) using the SRILM Toolkits (Stolcke,
NonContiguous Tree sequence Align-ment-based Model	2) The bi-lexical translation probabilities 3) The target language model
The Pisces decoder	On the other hand, to simplify the computation of language model , we only compute for source side contiguous translational hypothesis, while neglecting gaps in the target side if any.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. A Metric-based Framework for Automatic Taxonomy Induction

Yang, Hui and Callan, Jamie

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

The Features	It is built into a unigram language model without smoothing for each term.
The Features	This feature function measures the Kullback—Leibler divergence (KL divergence) between the language models associated with the two inputs.
The Features	Similarly, the local context is built into a unigram language model without smoothing for each term; the feature function outputs KL divergence between the models.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Dependency Based Chinese Sentence Realization

He, Wei and Wang, Haifeng and Guo, Yuqing and Liu, Ting

In Proc. ACL 2009, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Introduction	One is n-gram model over different units, such as word-level bigram/trigram models (Bangalore and Rambow, 2000; Langkilde, 2000), or factored language models integrated with syntactic tags (White et al.
Introduction	(2009) present a dependency-spanning tree algorithm for word ordering, which first builds dependency trees to decide linear precedence between heads and modifiers then uses an n-gram language model to order siblings.
Log-linear Models	We linearize the dependency relations by computing n-gram models, similar to traditional word-based language models , except using the names of dependency relations instead of words.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: