Index of papers in Proc. ACL 2010 that mention

language model

Seen in text as:

language model (151)
language models (49)
language modeling (5)
Language Modeling (4)
Language model (3)
Language Model (3)

Seen in 206 sentences in 24 papers.

1. A Rational Model of Eye Movement Control in Reading

Bicknell, Klinton and Levy, Roger

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Explaining between-word regressions	This simple example just illustrates the point that if a reader is combining noisy visual information with a language model , then confidence in previous regions will sometimes fall.
Models of eye movements in reading	Unfortunately, however, the Mr. Chips model simplifies the problem of reading in a number of ways: First, it uses a unigram model as its language model , and thus fails to use any information in the linguistic context to help with word identification.
Models of eye movements in reading	Specifically, our model identifies the words in a sentence by performing Bayesian inference combining noisy input from a realistic visual model with a language model that takes context into account.
Reading as Bayesian inference	Specifically, the model begins reading with a prior distribution over possible identities of a sentence given by its language model .
Reading as Bayesian inference	model’s prior distribution over the identity of the sentence given the language model is updated to a posterior distribution taking into account both the language model and the visual input obtained thus far.
Reading as Bayesian inference	Given the visual input and a language model, inferences about the identity of the sentence w can be made by standard Bayesian inference, where the prior is given by the language model and the likelihood is a function of the total visual input obtained from the first to the ith timestep Ii ,
Simulation 1	5.1.2 Language model
Simulation 1	Our reader’s language model was an unsmoothed bigram model created using a vocabulary set con-
Simulation 1	Specifically, we constructed the model’s initial belief state (i.e., the distribution over sentences given by its language model ) by directly translating the bigram model into a wFSA in the log semiring.
Simulation 2	6.1.3 Language model

language model is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

language model (15)
bigram (8)

2. An Exact A* Method for Deciphering Letter-Substitution Ciphers

Corlett, Eric and Penn, Gerald

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiment	In the event that a trigram or bigram would be found in the plaintext that was not counted in the language model , add one smoothing was used.
Experiment	Our character-level language model used was developed from the first 1.5 million characters of the Wall Street Journal section of the Penn Tree-bank corpus.
Introduction	If the text from which a language model is trained is of a different genre than the plaintext of a cipher, the unigraph letter frequencies may differ substantially from those of the language model , and so frequency counting will be misleading.
Introduction	Such inefficiency indicates that integer programming may simply be the wrong tool for the job, possibly because language model probabilities computed from empirical data are not smoothly distributed enough over the space in which a cutting-plane method would attempt to compute a linear relaxation of this problem.
Introduction	This difference in difficulty, while real, is not inherent, but rather an artefact of the character-level n-gram language models that they (and we) use, in which preponderant evidence of differences in short character sequences is necessary for the model to clearly favour one letter-substitution mapping over another.
Terminology	Every possible full solution to a cipher C will produce a plaintext string with some associated language model probability, and we will consider the best possible solution to be the one that gives the highest probability.
Terminology	For the sake of concreteness, we will assume here that the language model is a character-level trigram model.
The Algorithm	Backpointers are necessary to reference one of the two language model probabilities.
The Algorithm	Cells that would produce inconsistencies are left at zero, and these as well as cells that the language model assigns zero to can only produce zero entries in later columns.
The Algorithm	The n p x n p cells of every column 2' do not depend on each other —only on the cells of the previous two columns 2' — 1 and i— 2, as well as the language model .

language model is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

3. Pseudo-Word for Phrase-Based Machine Translation

Duan, Xiangyu and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion	Removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, shows that pseudo-words still improve translational performance significantly over unary words.
Experiments and Results	The pipeline uses GIZA++ model 4 (Brown et al., 1993; Och and Ney, 2003) for pseudo-word alignment, uses Moses (Koehn et al., 2007) as phrase-based decoder, uses the SRI Language Modeling Toolkit to train language model with modified Kneser-Ney smoothing (Kneser and Ney 1995; Chen and Goodman 1998).
Experiments and Results	A 5-gram language model is trained on English side of parallel corpus.
Experiments and Results	Xinhua portion of the English Gigaword3 corpus is used together with English side of large corpus to train a 4-gram language model .
Introduction	Further experiments of removing the power of higher order language model and longer max phrase length, which are inherent in pseudo-words, show that pseudo-words still improve translational performance significantly over unary words.

language model is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

4. Hindi-to-Urdu Machine Translation through Transliteration

Durrani, Nadir and Sajjad, Hassan and Fraser, Alexander and Schmid, Helmut

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Our Approach	(1) 3.1.1 Language Model
Our Approach	The language model (LM) pm?)
Our Approach	The parameters of the language model are learned from a monolingual Urdu corpus.

language model is mentioned in 18 sentences in this paper.

Topics mentioned in this paper:

5. Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Pitler, Emily and Louis, Annie and Nenkova, Ani

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Indicators of linguistic quality	3.1 Word choice: language models
Indicators of linguistic quality	Language models (LM) are a way of computing how familiar a text is to readers using the distribution of words from a large background corpus.
Indicators of linguistic quality	We built unigram, bigram, and trigram language models with Good-Turing smoothing over the New York Times (NYT) section of the English Gigaword corpus (over 900 million words).
Results and discussion	Coh-Metrix, which has been proposed as a comprehensive characterization of text, does not perform as well as the language model and the entity coherence classes, which contain considerably fewer features related to only one aspect of text.
Results and discussion	It is apparent from the results that continuity, entity coherence, sentence fluency and language models are the most powerful classes of features that should be used in automation of evaluation and against which novel predictors of text quality should be compared.
Results and discussion	For example, the language model features, which are the second best class for the system-level, do not fare as well at the input-level.

language model is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

6. Generating Image Descriptions Using Dependency Relational Patterns

Aker, Ahmet and Gaizauskas, Robert

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	Our results show that summaries biased by dependency pattern models lead to significantly higher ROUGE scores than both n-gram language models reported in previous work and also Wikipedia baseline summaries.
Introduction	They also experimented with representing such conceptual models using n- gram language models derived from corpora consisting of collections of descriptions of instances of specific object types (e.g.
Introduction	a corpus of descriptions of churches, a corpus of bridge descriptions, and so on) and reported results showing that incorporating such n-gram language models as a feature in a feature-based extractive summarizer improves the quality of automatically generated summaries.
Introduction	The main weakness of n-gram language models is that they only capture very local information aboutshofitennsequencesandcannotnnxkfllong distance dependencies between terms.
Representing conceptual models 2.1 Object type corpora	2.2 N-gram language models
Representing conceptual models 2.1 Object type corpora	Aker and Gaizauskas (2009) experimented with uni-gram and bi-gram language models to capture the features commonly used when describing an object type and used these to bias the sentence selection of the summarizer towards the sentences that contain these features.
Representing conceptual models 2.1 Object type corpora	As in Song and Croft (1999) they used their language models in a gener-

language model is mentioned in 16 sentences in this paper.

Topics mentioned in this paper:

language models (16)
n-gram (11)

7. Constituency to Dependency Translation with Forests

Mi, Haitao and Liu, Qun

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model ) to ensure grammaticality.
Decoding	where the first two terms are translation and language model probabilities, 6(0) is the target string (English sentence) for derivation 0, the third and forth items are the dependency language model probabilities on the target side computed with words and POS tags separately, De (0) is the target dependency tree of 0, the fifth one is the parsing probability of the source side tree TC(0) 6 FC, the ill(0) is the penalty for the number of ill-formed dependency structures in 0, and the last two terms are derivation and translation length penalties, respectively.
Decoding	For each node, we use the cube pruning technique (Chiang, 2007; Huang and Chiang, 2007) to produce partial hypotheses and compute all the feature scores including the dependency language model score (Section 4.1).
Decoding	4.1 Dependency Language Model Computing
Experiments	We also store the POS tag information for each word in dependency trees, and compute two different dependency language models for words and POS tags in dependency tree separately.
Experiments	We use SRI Language Modeling Toolkit (Stolcke, 2002) to train a 4-gram language model with Kneser-Ney smoothing on the first 1/3 of the Xinhua portion of Giga-word corpus.
Experiments	This suggests that using dependency language model really improves the translation quality by less than 1 BLEU point.

language model is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

8. Efficient Inference through Cascades of Weighted Tree Transducers

May, Jonathan and Knight, Kevin and Vogler, Heiko

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding Experiments	We add an English syntax language model £ to the cascade of transducers just described to better simulate an actual machine translation decoding task.
Decoding Experiments	The language model is cast as an identity WTT and thus fits naturally into the experimental framework.
Decoding Experiments	In our experiments we try several different language models to demonstrate varying performance of the application algorithms.

language model is mentioned in 11 sentences in this paper.

Topics mentioned in this paper:

language model (11)

9. A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages

Beaufort, Richard and Roekhaut, Sophie and Cougnon, Louise-Amélie and Fairon, Cédrick

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and perspectives	It would also be interesting to test the impact of another lexical language model , learned on non-SMS sentences.
Evaluation	The language model of the evaluation is a 3-gram.
Evaluation	(2008a), who showed on a French corpus comparable to ours that, if using a larger language model is always rewarded, the improvement quickly decreases with every higher level and is already quite small between 2-gram and 3-gram.
Overview of the system	In our system, all lexicons, language models and sets of rules are compiled into finite-state machines (FSMs) and combined with the input text by composition (0).
Overview of the system	Third, a combination of the lattice of solutions with a language model , and the choice of the best sequence of lexical units.
Related work	A language model is then applied on the word lattice, and the most probable word sequence is finally chosen by applying a best-path algorithm on the lattice.
The normalization models	All tokens Tj of S are concatenated together and composed with the lexical language model LM.
The normalization models	4.6 The language model
The normalization models	Our language model is an n-gram of lexical forms, smoothed by linear interpolation (Chen and Goodman, 1998), estimated on the normalized part of our training corpus and compiled into a weighted FST LMw.

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

10. Complexity Metrics in an Incremental Right-Corner Parser

Wu, Stephen and Bachrach, Asaf and Cardenas, Carlos and Schuler, William

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	This is particularly true when the sentence structure is defined in a language model that is psycholinguistically plausible (here, bounded-memory right-corner form).
Discussion	This accords with an understated result of Boston et al.’s eye-tracking study (2008a): a richer language model predicts eye movements during reading better than an oversimplified one.
Discussion	Frank (2009) similarly reports improvements in the reading-time predictiveness of unlexi-calized surprisal when using a language model that is more plausible than PCFGs.
Introduction	Ideally, a psychologically-plausible language model would produce a surprisal that would correlate better with linguistic complexity.
Introduction	Therefore, the specification of how to encode a syntactic language model is of utmost importance to the quality of the metric.
Introduction	The purpose of this paper is to determine whether the language model defined by the HHMM parser can also predict reading times —it would be strange if a psychologically plausible model did not also produce Viable complexity metrics.
Parsing Model	Both of these metrics fall out naturally from the time-series representation of the language model .
Parsing Model	With the understanding of what operations need to occur, a formal definition of the language model is in order.

language model is mentioned in 10 sentences in this paper.

Topics mentioned in this paper:

11. Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

Mitchell, Jeff and Lapata, Mirella and Demberg, Vera and Keller, Frank

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	In this paper we analyze reading times in terms of a single predictive measure which integrates a model of semantic composition with an incremental parser and a language model .
Integrating Semantic Constraint into Surprisal	While surprisal is a theoretically well-motivated measure, formalizing the idea of linguistic processing being highly predictive in terms of probabilistic language models , the measurement of semantic constraint in terms of vector similarities lacks a clear motivation.
Integrating Semantic Constraint into Surprisal	This can be achieved by turning a vector model of semantic similarity into a probabilistic language model .
Integrating Semantic Constraint into Surprisal	There are in fact a number of approaches to deriving language models from distributional models of semantics (e.g., Bellegarda 2000; Coccaro and Jurafsky 1998; Gildea and Hofmann 1999).
Models of Processing Difficulty	The basic idea is that the processing costs relating to the expectations of the language processor can be expressed in terms of the probabilities assigned by some form of language model to the input.
Models of Processing Difficulty	Surprisal could be also defined using a vanilla language model that does not take any structural or grammatical information into account (Frank 2009).

language model is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

12. Word Representations: A Simple and General Method for Semi-Supervised Learning

Turian, Joseph and Ratinov, Lev-Arie and Bengio, Yoshua

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Clustering-based word representations	So it is a class-based bigram language model .
Clustering-based word representations	Deschacht and Moens (2009) use a latent-variable language model to improve semantic role labeling.
Distributed representations	Word embeddings are typically induced using neural language models , which use neural networks as the underlying predictive model (Bengio, 2008).
Distributed representations	Historically, training and testing of neural language models has been slow, scaling as the size of the vocabulary for each model computation (Bengio et al., 2001; Bengio et al., 2003).
Distributed representations	Collobert and Weston (2008) presented a neural language model that could be trained over billions of words, because the gradient of the loss was computed stochastically over a small sample of possible outputs, in a spirit similar to Bengio and Sénecal (2003).
Introduction	Neural language models (Bengio et al., 2001; Schwenk & Gauvain, 2002; Mnih & Hinton, 2007; Collobert & Weston, 2008), on the other hand, induce dense real-valued low-dimensional
Introduction	(See Bengio (2008) for a more complete list of references on neural language models .)
Unlabled Data	These auxiliary tasks are sometimes specific to the supervised task, and sometimes general language modeling tasks like “predict the missing word”.

language model is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

13. Open-Domain Semantic Role Labeling by Modeling Word Spans

Huang, Fei and Yates, Alexander

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We leverage recently-developed techniques for learning representations of text using latent-variable language models , and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling.
Introduction	Using latent-variable language models , we learn representations of texts that provide novel kinds of features to our supervised learning algorithms.
Introduction	The next section provides background information on learning representations for NLP tasks using latent-variable language models .
Introduction	2 Open-Domain Representations Using Latent-Variable Language Models

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

14. Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

Yeniterzi, Reyyan and Oflazer, Kemal

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusions	The problem is essentially one of generating multiple candidate sentences with the unattached function words ambiguously positioned (say in a lattice) and then use a second language model to rerank these sentences to select the target sentence.
Experimental Setup and Results	Furthermore, in factored models, we can employ different language models for different factors.
Experimental Setup and Results	We believe that the use of multiple language models (some much less sparse than the surface LM) in the factored baseline is the main reason for the improvement.
Experimental Setup and Results	3.2.3 Experiments with higher-order language models
Introduction	The main reason given for these problems was that the same statistical translation, reordering and language modeling mechanisms were being employed to both determine the morphological structure of the words and, at the same time, get the global order of the words correct.

language model is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

15. Learning Phrase-Based Spelling Error Models from Clickthrough Data

Sun, Xu and Gao, Jianfeng and Micol, Daniel and Quirk, Chris

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Related Work	First, a pre-def1ned confusion set is used to generate candidate corrections, then a scoring model, such as a trigram language model or na'1've Bayes classifier, is used to rank the candidates according to their context (e.g., Golding and Roth, 1996; Mangu and Brill, 1997; Church et al., 2007).
Related Work	(2009) present a query speller system in which both the error model and the language model are trained using Web data.
Related Work	Typically, a language model (source model) is used to capture contextual information, while an error model (channel model) is considered to be context free in that it does not take into account any contextual information in modeling word transformation probabilities.
The Baseline Speller System	where the error model P(QIC) models the transformation probability from C to Q, and the language model P(C) models how likely C is a correctly spelled query.
The Baseline Speller System	The language model (the second factor) is a backoff bigram model trained on the tokenized form of one year of query logs, using maximum likelihood estimation with absolute discounting smoothing.
The Baseline Speller System	Since we define the logarithm of the probabilities of the language model and the error model (i.e., the edit distance function) as features, the ranker can be viewed as a more general framework, subsuming the source channel model as a special case.

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

16. Training Phrase Translation Models with Leaving-One-Out

Wuebker, Joern and Mauser, Arne and Ney, Hermann

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Alignment	A language model is not used in this case, as the system is constrained to the given target sentence and thus the language model score has no effect on the alignment.
Alignment	To deal with this problem, instead of simple phrase length restriction, we propose to apply the leaving-one-out method, which is also used for language modeling techniques (Kneser and Ney, 1995).
Experimental Evaluation	The baseline system is a standard phrase-based SMT system with eight features: phrase translation and word lexicon probabilities in both translation directions, phrase penalty, word penalty, language model score and a simple distance-based reordering model.
Experimental Evaluation	We used a 4-gram language model with modified Kneser-Ney discounting for all experiments.
Introduction	The phrase model is combined with a language model , word lexicon models, word and phrase penalty, and many oth: ers.
Related Work	They report improvements over a phrase-based model that uses an inverse phrase model and a language model .

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

17. Boosting-Based System Combination for Machine Translation

Xiao, Tong and Zhu, Jingbo and Zhu, Muhua and Wang, Huizhen

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Background	Since all the member systems share the same data resources, such as language model and translation table, we only need to keep one copy of the required resources in memory.
Background	Another method to speed up the system is to accelerate n-gram language model with n-gram caching techniques.
Background	If the required n-gram hits the cache, the corresponding n-gram probability is returned by the cached copy rather than re-fetching the original data in language model .

language model is mentioned in 6 sentences in this paper.

Topics mentioned in this paper:

18. Modeling Norms of Turn-Taking in Multi-Party Conversation

Laskowski, Kornel

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conceptual Framework	In language modeling practice, one finds the likelihood P ( w \| (9 of a word sequence w of length under a model (9, to be an inconvenient measure for comparison.
Discussion	This makes it suitable for comparison of conversational genres, in much the same way as are general language models of words.
Discussion	Accordingly, as for language models , density estimation in future turn-taking models may be im-
Introduction	The current work attempts to address this problem by proposing a simple framework, which, at least conceptually, borrows quite heavily from the standard language modeling paradigm.

language model is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

language modeling (4)
bigram (3)

19. How Many Words Is a Picture Worth? Automatic Caption Generation for News Images

Feng, Yansong and Lapata, Mirella

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstractive Caption Generation	Specifically, we use an adaptive language model (Kneser et al., 1997) that modifies an
Abstractive Caption Generation	where P(wi E C \|wi E D) is the probability of W, appearing in the caption given that it appears in the document D, and Padap(wi\|wi_1,wi_2) the language model adapted with probabilities from our image annotation model:
Experimental Setup	The scaling parameter [3 for the adaptive language model was also tuned on the development set using a range of [05,09].

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

20. A Statistical Model for Lost Language Decipherment

Snyder, Benjamin and Barzilay, Regina and Knight, Kevin

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Evaluation Tasks and Results	To produce baseline cognate identification predictions, we calculate the probability of each latent Hebrew letter sequence predicted by the HMM, and compare it to a uniform character-level Ugaritic language model (as done by our model, to avoid automatically assigning higher cognate probability to shorter Ugaritic words).
Inference	We also calculate P = 0) using a uniform uni-gram character-level language model (and thus depends only on the number of characters in ui).
Model	Otherwise, a lone word it is generated, according a uniform character-level language model .

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

21. Recommendation in Internet Forums and Blogs

Wang, Jia and Li, Qing and Chen, Yuanzhu Peter and Lin, Zhangxi

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Conclusion and Future Work	By combining such information with traditional statistical language models , it is capable of suggesting relevant articles that meet the dynamic nature of a discussion in social media.
Experimental Evaluation	The second one, LM, is based on statistical language models for relevant information retrieval (Ponte and Croft, 1998).
Experimental Evaluation	bilistic language model for each article, and ranks them on query likelihood, i.e.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

22. Bilingual Sense Similarity for Statistical Machine Translation

Chen, Boxing and Foster, George and Kuhn, Roland

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	We trained two language models : the first one is a 4-gram LM which is estimated on the target side of the texts used in the large data condition.
Experiments	Both language models are used for both tasks.
Experiments	Only the target-language half of the parallel training data are used to train the language model in this task.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

23. Fine-Grained Tree-to-String Translation Rule Extraction

Wu, Xianchao and Matsuzaki, Takuya and Tsujii, Jun'ichi

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Here, the first item is the language model (LM) probability where 7'(d) is the target string of derivation d; the second item is the translation length penalty; and the third item is the translation score, which is decomposed into a product of feature values of rules:
Experiments	SRI Language Modeling Toolkit (Stolcke, 2002) was employed to train 5-gram English and Japanese LMs on the training set.
Related Work	By introducing supertags into the target language side, i.e., the target language model and the target side of the phrase table, significant improvement was achieved for Arabic-to-English translation.

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper:

24. Error Detection for Statistical Machine Translation Using Linguistic Features

Xiong, Deyi and Zhang, Min and Li, Haizhou

In Proc. ACL 2010, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Features	To some extent, these two features have similar function to a target language model or pos-based target language model .
Related Work	(2009) study several confidence features based on mutual information between words and n-gram and backward n-gram language model for word-level and sentence-level CE.
SMT System	We build a four-gram language model using the SRILM toolkit (Stolcke, 2002), which is trained

language model is mentioned in 3 sentences in this paper.

Topics mentioned in this paper: