Index of papers in Proc. ACL 2013 that mention

Seen in text as:

LM (134)

Seen in 123 sentences in 13 papers.

1. Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Braune, Fabienne and Seemann, Nina and Quernheim, Daniel and Maletti, Andreas

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Decoding	The language model ( LM ) scoring is directly integrated into the cube pruning algorithm.
Decoding	Thus, LM estimates are available for all considered hypotheses.
Decoding	Because the output trees can remain discontiguous after hypothesis creation, LM scoring has to be done individually over all output trees.

LM is mentioned in 17 sentences in this paper.

Topics mentioned in this paper:

2. Sentence Level Dialect Identification in Arabic

Elfardy, Heba and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Approach to Sentence-Level Dialect Identification	The aforementioned approach relies on language models ( LM ) and MSA and EDA Morphological Analyzer to decide whether each word is (a) MSA, (b) EDA, (c) Both (MSA & EDA) or (d) OOV.
Approach to Sentence-Level Dialect Identification	The following variants of the underlying token-level system are built to assess the effect of varying the level of preprocessing on the underlying LM on the performance of the overall sentence level dialect identification process: (1) Surface, (2) Tokenized, (3) CODAfied, and (4) Tokenized-CODA.
Approach to Sentence-Level Dialect Identification	Orthography Normalized (CODAfied) LM : since DA is not originally a written form of Arabic, no standard orthography exists for it.
Related Work	Amazon Mechanical Turk and try a language modeling ( LM ) approach to solve the problem.

LM is mentioned in 15 sentences in this paper.

Topics mentioned in this paper:

3. Advancements in Reordering Models for Statistical Machine Translation

Feng, Minwei and Peter, Jan-Thorsten and Ney, Hermann

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Comparative Study	Figure 4: bilingual LM illustration.
Comparative Study	4.3 Bilingual LM
Comparative Study	We build a 9-gram LM using SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing.
Experiments	Our baseline is a phrase-based decoder, which includes the following models: an n-gram target-side language model ( LM ), a phrase translation model and a word-based lexicon model.
Experiments	0 lowercased training data from the GALE task (Table 1, UN corpus not included) alignment trained with GIZA++ o tuning corpus: NIST06 test corpora: NIST02 03 04 05 and 08 o 5-gram LM (1694412027 running words) trained by SRILM toolkit (Stolcke, 2002) with modified Kneser—Ney smoothing training data: target side of bilingual data.

LM is mentioned in 14 sentences in this paper.

Topics mentioned in this paper:

CRFs (19)
LM (14)
BLEU (8)

4. Accurate Word Segmentation using Transliteration and Language Model Projection

Hagiwara, Masato and Sekine, Satoshi

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Abstract	We propose an online approach, integrating source LM, and / or, back-transliteration and English LM .
Experiments	We observed slight improvement by incorporating the source LM , and observed a 0.48 point F—value increase over baseline, which translates to 4.65 point Katakana F—value change and 16.0% (3.56% to 2.99 %) WER reduction, mainly due to its higher Katakana word rate (11.2%).
Experiments	H. This type of error is reduced by +LM-P, e.g., * 7°?X/fify 7 pumsu chikku “*plus tick” to 703%?‘y7 pumsuchz’kku “plastic” due to LM projection.
Experiments	performance, which may be because one cannot limit where the source LM features are applied.
Introduction	We refer to this process of transliterating unknown words into another language and using the target LM as LM projection.
Introduction	Since the model employs a general transliteration model and a general English LM , it achieves robust WS for unknown words.
Use of Language Model	As we mentioned in Section 2, English LM knowledge helps split transliterated compounds.
Use of Language Model	We use ( LM ) projection, which is a combination of back-transliteration and an English model, by extending the normal lattice building process as follows:
Use of Language Model	Then, edges are spanned between these extended English nodes, instead of between the original nodes, by additionally taking into consideration English LM features (ID 21 and 22 in Table 1): ¢iMP(wi) = IOgPWi) and ¢§Mp(wi—1,wi) = log p(wi_1,wi).
Word Segmentation Model	Figure 1: Example lattice with LM projection

LM is mentioned in 19 sentences in this paper.

Topics mentioned in this paper:

LM (19)
PoS tags (4)
bigram (3)

5. Name-aware Machine Translation

Li, Haibo and Zheng, Jing and Ji, Heng and Li, Qi and Wang, Wen

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Baseline MT	The LM used for decoding is a log-linear combination of four word n-gram LMs which are built on different English
Baseline MT	corpora (details described in section 5.1), with the LM weights optimized on a development set and determined by minimum error rate training (MERT), to estimate the probability of a word given the preceding words.
Experiments	LM1 is a 7-gram LM trained on the tar-
Experiments	LM2 is a 7-gram LM trained only on the English monolingual discussion forums data listed above.
Experiments	LM3 is a 4-gram LM trained on the web genre among the target side of all parallel text (i.e., web text from pre-BOLT parallel text and BOLT released discussion forum parallel text).
Introduction	Optimize name translation and context translation simultaneously and conduct name translation driven decoding with language model ( LM ) based selection (Section 3.2).
Related Work	enable the LM to decide which translations to choose when encountering the names in the texts (Ji et al., 2009).
Related Work	The LM selection method often assigns an inappropriate weight to the additional name translation table because it is constructed independently from translation of context words; therefore after weighted voting most correct name translations are not used in the final translation output.
Related Work	More importantly, in these approaches the MT model was still mostly treated as a “black-box” because neither the translation model nor the LM was updated or adapted specifically for names.

LM is mentioned in 12 sentences in this paper.

Topics mentioned in this paper:

BLEU (19)
word alignment (17)
LM (12)

6. Reranking with Linguistic and Semantic Features for Arabic Optical Character Recognition

Tomeh, Nadi and Habash, Nizar and Roth, Ryan and Farra, Noura and Dasigi, Pradeep and Diab, Mona

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discriminative Reranking for OCR	Base features include the HMM and LM scores produced by the OCR system.
Discriminative Reranking for OCR	Word LM features (“LM-word”) include the log probabilities of the hypothesis obtained using n-gram LMs with n E {1, .
Discriminative Reranking for OCR	The LM models are built using the SRI Language Modeling Toolkit (Stolcke, 2002).
Experiments	Our baseline is based on the sum of the logs of the HMM and LM scores.
Experiments	For LM training we used 220M words from Arabic Gigaword 3, and 2.4M words from each “print” and “hand” ground truth annotations.
Introduction	The BBN Byblos OCR system (Natajan et al., 2002; Prasad et al., 2008; Saleem et al., 2009), which we use in this paper, relies on a hidden Markov model (HMM) to recover the sequence of characters from the image, and uses an n-gram language model ( LM ) to emphasize the fluency of the output.
Introduction	For an input image, the OCR decoder generates an n-best list of hypotheses each of which is associated with HMM and LM scores.

LM is mentioned in 9 sentences in this paper.

Topics mentioned in this paper:

reranking (14)
LM (9)
error rate (4)

7. Translating Dialectal Arabic to English

Sajjad, Hassan and Darwish, Kareem and Belinkov, Yonatan

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Previous Work	Our LM experiments also affirmed the importance of in-domain English LMs.
Previous Work	‘Train LM BLEU oov
Previous Work	Table 1: Baseline results using the EG and AR training sets with G W and EGen corpora for LM training
Proposed Methods 3.1 Egyptian to EG’ Conversion	Then we used a trigram LM that we built from the aforementioned Aljazeera articles to pick the most likely candidate in context.
Proposed Methods 3.1 Egyptian to EG’ Conversion	We simply multiplied the character-level transformation probability with the LM probability — giving them equal weight.
Proposed Methods 3.1 Egyptian to EG’ Conversion	- SI and $2 trained on the EG’ with EGen and both EGen and GW for LM training respectively.

LM is mentioned in 8 sentences in this paper.

Topics mentioned in this paper:

BLEU (13)
parallel data (9)
LM (8)

8. Scalable Decipherment for Machine Translation via Hash Sampling

Ravi, Sujith

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Bayesian MT Decipherment via Hash Sampling	5 A high value for the LM concentration parameter oz ensures that the LM probabilities do not deviate too far from the original fixed base distribution during sampling.
Decipherment Model for Machine Translation	For P(e), we use a word n-gram language model ( LM ) trained on monolingual target text.
Experiments and Results	We observe that our method produces much better results than the others even with a 2-gram LM .
Experiments and Results	With a 3-gram LM , the new method achieves the best performance; the highest BLEU score reported on this task.
Experiments and Results	Bayesian Hash Sampling with 2—gram LM vocab=full (V6), addjertility=n0 4.2 vocab=pruned*, add_fertility=yes 5.3

LM is mentioned in 7 sentences in this paper.

Topics mentioned in this paper:

9. Vector Space Model for Adaptation in Statistical Machine Translation

Chen, Boxing and Kuhn, Roland and Foster, George

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	Other features include lexical weighting in both directions, word count, a distance-based RM, a 4-gram LM trained on the target side of the parallel data, and a 6-gram English Gigaword LM .
Experiments	Some of the results reported above involved linear TM mixtures, but none of them involved linear LM mixtures.
Experiments	For instance, with an initial Chinese system that employs linear mixture LM adaptation (lin-lm) and has a BLEU of 32.1, adding l-feature VSM adaptation (+vsm, joint) improves performance to 33.1 (improvement significant at p < 0.01), while adding 3-feature VSM instead (+vsm, 3 feat.)
Introduction	Both were studied in (Foster and Kuhn, 2007), which concluded that the best approach was to combine sub-models of the same type (for instance, several different TMs or several different LMs) linearly, while combining models of different types (for instance, a mixture TM with a mixture LM ) log-linearly.

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

10. Grammatical Error Correction Using Integer Linear Programming

Wu, Yuanbin and Ng, Hwee Tou

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Inference with First Order Variables	The language model score h(s’, LM ) of 8’ based on a large web corpus;
Inference with First Order Variables	wlmflc : VLMh(8/7 LM ) + Z )‘tf(8/7
Inference with First Order Variables	wNoun,2,singular : VLMh/(Sl, LM )+ AARTfls’, ART) + APREpfls’, PREP) + ANOUNfls’, NOUN)+ IU’ARTg(S/7 ART) + ,U/PREPg(3/a PREP) ‘i— MNOUNg(8/, NOUN).
Inference with Second Order Variables	wufl) : VLMh(8/7 LM ) + Z )‘tf(8/7

LM is mentioned in 5 sentences in this paper.

Topics mentioned in this paper:

11. Online Relative Margin Maximization for Statistical Machine Translation

Eidelman, Vladimir and Marton, Yuval and Resnik, Philip

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Discussion	6In our preliminary experiments with the smaller trigram LM , MERT did better on MT05 in the smaller feature set, and MIRA had a small advantage in two cases.
Experiments	We trained a 4-gram LM on the
Experiments	In preliminary experiments with a smaller trigram LM , our RM method consistently yielded the highest scores in all Chinese-English tests — up to 1.6 BLEU and 6.4 TER from MIRA, the second best performer.
The Relative Margin Machine in SMT	4- gram LM 24M 600M —

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

feature set (25)
BLEU (18)
TER (11)

12. A Multi-Domain Translation Model Framework for Statistical Machine Translation

Sennrich, Rico and Schwenk, Holger and Aransa, Walid

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Translation Model Architecture	We find that an adaptation of the TM and LM to the full development set (system “1 cluster”) yields the smallest improvements over the unadapted baseline.
Translation Model Architecture	For the IT test set, the system with gold labels and TM adaptation yields an improvement of 0.7 BLEU (21.1 —> 21.8), LM adaptation yields 1.3 BLEU (21.1 —> 22.4), and adapting both models outperforms the baseline by 2.1 BLEU (21.1 —> 23.2).
Translation Model Architecture	TM adaptation with 8 clusters (21.1 —> 21.8 —> 22.1), or LM adaptation with 4 or 8 clusters (21.1 —> 22.4 —> 23.1).

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper:

13. Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

Zhou, Guangyou and Liu, Fang and Liu, Yang and He, Shizhu and Zhao, Jun

In Proc. ACL 2013, part of Proceedings of the Annual Meeting of the Association for Computational Linguistics.

Experiments	‘ # 1 Methods \| MAP 1 P@ 10 l 1 VSM 0.242 0.226 2 LM 0.385 0.242 3 Jeon et a1.
Experiments	Row 1 and row 2 are two baseline systems, which model the relevance score using VSM (Cao et al., 2010) and language model ( LM ) (Zhai and Laf-ferty, 2001; Cao et al., 2010) in the term space.
Experiments	(l) Monolingual translation models significantly outperform the VSM and LM (row 1 and
Introduction	As a principle approach to capture semantic word relations, word-based translation models are built by using the IBM model 1 (Brown et al., 1993) and have been shown to outperform traditional models (e.g., VSM, BM25, LM ) for question retrieval.

LM is mentioned in 4 sentences in this paper.

Topics mentioned in this paper: