Bayesian Inference for Zodiac and Other Homophonic Ciphers
Ravi, Sujith and Knight, Kevin

Article Structure

Abstract

We introduce a novel Bayesian approach for deciphering complex substitution ciphers.

Introduction

Substitution ciphers have been used widely in the past to encrypt secrets behind messages.

Letter Substitution Ciphers

We use natural language processing techniques to attack letter substitution ciphers.

Decipherment

Given a ciphertext message cl...cn, the goal of decipherment is to uncover the hidden plaintext message pl...pn.

Experiments and Results

We run decipherment experiments on different types of letter substitution ciphers (described in Section 2).

Conclusion

In this work, we presented a novel Bayesian decipherment approach that can effectively solve a va-

Topics

LM

Appears in 10 sentences as: LM (11)
In Bayesian Inference for Zodiac and Other Homophonic Ciphers
  1. We build a statistical English language model ( LM ) for the plaintext source model P (p), which assigns a probability to any English letter sequence.
    Page 4, “Decipherment”
  2. We build an interpolated word+n-gram LM and use it to assign P (p) probabilities to any plaintext letter sequence p1...pn.3 The advantage is that it helps direct the sampler towards plaintext hypotheses that resemble natural language—high probability letter sequences which form valid words such as “H E L L O” instead of sequences like “‘T X H R T”.
    Page 6, “Decipherment”
  3. 3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
    Page 6, “Decipherment”
  4. The word-based LM is constructed from a dictionary consisting of 9,881 frequently occurring words collected from Wikipedia articles.
    Page 6, “Decipherment”
  5. We train the letter n-gram LM on 50 million words of English text available from the Linguistic Data Consortium.
    Page 6, “Decipherment”
  6. Even with a 3-gram letter LM , our method yields a +63% improvement in decipherment accuracy over EM on the homophonic cipher with spaces.
    Page 7, “Experiments and Results”
  7. We observe that the word+3—gram LM proves highly effective when tackling more complex ciphers and cracks the Zodiac-408 cipher.
    Page 7, “Experiments and Results”
  8. 0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.
    Page 8, “Experiments and Results”
  9. On a 98-letter simple substitution cipher, EM using 3-gram LM achieves 41% accuracy, whereas the method from Ravi and Knight (2009) scores 84% accuracy.
    Page 8, “Experiments and Results”
  10. Our Bayesian method performs the best in this case, achieving 100% with word+3— gram LM .
    Page 8, “Experiments and Results”

See all papers in Proc. ACL 2011 that mention LM.

See all papers in Proc. ACL that mention LM.

Back to top.

n-gram

Appears in 9 sentences as: n-gram (10)
In Bayesian Inference for Zodiac and Other Homophonic Ciphers
  1. (2006) use the Expectation Maximization (EM) algorithm (Dempster et al., 1977) to search for the best probabilistic key using letter n-gram models.
    Page 1, “Introduction”
  2. Ravi and Knight (2008) formulate decipherment as an integer programming problem and provide an exact method to solve simple substitution ciphers by using letter n-gram models along with deterministic key constraints.
    Page 1, “Introduction”
  3. 0 Our new method combines information from word dictionaries along with letter n-gram models, providing a robust decipherment model which offsets the disadvantages faced by previous approaches.
    Page 2, “Introduction”
  4. Combining letter n-gram language models with word dictionaries: Many existing probabilistic approaches use statistical letter n-gram language models of English to assign P (p) probabilities to plaintext hypotheses during decipherment.
    Page 6, “Decipherment”
  5. 3We set the interpolation weights for the word and n-gram LM as (0.9, 0.1).
    Page 6, “Decipherment”
  6. We train the letter n-gram LM on 50 million words of English text available from the Linguistic Data Consortium.
    Page 6, “Decipherment”
  7. EM Method using letter n-gram LMs following the approach of Knight et al.
    Page 7, “Experiments and Results”
  8. 0 Letter n-gram versus W0rd+n-gram LMs—Figure 2 shows that using a word+3-gram LM instead of a 3-gram LM results in +75% improvement in decipherment accuracy.
    Page 8, “Experiments and Results”
  9. Unlike previous approaches, our method combines information from letter n-gram language models and word dictionaries and provides a robust decipherment model.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2011 that mention n-gram.

See all papers in Proc. ACL that mention n-gram.

Back to top.

language models

Appears in 6 sentences as: language model (3) language models (4)
In Bayesian Inference for Zodiac and Other Homophonic Ciphers
  1. Our method uses a decipherment model which combines information from letter n—gram language models as well as word dictionaries.
    Page 1, “Abstract”
  2. We build a statistical English language model (LM) for the plaintext source model P (p), which assigns a probability to any English letter sequence.
    Page 4, “Decipherment”
  3. For the plaintext source model, we use probabilities from an English language model and for the channel model, we specify a uniform distribution (i.e., a plaintext letter can be substituted with any given cipher type with equal probability).
    Page 6, “Decipherment”
  4. Combining letter n-gram language models with word dictionaries: Many existing probabilistic approaches use statistical letter n-gram language models of English to assign P (p) probabilities to plaintext hypotheses during decipherment.
    Page 6, “Decipherment”
  5. 4For letter substitution decipherment we want to keep the language model probabilities fixed during training, and hence we set the prior on that model to be high (a = 104).
    Page 7, “Decipherment”
  6. Unlike previous approaches, our method combines information from letter n-gram language models and word dictionaries and provides a robust decipherment model.
    Page 8, “Conclusion”

See all papers in Proc. ACL 2011 that mention language models.

See all papers in Proc. ACL that mention language models.

Back to top.

natural language

Appears in 4 sentences as: natural language (4)
In Bayesian Inference for Zodiac and Other Homophonic Ciphers
  1. We use natural language processing techniques to attack letter substitution ciphers.
    Page 2, “Letter Substitution Ciphers”
  2. In a letter substitution cipher, every letter p in the natural language (plaintext) sequence is replaced by a cipher token 0, according to some substitution key.
    Page 2, “Letter Substitution Ciphers”
  3. Bayesian inference methods have become popular in natural language processing (Goldwater and Grif-fiths, 2007; Finkel et al., 2005; Blunsom et al., 2009; Chiang et al., 2010).
    Page 4, “Decipherment”
  4. A common phenomenon observed while modeling natural language problems is sparsity.
    Page 4, “Decipherment”

See all papers in Proc. ACL 2011 that mention natural language.

See all papers in Proc. ACL that mention natural language.

Back to top.

model parameters

Appears in 3 sentences as: model parameter (1) model parameters (2)
In Bayesian Inference for Zodiac and Other Homophonic Ciphers
  1. These methods are attractive for their ability to manage uncertainty about model parameters and allow one to incorporate prior knowledge during inference.
    Page 4, “Decipherment”
  2. Our goal is to estimate the channel model parameters 6 in order to maximize the probability of the observed ciphertext c:
    Page 4, “Decipherment”
  3. The base distribution P0 represents prior knowledge about the model parameter distributions.
    Page 6, “Decipherment”

See all papers in Proc. ACL 2011 that mention model parameters.

See all papers in Proc. ACL that mention model parameters.

Back to top.