Decipherment Complexity in 1:1 Substitution Ciphers
Nuhn, Malte and Ney, Hermann

Article Structure

Abstract

In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.

Introduction

The decipherment approach for MT has recently gained popularity for training and adapting translation models using only monolingual data.

Related Work

In recent years a large number of publications on the automatic decipherment of substitution ciphers has been published.

Definitions

In the following we will use the machine translation notation and denote the ciphertext with ffv = fl... fj .

Topics

language model

Appears in 20 sentences as: language model (18) language models (2)
In Decipherment Complexity in 1:1 Substitution Ciphers
  1. In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
    Page 1, “Abstract”
  2. The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language.
    Page 1, “Introduction”
  3. This might be related to the fact that a statistical formulation of the decipherment problem has not been analyzed with respect to n-gram language models : This paper shows the close relationship of the decipherment problem to the quadratic assignment problem.
    Page 1, “Introduction”
  4. In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP).
    Page 1, “Introduction”
  5. Section 5 shows the connection between the quadratic assignment problem and decipherment using a bigram language model .
    Page 1, “Introduction”
  6. gram language model .
    Page 2, “Related Work”
  7. denotes the language model .
    Page 2, “Definitions”
  8. Depending on the structure of the language model Equation 2 can be further simplified.
    Page 2, “Definitions”
  9. Similarly, we define language model matrices S for the unigram and the bigram case.
    Page 3, “Definitions”
  10. The unigram language model Sf is defined as
    Page 3, “Definitions”
  11. Similarly for the bigram language model matrix S”, , we define
    Page 3, “Definitions”

See all papers in Proc. ACL 2013 that mention language model.

See all papers in Proc. ACL that mention language model.

Back to top.

bigram

Appears in 14 sentences as: Bigram (1) bigram (11) bigrams (2)
In Decipherment Complexity in 1:1 Substitution Ciphers
  1. In this paper we show that even for the case of 1:1 substitution ciphers—which encipher plaintext symbols by exchanging them with a unique substitute—finding the optimal decipherment with respect to a bigram language model is NP-hard.
    Page 1, “Abstract”
  2. Section 5 shows the connection between the quadratic assignment problem and decipherment using a bigram language model.
    Page 1, “Introduction”
  3. similarly define the bigram count N f f/ of f, f ’ E Vf as
    Page 2, “Definitions”
  4. (a) N f fl are integer counts > 0 of bigrams found in the ciphertext ffv .
    Page 2, “Definitions”
  5. (b) Given the first and last token of the cipher f1 and fN, the bigram counts involving the sentence boundary token $ need to fulfill
    Page 2, “Definitions”
  6. Similarly, we define language model matrices S for the unigram and the bigram case.
    Page 3, “Definitions”
  7. Similarly for the bigram language model matrix S”, , we define
    Page 3, “Definitions”
  8. 5 Decipherment Using Bigram LMs
    Page 3, “Definitions”
  9. (Bauer, 2010) arrives at a similar optimization problem for the “combined method of frequency matching” using bigrams and mentions that it can be seen as a combinatorial problem for which an efficient way of solving is not known.
    Page 4, “Definitions”
  10. Show that S”, is a valid bigram language model matrix.
    Page 5, “Definitions”
  11. We now show that Sff/ is a valid bigram language model matrix:
    Page 5, “Definitions”

See all papers in Proc. ACL 2013 that mention bigram.

See all papers in Proc. ACL that mention bigram.

Back to top.

unigram

Appears in 9 sentences as: (1) unigram (9)
In Decipherment Complexity in 1:1 Substitution Ciphers
  1. In Section 4 we show that decipherment using a unigram language model corresponds to solving a linear sum assignment problem (LSAP).
    Page 1, “Introduction”
  2. Given a ciphertext ffv , we define the unigram count Nf off 6 Vf as1
    Page 2, “Definitions”
  3. Similarly, we define language model matrices S for the unigram and the bigram case.
    Page 3, “Definitions”
  4. The unigram language model Sf is defined as
    Page 3, “Definitions”
  5. 1"er 4 Decipherment Using Unigram LMs 4.1 Problem Definition
    Page 3, “Definitions”
  6. When using a unigram language model, Equation 2 simplifies to finding
    Page 3, “Definitions”
  7. Figure 1: Linear sum assignment problem for a cipher with V; = {a, 19,0}, Vf = {A, B, C}, unigram counts N f, and unigram probabilities 19(6).
    Page 3, “Definitions”
  8. However, since the matrix cij occurring for the decipherment using a unigram language model can be represented as the product cij 2 at, - bj the decipherment problem can be solved more easily: In the Section “Optimal Matching”, (Bauer, 2010) shows that in this case the optimal assignment is found by sorting the jobs jz- by at, and workers wj by bj and then assigning the jobs j,- to workers 10,-that have the same rank in the respective sorted lists.
    Page 3, “Definitions”
  9. We have shown the correspondence between solving 1:1 substitution ciphers and the linear sum assignment problem and the quadratic assignment problem: When using unigram language models, the decipherment problem is equivalent to the linear sum assignment problem and solvable in polynomial time.
    Page 7, “Definitions”

See all papers in Proc. ACL 2013 that mention unigram.

See all papers in Proc. ACL that mention unigram.

Back to top.

translation models

Appears in 3 sentences as: translation model (1) translation models (2)
In Decipherment Complexity in 1:1 Substitution Ciphers
  1. The decipherment approach for MT has recently gained popularity for training and adapting translation models using only monolingual data.
    Page 1, “Introduction”
  2. The general idea is to find those translation model parameters that maximize the probability of the translations of a given source text in a given language model of the target language.
    Page 1, “Introduction”
  3. (Ravi and Knight, 2011), (Nuhn et al., 2012), and (Dou and Knight, 2012) treat natural language translation as a deciphering problem including phenomena like reordering, insertion, and deletion and are able to train translation models using only monolingual data.
    Page 2, “Related Work”

See all papers in Proc. ACL 2013 that mention translation models.

See all papers in Proc. ACL that mention translation models.

Back to top.