Encrypting Data

Encrypting Data Using Auto-Encoders

The overall procedure is as below.

Procedure

Training Set And Mapping

We generate a very long string (Around 1000 - 1500 words) for training. This is again generated by randomly shuffling characters from UTF-8 printable characters.

Code for same is below.

def sequence(word_length):
    # All utf-8 characters
    characters = [chr(i) for i in range(32, 123)]
    # Shuffle them
    random.shuffle(characters)
    # Add to string
    word = "".join(characters[:word_length])
    return word


def random_paragraph_generator():
    # Generate random number of words
    no_of_words = random.randint(1000, 1500)
    para = ""
    # Random Word length.
    for i in range(no_of_words):
        word_length = random.randint(1, 10)
        word = sequence(word_length)
        para += word
        para += " "

    # Encode in UTF-8 and return
    return para.encode("utf-8")

We now have a long paragraph. We simply one hot encode the paragraph.

Here our hashmap is slight hint of supervision to learn latent distribution.

To learn a latent distribution, we train encoder and decoder.

Encoder and Decoder

The inputs of encoder are one-hot encoded data, and is hinted with labels of hashmap. Thus we train encoder from one-hot encoded data to hashmap, learning a latent distribution (function approximation)

The outputs of encoder are passed as inputs to decoder. The decoder learns to reconstruct back the given plain text from latent distribution. The decoder outputs back one-hot encoded data, which is very simple to decode.

We implemented this logic in tensorflow.keras API.

Training

We train Encoder and Decoder with Mean Squared Error Loss, Adam Optimizer.

We keep a small learning rate around 1e-3.

Since there are not many gradients to compute, batch size is kept dynamic as per tensorflow using None.

Training both Encoder and Decoder Jointly takes around 3-4 mins over CPU and 2 mins over GPU.

Ensembling Networks

Once we have trained one set of enocder (encrypter) and decoder (decrypter). We can use similar configuration and train another. We trained 2 such set of encrypter and decrypters. Both had slightly different mapping, created by mapping algorithm. Encrypter, Decrypter small network had hashmap with encoding size 32 and a larger with encoding size 56. This allows us to create secure networks, by ensembling them.

Encrypting With Ensemble Networks

We have 2 sets of encrypters and decrypters this case. Now we can use each of them to encrypt new data, we can mark each of them with IDs We create a cryptographically secure random number which will allocate each ID to parts of long text data. Thus networks can encrypt the data allocated to them. The IDs allocated to them is used as a public key.