Compression as Encryption


All encryption methods (except the one time pad) are vulnerable to decryption by statistical means when used with ASCII plaintext. OTOH, even trivial encryption of highly compressed data cannot be broken without vast amounts of encrypted messages. Interestingly, cryptographers don't seem to spend much effort on compression.

The great failing of compression methods is that they don't do a very good job at the start of a message. A solution to this would be to pre-load the compression counts and dictionary using a known dataset before starting on the message. The more similar the dataset is in form and content to the message, the better the compression.

In fact, one can regard this dataset as a key to an encryption method, for without it de-compression is impossible. Any additional encryption is purely gratuitous.

In traditional cryptography terms, the length of the key is given by the log (base 2) of the total number of files from which the dataset was chosen. For TLS purposes, one could use any file on the web as a temporary key. (This has the advantage of making a brute force attack inherently slow.)


Another useful application of compression is in creating large random numbers from semi-random data. Specifically, "random" user input by keyboard or mouse is partly random and partly predictable. Data compression would remove most of the predictable component, especially if designed for that purpose.

Data compression normally has to be able to reconstuct the original data, but that is not an object here. So information regarding character counts and match lengths (which are somewhat predictable) need not be included. Instead, one would strive to approach the information theoretical limit.


See also LZA Compression.