Experiments in Encryption
by John Kormylo
E.L.F. Software
A good pseudo-random number generator is also a good
encryption method, in that neither is predictable, statistically
biased, nor should they repeat within a reasonable period.
The RC4 algorithm is an efficient and elegant approach with
a theoretical period of 256 squared times 256 factorial bytes.
Unfortunately, it suffers from short cycles. The most common
such case has a period of 256*255 bytes and there are 254!
different such loops (1/64K odds). There are many others as
well, although the odds of finding them is astronomically small.
We know they exist since they occur in the 2 and 3 bit
versions of the algorithm. Specifically, the 3 bit version has
2,580,480 possible states which occur in 2 loops of 955,496, 4
loops of 29,032, 2 loops of 9624, 8 loops of 4696, 8 loops of
3008, 2 loops of 472, 2 loops of 264, 4 loops of 120, 720 loops
of 56, 12 loops of 24, and 1 loop each of 322120, 53000, 44264,
9432, 648 and 456 states
I've tried a number of variations of the (3 bit) RC4
algorithm, and the best so far has 20,643,840 states which occur
in 2 loops 5,793,758, 4 loops of 1,672,280, 1 loop of 1,316,872,
2 loops of 242,344, 1 loop of 194,216, 2 loops of 167,640, 2
loops of 11,320, 2 loops of 1320, 4 loops of 1208, 8 loops of
664, 2 loops of 184 and 2 loops of 168 states. While this
algorithm still suffers from short cycles, there are far fewer of
them and they are generally larger than those for (3 bit) RC4.
Since the algorithm is reversible (one can compute the
previous state from the current one) all states must link
together in loops, as opposed to having multiple branches feeding
into a smaller loop (strange attractor).
I suspect that the best PRNGs all suffer from short cycles.
Any algorithm which hits every possible state in one loop is too
methodical to be a good PRNG. Ideally one would like an
algorithm which consists of only a few very large loops.
A sure fix for short cycles is to save the initial state and
check for its re-occurrence, then perform some modification to
kick it into a different loop. However, the odds on finding a
short loop are so small that such a test will almost certainly
never be of any use. On the other hand, an encryption algorithm
that is ALMOST always secure is simply not acceptable.
Processing Keys
First of all, no two keys should produce the same initial
state, at least not until the key size begins to approach the
number of states possible (256 factorial corresponds to 1684 bits
or 211 bytes). Secondly, the sort array should be well
randomized before encryption begins.
RC4 achieves this by using almost exactly the same algorithm
for processing the key as for generating code, and by repeating
the key for 256 bytes. This forces every entry in the sort array
to be swapped at least once and on average twice. (It also has
the drawback that one of the variables is always 0 initially.)
I tweaked the keying process to reduce the number of
unreachable states and/or duplicate keys. There are n^n possible
keys and n^3 n! possible states. For the 3 bit version (n=8)
this comes to 20,643,840 states and 16,777,216 keys. The final
keying algorithm had 6,104,000 duplicate keys, which comes to
14,539,840 unreachable states.
Keys typically come in two varieties: one time keys which
are long and generated by a PRNG, and short keys which are ASCII
and memorised. While one would like the process to work well for
short keys, the main problem with short keys is that one can
always perform an exhaustive search of all possible short keys in
order to break the code.
There are lots of ways to improve the performance for
short keys. One suggestion is to use the short key to create a
long key by "encrypting" a given set of bytes. At least that way
one would have to repeat the exhaustive search for each
application.
Output
The sole purpose of the output stage is to hide the state.
RC4 does this by adding two relatively random numbers together
and running the sum (modulo 256) through the sort array again.
Even if the same number appears several times in a row and
assuming the corresponding sort array entry hasn't changed, there
are still 256 possible combinations of the two quantities which
could produce that number.
Combining additional relatively random quantities would
further reduce the amount of information revealed. Also, there
are other operations besides modulo addition one could use.
The problem with exclusive or is that there is no "bit mixing."
Addition (or subtraction) includes bit mixing, but only in one
direction. A one bit rotate (either direction) combined with an
exclusive OR produces equal uncertainty in every bit.
JK1 Encryption
The following C code contains the 8 bit version of the JK1
algorithm (no testing). Rotate() is an assembly language
function to perform a bit rotation and exclusive or.
static unsigned char s[256];
void JK1code(
short nkey, /* number of bytes in key */
void *key, /* array or structure containing key */
long ncode, /* number of bytes to be encrypted */
void *code) /* data to be encrypted */
{
unsigned char i,j,k,l,incr,ii,jj,kk,*p;
short n;
i=j=k=l=0;
for(n=0; n < 256; n++) s[n] = (unsigned char) n;
/* keying algorithm */
for(p = key; nkey--; p++) {
i += *p;
j += s[i];
k -= jj = s[j]; /* minus instead of plus */
s[j] = kk = s[k]; /* swap s[j] and s[k] */
s[k] = jj;
l = Rotate(l,kk); /* l = ((l << 1) | (l >> 7)) ^ kk */
}
incr = s[l] | 1; /* must be odd */
/* coding algorithm */
for(p = code; ncode--; p++) {
i += incr;
j += ii = s[i];
l = Rotate(l,ii);
k -= jj = s[j]; /* minus instead of plus */
l = Rotate(l,jj);
s[j] = kk = s[k]; /* swap s[j] and s[k] */
s[k] = jj;
l = Rotate(l,kk);
*p ^= s[l]; /* exclusive OR code with "random" number */
}
return;
}
The primary reason for using a third index (k) was to avoid
the most common short cycle for the RC4 algorithm, which occurs
when j=i+1 and s[i]=1. Because k decrements, one would need
s[j]=255 as well as s[i]=1, and there is no way one swap can move
both numbers ahead one place.
The addition of the forth variable (l) was to equalize
uncertainty in the bits. It does not contribute in any way to
the length of the loops (except to disguise them).
Short Cycle Analysis
At this point it might be useful to mention that any two
states where j-i, k-i and s[n+i] for all n are the same, the
s[i], s[j] and s[k] will repeat. The output itself will not
repeat, but one should be able to predict the output long before
the cycle completes. If the loop does not include such an
occurrence, then there will exist 8 (or 256 for 8 bit) separate
loops with the same period. If such occurrences involve only
multiples of 2 or 4 (etc.), then there will be the same number of
loops with that period.
In other words, if there are 2 loops of 168 states then the
detectable period is only 42 states long.
Studying the short cycles for JK1 reveals no patterns. The
algorithm cranks along in all its complexity until suddenly
everything is back the way it was at the start (or offset
slightly). As a PRNG, there is nothing wrong with the short
cycles except that they are short.
Another interesting fact is that changing the increment for
i to any other odd number produces the same number of loops with
with exact same lengths. However, the states that make up these
new loops are not the same. Experimenting with the 3 bit
version, the larger loops for an increment of 1 overlapped all of
the loops for an increment of 3, but the smaller loops only
overlapped the larger loops. The worst case was a loop of length
1320 that overlapped a loop of length 11320.
This could be useful when a short cycle is detected. Since
the vast number of states belong to the longer loops, almost any
change to a state in a short loop will kick it into a larger
loop. However, for any particular change (even running the key
through again) there is a state for which this change does
nothing. If instead one were to change the increment, then not
only will one be in a larger loop, but one will be in a totally
different loop than ANY of those using an increment of 1.
In fact, I decided to go back and modify the keying
algorithm to make the initial increment variable as well. Since
any odd value will do, why not take advantage of all the
additional loops?
Testing
I ran the 8 bit version starting from the default (no key)
state for over three billion states without repeating. While this
is but a tiny fraction of the total number of states, it
represented many days of solid computing (using my admittedly
slow computer).
While s(i) flunked badly, both s(j) and s(k) performed well
using chi-squared and gap tests (at least for gaps up to about 10
times 256). A number of ways to combine these variables were
tested, and all performed well. The final choice was based more
on intuition than objective performance.
Key Protocols
It should be noted that two messages encrypted with the same
key can be exclusive ORed with each other, removing the
encryption and leaving two text messages exclusive ORed together
(easily separated).
One solution is to encrypt a one-time random key using the
fixed key, then use the one-time key to encrypt the data. Random
keys XORed together cannot be separated. The cost is that one
must include the encrypted key in the file (increasing its size).
Alternatively, one could always perform data compression
before encryption, removing the statistical information needed to
separate the two messages.
For small files, I like to append the CRC checksum to a
fixed key, since it is usually included in the file header anyway
(to verify decryption).
Signatures
While RC4 is not generally used to produce signatures, both
it and JK1 can produce them. Specifically, the keying algorithm
can be used to process an entire file, and the resulting "random"
numbers used for however large or small a signature is desired
(although anything over 215 bytes is probably a waste). The
important thing is that the map from key to state and state to
output is both sensitive and non-reversible.