Experiments in Encryption
                         by John Kormylo
                         E.L.F. Software


     A good pseudo-random number generator is also a good 
encryption method, in that neither is predictable, statistically 
biased, nor should they repeat within a reasonable period.
     The RC4 algorithm is an efficient and elegant approach with 
a theoretical period of 256 squared times 256 factorial bytes.  
Unfortunately, it suffers from short cycles.  The most common 
such case has a period of 256*255 bytes and there are 254! 
different such loops (1/64K odds).  There are many others as 
well, although the odds of finding them is astronomically small.
     We know they exist since they occur in the 2 and 3 bit 
versions of the algorithm.  Specifically, the 3 bit version has 
2,580,480 possible states which occur in 2 loops of 955,496, 4 
loops of 29,032, 2 loops of 9624, 8 loops of 4696, 8 loops of 
3008, 2 loops of 472, 2 loops of 264, 4 loops of 120, 720 loops 
of 56, 12 loops of 24, and 1 loop each of 322120, 53000, 44264, 
9432, 648 and 456 states
     I've tried a number of variations of the (3 bit) RC4 
algorithm, and the best so far has 20,643,840 states which occur 
in 2 loops 5,793,758, 4 loops of 1,672,280, 1 loop of 1,316,872, 
2 loops of 242,344, 1 loop of 194,216, 2 loops of 167,640, 2 
loops of 11,320, 2 loops of 1320, 4 loops of 1208, 8 loops of 
664, 2 loops of 184 and 2 loops of 168 states.  While this 
algorithm still suffers from short cycles, there are far fewer of 
them and they are generally larger than those for (3 bit) RC4.
     Since the algorithm is reversible (one can compute the 
previous state from the current one) all states must link 
together in loops, as opposed to having multiple branches feeding 
into a smaller loop (strange attractor).
     I suspect that the best PRNGs all suffer from short cycles.  
Any algorithm which hits every possible state in one loop is too 
methodical to be a good PRNG.  Ideally one would like an 
algorithm which consists of only a few very large loops.
     A sure fix for short cycles is to save the initial state and 
check for its re-occurrence, then perform some modification to 
kick it into a different loop.  However, the odds on finding a 
short loop are so small that such a test will almost certainly 
never be of any use.  On the other hand, an encryption algorithm 
that is ALMOST always secure is simply not acceptable.

                         Processing Keys

     First of all, no two keys should produce the same initial 
state, at least not until the key size begins to approach the 
number of states possible (256 factorial corresponds to 1684 bits 
or 211 bytes).  Secondly, the sort array should be well 
randomized before encryption begins.
     RC4 achieves this by using almost exactly the same algorithm 
for processing the key as for generating code, and by repeating 
the key for 256 bytes.  This forces every entry in the sort array
to be swapped at least once and on average twice.  (It also has 
the drawback that one of the variables is always 0 initially.)
     I tweaked the keying process to reduce the number of 
unreachable states and/or duplicate keys.  There are n^n possible 
keys and n^3 n! possible states.  For the 3 bit version (n=8) 
this comes to 20,643,840 states and 16,777,216 keys.  The final 
keying algorithm had 6,104,000 duplicate keys, which comes to 
14,539,840 unreachable states.
     Keys typically come in two varieties: one time keys which 
are long and generated by a PRNG, and short keys which are ASCII 
and memorised.  While one would like the process to work well for 
short keys, the main problem with short keys is that one can 
always perform an exhaustive search of all possible short keys in 
order to break the code.
     There are lots of ways to improve the performance for 
short keys.  One suggestion is to use the short key to create a 
long key by "encrypting" a given set of bytes.  At least that way 
one would have to repeat the exhaustive search for each 
application.

                             Output

     The sole purpose of the output stage is to hide the state.  
RC4 does this by adding two relatively random numbers together 
and running the sum (modulo 256) through the sort array again.  
Even if the same number appears several times in a row and 
assuming the corresponding sort array entry hasn't changed, there 
are still 256 possible combinations of the two quantities which 
could produce that number.
     Combining additional relatively random quantities would 
further reduce the amount of information revealed.  Also, there 
are other operations besides modulo addition one could use.  
The problem with exclusive or is that there is no "bit mixing."
Addition (or subtraction) includes bit mixing, but only in one 
direction.  A one bit rotate (either direction) combined with an 
exclusive OR produces equal uncertainty in every bit.

                         JK1 Encryption

     The following C code contains the 8 bit version of the JK1
algorithm (no testing).  Rotate() is an assembly language 
function to perform a bit rotation and exclusive or.

static unsigned char s[256];

void JK1code(
  short nkey,  /* number of bytes in key */
  void *key,   /* array or structure containing key */
  long ncode,  /* number of bytes to be encrypted */
  void *code)  /* data to be encrypted */
 {
  unsigned char i,j,k,l,incr,ii,jj,kk,*p;
  short n;

  i=j=k=l=0;
  for(n=0; n < 256; n++) s[n] = (unsigned char) n;

/* keying algorithm */

  for(p = key; nkey--; p++) {
    i += *p;
    j += s[i];
    k -= jj = s[j];      /* minus instead of plus */
    s[j] = kk = s[k];    /* swap s[j] and s[k] */
    s[k] = jj;
    l = Rotate(l,kk);    /* l = ((l << 1) | (l >> 7)) ^ kk */
   }
  incr = s[l] | 1;       /* must be odd */

/* coding algorithm */

  for(p = code; ncode--; p++) {
    i += incr;
    j += ii = s[i];
    l = Rotate(l,ii);
    k -= jj = s[j];      /* minus instead of plus */
    l = Rotate(l,jj);
    s[j] = kk = s[k];    /* swap s[j] and s[k] */
    s[k] = jj;
    l = Rotate(l,kk);
    *p ^= s[l];          /* exclusive OR code with "random" number */
   }
  return;
 }

     The primary reason for using a third index (k) was to avoid 
the most common short cycle for the RC4 algorithm, which occurs 
when j=i+1 and s[i]=1.  Because k decrements, one would need 
s[j]=255 as well as s[i]=1, and there is no way one swap can move
both numbers ahead one place.
     The addition of the forth variable (l) was to equalize 
uncertainty in the bits.  It does not contribute in any way to 
the length of the loops (except to disguise them).

                      Short Cycle Analysis

     At this point it might be useful to mention that any two 
states where j-i, k-i and s[n+i] for all n are the same, the 
s[i], s[j] and s[k] will repeat.  The output itself will not 
repeat, but one should be able to predict the output long before 
the cycle completes.  If the loop does not include such an 
occurrence, then there will exist 8 (or 256 for 8 bit) separate 
loops with the same period.  If such occurrences involve only 
multiples of 2 or 4 (etc.), then there will be the same number of 
loops with that period.
     In other words, if there are 2 loops of 168 states then the 
detectable period is only 42 states long.
     Studying the short cycles for JK1 reveals no patterns.  The 
algorithm cranks along in all its complexity until suddenly 
everything is back the way it was at the start (or offset 
slightly).  As a PRNG, there is nothing wrong with the short 
cycles except that they are short.
     Another interesting fact is that changing the increment for 
i to any other odd number produces the same number of loops with 
with exact same lengths.  However, the states that make up these 
new loops are not the same.  Experimenting with the 3 bit 
version, the larger loops for an increment of 1 overlapped all of 
the loops for an increment of 3, but the smaller loops only 
overlapped the larger loops.  The worst case was a loop of length 
1320 that overlapped a loop of length 11320.
     This could be useful when a short cycle is detected.  Since 
the vast number of states belong to the longer loops, almost any 
change to a state in a short loop will kick it into a larger 
loop.  However, for any particular change (even running the key 
through again) there is a state for which this change does 
nothing.  If instead one were to change the increment, then not 
only will one be in a larger loop, but one will be in a totally 
different loop than ANY of those using an increment of 1.
     In fact, I decided to go back and modify the keying 
algorithm to make the initial increment variable as well.  Since 
any odd value will do, why not take advantage of all the 
additional loops?

                             Testing

     I ran the 8 bit version starting from the default (no key) 
state for over three billion states without repeating.  While this 
is but a tiny fraction of the total number of states, it 
represented many days of solid computing (using my admittedly 
slow computer).
     While s(i) flunked badly, both s(j) and s(k) performed well  
using chi-squared and gap tests (at least for gaps up to about 10 
times 256).  A number of ways to combine these variables were 
tested, and all performed well.  The final choice was based more 
on intuition than objective performance.

                          Key Protocols

     It should be noted that two messages encrypted with the same 
key can be exclusive ORed with each other, removing the 
encryption and leaving two text messages exclusive ORed together 
(easily separated).
     One solution is to encrypt a one-time random key using the 
fixed key, then use the one-time key to encrypt the data. Random 
keys XORed together cannot be separated.  The cost is that one 
must include the encrypted key in the file (increasing its size).
     Alternatively, one could always perform data compression 
before encryption, removing the statistical information needed to 
separate the two messages.
     For small files, I like to append the CRC checksum to a 
fixed key, since it is usually included in the file header anyway 
(to verify decryption).

                           Signatures

     While RC4 is not generally used to produce signatures, both 
it and JK1 can produce them.  Specifically, the keying algorithm
can be used to process an entire file, and the resulting "random" 
numbers used for however large or small a signature is desired 
(although anything over 215 bytes is probably a waste).  The 
important thing is that the map from key to state and state to 
output is both sensitive and non-reversible.