Experiments in Encryption by John Kormylo E.L.F. Software A good pseudo-random number generator is also a good encryption method, in that neither is predictable, statistically biased, nor should they repeat within a reasonable period. The RC4 algorithm is an efficient and elegant approach with a theoretical period of 256 squared times 256 factorial bytes. Unfortunately, it suffers from short cycles. The most common such case has a period of 256*255 bytes and there are 254! different such loops (1/64K odds). There are many others as well, although the odds of finding them is astronomically small. We know they exist since they occur in the 2 and 3 bit versions of the algorithm. Specifically, the 3 bit version has 2,580,480 possible states which occur in 2 loops of 955,496, 4 loops of 29,032, 2 loops of 9624, 8 loops of 4696, 8 loops of 3008, 2 loops of 472, 2 loops of 264, 4 loops of 120, 720 loops of 56, 12 loops of 24, and 1 loop each of 322120, 53000, 44264, 9432, 648 and 456 states I've tried a number of variations of the (3 bit) RC4 algorithm, and the best so far has 20,643,840 states which occur in 2 loops 5,793,758, 4 loops of 1,672,280, 1 loop of 1,316,872, 2 loops of 242,344, 1 loop of 194,216, 2 loops of 167,640, 2 loops of 11,320, 2 loops of 1320, 4 loops of 1208, 8 loops of 664, 2 loops of 184 and 2 loops of 168 states. While this algorithm still suffers from short cycles, there are far fewer of them and they are generally larger than those for (3 bit) RC4. Since the algorithm is reversible (one can compute the previous state from the current one) all states must link together in loops, as opposed to having multiple branches feeding into a smaller loop (strange attractor). I suspect that the best PRNGs all suffer from short cycles. Any algorithm which hits every possible state in one loop is too methodical to be a good PRNG. Ideally one would like an algorithm which consists of only a few very large loops. A sure fix for short cycles is to save the initial state and check for its re-occurrence, then perform some modification to kick it into a different loop. However, the odds on finding a short loop are so small that such a test will almost certainly never be of any use. On the other hand, an encryption algorithm that is ALMOST always secure is simply not acceptable. Processing Keys First of all, no two keys should produce the same initial state, at least not until the key size begins to approach the number of states possible (256 factorial corresponds to 1684 bits or 211 bytes). Secondly, the sort array should be well randomized before encryption begins. RC4 achieves this by using almost exactly the same algorithm for processing the key as for generating code, and by repeating the key for 256 bytes. This forces every entry in the sort array to be swapped at least once and on average twice. (It also has the drawback that one of the variables is always 0 initially.) I tweaked the keying process to reduce the number of unreachable states and/or duplicate keys. There are n^n possible keys and n^3 n! possible states. For the 3 bit version (n=8) this comes to 20,643,840 states and 16,777,216 keys. The final keying algorithm had 6,104,000 duplicate keys, which comes to 14,539,840 unreachable states. Keys typically come in two varieties: one time keys which are long and generated by a PRNG, and short keys which are ASCII and memorised. While one would like the process to work well for short keys, the main problem with short keys is that one can always perform an exhaustive search of all possible short keys in order to break the code. There are lots of ways to improve the performance for short keys. One suggestion is to use the short key to create a long key by "encrypting" a given set of bytes. At least that way one would have to repeat the exhaustive search for each application. Output The sole purpose of the output stage is to hide the state. RC4 does this by adding two relatively random numbers together and running the sum (modulo 256) through the sort array again. Even if the same number appears several times in a row and assuming the corresponding sort array entry hasn't changed, there are still 256 possible combinations of the two quantities which could produce that number. Combining additional relatively random quantities would further reduce the amount of information revealed. Also, there are other operations besides modulo addition one could use. The problem with exclusive or is that there is no "bit mixing." Addition (or subtraction) includes bit mixing, but only in one direction. A one bit rotate (either direction) combined with an exclusive OR produces equal uncertainty in every bit. JK1 Encryption The following C code contains the 8 bit version of the JK1 algorithm (no testing). Rotate() is an assembly language function to perform a bit rotation and exclusive or. static unsigned char s[256]; void JK1code( short nkey, /* number of bytes in key */ void *key, /* array or structure containing key */ long ncode, /* number of bytes to be encrypted */ void *code) /* data to be encrypted */ { unsigned char i,j,k,l,incr,ii,jj,kk,*p; short n; i=j=k=l=0; for(n=0; n < 256; n++) s[n] = (unsigned char) n; /* keying algorithm */ for(p = key; nkey--; p++) { i += *p; j += s[i]; k -= jj = s[j]; /* minus instead of plus */ s[j] = kk = s[k]; /* swap s[j] and s[k] */ s[k] = jj; l = Rotate(l,kk); /* l = ((l << 1) | (l >> 7)) ^ kk */ } incr = s[l] | 1; /* must be odd */ /* coding algorithm */ for(p = code; ncode--; p++) { i += incr; j += ii = s[i]; l = Rotate(l,ii); k -= jj = s[j]; /* minus instead of plus */ l = Rotate(l,jj); s[j] = kk = s[k]; /* swap s[j] and s[k] */ s[k] = jj; l = Rotate(l,kk); *p ^= s[l]; /* exclusive OR code with "random" number */ } return; } The primary reason for using a third index (k) was to avoid the most common short cycle for the RC4 algorithm, which occurs when j=i+1 and s[i]=1. Because k decrements, one would need s[j]=255 as well as s[i]=1, and there is no way one swap can move both numbers ahead one place. The addition of the forth variable (l) was to equalize uncertainty in the bits. It does not contribute in any way to the length of the loops (except to disguise them). Short Cycle Analysis At this point it might be useful to mention that any two states where j-i, k-i and s[n+i] for all n are the same, the s[i], s[j] and s[k] will repeat. The output itself will not repeat, but one should be able to predict the output long before the cycle completes. If the loop does not include such an occurrence, then there will exist 8 (or 256 for 8 bit) separate loops with the same period. If such occurrences involve only multiples of 2 or 4 (etc.), then there will be the same number of loops with that period. In other words, if there are 2 loops of 168 states then the detectable period is only 42 states long. Studying the short cycles for JK1 reveals no patterns. The algorithm cranks along in all its complexity until suddenly everything is back the way it was at the start (or offset slightly). As a PRNG, there is nothing wrong with the short cycles except that they are short. Another interesting fact is that changing the increment for i to any other odd number produces the same number of loops with with exact same lengths. However, the states that make up these new loops are not the same. Experimenting with the 3 bit version, the larger loops for an increment of 1 overlapped all of the loops for an increment of 3, but the smaller loops only overlapped the larger loops. The worst case was a loop of length 1320 that overlapped a loop of length 11320. This could be useful when a short cycle is detected. Since the vast number of states belong to the longer loops, almost any change to a state in a short loop will kick it into a larger loop. However, for any particular change (even running the key through again) there is a state for which this change does nothing. If instead one were to change the increment, then not only will one be in a larger loop, but one will be in a totally different loop than ANY of those using an increment of 1. In fact, I decided to go back and modify the keying algorithm to make the initial increment variable as well. Since any odd value will do, why not take advantage of all the additional loops? Testing I ran the 8 bit version starting from the default (no key) state for over three billion states without repeating. While this is but a tiny fraction of the total number of states, it represented many days of solid computing (using my admittedly slow computer). While s(i) flunked badly, both s(j) and s(k) performed well using chi-squared and gap tests (at least for gaps up to about 10 times 256). A number of ways to combine these variables were tested, and all performed well. The final choice was based more on intuition than objective performance. Key Protocols It should be noted that two messages encrypted with the same key can be exclusive ORed with each other, removing the encryption and leaving two text messages exclusive ORed together (easily separated). One solution is to encrypt a one-time random key using the fixed key, then use the one-time key to encrypt the data. Random keys XORed together cannot be separated. The cost is that one must include the encrypted key in the file (increasing its size). Alternatively, one could always perform data compression before encryption, removing the statistical information needed to separate the two messages. For small files, I like to append the CRC checksum to a fixed key, since it is usually included in the file header anyway (to verify decryption). Signatures While RC4 is not generally used to produce signatures, both it and JK1 can produce them. Specifically, the keying algorithm can be used to process an entire file, and the resulting "random" numbers used for however large or small a signature is desired (although anything over 215 bytes is probably a waste). The important thing is that the map from key to state and state to output is both sensitive and non-reversible.