Packing a Packed Hash
by John Kormylo
E.L.F. Software
I have used two different algorithm for producing packed hashes.
One works quickly most of the time, but gives you nothing if a
perfect hash doesn't exist. The other is slower but will produce
a minimal hash under all circumstances.
The first method (packing) ostensively tries all possible ways to
pack the nodes into a given array, althought it usually gets a
solution on the first try. About the only thing clever about
this method is that it sorts "families" in order of difficulty
(number of children and the spread between min and max), and
packs them in order of decreasing difficulty. Also, it makes no
attempt to pack "only children," since they can be put anywhere.
The "families" are put into the first space in the array they can
be fit into (searching the gaps in previously added "families").
The second method (merging) takes pairs of families and forms
larger blocks by overlapping them. This continues until no two
blocks can be overlapped, or if all of the blocks will fit end-
to-end in a perfect hash. The clever part of this algorithm has
to do with the fact that the matrix of overlaps for all possible
pairs is often too large to fit into RAM (worst case 64K by 64K).
So instead I save a sorted list (in a binary tree) of the largest
N values which will fit into RAM.
Also, when two blocks are merged, one row and column of the
matrix were eliminated and another row and column had to be
recomputed. With the sorted list, as each entry is popped off
the top, it is eliminated or recomputed as indicated and put back
into the sorted tree. If the recomputed overlap is worse than
the worst pair currently in the list, the entry is thrown away.
When all the pairs in the list have either been merged or
eliminated, one computes all possible overlaps between the
remaining pairs of blocks to form a new list.
Based on empirical evidence, the larger a hash table is, the more
likely it is to have a perfect solution, in which case "packing"
is the best approach. With smaller hash tables, "packing" will
fail quickly. So only when "packing" has failed should "merging"
be attempted.