|
|
@ -731,12 +731,12 @@ lookups:: |
|
|
|
| 0 | = 0xffffffffU |
|
|
|
| UNUSED | |
|
|
|
| UNUSED | DELETED = DUK_HOBJECT_HASHIDX_DELETED |
|
|
|
+---------+ = 0xfffffffeU |
|
|
|
|
|
|
|
| UNUSED | = 0xfffffffeU |
|
|
|
+---------+ |
|
|
|
DELETED entries don't terminate hash |
|
|
|
probe sequences, UNUSED entries do. |
|
|
|
|
|
|
|
Here, e_size = 5, e_next = 3, h_size = 7. |
|
|
|
Here, e_size = 5, e_next = 3, h_size = 8. |
|
|
|
|
|
|
|
.. FIXME for some unknown reason the illustration breaks with pandoc |
|
|
|
|
|
|
@ -815,8 +815,7 @@ Hash part details |
|
|
|
The hash part maps a key ``K`` to an index ``I`` of the entry part or |
|
|
|
indicates that ``K`` does not exist. The hash part uses a `closed hash |
|
|
|
table`__, i.e. the hash table has a fixed size and a certain key has |
|
|
|
multiple possible locations in a *probe sequence*. The current probe |
|
|
|
sequence uses a variant of *double hashing*. |
|
|
|
multiple possible locations in a *probe sequence*. |
|
|
|
|
|
|
|
__ http://en.wikipedia.org/wiki/Hash_table#Open_addressing |
|
|
|
|
|
|
@ -834,46 +833,18 @@ is either an index to the entry part, or one of two markers: |
|
|
|
Hash table size (``h_size``) is selected relative to the maximum number |
|
|
|
of inserted elements ``N`` (equal to ``e_size`` in practice) in two steps: |
|
|
|
|
|
|
|
#. A temporary value ``T`` is selected relative to the number of entries, |
|
|
|
as ``c * N`` where ``c`` is currently about 1.2. |
|
|
|
|
|
|
|
#. ``T`` is rounded upwards to the closest prime from a pre-generated |
|
|
|
list of primes with an approximately fixed prime-to-prime ratio. |
|
|
|
|
|
|
|
+ The list of primes generated by ``genhashsizes.py``, and is encoded |
|
|
|
in a bit packed format, decoded on the fly. See ``genhashsizes.py`` |
|
|
|
for details. |
|
|
|
|
|
|
|
+ The fact that the hash table size is a prime simplifies probe sequence |
|
|
|
handling: it is easy to select probe steps which are guaranteed to |
|
|
|
cover all entries of the hash table. |
|
|
|
#. Find lowest N so that ``2 ** N >= e_size``. |
|
|
|
|
|
|
|
+ The ratio between successive primes is currently about 1.15. |
|
|
|
As a result, the hash table size is about 1.2-1.4 times larger than |
|
|
|
the maximum number of properties in the entry part. This implies a |
|
|
|
maximum hash table load factor of about 72-83%. |
|
|
|
|
|
|
|
+ The current minimum prime used is 17. |
|
|
|
#. Use ``2 ** (N + 1)`` as hash size. This guarantees load factor is |
|
|
|
lower than 0.5 after resize. |
|
|
|
|
|
|
|
The probe sequence for a certain key is guaranteed to walk through every |
|
|
|
hash table entry, and is generated as follows: |
|
|
|
|
|
|
|
#. The initial hash index is computed directly from the string hash, |
|
|
|
modulo hash table size as: ``I = string_hash % h_size``. |
|
|
|
|
|
|
|
#. The probe step is then selected from a pre-generated table of 32 |
|
|
|
probe steps as: ``S = probe_steps[string_hash % 32]``. |
|
|
|
|
|
|
|
+ The probe steps are is guaranteed to be non-zero and relatively prime |
|
|
|
to all precomputed hash table size primes. See ``genhashsizes.py``. |
|
|
|
hash table entry. Currently the probe sequence is simply: |
|
|
|
|
|
|
|
+ Currently the precomputed steps are small primes which are not present |
|
|
|
in the precomputed hash size primes list. Technically they don't need |
|
|
|
to be primes (or small), as long as they are relatively prime to all |
|
|
|
possible hash table sizes, i.e. ``gcd(S, h_size)=1``, to guarantee that |
|
|
|
the probe sequence walks through all entries of the hash. |
|
|
|
* ``(X + i) % h_size`` where i=0,1,...,h_size-1. |
|
|
|
|
|
|
|
#. The probe sequence is: ``(X + i*S) % h_size`` where i=0,1,...h_size-1. |
|
|
|
This isn't ideal for avoiding clustering (double hashing would be better) |
|
|
|
but is cache friendly and works well enough with low load factors. |
|
|
|
|
|
|
|
When looking up an element from the hash table, we walk through the probe |
|
|
|
sequence looking at the hash table entries. If a UNUSED entry is found, the |
|
|
|