Browse Source

Internal doc updates for object hash

pull/1284/head
Sami Vaarala 8 years ago
parent
commit
6f583677af
  1. 51
      doc/hobject-design.rst

51
doc/hobject-design.rst

@ -731,12 +731,12 @@ lookups::
| 0 | = 0xffffffffU
| UNUSED |
| UNUSED | DELETED = DUK_HOBJECT_HASHIDX_DELETED
+---------+ = 0xfffffffeU
| UNUSED | = 0xfffffffeU
+---------+
DELETED entries don't terminate hash
probe sequences, UNUSED entries do.
Here, e_size = 5, e_next = 3, h_size = 7.
Here, e_size = 5, e_next = 3, h_size = 8.
.. FIXME for some unknown reason the illustration breaks with pandoc
@ -815,8 +815,7 @@ Hash part details
The hash part maps a key ``K`` to an index ``I`` of the entry part or
indicates that ``K`` does not exist. The hash part uses a `closed hash
table`__, i.e. the hash table has a fixed size and a certain key has
multiple possible locations in a *probe sequence*. The current probe
sequence uses a variant of *double hashing*.
multiple possible locations in a *probe sequence*.
__ http://en.wikipedia.org/wiki/Hash_table#Open_addressing
@ -834,46 +833,18 @@ is either an index to the entry part, or one of two markers:
Hash table size (``h_size``) is selected relative to the maximum number
of inserted elements ``N`` (equal to ``e_size`` in practice) in two steps:
#. A temporary value ``T`` is selected relative to the number of entries,
as ``c * N`` where ``c`` is currently about 1.2.
#. ``T`` is rounded upwards to the closest prime from a pre-generated
list of primes with an approximately fixed prime-to-prime ratio.
+ The list of primes generated by ``genhashsizes.py``, and is encoded
in a bit packed format, decoded on the fly. See ``genhashsizes.py``
for details.
+ The fact that the hash table size is a prime simplifies probe sequence
handling: it is easy to select probe steps which are guaranteed to
cover all entries of the hash table.
#. Find lowest N so that ``2 ** N >= e_size``.
+ The ratio between successive primes is currently about 1.15.
As a result, the hash table size is about 1.2-1.4 times larger than
the maximum number of properties in the entry part. This implies a
maximum hash table load factor of about 72-83%.
+ The current minimum prime used is 17.
#. Use ``2 ** (N + 1)`` as hash size. This guarantees load factor is
lower than 0.5 after resize.
The probe sequence for a certain key is guaranteed to walk through every
hash table entry, and is generated as follows:
#. The initial hash index is computed directly from the string hash,
modulo hash table size as: ``I = string_hash % h_size``.
#. The probe step is then selected from a pre-generated table of 32
probe steps as: ``S = probe_steps[string_hash % 32]``.
+ The probe steps are is guaranteed to be non-zero and relatively prime
to all precomputed hash table size primes. See ``genhashsizes.py``.
hash table entry. Currently the probe sequence is simply:
+ Currently the precomputed steps are small primes which are not present
in the precomputed hash size primes list. Technically they don't need
to be primes (or small), as long as they are relatively prime to all
possible hash table sizes, i.e. ``gcd(S, h_size)=1``, to guarantee that
the probe sequence walks through all entries of the hash.
* ``(X + i) % h_size`` where i=0,1,...,h_size-1.
#. The probe sequence is: ``(X + i*S) % h_size`` where i=0,1,...h_size-1.
This isn't ideal for avoiding clustering (double hashing would be better)
but is cache friendly and works well enough with low load factors.
When looking up an element from the hash table, we walk through the probe
sequence looking at the hash table entries. If a UNUSED entry is found, the

Loading…
Cancel
Save