Internal doc updates for object hash

8 years ago · 6f583677af
1 changed files with 11 additions and 40 deletions
--- a/doc/hobject-design.rst
+++ b/doc/hobject-design.rst
@ -731,12 +731,12 @@ lookups::
    | 0       |           = 0xffffffffU
    | UNUSED  |
    | UNUSED  |   DELETED = DUK_HOBJECT_HASHIDX_DELETED
-    +---------+           = 0xfffffffeU
- 
+    | UNUSED  |           = 0xfffffffeU
+    +---------+
                  DELETED entries don't terminate hash
                  probe sequences, UNUSED entries do.
 
-    Here, e_size = 5, e_next = 3, h_size = 7.
+    Here, e_size = 5, e_next = 3, h_size = 8.

 .. FIXME for some unknown reason the illustration breaks with pandoc

@ -815,8 +815,7 @@ Hash part details
 The hash part maps a key ``K`` to an index ``I`` of the entry part or
 indicates that ``K`` does not exist.  The hash part uses a `closed hash
 table`__, i.e. the hash table has a fixed size and a certain key has
-multiple possible locations in a *probe sequence*.  The current probe
-sequence uses a variant of *double hashing*.
+multiple possible locations in a *probe sequence*.

 __ http://en.wikipedia.org/wiki/Hash_table#Open_addressing

@ -834,46 +833,18 @@ is either an index to the entry part, or one of two markers:
 Hash table size (``h_size``) is selected relative to the maximum number
 of inserted elements ``N`` (equal to ``e_size`` in practice) in two steps:

-#. A temporary value ``T`` is selected relative to the number of entries,
-   as ``c * N`` where ``c`` is currently about 1.2.
-
-#. ``T`` is rounded upwards to the closest prime from a pre-generated
-   list of primes with an approximately fixed prime-to-prime ratio.
-
-   + The list of primes generated by ``genhashsizes.py``, and is encoded
-     in a bit packed format, decoded on the fly.  See ``genhashsizes.py``
-     for details.
-
-   + The fact that the hash table size is a prime simplifies probe sequence
-     handling: it is easy to select probe steps which are guaranteed to
-     cover all entries of the hash table.
+#. Find lowest N so that ``2 ** N >= e_size``.

-   + The ratio between successive primes is currently about 1.15.
-     As a result, the hash table size is about 1.2-1.4 times larger than
-     the maximum number of properties in the entry part.  This implies a
-     maximum hash table load factor of about 72-83%.
-
-   + The current minimum prime used is 17.
+#. Use ``2 ** (N + 1)`` as hash size.  This guarantees load factor is
+   lower than 0.5 after resize.

 The probe sequence for a certain key is guaranteed to walk through every
-hash table entry, and is generated as follows:
-
-#. The initial hash index is computed directly from the string hash,
-   modulo hash table size as: ``I = string_hash % h_size``.
-
-#. The probe step is then selected from a pre-generated table of 32
-   probe steps as: ``S = probe_steps[string_hash % 32]``.
-
-   + The probe steps are is guaranteed to be non-zero and relatively prime
-     to all precomputed hash table size primes.  See ``genhashsizes.py``.
+hash table entry.  Currently the probe sequence is simply:

-   + Currently the precomputed steps are small primes which are not present
-     in the precomputed hash size primes list.  Technically they don't need
-     to be primes (or small), as long as they are relatively prime to all
-     possible hash table sizes, i.e. ``gcd(S, h_size)=1``, to guarantee that
-     the probe sequence walks through all entries of the hash.
+* ``(X + i) % h_size`` where i=0,1,...,h_size-1.

-#. The probe sequence is: ``(X + i*S) % h_size`` where i=0,1,...h_size-1.
+This isn't ideal for avoiding clustering (double hashing would be better)
+but is cache friendly and works well enough with low load factors.

 When looking up an element from the hash table, we walk through the probe
 sequence looking at the hash table entries.  If a UNUSED entry is found, the