Discussion on double-to-fastint conversion check

10 years ago · 483a82ce9b
1 changed files with 269 additions and 0 deletions
--- a/doc/tagged-integer-type.rst
+++ b/doc/tagged-integer-type.rst
@ -199,3 +199,272 @@ In addition to these, user code may have some practical dependencies, such as:
  Signed 41.6 fixed point provides a fractional increment of 0.015625;
  for the scheduler, this would mean about 15.6ms resolution, which is not
  that great.
+
+Efficient check for double-to-fastint conversion
+================================================
+
+Criteria
+--------
+
+For an IEEE double to be representable as a fast integer, it must be:
+
+* A whole number
+
+* In the 48-bit range
+
+* Not a negative zero, assuming that the integer zero is taken to represent
+  a positive zero
+
+What to optimize for
+--------------------
+
+This algorithm is needed when Duktape:
+
+* Parses a number and checks whether to represent the number as a double or
+  a fastint
+
+* Executes internal code with no fastint handling; in this case any fastint
+  inputs are first coerced to doubles and then back to fastints if the result
+  fits
+
+* Executes internal code with fastint handling, with one or more of the
+  inputs not matching the fastint "fast path" but the result possibly fitting
+  into a fastint
+
+The "fast path" for fastint operations doesn't execute this algorithm because
+both inputs and outputs are fastints and Duktape detects this in the fast path
+preconditions.  Given this, an aggressive memory-speed tradeoff (e.g. a table
+for each exponent) doesn't make sense.
+
+The speed of this algorithm affects two scenarios:
+
+1. Computations where the numbers involved are outside the fastint range.  Here
+   it's important to quickly determine that a fastint representation is not
+   possible.
+
+2. Computations where the numbers can be represented as fastints (at least some
+   of the time), but one or more operations don't have a fastint "fast path" so
+   that the numbers get upgraded to an IEEE double and then need to be downgraded
+   back to a fastint.
+
+Both cases matter, but for typical embedded code the latter case matters more.
+In other words, the code should be optimized for the case where a fastint fit
+is possible.
+
+Exponent and sign by cases
+--------------------------
+
+An IEEE double has a sign (1 bit), an exponent (11 bits), and a 52-bit stored
+mantissa.  The mantissa has an implicit (not stored) leading '1' digit, except
+for denormals, NaNs, and infinities.
+
+Going through the possible exponent values:
+
+* If exponent is 0:
+
+  - The number is a fastint only if the sign bit is zero (positive) and the
+    entire mantissa is all zeroes.  This corresponds to +0.
+
+  - If the mantissa is non-zero, the number is a denormal.
+
+* If the exponent is in the range [1, 1022] the number is not a fastint
+  because the implicit mantissa bit corresponds to the number 0.5.
+
+* If exponent is exactly 1023:
+
+  - The number is only a fastint if the stored mantissa is all zeroes.
+    This corresponds to +/- 1.
+
+* If exponent is exactly 1024:
+
+  - The number is only a fastint if 51 lowest bits of the mantissa are all
+    zeroes.  This corresponds to the numbers +/- 2 and +/- 3.
+
+* Generalizing, if the exponent is in the range [1023,1069], the number is
+  a fastint if and only if:
+
+  - The lowest N bits of the mantissa are zero, where N = 52 - (exp - 1023),
+    with either sign.
+
+  - N can also be expressed as: N = 1075 - exp.
+
+* If exponent is exactly 1070:
+
+  - The number is only a fastint if the sign bit is set (negative) and the
+    stored mantissa is all zeroes.  This corresponds to -2^47.  The positive
+    counterpart +2^47 does not fit into the fastint range.
+
+* If exponent is [1071,2047] the number is never a fastint:
+
+  - For exponents [1071,2046] the number is too large to be a fastint.
+
+  - For exponent 2047 the number is a NaN or infinity depending on the
+    mantissa contents, neither a valid fastint.
+
+Pseudocode 1
+------------
+
+The algorithm::
+
+    is_fastint(sgn, exp, mant):
+        if exp == 0:
+            return sign == 0 and mzero(mant, 52)
+        else if exp < 1023:
+            return false
+        else if exp < 1070:
+            return mzero(mant, 1075 - exp)
+        else if exp == 1070:
+            return sign == 1 and mzero(mant, 52)
+        else:
+            return false
+
+The ``mzero`` helper predicate returns true if the mantissa given has its
+lowest ``n`` bits zero.
+
+Non-zero integers in the fastint range will fall into the case where a certain
+computed number of low mantissa bits must be checked to be zero.  As discussed
+above, the algorithm should be optimized for the "input fits fastint" case.
+
+Pseudocode 2
+------------
+
+Some rewriting::
+
+    is_fastint(sgn, exp, mant):
+        nzero = 1075 - exp
+        if nzero >= 52 and nzero <= 6:  // exp 1023 ... exp 1069
+            // exponents 1023 to 1069: regular handling, common case
+            return mzero(mant, nzero)
+        else if nzero == 1075:
+            // exponent 0: irregular handling, but still common (positive zero)
+            return sign == 0 and mzero(mant, 52)
+        else if nzero == 5:
+            // exponent 1070: irregular handling, rare case
+            return sign == 1 and mzero(mant, 52)
+        else:
+            // exponents [1,1022] and [1071,2047], rare case
+            return false
+
+C algorithm with a lookup table
+-------------------------------
+
+The common case ``nzero`` values are between [6, 52] and correspond to
+mantissa masks.  Compute a mask index instead as nzero - 6 = 1069 - exp::
+
+    duk_uint64_t mzero_masks[47] = {
+        0x000000000000003fULL,  /* exp 1069, nzero 6 */
+        0x000000000000007fULL,  /* exp 1068, nzero 7 */
+        0x00000000000000ffULL,  /* exp 1067, nzero 8 */
+        0x00000000000001ffULL,  /* exp 1066, nzero 9 */
+        /* ... */
+        0x0003ffffffffffffULL,  /* exp 1025, nzero 50 */
+        0x0007ffffffffffffULL,  /* exp 1024, nzero 51 */
+        0x000fffffffffffffULL,  /* exp 1023, nzero 52 */
+    };
+
+    int is_fastint(duk_int64_t d) {
+        int exp = (d >> 52) & 0x07ff;
+        int idx = 1069 - exp;
+
+        if (idx >= 0 && idx <= 46) {  /* exponents 1069 to 1023 */
+            return (mzero_masks[idx] & mant) == 0;
+        } else if (idx == 1069) {  /* exponent 0 */
+            return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0);
+        } else if (idx == -1) {  /* exponent 1070 */
+            return (d < 0) && ((d & 0x000fffffffffffffULL) == 0);
+        } else {
+            return 0;
+        }
+    };
+
+The memory cost of the mask table is 8x47 = 376 bytes.  This can be halved
+e.g. by using a table of 32-bit values with separate cases for nzero >= 32
+and nzero < 32.
+
+Unfortunately the expected case (exponents 1023 to 1069) involves a mask
+check with a variable mask, so it may be unsuitable for direct inlining in
+the most important hot spots.
+
+C algorithm with a computed mask
+--------------------------------
+
+Since this algorithm only runs outside the proper fastint "fast path" it
+may be more sensible to avoid a memory tradeoff and compute the masks::
+
+    int is_fastint(duk_int64_t d) {
+        int exp = (d >> 52) & 0x07ff;
+        int shift = exp - 1023;
+
+        if (shift >= 0 && shift <= 46) {  /* exponents 1023 to 1069 */
+            return ((0x000fffffffffffffULL >> shift) & mant) == 0;
+        } else if (shift == -1023) {  /* exponent 0 */
+            return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0);
+        } else if (shift == 47) {  /* exponent 1070 */
+            return (d < 0) && ((d & 0x000fffffffffffffULL) == 0);
+        } else {
+            return 0;
+        }
+    };
+
+For middle endian machines (ARM) this algorithm first needs swapping
+of the 32-bit parts.  By changing the mask checks to operate on 32-bit
+parts the algorithm would work on more platforms and would also remove
+the need for swapping the parts on middle endian platforms.
+
+C algorithm with 32-bit operations and a computed mask
+------------------------------------------------------
+
+::
+
+    int is_fastint(duk_uint32_t hi, duk_uint32_t lo) {
+        int exp = (hi >> 20) & 0x07ff;
+        int shift = exp - 1023;
+
+        if (shift >= 0 && shift <= 46) {  /* exponents 1023 to 1069 */
+            if (shift <= 20) {
+                /* 0x000fffff'ffffffff -> 0x00000000'ffffffff */
+                return (((0x000fffffUL >> shift) & hi) == 0) && (lo == 0);
+            } else {
+                /* 0x00000000'ffffffff -> 0x00000000'0000003f */
+                return (((0xffffffffUL >> (shift - 20)) & lo) == 0);
+            }
+        } else if (shift == -1023) {  /* exponent 0 */
+            /* return ((hi & 0x800fffffUL) == 0x00000000UL) && (lo == 0); */
+            return (hi == 0) && (lo == 0);
+        } else if (shift == 47) {  /* exponent 1070 */
+            return ((hi & 0x800fffffUL) == 0x80000000UL) && (lo == 0);
+        } else {
+            return 0;
+        }
+    };
+
+
+Future work
+===========
+
+Skipping the double-to-fastint test sometimes
+---------------------------------------------
+
+The double-to-fastint can safely err on the side of caution and decide to
+represent a fastint-compatible number as a double.  This opens up the
+possibility of skipping the double-to-fastint test in some cases which
+may improve performance and reduce code size.
+
+For instance, when ``Math.cos()`` pushes its result on the stack, it's
+probably quite a safe bet that the number won't fit a fastint, so it could
+be written as a double directly without a double-to-fastint downgrade
+check.  In case it is a fastint (-1, 0, or 1) it will be represented as a
+double but will be downgraded to a fastint by the first operation that
+does execute the downgrade check.  To support this, there could be a macro
+like ``DUK_TVAL_SET_NUMBER_NOFASTINT``.
+
+Another option is to run the double-to-fastint check randomly or e.g. only
+every Nth time it is needed (N could be quite large, e.g. the prime 17).
+This should be quite OK from a performance point of view.  If a number is
+incorrectly stored as a double and is involved in a lot of operations,
+chances are it will get downgraded quite quickly, as long as the check
+interval does not unluckily correlate with the downgrade check frequency.
+This approach may not be worth it because an optimized fastint downgrade
+check should have quite reasonable performance, and such an approach would
+have no effect on the actual fastint fast path (inputs are fastints,
+outputs are fastints).