diff --git a/doc/tagged-integer-type.rst b/doc/tagged-integer-type.rst index c0542a0c..71e676aa 100644 --- a/doc/tagged-integer-type.rst +++ b/doc/tagged-integer-type.rst @@ -199,3 +199,272 @@ In addition to these, user code may have some practical dependencies, such as: Signed 41.6 fixed point provides a fractional increment of 0.015625; for the scheduler, this would mean about 15.6ms resolution, which is not that great. + +Efficient check for double-to-fastint conversion +================================================ + +Criteria +-------- + +For an IEEE double to be representable as a fast integer, it must be: + +* A whole number + +* In the 48-bit range + +* Not a negative zero, assuming that the integer zero is taken to represent + a positive zero + +What to optimize for +-------------------- + +This algorithm is needed when Duktape: + +* Parses a number and checks whether to represent the number as a double or + a fastint + +* Executes internal code with no fastint handling; in this case any fastint + inputs are first coerced to doubles and then back to fastints if the result + fits + +* Executes internal code with fastint handling, with one or more of the + inputs not matching the fastint "fast path" but the result possibly fitting + into a fastint + +The "fast path" for fastint operations doesn't execute this algorithm because +both inputs and outputs are fastints and Duktape detects this in the fast path +preconditions. Given this, an aggressive memory-speed tradeoff (e.g. a table +for each exponent) doesn't make sense. + +The speed of this algorithm affects two scenarios: + +1. Computations where the numbers involved are outside the fastint range. Here + it's important to quickly determine that a fastint representation is not + possible. + +2. Computations where the numbers can be represented as fastints (at least some + of the time), but one or more operations don't have a fastint "fast path" so + that the numbers get upgraded to an IEEE double and then need to be downgraded + back to a fastint. + +Both cases matter, but for typical embedded code the latter case matters more. +In other words, the code should be optimized for the case where a fastint fit +is possible. + +Exponent and sign by cases +-------------------------- + +An IEEE double has a sign (1 bit), an exponent (11 bits), and a 52-bit stored +mantissa. The mantissa has an implicit (not stored) leading '1' digit, except +for denormals, NaNs, and infinities. + +Going through the possible exponent values: + +* If exponent is 0: + + - The number is a fastint only if the sign bit is zero (positive) and the + entire mantissa is all zeroes. This corresponds to +0. + + - If the mantissa is non-zero, the number is a denormal. + +* If the exponent is in the range [1, 1022] the number is not a fastint + because the implicit mantissa bit corresponds to the number 0.5. + +* If exponent is exactly 1023: + + - The number is only a fastint if the stored mantissa is all zeroes. + This corresponds to +/- 1. + +* If exponent is exactly 1024: + + - The number is only a fastint if 51 lowest bits of the mantissa are all + zeroes. This corresponds to the numbers +/- 2 and +/- 3. + +* Generalizing, if the exponent is in the range [1023,1069], the number is + a fastint if and only if: + + - The lowest N bits of the mantissa are zero, where N = 52 - (exp - 1023), + with either sign. + + - N can also be expressed as: N = 1075 - exp. + +* If exponent is exactly 1070: + + - The number is only a fastint if the sign bit is set (negative) and the + stored mantissa is all zeroes. This corresponds to -2^47. The positive + counterpart +2^47 does not fit into the fastint range. + +* If exponent is [1071,2047] the number is never a fastint: + + - For exponents [1071,2046] the number is too large to be a fastint. + + - For exponent 2047 the number is a NaN or infinity depending on the + mantissa contents, neither a valid fastint. + +Pseudocode 1 +------------ + +The algorithm:: + + is_fastint(sgn, exp, mant): + if exp == 0: + return sign == 0 and mzero(mant, 52) + else if exp < 1023: + return false + else if exp < 1070: + return mzero(mant, 1075 - exp) + else if exp == 1070: + return sign == 1 and mzero(mant, 52) + else: + return false + +The ``mzero`` helper predicate returns true if the mantissa given has its +lowest ``n`` bits zero. + +Non-zero integers in the fastint range will fall into the case where a certain +computed number of low mantissa bits must be checked to be zero. As discussed +above, the algorithm should be optimized for the "input fits fastint" case. + +Pseudocode 2 +------------ + +Some rewriting:: + + is_fastint(sgn, exp, mant): + nzero = 1075 - exp + if nzero >= 52 and nzero <= 6: // exp 1023 ... exp 1069 + // exponents 1023 to 1069: regular handling, common case + return mzero(mant, nzero) + else if nzero == 1075: + // exponent 0: irregular handling, but still common (positive zero) + return sign == 0 and mzero(mant, 52) + else if nzero == 5: + // exponent 1070: irregular handling, rare case + return sign == 1 and mzero(mant, 52) + else: + // exponents [1,1022] and [1071,2047], rare case + return false + +C algorithm with a lookup table +------------------------------- + +The common case ``nzero`` values are between [6, 52] and correspond to +mantissa masks. Compute a mask index instead as nzero - 6 = 1069 - exp:: + + duk_uint64_t mzero_masks[47] = { + 0x000000000000003fULL, /* exp 1069, nzero 6 */ + 0x000000000000007fULL, /* exp 1068, nzero 7 */ + 0x00000000000000ffULL, /* exp 1067, nzero 8 */ + 0x00000000000001ffULL, /* exp 1066, nzero 9 */ + /* ... */ + 0x0003ffffffffffffULL, /* exp 1025, nzero 50 */ + 0x0007ffffffffffffULL, /* exp 1024, nzero 51 */ + 0x000fffffffffffffULL, /* exp 1023, nzero 52 */ + }; + + int is_fastint(duk_int64_t d) { + int exp = (d >> 52) & 0x07ff; + int idx = 1069 - exp; + + if (idx >= 0 && idx <= 46) { /* exponents 1069 to 1023 */ + return (mzero_masks[idx] & mant) == 0; + } else if (idx == 1069) { /* exponent 0 */ + return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0); + } else if (idx == -1) { /* exponent 1070 */ + return (d < 0) && ((d & 0x000fffffffffffffULL) == 0); + } else { + return 0; + } + }; + +The memory cost of the mask table is 8x47 = 376 bytes. This can be halved +e.g. by using a table of 32-bit values with separate cases for nzero >= 32 +and nzero < 32. + +Unfortunately the expected case (exponents 1023 to 1069) involves a mask +check with a variable mask, so it may be unsuitable for direct inlining in +the most important hot spots. + +C algorithm with a computed mask +-------------------------------- + +Since this algorithm only runs outside the proper fastint "fast path" it +may be more sensible to avoid a memory tradeoff and compute the masks:: + + int is_fastint(duk_int64_t d) { + int exp = (d >> 52) & 0x07ff; + int shift = exp - 1023; + + if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */ + return ((0x000fffffffffffffULL >> shift) & mant) == 0; + } else if (shift == -1023) { /* exponent 0 */ + return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0); + } else if (shift == 47) { /* exponent 1070 */ + return (d < 0) && ((d & 0x000fffffffffffffULL) == 0); + } else { + return 0; + } + }; + +For middle endian machines (ARM) this algorithm first needs swapping +of the 32-bit parts. By changing the mask checks to operate on 32-bit +parts the algorithm would work on more platforms and would also remove +the need for swapping the parts on middle endian platforms. + +C algorithm with 32-bit operations and a computed mask +------------------------------------------------------ + +:: + + int is_fastint(duk_uint32_t hi, duk_uint32_t lo) { + int exp = (hi >> 20) & 0x07ff; + int shift = exp - 1023; + + if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */ + if (shift <= 20) { + /* 0x000fffff'ffffffff -> 0x00000000'ffffffff */ + return (((0x000fffffUL >> shift) & hi) == 0) && (lo == 0); + } else { + /* 0x00000000'ffffffff -> 0x00000000'0000003f */ + return (((0xffffffffUL >> (shift - 20)) & lo) == 0); + } + } else if (shift == -1023) { /* exponent 0 */ + /* return ((hi & 0x800fffffUL) == 0x00000000UL) && (lo == 0); */ + return (hi == 0) && (lo == 0); + } else if (shift == 47) { /* exponent 1070 */ + return ((hi & 0x800fffffUL) == 0x80000000UL) && (lo == 0); + } else { + return 0; + } + }; + + +Future work +=========== + +Skipping the double-to-fastint test sometimes +--------------------------------------------- + +The double-to-fastint can safely err on the side of caution and decide to +represent a fastint-compatible number as a double. This opens up the +possibility of skipping the double-to-fastint test in some cases which +may improve performance and reduce code size. + +For instance, when ``Math.cos()`` pushes its result on the stack, it's +probably quite a safe bet that the number won't fit a fastint, so it could +be written as a double directly without a double-to-fastint downgrade +check. In case it is a fastint (-1, 0, or 1) it will be represented as a +double but will be downgraded to a fastint by the first operation that +does execute the downgrade check. To support this, there could be a macro +like ``DUK_TVAL_SET_NUMBER_NOFASTINT``. + +Another option is to run the double-to-fastint check randomly or e.g. only +every Nth time it is needed (N could be quite large, e.g. the prime 17). +This should be quite OK from a performance point of view. If a number is +incorrectly stored as a double and is involved in a lot of operations, +chances are it will get downgraded quite quickly, as long as the check +interval does not unluckily correlate with the downgrade check frequency. +This approach may not be worth it because an optimized fastint downgrade +check should have quite reasonable performance, and such an approach would +have no effect on the actual fastint fast path (inputs are fastints, +outputs are fastints).