|
|
@ -199,3 +199,272 @@ In addition to these, user code may have some practical dependencies, such as: |
|
|
|
Signed 41.6 fixed point provides a fractional increment of 0.015625; |
|
|
|
for the scheduler, this would mean about 15.6ms resolution, which is not |
|
|
|
that great. |
|
|
|
|
|
|
|
Efficient check for double-to-fastint conversion |
|
|
|
================================================ |
|
|
|
|
|
|
|
Criteria |
|
|
|
-------- |
|
|
|
|
|
|
|
For an IEEE double to be representable as a fast integer, it must be: |
|
|
|
|
|
|
|
* A whole number |
|
|
|
|
|
|
|
* In the 48-bit range |
|
|
|
|
|
|
|
* Not a negative zero, assuming that the integer zero is taken to represent |
|
|
|
a positive zero |
|
|
|
|
|
|
|
What to optimize for |
|
|
|
-------------------- |
|
|
|
|
|
|
|
This algorithm is needed when Duktape: |
|
|
|
|
|
|
|
* Parses a number and checks whether to represent the number as a double or |
|
|
|
a fastint |
|
|
|
|
|
|
|
* Executes internal code with no fastint handling; in this case any fastint |
|
|
|
inputs are first coerced to doubles and then back to fastints if the result |
|
|
|
fits |
|
|
|
|
|
|
|
* Executes internal code with fastint handling, with one or more of the |
|
|
|
inputs not matching the fastint "fast path" but the result possibly fitting |
|
|
|
into a fastint |
|
|
|
|
|
|
|
The "fast path" for fastint operations doesn't execute this algorithm because |
|
|
|
both inputs and outputs are fastints and Duktape detects this in the fast path |
|
|
|
preconditions. Given this, an aggressive memory-speed tradeoff (e.g. a table |
|
|
|
for each exponent) doesn't make sense. |
|
|
|
|
|
|
|
The speed of this algorithm affects two scenarios: |
|
|
|
|
|
|
|
1. Computations where the numbers involved are outside the fastint range. Here |
|
|
|
it's important to quickly determine that a fastint representation is not |
|
|
|
possible. |
|
|
|
|
|
|
|
2. Computations where the numbers can be represented as fastints (at least some |
|
|
|
of the time), but one or more operations don't have a fastint "fast path" so |
|
|
|
that the numbers get upgraded to an IEEE double and then need to be downgraded |
|
|
|
back to a fastint. |
|
|
|
|
|
|
|
Both cases matter, but for typical embedded code the latter case matters more. |
|
|
|
In other words, the code should be optimized for the case where a fastint fit |
|
|
|
is possible. |
|
|
|
|
|
|
|
Exponent and sign by cases |
|
|
|
-------------------------- |
|
|
|
|
|
|
|
An IEEE double has a sign (1 bit), an exponent (11 bits), and a 52-bit stored |
|
|
|
mantissa. The mantissa has an implicit (not stored) leading '1' digit, except |
|
|
|
for denormals, NaNs, and infinities. |
|
|
|
|
|
|
|
Going through the possible exponent values: |
|
|
|
|
|
|
|
* If exponent is 0: |
|
|
|
|
|
|
|
- The number is a fastint only if the sign bit is zero (positive) and the |
|
|
|
entire mantissa is all zeroes. This corresponds to +0. |
|
|
|
|
|
|
|
- If the mantissa is non-zero, the number is a denormal. |
|
|
|
|
|
|
|
* If the exponent is in the range [1, 1022] the number is not a fastint |
|
|
|
because the implicit mantissa bit corresponds to the number 0.5. |
|
|
|
|
|
|
|
* If exponent is exactly 1023: |
|
|
|
|
|
|
|
- The number is only a fastint if the stored mantissa is all zeroes. |
|
|
|
This corresponds to +/- 1. |
|
|
|
|
|
|
|
* If exponent is exactly 1024: |
|
|
|
|
|
|
|
- The number is only a fastint if 51 lowest bits of the mantissa are all |
|
|
|
zeroes. This corresponds to the numbers +/- 2 and +/- 3. |
|
|
|
|
|
|
|
* Generalizing, if the exponent is in the range [1023,1069], the number is |
|
|
|
a fastint if and only if: |
|
|
|
|
|
|
|
- The lowest N bits of the mantissa are zero, where N = 52 - (exp - 1023), |
|
|
|
with either sign. |
|
|
|
|
|
|
|
- N can also be expressed as: N = 1075 - exp. |
|
|
|
|
|
|
|
* If exponent is exactly 1070: |
|
|
|
|
|
|
|
- The number is only a fastint if the sign bit is set (negative) and the |
|
|
|
stored mantissa is all zeroes. This corresponds to -2^47. The positive |
|
|
|
counterpart +2^47 does not fit into the fastint range. |
|
|
|
|
|
|
|
* If exponent is [1071,2047] the number is never a fastint: |
|
|
|
|
|
|
|
- For exponents [1071,2046] the number is too large to be a fastint. |
|
|
|
|
|
|
|
- For exponent 2047 the number is a NaN or infinity depending on the |
|
|
|
mantissa contents, neither a valid fastint. |
|
|
|
|
|
|
|
Pseudocode 1 |
|
|
|
------------ |
|
|
|
|
|
|
|
The algorithm:: |
|
|
|
|
|
|
|
is_fastint(sgn, exp, mant): |
|
|
|
if exp == 0: |
|
|
|
return sign == 0 and mzero(mant, 52) |
|
|
|
else if exp < 1023: |
|
|
|
return false |
|
|
|
else if exp < 1070: |
|
|
|
return mzero(mant, 1075 - exp) |
|
|
|
else if exp == 1070: |
|
|
|
return sign == 1 and mzero(mant, 52) |
|
|
|
else: |
|
|
|
return false |
|
|
|
|
|
|
|
The ``mzero`` helper predicate returns true if the mantissa given has its |
|
|
|
lowest ``n`` bits zero. |
|
|
|
|
|
|
|
Non-zero integers in the fastint range will fall into the case where a certain |
|
|
|
computed number of low mantissa bits must be checked to be zero. As discussed |
|
|
|
above, the algorithm should be optimized for the "input fits fastint" case. |
|
|
|
|
|
|
|
Pseudocode 2 |
|
|
|
------------ |
|
|
|
|
|
|
|
Some rewriting:: |
|
|
|
|
|
|
|
is_fastint(sgn, exp, mant): |
|
|
|
nzero = 1075 - exp |
|
|
|
if nzero >= 52 and nzero <= 6: // exp 1023 ... exp 1069 |
|
|
|
// exponents 1023 to 1069: regular handling, common case |
|
|
|
return mzero(mant, nzero) |
|
|
|
else if nzero == 1075: |
|
|
|
// exponent 0: irregular handling, but still common (positive zero) |
|
|
|
return sign == 0 and mzero(mant, 52) |
|
|
|
else if nzero == 5: |
|
|
|
// exponent 1070: irregular handling, rare case |
|
|
|
return sign == 1 and mzero(mant, 52) |
|
|
|
else: |
|
|
|
// exponents [1,1022] and [1071,2047], rare case |
|
|
|
return false |
|
|
|
|
|
|
|
C algorithm with a lookup table |
|
|
|
------------------------------- |
|
|
|
|
|
|
|
The common case ``nzero`` values are between [6, 52] and correspond to |
|
|
|
mantissa masks. Compute a mask index instead as nzero - 6 = 1069 - exp:: |
|
|
|
|
|
|
|
duk_uint64_t mzero_masks[47] = { |
|
|
|
0x000000000000003fULL, /* exp 1069, nzero 6 */ |
|
|
|
0x000000000000007fULL, /* exp 1068, nzero 7 */ |
|
|
|
0x00000000000000ffULL, /* exp 1067, nzero 8 */ |
|
|
|
0x00000000000001ffULL, /* exp 1066, nzero 9 */ |
|
|
|
/* ... */ |
|
|
|
0x0003ffffffffffffULL, /* exp 1025, nzero 50 */ |
|
|
|
0x0007ffffffffffffULL, /* exp 1024, nzero 51 */ |
|
|
|
0x000fffffffffffffULL, /* exp 1023, nzero 52 */ |
|
|
|
}; |
|
|
|
|
|
|
|
int is_fastint(duk_int64_t d) { |
|
|
|
int exp = (d >> 52) & 0x07ff; |
|
|
|
int idx = 1069 - exp; |
|
|
|
|
|
|
|
if (idx >= 0 && idx <= 46) { /* exponents 1069 to 1023 */ |
|
|
|
return (mzero_masks[idx] & mant) == 0; |
|
|
|
} else if (idx == 1069) { /* exponent 0 */ |
|
|
|
return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0); |
|
|
|
} else if (idx == -1) { /* exponent 1070 */ |
|
|
|
return (d < 0) && ((d & 0x000fffffffffffffULL) == 0); |
|
|
|
} else { |
|
|
|
return 0; |
|
|
|
} |
|
|
|
}; |
|
|
|
|
|
|
|
The memory cost of the mask table is 8x47 = 376 bytes. This can be halved |
|
|
|
e.g. by using a table of 32-bit values with separate cases for nzero >= 32 |
|
|
|
and nzero < 32. |
|
|
|
|
|
|
|
Unfortunately the expected case (exponents 1023 to 1069) involves a mask |
|
|
|
check with a variable mask, so it may be unsuitable for direct inlining in |
|
|
|
the most important hot spots. |
|
|
|
|
|
|
|
C algorithm with a computed mask |
|
|
|
-------------------------------- |
|
|
|
|
|
|
|
Since this algorithm only runs outside the proper fastint "fast path" it |
|
|
|
may be more sensible to avoid a memory tradeoff and compute the masks:: |
|
|
|
|
|
|
|
int is_fastint(duk_int64_t d) { |
|
|
|
int exp = (d >> 52) & 0x07ff; |
|
|
|
int shift = exp - 1023; |
|
|
|
|
|
|
|
if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */ |
|
|
|
return ((0x000fffffffffffffULL >> shift) & mant) == 0; |
|
|
|
} else if (shift == -1023) { /* exponent 0 */ |
|
|
|
return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0); |
|
|
|
} else if (shift == 47) { /* exponent 1070 */ |
|
|
|
return (d < 0) && ((d & 0x000fffffffffffffULL) == 0); |
|
|
|
} else { |
|
|
|
return 0; |
|
|
|
} |
|
|
|
}; |
|
|
|
|
|
|
|
For middle endian machines (ARM) this algorithm first needs swapping |
|
|
|
of the 32-bit parts. By changing the mask checks to operate on 32-bit |
|
|
|
parts the algorithm would work on more platforms and would also remove |
|
|
|
the need for swapping the parts on middle endian platforms. |
|
|
|
|
|
|
|
C algorithm with 32-bit operations and a computed mask |
|
|
|
------------------------------------------------------ |
|
|
|
|
|
|
|
:: |
|
|
|
|
|
|
|
int is_fastint(duk_uint32_t hi, duk_uint32_t lo) { |
|
|
|
int exp = (hi >> 20) & 0x07ff; |
|
|
|
int shift = exp - 1023; |
|
|
|
|
|
|
|
if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */ |
|
|
|
if (shift <= 20) { |
|
|
|
/* 0x000fffff'ffffffff -> 0x00000000'ffffffff */ |
|
|
|
return (((0x000fffffUL >> shift) & hi) == 0) && (lo == 0); |
|
|
|
} else { |
|
|
|
/* 0x00000000'ffffffff -> 0x00000000'0000003f */ |
|
|
|
return (((0xffffffffUL >> (shift - 20)) & lo) == 0); |
|
|
|
} |
|
|
|
} else if (shift == -1023) { /* exponent 0 */ |
|
|
|
/* return ((hi & 0x800fffffUL) == 0x00000000UL) && (lo == 0); */ |
|
|
|
return (hi == 0) && (lo == 0); |
|
|
|
} else if (shift == 47) { /* exponent 1070 */ |
|
|
|
return ((hi & 0x800fffffUL) == 0x80000000UL) && (lo == 0); |
|
|
|
} else { |
|
|
|
return 0; |
|
|
|
} |
|
|
|
}; |
|
|
|
|
|
|
|
|
|
|
|
Future work |
|
|
|
=========== |
|
|
|
|
|
|
|
Skipping the double-to-fastint test sometimes |
|
|
|
--------------------------------------------- |
|
|
|
|
|
|
|
The double-to-fastint can safely err on the side of caution and decide to |
|
|
|
represent a fastint-compatible number as a double. This opens up the |
|
|
|
possibility of skipping the double-to-fastint test in some cases which |
|
|
|
may improve performance and reduce code size. |
|
|
|
|
|
|
|
For instance, when ``Math.cos()`` pushes its result on the stack, it's |
|
|
|
probably quite a safe bet that the number won't fit a fastint, so it could |
|
|
|
be written as a double directly without a double-to-fastint downgrade |
|
|
|
check. In case it is a fastint (-1, 0, or 1) it will be represented as a |
|
|
|
double but will be downgraded to a fastint by the first operation that |
|
|
|
does execute the downgrade check. To support this, there could be a macro |
|
|
|
like ``DUK_TVAL_SET_NUMBER_NOFASTINT``. |
|
|
|
|
|
|
|
Another option is to run the double-to-fastint check randomly or e.g. only |
|
|
|
every Nth time it is needed (N could be quite large, e.g. the prime 17). |
|
|
|
This should be quite OK from a performance point of view. If a number is |
|
|
|
incorrectly stored as a double and is involved in a lot of operations, |
|
|
|
chances are it will get downgraded quite quickly, as long as the check |
|
|
|
interval does not unluckily correlate with the downgrade check frequency. |
|
|
|
This approach may not be worth it because an optimized fastint downgrade |
|
|
|
check should have quite reasonable performance, and such an approach would |
|
|
|
have no effect on the actual fastint fast path (inputs are fastints, |
|
|
|
outputs are fastints). |
|
|
|