mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
954 lines
35 KiB
954 lines
35 KiB
============
|
|
Fastint type
|
|
============
|
|
|
|
Overview
|
|
========
|
|
|
|
ECMAScript has a single number type which is required to be an IEEE double.
|
|
This is a potential performance issue in some embedded environments where
|
|
hardware floating point numbers (at least IEEE doubles) are not available
|
|
and software floating point emulation performs poorly.
|
|
|
|
Duktape provides optional support for fast integers or "fastints" which
|
|
allows Duktape to represent numbers internally either as IEEE doubles or
|
|
48-bit signed integers. Duktape will transparently upgrade integers to
|
|
doubles when necessary (e.g. when an integer operation overflows) and
|
|
downgrade doubles to integers when possible.
|
|
|
|
Because a double-to-integer downgrade check is relatively expensive, it is
|
|
only applied in specific situations. Currently:
|
|
|
|
* All compiler constants are represented as fastints if possible.
|
|
|
|
* Unary plus performs a ToNumber() coercion and also downgrades an IEEE
|
|
double to a fastint if possible.
|
|
|
|
* All function return values are automatically downgraded to fastints if
|
|
possible.
|
|
|
|
* Thread yield/resume values are automatically downgraded to fastints if
|
|
possible.
|
|
|
|
Fastints don't affect ECMAScript semantics and are completely transparent
|
|
to user C and ECMAScript code: all conversions are automatic.
|
|
|
|
To enable fastint support, simply define:
|
|
|
|
* ``DUK_USE_FASTINT``
|
|
|
|
You should measure the impact of enabling fastint support for your target
|
|
platform and ECMAScript code base. Fastint support is not an automatic
|
|
performance win: while the fast path is a clear improvement for soft float
|
|
(and even some hard float) platforms, there is a run-time cost of doing
|
|
fastint downgrade checks and other book-keeping. Very roughly:
|
|
|
|
* Code that benefits most from fastint upsides (e.g. heavy integer arithmetic
|
|
in large loops) can run about 1000% faster on soft float platforms.
|
|
|
|
* Code that suffers most from fastint downsides can run about 10% more
|
|
slowly.
|
|
|
|
* Executable size will increase by about 7-10kB.
|
|
|
|
This document provides tips for using fastints, and provides some background
|
|
on the approach chosen. Some specific fastint algorithms used by Duktape are
|
|
also described in detail.
|
|
|
|
Application considerations
|
|
==========================
|
|
|
|
Because fastints are transparent to user code, the only real consideration is
|
|
to make sure performance critical sections take advantage of fastints. Some
|
|
tips for using fastints:
|
|
|
|
* Because a double-to-fastint downgrade check is only done for specific
|
|
operations, make sure that integer values don't accidentally become
|
|
IEEE doubles.
|
|
|
|
There's no easy way to check how a number is represented internally.
|
|
However, ``Duktape.info()`` provides a way to peek into the internal
|
|
representation. An example algorithm is provided in
|
|
``polyfills/duktape-isfastint.js``. You can use this polyfill to debug
|
|
your code if necessary.
|
|
|
|
* When in doubt, you can use unary plus to force a number to be downgrade
|
|
checked::
|
|
|
|
// Result is exactly 1, but is represented internally as a double.
|
|
var t = Math.PI / Math.PI;
|
|
|
|
// Result is exactly 1, downgrade checked, and is represented
|
|
// internally as a fastint.
|
|
var t = +(Math.PI / Math.PI);
|
|
|
|
* All function return values from both ECMAScript and Duktape/C functions
|
|
are automatically downgraded to fastints. So, the following value can be
|
|
trusted to be 3 and represented internally as a fastint::
|
|
|
|
// Resulting 'three' is a fastint because Math.floor() return
|
|
// value (double 3) is automatically downgraded to a fastint.
|
|
var three = Math.floor(Math.PI);
|
|
|
|
Same applies to any user functions::
|
|
|
|
function my_max(a, b) {
|
|
// For the call below, 'b' is 1 but is not represented as a
|
|
// fastint here. Only when we return is the return value 1
|
|
// downgraded into a fastint.
|
|
return (a >= b ? a : b);
|
|
}
|
|
|
|
// 't' is exactly 1, and represented internally as a fastint.
|
|
var t = my_max(0, Math.PI / Math.PI);
|
|
|
|
* All compiler constants are automatically downgraded to fastints when
|
|
possible. For example, all constants below will be fastints::
|
|
|
|
var i, n;
|
|
|
|
for (i = 0, n = 1e6; i < n; i++) {
|
|
// All 'i' values here will be fastints.
|
|
}
|
|
|
|
* Note that the number syntax doesn't affect the fastint downgrade check;
|
|
only the final value matters. All of the following will be represented
|
|
as fastints::
|
|
|
|
t = 1;
|
|
t = 1.0;
|
|
t = 100e-2;
|
|
t = 0.01e2;
|
|
|
|
Similarly, constant folding, when possible, will be done before doing the
|
|
downgrade check, so the following will be represented as a fastint::
|
|
|
|
t = 123.123 / 123.123; // fastint 1
|
|
|
|
But because ``Math.PI`` needs a runtime lookup, the following will not be
|
|
a fastint::
|
|
|
|
t = Math.PI / Math.PI; // double 1
|
|
|
|
* Non-fastint values will "taint" fastints in operations so that the result
|
|
will be represented as a double instead of a fastint::
|
|
|
|
t1 = 123; // fastint
|
|
t2 = 0.5; // double
|
|
t3 = t1 + t2; // <fastint> + <double> -> <double>
|
|
t4 = t3 - t2; // <double> - <double> -> <double>
|
|
t5 = +t4; // restore into fastint representation
|
|
|
|
While adding and subtracting ``t2`` is a net zero change and ``t4`` would
|
|
be fastint compatible, it will not be represented as a fastint internally
|
|
until the next explicit downgrade check. Here unary plus is used to get
|
|
the result back into fastint representation.
|
|
|
|
* Negative zero cannot be represented as a fastint. Ordinary ECMAScript
|
|
code will very rarely deal with negative zeros. Negative zero can "taint"
|
|
a fastint, too::
|
|
|
|
t1 = 123; // fastint
|
|
t2 = -0; // double
|
|
t3 = t1 + t2; // <fastint> + <double> -> <double> (!)
|
|
|
|
Here the result is a double even when an innocent zero value is added to
|
|
a fastint. When in doubt you can use unary plus to ensure the result is
|
|
a fastint if it's fastint compatible.
|
|
|
|
* When doing Duktape API calls from C code, prefer API calls which take
|
|
integer arguments. Such API calls will typically have fastint support.
|
|
For example::
|
|
|
|
// Value pushed will be 1, represented internally as a double.
|
|
duk_push_number(ctx, 1.0);
|
|
|
|
// Value pushed will be 1, represented internally as a fastint.
|
|
duk_push_int(ctx, 1);
|
|
|
|
* Because the fastint support is transparent from a semantics perspective,
|
|
Duktape fastint fast path and downgrade behavior may change in future
|
|
versions. Such changes won't change outward behavior but may affect
|
|
code performance.
|
|
|
|
As a general rule, optimize for fastints only in code sections where it
|
|
really matters for performance, e.g. heavy loops.
|
|
|
|
Detecting that a number is represented as a fastint internally
|
|
==============================================================
|
|
|
|
There's no explicit API for this now, but ``Duktape.info()`` provides the
|
|
necessary information (in a highly fragile manner though). For instance,
|
|
you can use something like::
|
|
|
|
/* Fastint tag depends on duk_tval packing */
|
|
var fastintTag = (Duktape.info(true)[1] >= 0xfff0 ?
|
|
0xfff1 /* tag for packed duk_tval) :
|
|
1 /* tag for unpacked duk_tval */ );
|
|
|
|
function isFastint(x) {
|
|
if (typeof x !== 'number') {
|
|
return false;
|
|
}
|
|
return Duktape.info(x)[1] === fastintTag;
|
|
}
|
|
|
|
There's an example polyfill which provides ``Duktape.isFastint()`` in:
|
|
|
|
* polyfills/duktape-isfastint.js
|
|
|
|
.. note:: This is fragile and may stop working when internal tag number
|
|
changes are made. Such changes are possible even in minor version
|
|
updates.
|
|
|
|
Fastints and Duktape internals
|
|
==============================
|
|
|
|
A few notes on how fastints are used internally, what macros are used, etc.
|
|
|
|
Fastint aware vs. unaware code
|
|
-----------------------------
|
|
|
|
Fastint support is optional and added between ``#if defined()`` guards::
|
|
|
|
#if defined(DUK_USE_FASTINT)
|
|
...
|
|
#endif
|
|
|
|
Number handling will be either:
|
|
|
|
* fastint unaware: requires no changes to existing code
|
|
|
|
* fastint aware: requires fastint detection, e.g. in switch-case statements
|
|
and then usage of fastint aware macros
|
|
|
|
Type switch cases
|
|
-----------------
|
|
|
|
The minimum change necessary is to ensure fastints are handled in type
|
|
switch-cases::
|
|
|
|
/* ... */
|
|
|
|
switch(DUK_TVAL_GET_TAG(tv)) {
|
|
case DUK_TAG_UNDEFINED:
|
|
/* ... */
|
|
#if defined(DUK_USE_FASTINT)
|
|
case DUK_TAG_FASTINT:
|
|
/* no direct support, fall through */
|
|
#endif
|
|
default:
|
|
/* number, double or fastint; use fastint unaware macros
|
|
* which will automatically upgrade a fastint to a double
|
|
* when necessary:
|
|
*/
|
|
|
|
duk_double_t d = DUK_TVAL_GET_NUMBER(tv); /* auto upgrade */
|
|
/* ... */
|
|
}
|
|
|
|
Even without this change the default clause will capture ``DUK_TAG_FASTINT``
|
|
values but it's preferable to have the fall through happen explicitly.
|
|
|
|
Fastint aware code will have specific code in the ``DUK_TAG_FASTINT`` case,
|
|
and the ``default`` case can then assume the number is represented as a
|
|
double. The ``default`` case must be written carefully so that it also works
|
|
correctly when fastints are disabled.
|
|
|
|
Getting numbers/fastints
|
|
------------------------
|
|
|
|
Fastint unaware code uses::
|
|
|
|
DUK_TVAL_GET_NUMBER(tv)
|
|
|
|
which will always evaluate to a double, and automatically upgrades a fastint
|
|
to a double. The implementation with fastints enabled is something like::
|
|
|
|
#define DUK_TVAL_GET_NUMBER(v) \
|
|
(DUK_TVAL_IS_FASTINT(v) ? \
|
|
(duk_double_t) DUK_TVAL_GET_FASTINT(v) : \
|
|
DUK_TVAL_GET_DOUBLE(v))
|
|
|
|
The extra compared to a direct read has a small runtime cost, but only when
|
|
fastints are enabled. When they're not enabled, ``DUK_TVAL_GET_NUMBER()``
|
|
will just read a double.
|
|
|
|
Fastint aware code uses the following::
|
|
|
|
/* When 'tv' is known to be a fastint, e.g. switch DUK_TAG_FASTINT or
|
|
* explicit check.
|
|
*/
|
|
DUK_TVAL_GET_FASTINT(tv) /* result is duk_int64_t */
|
|
|
|
/* When 'tv' is known to be a fastint, and we just need the lowest 32 bits
|
|
* as a duk_uint32_t.
|
|
*/
|
|
DUK_TVAL_GET_FASTINT_U32(tv) /* result is duk_uint32_t */
|
|
|
|
/* Similarly for a duk_int32_t. */
|
|
DUK_TVAL_GET_FASTINT_I32(tv) /* result is duk_int32_t */
|
|
|
|
/* When 'tv' is known to be a double, e.g. switch or explicit check. */
|
|
DUK_TVAL_GET_DOUBLE(tv)
|
|
|
|
The ``DUK_TVAL_GET_DOUBLE(tv)`` macro is also defined when fastints are not
|
|
enabled; in that case it's simply a synonym for ``DUK_TVAL_GET_NUMBER()``
|
|
because all numbers are represented as doubles. It should only be used when
|
|
in the fastint enabled case the number is known to be represented as a double.
|
|
|
|
This allows control structures like::
|
|
|
|
/* Fictional ToBoolean()-like operation. */
|
|
|
|
switch(DUK_TVAL_GET_TAG(tv)) {
|
|
...
|
|
#if defined(DUK_USE_FASTINT)
|
|
case DUK_TAG_FASTINT:
|
|
/* Fastints enabled and 'tv' is a fastint. */
|
|
return (DUK_TVAL_GET_FASTINT(tv) != 0 ? 1 : 0);
|
|
#endif
|
|
default:
|
|
/* Fastints enabled and 'tv' is a double, or fastints disabled. */
|
|
return (DUK_TVAL_GET_DOUBLE(tv) != 0.0 ? 1 : 0);
|
|
}
|
|
|
|
Setting numbers/fastints
|
|
------------------------
|
|
|
|
Fastint unaware code uses::
|
|
|
|
DUK_TVAL_SET_NUMBER(tv, d);
|
|
|
|
This sets the number always into an internal double representation, i.e.
|
|
no double-to-fastint downgrade is automatically done. (This was one
|
|
design option, but it turns out double-to-fastint coercion test is quite
|
|
expensive and adds a considerable overhead to the fastint unaware slow
|
|
path.)
|
|
|
|
Fastint aware which wants to set a double and downgrade it automatically
|
|
into a fastint when possible uses::
|
|
|
|
DUK_TVAL_SET_NUMBER_CHKFAST(tv, d);
|
|
|
|
This macro concretely calls into a helper function so there's a performance
|
|
penalty involved. Downgrade checks are only added to specific places where
|
|
they provide the most benefit.
|
|
|
|
Fastint aware code which wants to set a double explicitly (with no fastint
|
|
downgrade check) uses::
|
|
|
|
DUK_TVAL_SET_DOUBLE(tv, d);
|
|
|
|
Fastint aware code which wants to set a fastint explicitly (and has ensured
|
|
that the value is fastint compatible) uses::
|
|
|
|
/* 'i' must be in 48-bit signed range */
|
|
DUK_TVAL_SET_FASTINT(tv, i); /* i is duk_int64_t */
|
|
|
|
/* 'i' must be in 32-bit unsigned range */
|
|
DUK_TVAL_SET_FASTINT_U32(tv, i); /* i is duk_uint32_t */
|
|
|
|
/* 'i' must be in 32-bit signed range */
|
|
DUK_TVAL_SET_FASTINT_I32(tv, i); /* i is duk_int32_t */
|
|
|
|
The macros are also available when fastints are disabled, and will just
|
|
write a double with no checks or additional overhead. This is just a
|
|
convenience to reduce the number of ``#if defined()`` guards in call sites.
|
|
For example, ``DUK_TVAL_SET_FASTINT_U32`` coerces the uint32 argument to a
|
|
double when fastints are disabled.
|
|
|
|
In-place double-to-fastint downgrade check
|
|
------------------------------------------
|
|
|
|
The following macro is used to perform an in-place double-to-fastint
|
|
downgrade check::
|
|
|
|
DUK_TVAL_CHKFAST_INPLACE(tv);
|
|
|
|
The target ``tv`` can have any type; the macro first checks if the value
|
|
is a double and if so, if it can be fastint coerced.
|
|
|
|
When fastint support is disabled, the macro is a no-op.
|
|
|
|
Type checks
|
|
-----------
|
|
|
|
Fastint unaware code checks for a number (either double or fastint) using::
|
|
|
|
DUK_TVAL_IS_NUMBER(tv)
|
|
|
|
Fastint aware code uses::
|
|
|
|
/* Number represented as a fastint */
|
|
DUK_TVAL_IS_FASTINT(tv)
|
|
|
|
/* Number represented as a double */
|
|
DUK_TVAL_IS_DOUBLE(tv)
|
|
|
|
The following is defined even when fastints are disabled to support the
|
|
switch code structure described above::
|
|
|
|
/* When fastints disabled, same as DUK_TVAL_IS_NUMBER() */
|
|
DUK_TVAL_IS_DOUBLE(tv)
|
|
|
|
Background
|
|
==========
|
|
|
|
This section provides some background, discussion, and issues on various
|
|
approaches to integer support. It's not up to date with the current
|
|
implementation.
|
|
|
|
Approaches to integer support
|
|
-----------------------------
|
|
|
|
* Replace the tagged IEEE double number type with an integer or a fixed point
|
|
type. This will necessarily break ECMAScript compliance to some extent, but
|
|
it would be nice if at least number range was sufficient for 32-bit bit ops
|
|
and to represent e.g. Dates.
|
|
|
|
* Same as above, but also reserve a few bits for one or more special values
|
|
like NaNs, to maintain compatibility better. For instance, NaN is used to
|
|
signify an invalid Date, and is also used as a coercion result to signal a
|
|
coercion error.
|
|
|
|
* Extend the tagged type to support both an IEEE double and an integer or a
|
|
fixed point type. Convert between the two either fully transparently (to
|
|
maintain full ECMAScript semantics) or in selected situations, chosen for
|
|
either convenience or performance.
|
|
|
|
* Extend the tagged type to support both an IEEE double and an integer or a
|
|
fixed point type. Extend the public API and ECMAScript environment to
|
|
expose the new integer type explicitly. The upside is minimal performance
|
|
cost because there are fewer automatic conversion checks. The downside is
|
|
a significant API change and introduction of custom language features.
|
|
|
|
* Same as above, but expose the integer type only for user C code; keep the
|
|
ECMAScript environment unaware of the change.
|
|
|
|
Implementation issues
|
|
---------------------
|
|
|
|
* When there is no need to represent IEEE doubles, the 8-byte tagged duk_tval
|
|
no longer needs to conform to the IEEE double constraints (NaN space reuse).
|
|
Instead, it can be split e.g. into an 8-bit tag and 56-bit type-specific
|
|
value.
|
|
|
|
* When there is a need to represent both integers and IEEE doubles, the 8-byte
|
|
duk_tval must conform to the IEEE double representation, i.e. there are 16
|
|
bits of a special tag value and 48-bit type specific value.
|
|
|
|
* Should there be a C typedef for a Duktape number? Currently the public
|
|
API and Duktape internals assume numbers can be read/written as doubles.
|
|
Changing the public API will break compilation (or at least cause warnings)
|
|
for user code, if the integer changes are visible in the API.
|
|
|
|
* Does the integer change need to be made everywhere at once, so that all
|
|
code (including the compiler, etc) must support the underlying integer
|
|
type before the change is complete?
|
|
|
|
Alternatively, Duktape could read and write numbers as doubles by default
|
|
internally (with automatic conversion back and forth as needed) and
|
|
integer-aware optimizations would only be applied in places where it matters,
|
|
such as arithmetic. In particular, there would be no need to deal with
|
|
integer representation in the compiler as it would normally have a minimal
|
|
impact.
|
|
|
|
* Integer representations above 32 bits would normally use a 64-bit integer
|
|
type for arithmetic. However, some older platforms don't have such a type
|
|
(there are workarounds for this e.g. in ``duk_numconv.c``). So either the
|
|
integer arithmetic must also be implemented with 32-bit replacements, or
|
|
the representation won't be available if 64-bit types are not available.
|
|
|
|
Representation options
|
|
----------------------
|
|
|
|
Double type + separate integer / fixed point type (compliant)
|
|
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
|
|
|
|
In this case the 8-byte tagged type must conform to the IEEE NaN space
|
|
reuse, so 16 bits are lost to the type tag and 48 bits are available
|
|
for the value.
|
|
|
|
* Double and up to 48-bit integer (sign + 47-bit range). Integers are nice
|
|
and intuitive, but won't fit the full 53-bit integer range supported by
|
|
IEEE doubles, so some must fall back into the double representation (not a
|
|
big limitation). Date values and binary operations work.
|
|
|
|
* Double and a fixed point with up to 48 bit representation, e.g. sign +
|
|
41.6. To support reasonable Date values, the integer part must be at least
|
|
41 bits. To support bit operations without falling back to IEEE doubles,
|
|
the integer part must support both signed and unsigned 32-bit values.
|
|
Binary fractions require some additional shifting to implement, and user
|
|
code is not very likely to contain specific binary fractions, so they would
|
|
only benefit code specifically crafted to use them.
|
|
|
|
* Double and 32-bit signed or unsigned integer: 32-bit arithmetic is nice
|
|
but unfortunately not enough to support ECMAScript bit operations which
|
|
require the range -0x80000000 to 0xffffffff (sign + 32 bits, a 33-bit
|
|
representation). This would not be a compliance issue as Duktape would
|
|
fall back to the IEEE double for some values, but if fast bit operations
|
|
are important matter, this is not a good option. If bit operations don't
|
|
matter, then this is a nice option in that it avoids the 64-bit arithmetic
|
|
issue.
|
|
|
|
Only integer / fixed point type (non-compliant)
|
|
:::::::::::::::::::::::::::::::::::::::::::::::
|
|
|
|
Here the 8-byte tagged type can be split e.g. into a 8-bit type and a 56-bit
|
|
value which allows more range.
|
|
|
|
* 56-bit signed integer (sign + 55 bits): covers the IEEE integer range
|
|
(53-bit), Date values work, bit ops work. Lack of any fractions makes
|
|
built-in Math functions mostly useless (e.g. Math.random() will always
|
|
return zero), and some user code is likely to break.
|
|
|
|
* Sign and 47.8 or 45.10 fixed point: provides enough fractions to be
|
|
useful, Date values work, bit ops work. Math functions are somewhat
|
|
useful again.
|
|
|
|
* Sign and 41.14 fixed point: maximum number of fraction bits while keeping
|
|
Date values (and bit ops) working.
|
|
|
|
* Sign and 32.23 fixed point: maximum number of fraction bits while keeping
|
|
bit ops working and providing user code the reasonable and intuitive
|
|
guarantee that 32-bit integers (signed and unsigned) work. Date values
|
|
won't work.
|
|
|
|
* 32-bit unsigned integer or 32-bit signed integer: closest to what's fast
|
|
and convenient on typical embedded systems, but some bit operations stop
|
|
working because taken together they need the -0x80000000 to 0xffffffff
|
|
range (there are both signed and unsigned bit ops). Date values won't
|
|
work.
|
|
|
|
Dependencies on IEEE double or range
|
|
------------------------------------
|
|
|
|
Specification and Duktape dependencies:
|
|
|
|
* Signed integers are quite widely required, so having no support for negative
|
|
values is probably not an option.
|
|
|
|
* At least 32-bit unsigned integers are needed for array and string lengths.
|
|
|
|
* A sign + a 32-bit range (33-bit representation) are needed for bit ops,
|
|
which provide both signed and unsigned 32-bit results. The required range
|
|
is -0x80000000 to 0xffffffff.
|
|
|
|
* The Date built-in uses an integer millisecond value for time values. This
|
|
representation is used both internally and in the external Date API.
|
|
|
|
- 40 (unsigned) bits is not enough to represent the current time, it only
|
|
represents timestamps up to November 2004.
|
|
|
|
- 41 (unsigned) bits is enough to represent timestamps up to September
|
|
2039.
|
|
|
|
- The Date API never uses fractions, and in fact the specification requires
|
|
that the internal value is integer coerced (to milliseconds), so Date
|
|
does not require fractions to work properly.
|
|
|
|
- The implication for using only an integer / fixed point representation
|
|
is that the integer part must contain a sign and at least 41 bits.
|
|
For example, for a 48-bit representation sign + 41.6 fixed point is
|
|
enough, and would provide 1/64 fractions.
|
|
|
|
- It would be easy to fix the internal Date representation to work with any
|
|
fixed point representation with enough bits (e.g. sign + 32.15), but
|
|
because the integer millisecond values are used in the public Date API
|
|
too, this doesn't solve anything.
|
|
|
|
* Signed zero semantics (separation of negative and positive zero) are
|
|
are required and explicitly specified, but ECMAScript itself doesn't
|
|
really depend on being able to use a negative zero, and neither does
|
|
Duktape.
|
|
|
|
* NaN values are used in several places as significant internal or
|
|
external values. Invalid Date values are represented by having a
|
|
NaN as the Date object's internal time value. String-to-number
|
|
coercion relies on using a NaN to indicate a coercion error
|
|
(``Number('foo') === NaN``). If a NaN value is not available, the
|
|
best replacement is probably zero.
|
|
|
|
* Infinities are used in math functions but ECMAScript itself doesn't
|
|
rely on being able to use them, and neither does Duktape.
|
|
|
|
* Duktape packs some internal values into double representation, this is
|
|
used at least by:
|
|
|
|
- The compiler for declaration book-keeping. The needed bit count is
|
|
not large (32 bits should more than suffice, for 2**24 inner functions).
|
|
|
|
- Error object tracedata format, which needs 32 bits + a few flags;
|
|
40 bits should suffice.
|
|
|
|
In addition to these, user code may have some practical dependencies, such as:
|
|
|
|
* Being able to represent at least signed and unsigned 32 bits, so that all
|
|
ECMAScript bit operations work as expected.
|
|
|
|
* Being able to represent at least some fractional values. For instance,
|
|
suppose a custom scheduler used second-based timestamps for timers; it
|
|
would then require a reasonable number of fractions to work properly.
|
|
Signed 41.6 fixed point provides a fractional increment of 0.015625;
|
|
for the scheduler, this would mean about 15.6ms resolution, which is not
|
|
that great.
|
|
|
|
Efficient check for double-to-fastint downgrade
|
|
===============================================
|
|
|
|
Overview
|
|
--------
|
|
|
|
For an IEEE double to be representable as a fast integer, it must be:
|
|
|
|
* A whole number
|
|
|
|
* In the signed 48-bit range
|
|
|
|
* Not a negative zero, assuming that the integer zero is taken to represent
|
|
a positive zero
|
|
|
|
This algorithm is needed when Duktape does an explicit downgrade check to see
|
|
if a double value can be represented as a fastint.
|
|
|
|
The "fast path" for fastint operations doesn't execute this algorithm because
|
|
both inputs and outputs are fastints and Duktape detects this in the fast path
|
|
preconditions. Even so, the performance of the downgrade check matters for
|
|
overall performance.
|
|
|
|
Exponent and sign by cases
|
|
--------------------------
|
|
|
|
An IEEE double has a sign (1 bit), an exponent (11 bits), and a 52-bit stored
|
|
mantissa. The mantissa has an implicit (not stored) leading '1' digit, except
|
|
for denormals, NaNs, and infinities.
|
|
|
|
Going through the possible exponent values:
|
|
|
|
* If exponent is 0:
|
|
|
|
- The number is a fastint only if the sign bit is zero (positive) and the
|
|
entire mantissa is all zeroes. This corresponds to +0.
|
|
|
|
- If the mantissa is non-zero, the number is a denormal.
|
|
|
|
* If the exponent is in the range [1, 1022] the number is not a fastint
|
|
because the implicit mantissa bit corresponds to the number 0.5.
|
|
|
|
* If exponent is exactly 1023:
|
|
|
|
- The number is only a fastint if the stored mantissa is all zeroes.
|
|
This corresponds to +/- 1.
|
|
|
|
* If exponent is exactly 1024:
|
|
|
|
- The number is only a fastint if 51 lowest bits of the mantissa are all
|
|
zeroes (with the top bit either zero or one). This corresponds to the
|
|
numbers +/- 2 and +/- 3.
|
|
|
|
* Generalizing, if the exponent is in the range [1023,1069], the number is
|
|
a fastint if and only if:
|
|
|
|
- The lowest N bits of the mantissa are zero, where N = 52 - (exp - 1023),
|
|
with either sign.
|
|
|
|
- N can also be expressed as: N = 1075 - exp.
|
|
|
|
* If exponent is exactly 1070:
|
|
|
|
- The number is only a fastint if the sign bit is set (negative) and the
|
|
stored mantissa is all zeroes. This corresponds to -2^47. The positive
|
|
counterpart +2^47 does not fit into the fastint range.
|
|
|
|
* If exponent is [1071,2047] the number is never a fastint:
|
|
|
|
- For exponents [1071,2046] the number is too large to be a fastint.
|
|
|
|
- For exponent 2047 the number is a NaN or infinity depending on the
|
|
mantissa contents, neither a valid fastint.
|
|
|
|
Pseudocode 1
|
|
------------
|
|
|
|
The algorithm::
|
|
|
|
is_fastint(sgn, exp, mant):
|
|
if exp == 0:
|
|
return sign == 0 and mzero(mant, 52)
|
|
else if exp < 1023:
|
|
return false
|
|
else if exp < 1070:
|
|
return mzero(mant, 1075 - exp)
|
|
else if exp == 1070:
|
|
return sign == 1 and mzero(mant, 52)
|
|
else:
|
|
return false
|
|
|
|
The ``mzero`` helper predicate returns true if the mantissa given has its
|
|
lowest ``n`` bits zero.
|
|
|
|
Non-zero integers in the fastint range will fall into the case where a certain
|
|
computed number of low mantissa bits must be checked to be zero. As discussed
|
|
above, the algorithm should be optimized for the "input fits fastint" case.
|
|
|
|
Pseudocode 2
|
|
------------
|
|
|
|
Some rewriting::
|
|
|
|
is_fastint(sgn, exp, mant):
|
|
nzero = 1075 - exp
|
|
if nzero >= 52 and nzero <= 6: // exp 1023 ... exp 1069
|
|
// exponents 1023 to 1069: regular handling, common case
|
|
return mzero(mant, nzero)
|
|
else if nzero == 1075:
|
|
// exponent 0: irregular handling, but still common (positive zero)
|
|
return sign == 0 and mzero(mant, 52)
|
|
else if nzero == 5:
|
|
// exponent 1070: irregular handling, rare case
|
|
return sign == 1 and mzero(mant, 52)
|
|
else:
|
|
// exponents [1,1022] and [1071,2047], rare case
|
|
return false
|
|
|
|
C algorithm with a lookup table
|
|
-------------------------------
|
|
|
|
The common case ``nzero`` values are between [6, 52] and correspond to
|
|
mantissa masks. Compute a mask index instead as nzero - 6 = 1069 - exp::
|
|
|
|
duk_uint64_t mzero_masks[47] = {
|
|
0x000000000000003fULL, /* exp 1069, nzero 6 */
|
|
0x000000000000007fULL, /* exp 1068, nzero 7 */
|
|
0x00000000000000ffULL, /* exp 1067, nzero 8 */
|
|
0x00000000000001ffULL, /* exp 1066, nzero 9 */
|
|
/* ... */
|
|
0x0003ffffffffffffULL, /* exp 1025, nzero 50 */
|
|
0x0007ffffffffffffULL, /* exp 1024, nzero 51 */
|
|
0x000fffffffffffffULL, /* exp 1023, nzero 52 */
|
|
};
|
|
|
|
int is_fastint(duk_int64_t d) {
|
|
int exp = (d >> 52) & 0x07ff;
|
|
int idx = 1069 - exp;
|
|
|
|
if (idx >= 0 && idx <= 46) { /* exponents 1069 to 1023 */
|
|
return (mzero_masks[idx] & mant) == 0;
|
|
} else if (idx == 1069) { /* exponent 0 */
|
|
return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0);
|
|
} else if (idx == -1) { /* exponent 1070 */
|
|
return (d < 0) && ((d & 0x000fffffffffffffULL) == 0);
|
|
} else {
|
|
return 0;
|
|
}
|
|
};
|
|
|
|
The memory cost of the mask table is 8x47 = 376 bytes. This can be halved
|
|
e.g. by using a table of 32-bit values with separate cases for nzero >= 32
|
|
and nzero < 32.
|
|
|
|
Unfortunately the expected case (exponents 1023 to 1069) involves a mask
|
|
check with a variable mask, so it may be unsuitable for direct inlining in
|
|
the most important hot spots.
|
|
|
|
C algorithm with a computed mask
|
|
--------------------------------
|
|
|
|
Since this algorithm only runs outside the proper fastint "fast path" it
|
|
may be more sensible to avoid a memory tradeoff and compute the masks::
|
|
|
|
int is_fastint(duk_int64_t d) {
|
|
int exp = (d >> 52) & 0x07ff;
|
|
int shift = exp - 1023;
|
|
|
|
if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */
|
|
return ((0x000fffffffffffffULL >> shift) & mant) == 0;
|
|
} else if (shift == -1023) { /* exponent 0 */
|
|
/* return (d >= 0) && ((d & 0x000fffffffffffffULL) == 0); */
|
|
return (d == 0);
|
|
} else if (shift == 47) { /* exponent 1070 */
|
|
return (d < 0) && ((d & 0x000fffffffffffffULL) == 0);
|
|
} else {
|
|
return 0;
|
|
}
|
|
};
|
|
|
|
C algorithm with a computed mask, unsigned
|
|
------------------------------------------
|
|
|
|
Using an unsigned 64-bit integer for the input::
|
|
|
|
int is_fastint(duk_uint64_t d) {
|
|
int exp = (d >> 52) & 0x07ff;
|
|
int shift = exp - 1023;
|
|
|
|
if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */
|
|
return ((0x000fffffffffffffULL >> shift) & mant) == 0;
|
|
} else if (shift == -1023) { /* exponent 0 */
|
|
/* return ((d & 0x800fffffffffffffULL) == 0); */
|
|
return (d == 0);
|
|
} else if (shift == 47) { /* exponent 1070 */
|
|
return ((d & 0x800fffffffffffffULL) == 0x8000000000000000ULL);
|
|
} else {
|
|
return 0;
|
|
}
|
|
};
|
|
|
|
C algorithm with 32-bit operations and a computed mask
|
|
------------------------------------------------------
|
|
|
|
For middle endian machines (ARM) this algorithm first needs swapping
|
|
of the 32-bit parts. By changing the mask checks to operate on 32-bit
|
|
parts the algorithm would work on more platforms and would also remove
|
|
the need for swapping the parts on middle endian platforms::
|
|
|
|
int is_fastint(duk_uint32_t hi, duk_uint32_t lo) {
|
|
int exp = (hi >> 20) & 0x07ff;
|
|
int shift = exp - 1023;
|
|
|
|
if (shift >= 0 && shift <= 46) { /* exponents 1023 to 1069 */
|
|
if (shift <= 20) {
|
|
/* 0x000fffff'ffffffff -> 0x00000000'ffffffff */
|
|
return (((0x000fffffUL >> shift) & hi) == 0) && (lo == 0);
|
|
} else {
|
|
/* 0x00000000'ffffffff -> 0x00000000'0000003f */
|
|
return (((0xffffffffUL >> (shift - 20)) & lo) == 0);
|
|
}
|
|
} else if (shift == -1023) { /* exponent 0 */
|
|
/* return ((hi & 0x800fffffUL) == 0x00000000UL) && (lo == 0); */
|
|
return (hi == 0) && (lo == 0);
|
|
} else if (shift == 47) { /* exponent 1070 */
|
|
return ((hi & 0x800fffffUL) == 0x80000000UL) && (lo == 0);
|
|
} else {
|
|
return 0;
|
|
}
|
|
};
|
|
|
|
Performance notes
|
|
-----------------
|
|
|
|
Coercing a double to an int64_t seems to be very slow on some platforms, so it
|
|
may be faster to get the fastint out of the IEEE double value with custom C
|
|
code. The code doesn't need to handle denormals, NaNs, etc, so it can be much
|
|
simpler than a full coercion routine.
|
|
|
|
There's a standard trick which is based on adding a double constant that
|
|
forces the mantissa to be shifted so that the integer value can be directly
|
|
extracted. See e.g.:
|
|
|
|
* http://stackoverflow.com/questions/17035464/a-fast-method-to-round-a-double-to-a-32-bit-int-explained
|
|
|
|
A similar trick is used in the number-to-double upgrade, see below.
|
|
|
|
Efficient check for number-to-double upgrade
|
|
============================================
|
|
|
|
Slow path code often needs to handle a number which may be either a fastint or
|
|
a double. The code needs to read the value efficiently as a double. To
|
|
minimize the slow path penalty, this check and conversion from a fastint to
|
|
a double (if necessary) needs to be fast.
|
|
|
|
The algorithm has two parts: (1) detecting that the value is a fastint, and
|
|
(2) converting a fastint into a double if necessary.
|
|
|
|
Checking for a fastint
|
|
----------------------
|
|
|
|
Checking for a fastint is easy:
|
|
|
|
* For packed duk_tval: if 16 highest bits are 0xfff1 (DUK_TAG_FASTINT) the
|
|
value is a fastint.
|
|
|
|
* For unpacked duk_tval: compare tag value similarly.
|
|
|
|
Trivial fastint-to-double conversion
|
|
------------------------------------
|
|
|
|
Converting a fastint into a double could be done by:
|
|
|
|
1. Sign extending the 48-bit value into a signed 64-bit value; the sign
|
|
extension can be achieved by two shifts.
|
|
|
|
2. Coercing the 64-bit value to a double.
|
|
|
|
Example::
|
|
|
|
duk_int64_t tmp = du.ull[DUK_DBL_IDX_ULL0];
|
|
tmp = (tmp << 16) >> 16; /* sign extend */
|
|
return (duk_double_t) tmp;
|
|
|
|
Unfortunately this is very slow, at least on some soft float platforms
|
|
where this was tested on.
|
|
|
|
Alternate fastint-to-double conversion
|
|
--------------------------------------
|
|
|
|
Because the input number range is 48-bit signed (and zero) the conversion can
|
|
be optimized a great deal. Let's first consider a positive value [1,2^47-1]:
|
|
|
|
* Construct an IEEE double with:
|
|
|
|
- Sign = 0
|
|
|
|
- Exponent field = 1023 + 52 = 1075
|
|
|
|
- Mantissa = the 52-bit fastint value aligned to the right of the field,
|
|
i.e. padded with zero bits on the left
|
|
|
|
* Because of the implicit leading 1-bit, the value represented is 2^52 +
|
|
fastint_value. Floating point subtract 2^52 to yield the final result.
|
|
|
|
The C code for this could be something like::
|
|
|
|
/* For fastint value [1,2^47-1]. */
|
|
du.ull[DUK_DBL_IDX_ULL0] = (duk_uint64_t) fastint_value |
|
|
(duk_uint64_t) 0x4330000000000000ULL;
|
|
du.d = du.d - 4503599627370496.0; /* 1<<52 */
|
|
return du.d;
|
|
|
|
Negative values need similar handling but the double sign bit needs to be set.
|
|
It's good to avoid sign extending the 48-bit value::
|
|
|
|
/* For fastint value [-2^47,-1]. */
|
|
du.ull[DUK_DBL_IDX_ULL0] = ((duk_uint64_t) (-fastint_value) &
|
|
(duk_uint64_t) 0x000fffffffffffffULL) |
|
|
(duk_uint64_t) 0xc330000000000000ULL;
|
|
du.d = du.d + 4503599627370496.0; /* 1<<52 */
|
|
return du.d;
|
|
|
|
Zero fastint is simply represented as an IEEE double with all bits zero, which
|
|
unfortunately needs a separate case.
|
|
|
|
In the concrete implementation the fastint_value might include the fastint
|
|
duk_tval tag and be masked out also for the positive number case.
|
|
|
|
Future work
|
|
===========
|
|
|
|
Fastint on platforms with no 64-bit integer type
|
|
------------------------------------------------
|
|
|
|
Currently fastint support can only be used if the platform/compiler has
|
|
support for a 64-bit integer type. This limitation could be removed by
|
|
implementing alternative fastint fast paths which only relied on 32-bit
|
|
arithmetic.
|
|
|
|
32-bit fastint
|
|
--------------
|
|
|
|
It might be worth investigating if a signed or unsigned 32-bit fastint
|
|
(instead of a signed 48-bit fastint) would be more useful. Fast path
|
|
arithmetic would certainly be faster.
|
|
|
|
The downside would be that some bit operations won't be possible: to
|
|
fully support all bit operations both signed and unsigned 32-bit values
|
|
are needed.
|
|
|
|
Optimize upgrade and downgrade
|
|
------------------------------
|
|
|
|
These operations are very important for performance so perhaps inline
|
|
assembler optimization would be useful for specific platforms, e.g. ARM.
|
|
|
|
The current C algorithms can also be optimized further.
|
|
|