Figuring out an effective 'this' value happens for every call. The previous
implementation for non-strict functions used stack API calls. The reworked
implementation accesses the value stack directly which matters for this
performance critical code path.
Reorder tags to accommodate a separate 'unused' tag so that 'undefined' can
become a single tag write (instead of tag + value like booleans). This is
good because 'undefined' values are involved in e.g. value stack resizes and
are performance relevant.
Also reorder tags so that "is heap allocated" check can be a single bit test
instead of a comparison when using non-packed duk_tval. This makes every
DECREF potentially faster because an "is heap allocated" test appears in
every DECREF.
Because "unused" is not intended to appear anywhere in actual use (e.g. as
a value stack value, as a property value, etc), "unused" values will fall
into the default clause of DUK_TAG_xxx switch case statements. Add an assert
to every such default clause that the value is not intended to be "unused".
Remove duk_push_unused() as it should no longer be used. It was only used
by the debugger protocol; refuse an inbound "unused" value in the debugger.
This is not breaking compatibility because there was no legitimate usage for
the debug client sending requests with "unused" values.
'Leave as undefined' seems to be the best overall value stack initialization
policy. While 'leave as garbage' is marginally better in a few cases (mostly
when refcounting is disabled) it's probably not worth keeping two policies
around.
Change the current value stack policy from "unused above top" to either
"undefined above top" or "garbage above top", depending on a config
option. Change mark-and-sweep and debug print code to only process
entries between [0,top[ in either case.
Both policies have potential upsides and downsides based on performance
measurement. This commit provides both policies; "undefined above top"
will probably be the only policy left however, because "garbage above
top" is only better in a few cases and mostly without refcounts.
Other minor improvements:
- Rework index validation to use duk_uidx_t and more shared code.
- Add cached thr->valstack_size value
These replace the much repeated idiom:
duk_tval tv_tmp;
DUK_TVAL_SET_TVAL(&tv_tmp, tv_dst);
DUK_TVAL_SET_TVAL(tv_dst, tv_src);
DUK_TVAL_INCREF(thr, tv_src);
DUK_TVAL_DECREF(thr, &tv_tmp); /* side effects */
with (e.g.):
DUK_TVAL_SET_TVAL_UPDREF(thr, tv_dst, tv_src);
This reduces line count, and also allows the set-and-update-refcount
sequence to be optimized in detail.
- Add missing 't = 0' initializer to test-assign-addto. Without the
initializer 't' becomes NaN and for some reason is over 20x slower
on x86 as a result of that. If 't' is initialized performance on x86
is fine. (It's worth investigating why falling out of the fastint
fast path is so costly on x86 but not on x64; NaN normalization?)
- Add missing test-assign-reg.pl.
- Add specific addition tests, where numbers involved are (a) fastints,
(b) doubles, (c) NaNs. These tests demonstrate the roughly 10x
difference on x86 for NaNs and other IEEE doubles -- with and without
packed duk_tval.
Outer function has a setjmp catchpoint and handles all longjmps.
Inner function is the plain bytecode executor which is capable of
an internal restart but doesn't handle longjmps.
Also make 'fun' local from executor config option dependent. x64 is
slightly faster without it, x86 is slightly faster with it.