Switch to using WTF-8 for duk_hstring string representation. The main
differences to previous extended CESU-8/UTF-8 are: (1) valid surrogate
pairs are automatically combined to UTF-8 on string intern while invalid
surrogate characters are encoded in CESU-8, and (2) ECMAScript code always
sees surrogate pairs for non-BMP characters.
Together, these make it more natural to work with non-BMP strings for both
ECMAScript (which no longer sees extended codepoints as before) and native
code (which now sees valid UTF-8 for non-BMP whenever possible).
Internally the main change is in string interning which now always sanitizes
input strings (but no Symbols) to WTF-8. Also all call sites where the byte
representation of strings are dealt with need fixing. WTF-8 leads to some
challenges because it's no longer possible to e.g. find a substring with a
naive byte compare: surrogate characters may either appear directly (CESU-8)
or baked into a non-BMP UTF-8 byte sequence.
The main places where this needs complex handling include:
* charCodeAt / codePointAt
* Extracting a substring
* String .replace()
* String .startsWith() and .endsWith()
* String .split() and search functions (like .indexOf())
* RegExp matching
* String cache behavior
This commit fixes all the necessary sites with minimal baseline implementations
which are in some cases much slower than the previous CESU-8 ones. Further work
is needed to optimize the WTF-8 variants to perform close to CESU-8.
* Fix Proxy behavior for Array.isArray() and duk_is_array(), when
argument is a proxied Array.
* Fix JSON serialization of proxied Arrays.
* Fix Object.prototype.toString() behavior for proxied Arrays.
* Rename hobject helper functions for property entry access.
* Add helpers for looking up _Formals and _Varmap without inheritance.
* Add duk_xget_owndataprop() + variants for internal property lookups
without inheritance or side effects.
* Date built-in fix for .setTime(), .setYear(), etc on a frozen Date
instance. These should be allowed because they operate on an internal
slot which is not under normal property access control. Previously
the operations were incorrectly rejected with a TypeError.
* Changes to internal property definition and lookup here and there.
Mostly the changes should be no-op because e.g. inheritance only
matters if the internal property might be missing (which is not
usually the case).
While an application should never use the DUK_INTERNAL_SYMBOL() macro
to write internal symbols, reading Duktape internal symbols (even with
no versioning guarantees) is sometimes useful.
* Add internal helper for OrdinaryHasInstance(), needed by
Function.prototype[@@hasInstance].
* Add internal helper duk_get_method_stridx() which implements the ES2015
GetMethod() specification function.
* Add internal helper duk_to_boolean_top_pop() which occurs in a few
places and shaves some footprint.
* Extend duk_instanceof() and the internal duk_js_instanceof() helper
to include a target @@hasInstance check.
* Add Symbol.hasInstance and Function.prototype[@@hasInstance].
* Add support for non-default attributes for function properties,
needed by Function.prototype[@@hasInstance]. This also fixes
the incorrect attributes of performance.now().
Both duk_hthread and duk_context typedefs resolve to struct duk_hthread
internally. In external API duk_context resolves to struct duk_hthread
which is intentionally left undefined as the struct itself is not
dereferenced. Change internal code to use duk_hthread exclusively which
removes unnecessary and awkward thr <-> ctx casts from internals.
The basic guidelines are:
* Public API uses duk_context in prototype declarations. The intent is to
hide the internal type, and there's already a wide dependency on the
type name.
* All internal code, both declarations and definitions, use duk_hthread
exclusively. This is done even for API functions, i.e. an API function
declared as "void duk_foo(duk_context *ctx);" is then defined as
"void duk_foo(duk_hthread *thr);".
Remove the special ecma-to-ecma call setup code and just use the normal
unprotected call setup code for that instead. Most of the code is the
same; just before calling into the bytecode executor check if the current
executor can be reused, and if so, indicate the situation using a special
return code.
Also remove internal duk_handle_call_protected() and implement all
protected API calls via duk_safe_call(). This reduces footprint and code
duplication further.
Rework call handling to use helpers more to make the call handling code
easier to follow.
Various other minor changer, e.g. DUK_OP_NEW is now DUK_OP_CONSCALL and
bytecode sets up the initial default instance.
* Wrap checks to duk_require_stack() and variants.
* Wrap check to value stack grow.
* Add internal helper duk_set_top_and_wipe(); for now it's just two
duk_set_top() calls but can be optimized later.
* Make value stack and call stack limits configurable via DUK_USE_xxx
options. Also make value stack grow/shrink constants configurable.
* Rewrite value stack grow/shrink check primitives for better hot/cold path
handling.
* Use a proportional spare for grow and shrink sizes so that applications
needing a large value stack have fewer value stack resizes.
* Grow value stack allocation when entering a call or when explicitly requested
via e.g. duk_require_stack().
* Never shrink the value stack when entering a call, so that the unwind path
is guaranteed to have value stack to handle a protected call return. This
guarantee is only needed for protected call but is now applied to all calls
for simplicity.
* Don't perform a value stack shrink check at all in function return anymore.
It would be OK from protected call semantics perspective to do a shrink
attempt without throwing if it fails.
* Perform a value stack shrink check in mark-and-sweep only for now. When
emergency GC is running, shrink to a minimal size respecting current value
stack reserve.
Popping without DECREF allows "stealing" of refcounts when temporaries are
first placed on the value stack (useful e.g. in property table realloc when
array part is abandoned).
* Change plain buffers to inherit from Uint8Array. This affects a lot of
small things like Object.prototype.toString() output, enumeration of plain
buffers, etc. It also changes JSON serialization for plain buffers because
the index properties are enumerable as with Uint8Array instances.
* Disable JSON stringify fastpath for plain buffers for now so that the
virtual index properties get serialized correctly.
* Remove ArrayBuffer non-standard virtual properties.
* Remove DataView non-standard virtual properties.
* Move .byteLength, .byteOffset, .BYTES_PER_ELEMENT, and .buffer into
inherited getters as required in ES6. However, the .length property
remains a virtual own property for now (it too is an inherited getter
in ES6).
* Move ArrayBuffer.allocPlain() and ArrayBuffer.plainOf() to
Uint8Array.allocPlain() and Uint8Array.plainOf() to match the
semantics change for plain buffers.
* Fix Node.js buffer .slice() behavior, the returned Node.js buffer
would have ArrayBuffer.isView == 0 which doesn't match the revised
Node.js behavior (Buffers being Uint8Array instances)
* Reject ArrayBuffers with a view offset/length in Node.js Buffer .slice()
rather than accept such ArrayBuffers without actually respecting the
view offset/length.
* Allow a plain buffer or a lightfunc as a constructor "replacement object"
return value.
* Strings with 0xFF byte prefix are considered special symbols: they have
typeof "symbol" but still mostly behave as strings (e.g. allow ToString)
so that existing code dealing with internal keys, especially inside
Duktape, can work with fewer changes.
* Strings with 0x80 byte prefix are global symbols, e.g. Symbol.for('foo')
creates the byte representatio: 0x80 "foo"
* Strings with 0x81 byte prefix are unique symbols; the 0x81 byte is followed
by the Symbol description, and an internal string component ensuring
uniqueness is separated by a 0xFF byte (which can never appear anywhere in
an extended UTF-8 string). The unique suffix is up to Duktape internals,
currently two 32-bit counters are used. For example:
0x81 "mySymbol" 0xFF "0-17".
* Well-known symbols use the 0x81 prefix but lack a unique suffix, so their
format is 0x81 <description> 0xFF.
* ES6 distinguishes between an undefined symbol description and an empty
string symbol description. This distinction is not currently visible via
Ecmascript bindings but may be visible in the future. Append an extra
0xFF to the unique suffix when the description is undefined, i.e.
0x81 0xFF <unique suffix> 0xFF.
* Anonymous functions don't have an own '.name' property but inherit
an empty string .name from Function.prototype.
* new Function() returns a function whose name is 'anonymous'.
Shifting signed values is implementation defined behavior so use unsigned
shifts and casts to sign extend (e.g. (duk_idx_t) (duk_int8_t) (x >> 24)).
Also avoid signed shift for lightfunc magic macro.
Rename and reuse a previously internal duk_push_object_internal() which just
pushes a bare object (= object without an internal prototype) which is useful
for various dict / tracking map purposes.