Change function .arguments and .caller behavior to be in line with the
latest specification:
* No own .arguments or .caller property for Function instances. V8 provides
non-strict functions with null .caller and .arguments properties (but they
are not required by the specification).
* Inherit .arguments and .caller thrower from Function.prototype.
Also change %ThrowTypeError%.name to the empty string to match V8.
Fix a realloc() memory leak which would happen when:
- a previous allocation exists ('ptr' is non-NULL); and
- the new 'size' is zero; and
- a voluntary GC (or GC torture) causes the initial realloc() attempt
to be bypassed
In this case the slow path would incorrectly assume that it was entered
after realloc() had returned a NULL, which for a zero new 'size' would
mean that 'ptr' was successfully freed and no further action was necessary.
But because the realloc() had actually been bypassed, this would cause the
old 'ptr' to leak.
Restructure string intern check:
- Compute string hash and perform strtable lookup; if found, it
must already be a valid Symbol or valid WTF-8 data so no WTF-8
sanitization steps are needed. Return found string.
- Otherwise perform a "keepcheck" to see if the candidate string
can be used as is (i.e. it is valid Symbol or valid WTF-8).
If so, we know it's not in the strtable so intern the string.
- Otherwise the string needs WTF-8 sanitization. After sanitizing,
rehash the sanitized data, perform another strtable lookup and
return existing string or intern the sanitized string.
This speeds up string intern processing for (1) strings already in
the string table and (2) valid WTF-8 strings which should be the
vast majority of strings interned. Only strings that are invalid
WTF-8, i.e. contain uncombined surrogate pairs or outright data,
will need sanitization.
Other minor changes:
- Add some WTF-8 documentation to tentative 3.0 release notes.
- Add a 3.0 release entry.
* Remove lazy charlen support. Since we need to WTF-8 sanitize the entire
input string, charlen can be computed while validating (avoiding extra
book-keeping for ASCII eventually).
* Improve WTF-8 search forwards/backwards performance (no substring operations)
when the search string is valid UTF-8. Use reference implementation for
non-UTF-8 still, to be optimized later.
* Minor testcase improvements.
Switch to using WTF-8 for duk_hstring string representation. The main
differences to previous extended CESU-8/UTF-8 are: (1) valid surrogate
pairs are automatically combined to UTF-8 on string intern while invalid
surrogate characters are encoded in CESU-8, and (2) ECMAScript code always
sees surrogate pairs for non-BMP characters.
Together, these make it more natural to work with non-BMP strings for both
ECMAScript (which no longer sees extended codepoints as before) and native
code (which now sees valid UTF-8 for non-BMP whenever possible).
Internally the main change is in string interning which now always sanitizes
input strings (but no Symbols) to WTF-8. Also all call sites where the byte
representation of strings are dealt with need fixing. WTF-8 leads to some
challenges because it's no longer possible to e.g. find a substring with a
naive byte compare: surrogate characters may either appear directly (CESU-8)
or baked into a non-BMP UTF-8 byte sequence.
The main places where this needs complex handling include:
* charCodeAt / codePointAt
* Extracting a substring
* String .replace()
* String .startsWith() and .endsWith()
* String .split() and search functions (like .indexOf())
* RegExp matching
* String cache behavior
This commit fixes all the necessary sites with minimal baseline implementations
which are in some cases much slower than the previous CESU-8 ones. Further work
is needed to optimize the WTF-8 variants to perform close to CESU-8.
Isolate all char-offset-to-byte-offset and character access calls
behind helpers to help prepare for a switch to WTF-8 representation.
This change should have no visible effect yet.
Join surrogate pairs (encoded in CESU-8) in string intern check,
with unoptimized code. This allows working on WTF-8 representation
when the joining is manually enabled. The test code is disabled by
default so should not affect current behavior.
NaN normalization check should use a full NaN check to decide when to
normalize, even when using partial NaN initialization. The fix here
is to switch to full NaN initialization in general.
* Add 'json' to function names for consistency, e.g. duk__dec_xxx() to
duk__json_dec_xxx().
* Remove 'JSON' from recursion limit error messages, it is usually
apparent from the context and can be shared by CBOR encode/decode.
Input 'val' pointer may be a value stack pointer, which may become
stale if the variable lookup reallocates the current value stack.
This can happen e.g. in a with(proxy).
Rename duk_is_null_or_undefined() to duk_is_nullish() to better match
current ECMAScript terminology. Keep duk_is_null_or_undefined() as a
deprecated API macro. Add internal DUK_TVAL_IS_NULLISH().