mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
231 lines
12 KiB
231 lines
12 KiB
8 years ago
|
==========================
|
||
|
ES6 Symbols in Duktape 2.x
|
||
|
==========================
|
||
|
|
||
|
Overview
|
||
|
========
|
||
|
|
||
|
Duktape 2.x adds ES6 Symbol support. Duktape 1.x internal keys are unified
|
||
|
with the Symbol concept, and are considered a custom "hidden symbol" type
|
||
|
which is not normally visible to Ecmascript code. C code can access hidden
|
||
|
symbols, however.
|
||
|
|
||
|
The internal implementation is similar to existing internal keys. Symbols
|
||
|
are represented as ``duk_hstring`` heap objects, with the string data
|
||
|
containing a byte prefix which is invalid (extended) UTF-8 so that it can
|
||
|
never occur for normal Ecmascript strings, or even strings with non-BMP
|
||
|
codepoints. Object coerced strings have a special object class and the
|
||
|
underlying symbol is stored in ``_Value`` similarly to e.g. Number object.
|
||
|
|
||
|
Representation basics:
|
||
|
|
||
|
* Symbols have an external type ``DUK_TYPE_STRING``.
|
||
|
|
||
|
* Symbols have internal type tag ``DUK_TAG_STRING``.
|
||
|
|
||
|
* Symbols can be distinguished internally from ordinary strings by looking
|
||
|
up the ``DUK_HSTRING_FLAG_SYMBOL`` flag. Hidden symbols also have the
|
||
|
``DUK_HSTRING_FLAG_HIDDEN`` set.
|
||
|
|
||
|
Behavior basics:
|
||
|
|
||
|
* Symbols are visible to Ecmascript code as required by ES6 and later.
|
||
|
Hidden symbols are not visible through e.g.
|
||
|
``Object.getOwnPropertySymbols()``. They can only be accessed if a
|
||
|
reference to the hidden symbol string is somehow available, e.g. via a
|
||
|
C binding.
|
||
|
|
||
|
* Symbols are visible to the public C API as strings: ``duk_is_string()``
|
||
|
is true, ``duk_get_string()`` returns a pointer to the symbol internal
|
||
|
string representation, etc. C code can create symbols simply by pushing
|
||
|
C strings with a specific format, see below.
|
||
|
|
||
|
* While symbols are strings in the C API, coercion semantics are based on
|
||
|
the Ecmascript behavior. For example, ``duk_to_string()`` applied to a
|
||
|
symbol throws a ``TypeError``.
|
||
|
|
||
|
See:
|
||
|
|
||
|
* https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Symbol
|
||
|
|
||
|
* http://www.2ality.com/2014/12/es6-symbols.html
|
||
|
|
||
|
Internal key formats
|
||
|
====================
|
||
|
|
||
|
Duktape custom hidden Symbols have an initial 0xFF byte prefix, which matches
|
||
|
the existing convention for Duktape 1.x internal keys. While all bytes in the
|
||
|
range [0xC0,0xFE] are valid initial bytes for Duktape's extended UTF-8 flavor,
|
||
|
the continuation bytes [0x80,0xBF] are never a valid first byte so they are used
|
||
|
for ES6 symbols (and reserved for other future uses) in Duktape 2.x.
|
||
|
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| Internal string format | Description |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <ff> SomeUpperCaseValue | Hidden symbol (Duktape specific) used by Duktape internals. |
|
||
|
| | Previously called internal properties. First byte is 0xFF, |
|
||
|
| | second is from [A-Z]. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <ff> anyOtherValue | Hidden symbol (Duktape specific) used by application code. |
|
||
|
| | First byte is 0xFF, second is ASCII (0x00-0x7f) but not |
|
||
|
| | from [A-Z]. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <ff> <ff> anyOtherValue | Hidden symbol (Duktape specific) used by application code. |
|
||
|
| | First and second bytes are 0xFF, remaining bytes arbitrary. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <80> symbolDescription | Global symbol with description 'symbolDescription' created |
|
||
|
| | using Symbol.for(). |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <81> symbolDescription <ff> uniqueSuffix | Local symbol with description 'symbolDescription'. Trailing |
|
||
|
| | unique string makes the symbol unique. The unique suffix is |
|
||
|
| | opaque and chosen arbitrarily by Duktape. It's unique within a |
|
||
|
| | Duktape heap (across all global environments). |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <81> <ff> uniqueSuffix | Local symbol with an empty description. Unique suffix makes |
|
||
|
| | each such symbol unique. The unique suffix is arbitrary but |
|
||
|
| | must not contain the 0xFF byte. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <81> <ff> uniqueSuffix <ff> | Local symbol with undefined description. ES6 differentiates |
|
||
|
| | internally between symbols with an empty string description vs. |
|
||
|
| | symbols with an undefined description. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <81> symbolDescription <ff> | Well known symbol with description 'symbolDescription'. Well |
|
||
|
| | known symbols (like Symbol.iterator) are local symbols which |
|
||
|
| | are still shared across "code realms". Any fixed suffix never |
|
||
|
| | colliding with runtime generated unique local symbols works, |
|
||
|
| | currently an empty suffix is used. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <82 to bf> | Initial bytes 0x82 to 0xBF are reserved for future use. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <00 to 7f> | Valid ASCII initial byte. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <c0 to f7> | Valid standard UTF-8 (or CESU-8) initial byte. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
| <f8 to fe> | Valid extended UTF-8 initial byte. |
|
||
|
+-----------------------------------------------+-----------------------------------------------------------------+
|
||
|
|
||
|
Useful comparisons (``p`` is pointer to string data) for internal use only:
|
||
|
|
||
|
* ``p[0] == 0xff || (p[0] & 0xc0) == 0x80``: some kind of Symbol, either Duktape
|
||
|
hidden Symbol or an ES6 Symbol.
|
||
|
|
||
|
* ``p[0] == 0xff``: hidden symbol, user or Duktape
|
||
|
|
||
|
* ``(p[0] & 0xc0) == 0x80``: ES6 Symbol, visible to Ecmascript code
|
||
|
|
||
|
Global symbols
|
||
|
==============
|
||
|
|
||
|
Global symbols are the same across separate global environments and even across
|
||
|
Duktape heaps. ES6 Section 19.4.2.1:
|
||
|
|
||
|
The GlobalSymbolRegistry is a List that is globally available.
|
||
|
It is shared by all Code Realms.
|
||
|
|
||
|
and ES6 Section 8.2:
|
||
|
|
||
|
Before it is evaluated, all ECMAScript code must be associated with a Realm.
|
||
|
Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript
|
||
|
global environment, all of the ECMAScript code that is loaded within the
|
||
|
scope of that global environment, and other associated state and resources.
|
||
|
|
||
|
The current approach satisfies these simply by making a globally registered
|
||
|
Symbol have a fixed format so that if a Symbol with the same description is
|
||
|
created in another Duktape thread (or even Duktape heap), its internal
|
||
|
representation will be identical. No explicit registry is maintained.
|
||
|
|
||
|
Well-known symbols
|
||
|
==================
|
||
|
|
||
|
Well-known symbols (such as ``Symbol.iterator``) are distinct from any local or
|
||
|
global symbols. ES6 Section 6.1.5.1:
|
||
|
|
||
|
Well-known symbols are built-in Symbol values that are explicitly referenced
|
||
|
by algorithms of this specification. They are typically used as the keys of
|
||
|
properties whose values serve as extension points of a specification algorithm.
|
||
|
Unless otherwise specified, well-known symbols values are shared by all Code
|
||
|
Realms (8.2).
|
||
|
|
||
|
The fixed representation described above ensures that well-known symbols are
|
||
|
the same across all code realms (and even across Duktape heaps). The internal
|
||
|
representation is essentially the same as for a unique local symbol, but the
|
||
|
suffix that makes local symbols unique is missing. Thus, they behave like
|
||
|
local symbols other than having a fixed representation.
|
||
|
|
||
|
Unifying with Duktape internal keys
|
||
|
===================================
|
||
|
|
||
|
Necessary changes to add symbol behavior:
|
||
|
|
||
|
* Strings with initial byte 0x80, 0x81, or 0xFF are flagged as symbols
|
||
|
(``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF, also the
|
||
|
hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set.
|
||
|
|
||
|
* ``typeof(sym)`` should return "symbol" rather than string. This is done
|
||
|
for Duktape hidden symbols too.
|
||
|
|
||
|
* ``ToString(sym)`` must be rejected for a symbol, while ``String(sym)``
|
||
|
must specifically check for symbols. Coercion needs to strip possible
|
||
|
"unique suffix" when coming up with the Symbol description.
|
||
|
|
||
|
* Symbols should be safe from accidental enumeration, JSON serialization, etc.
|
||
|
This is actually already the case because internal keys are already excluded
|
||
|
in Duktape 1.x.
|
||
|
|
||
|
* ``Object.getOwnPropertySymbols(``) should return a list of symbol properties
|
||
|
for an object, but filter out Duktape hidden symbols.
|
||
|
|
||
|
* ``Object(sym)`` should create an object with internal class "Symbol",
|
||
|
with the plain symbol value stored behind ``_Value`` (hidden symbol
|
||
|
property) as for Number objects, etc.
|
||
|
|
||
|
* Non-strict comparison needs to handle symbols. ToPrimitive() coercion
|
||
|
is maybe enough to ensure ``sym == Object(sym)`` is accepted.
|
||
|
|
||
|
* Property code needs to accept plain Symbols as is (treated like any other
|
||
|
strings), and Symbol objects should look up their internal string value
|
||
|
(instead of being coerced to e.g. ``Symbol(symbolDescription)``. Current
|
||
|
code just uses ``ToString()`` which causes a TypeError.
|
||
|
|
||
|
* Dozens of similar semantics checks throughout the code base.
|
||
|
|
||
|
Some design questions
|
||
|
=====================
|
||
|
|
||
|
How should C code see Symbols?
|
||
|
------------------------------
|
||
|
|
||
|
Easiest approach:
|
||
|
|
||
|
* Symbols are not enumerated by duk_enum() unless requested. Either fold in with
|
||
|
internal keys, add a separate flags. Maybe rename existing internal keys
|
||
|
flag.
|
||
|
|
||
|
* Property operations work with symbols and internal keys without distinction.
|
||
|
|
||
|
* API call to create a symbol from C code. Hides the construction of the internal
|
||
|
string.
|
||
|
|
||
|
Best naming for Duktape internal keys
|
||
|
-------------------------------------
|
||
|
|
||
|
With https://github.com/svaarala/duktape/pull/979 Duktape internal properties
|
||
|
would become unreachable from Ecmascript code, even if you construct the
|
||
|
internal string using a buffer and then try to use it as an object key.
|
||
|
This offers more protection for sandboxing than ES6 Symbols which can be
|
||
|
enumerated.
|
||
|
|
||
|
Current naming for Duktape 1.x internal keys is "hidden symbols". Some
|
||
|
alternatives considered:
|
||
|
|
||
|
* Internal symbol: easy to confuse with specification symbols for example.
|
||
|
One benefit would be that as a term close to "internal property".
|
||
|
|
||
|
* Hidden symbol: conveys semantics (assuming GH-797) pretty well.
|
||
|
|
||
|
* Private symbol
|
||
|
|
||
|
* Native symbol
|
||
|
|
||
|
* Invisible symbol
|