diff --git a/doc/symbols.rst b/doc/symbols.rst new file mode 100644 index 00000000..d3c9551f --- /dev/null +++ b/doc/symbols.rst @@ -0,0 +1,230 @@ +========================== +ES6 Symbols in Duktape 2.x +========================== + +Overview +======== + +Duktape 2.x adds ES6 Symbol support. Duktape 1.x internal keys are unified +with the Symbol concept, and are considered a custom "hidden symbol" type +which is not normally visible to Ecmascript code. C code can access hidden +symbols, however. + +The internal implementation is similar to existing internal keys. Symbols +are represented as ``duk_hstring`` heap objects, with the string data +containing a byte prefix which is invalid (extended) UTF-8 so that it can +never occur for normal Ecmascript strings, or even strings with non-BMP +codepoints. Object coerced strings have a special object class and the +underlying symbol is stored in ``_Value`` similarly to e.g. Number object. + +Representation basics: + +* Symbols have an external type ``DUK_TYPE_STRING``. + +* Symbols have internal type tag ``DUK_TAG_STRING``. + +* Symbols can be distinguished internally from ordinary strings by looking + up the ``DUK_HSTRING_FLAG_SYMBOL`` flag. Hidden symbols also have the + ``DUK_HSTRING_FLAG_HIDDEN`` set. + +Behavior basics: + +* Symbols are visible to Ecmascript code as required by ES6 and later. + Hidden symbols are not visible through e.g. + ``Object.getOwnPropertySymbols()``. They can only be accessed if a + reference to the hidden symbol string is somehow available, e.g. via a + C binding. + +* Symbols are visible to the public C API as strings: ``duk_is_string()`` + is true, ``duk_get_string()`` returns a pointer to the symbol internal + string representation, etc. C code can create symbols simply by pushing + C strings with a specific format, see below. + +* While symbols are strings in the C API, coercion semantics are based on + the Ecmascript behavior. For example, ``duk_to_string()`` applied to a + symbol throws a ``TypeError``. + +See: + +* https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Symbol + +* http://www.2ality.com/2014/12/es6-symbols.html + +Internal key formats +==================== + +Duktape custom hidden Symbols have an initial 0xFF byte prefix, which matches +the existing convention for Duktape 1.x internal keys. While all bytes in the +range [0xC0,0xFE] are valid initial bytes for Duktape's extended UTF-8 flavor, +the continuation bytes [0x80,0xBF] are never a valid first byte so they are used +for ES6 symbols (and reserved for other future uses) in Duktape 2.x. + ++-----------------------------------------------+-----------------------------------------------------------------+ +| Internal string format | Description | ++-----------------------------------------------+-----------------------------------------------------------------+ +| SomeUpperCaseValue | Hidden symbol (Duktape specific) used by Duktape internals. | +| | Previously called internal properties. First byte is 0xFF, | +| | second is from [A-Z]. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| anyOtherValue | Hidden symbol (Duktape specific) used by application code. | +| | First byte is 0xFF, second is ASCII (0x00-0x7f) but not | +| | from [A-Z]. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| anyOtherValue | Hidden symbol (Duktape specific) used by application code. | +| | First and second bytes are 0xFF, remaining bytes arbitrary. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <80> symbolDescription | Global symbol with description 'symbolDescription' created | +| | using Symbol.for(). | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <81> symbolDescription uniqueSuffix | Local symbol with description 'symbolDescription'. Trailing | +| | unique string makes the symbol unique. The unique suffix is | +| | opaque and chosen arbitrarily by Duktape. It's unique within a | +| | Duktape heap (across all global environments). | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <81> uniqueSuffix | Local symbol with an empty description. Unique suffix makes | +| | each such symbol unique. The unique suffix is arbitrary but | +| | must not contain the 0xFF byte. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <81> uniqueSuffix | Local symbol with undefined description. ES6 differentiates | +| | internally between symbols with an empty string description vs. | +| | symbols with an undefined description. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <81> symbolDescription | Well known symbol with description 'symbolDescription'. Well | +| | known symbols (like Symbol.iterator) are local symbols which | +| | are still shared across "code realms". Any fixed suffix never | +| | colliding with runtime generated unique local symbols works, | +| | currently an empty suffix is used. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <82 to bf> | Initial bytes 0x82 to 0xBF are reserved for future use. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <00 to 7f> | Valid ASCII initial byte. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| | Valid standard UTF-8 (or CESU-8) initial byte. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| | Valid extended UTF-8 initial byte. | ++-----------------------------------------------+-----------------------------------------------------------------+ + +Useful comparisons (``p`` is pointer to string data) for internal use only: + +* ``p[0] == 0xff || (p[0] & 0xc0) == 0x80``: some kind of Symbol, either Duktape + hidden Symbol or an ES6 Symbol. + +* ``p[0] == 0xff``: hidden symbol, user or Duktape + +* ``(p[0] & 0xc0) == 0x80``: ES6 Symbol, visible to Ecmascript code + +Global symbols +============== + +Global symbols are the same across separate global environments and even across +Duktape heaps. ES6 Section 19.4.2.1: + + The GlobalSymbolRegistry is a List that is globally available. + It is shared by all Code Realms. + +and ES6 Section 8.2: + + Before it is evaluated, all ECMAScript code must be associated with a Realm. + Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript + global environment, all of the ECMAScript code that is loaded within the + scope of that global environment, and other associated state and resources. + +The current approach satisfies these simply by making a globally registered +Symbol have a fixed format so that if a Symbol with the same description is +created in another Duktape thread (or even Duktape heap), its internal +representation will be identical. No explicit registry is maintained. + +Well-known symbols +================== + +Well-known symbols (such as ``Symbol.iterator``) are distinct from any local or +global symbols. ES6 Section 6.1.5.1: + + Well-known symbols are built-in Symbol values that are explicitly referenced + by algorithms of this specification. They are typically used as the keys of + properties whose values serve as extension points of a specification algorithm. + Unless otherwise specified, well-known symbols values are shared by all Code + Realms (8.2). + +The fixed representation described above ensures that well-known symbols are +the same across all code realms (and even across Duktape heaps). The internal +representation is essentially the same as for a unique local symbol, but the +suffix that makes local symbols unique is missing. Thus, they behave like +local symbols other than having a fixed representation. + +Unifying with Duktape internal keys +=================================== + +Necessary changes to add symbol behavior: + +* Strings with initial byte 0x80, 0x81, or 0xFF are flagged as symbols + (``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF, also the + hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set. + +* ``typeof(sym)`` should return "symbol" rather than string. This is done + for Duktape hidden symbols too. + +* ``ToString(sym)`` must be rejected for a symbol, while ``String(sym)`` + must specifically check for symbols. Coercion needs to strip possible + "unique suffix" when coming up with the Symbol description. + +* Symbols should be safe from accidental enumeration, JSON serialization, etc. + This is actually already the case because internal keys are already excluded + in Duktape 1.x. + +* ``Object.getOwnPropertySymbols(``) should return a list of symbol properties + for an object, but filter out Duktape hidden symbols. + +* ``Object(sym)`` should create an object with internal class "Symbol", + with the plain symbol value stored behind ``_Value`` (hidden symbol + property) as for Number objects, etc. + +* Non-strict comparison needs to handle symbols. ToPrimitive() coercion + is maybe enough to ensure ``sym == Object(sym)`` is accepted. + +* Property code needs to accept plain Symbols as is (treated like any other + strings), and Symbol objects should look up their internal string value + (instead of being coerced to e.g. ``Symbol(symbolDescription)``. Current + code just uses ``ToString()`` which causes a TypeError. + +* Dozens of similar semantics checks throughout the code base. + +Some design questions +===================== + +How should C code see Symbols? +------------------------------ + +Easiest approach: + +* Symbols are not enumerated by duk_enum() unless requested. Either fold in with + internal keys, add a separate flags. Maybe rename existing internal keys + flag. + +* Property operations work with symbols and internal keys without distinction. + +* API call to create a symbol from C code. Hides the construction of the internal + string. + +Best naming for Duktape internal keys +------------------------------------- + +With https://github.com/svaarala/duktape/pull/979 Duktape internal properties +would become unreachable from Ecmascript code, even if you construct the +internal string using a buffer and then try to use it as an object key. +This offers more protection for sandboxing than ES6 Symbols which can be +enumerated. + +Current naming for Duktape 1.x internal keys is "hidden symbols". Some +alternatives considered: + +* Internal symbol: easy to confuse with specification symbols for example. + One benefit would be that as a term close to "internal property". + +* Hidden symbol: conveys semantics (assuming GH-797) pretty well. + +* Private symbol + +* Native symbol + +* Invisible symbol