You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

225 lines
12 KiB

=============================
ES2015 Symbols in Duktape 2.x
=============================
Overview
========
Duktape 2.x adds ES2015 Symbol support. Duktape 1.x internal keys are unified
with the Symbol concept, and are considered a custom "hidden symbol" type
which is not normally visible to ECMAScript code. C code can access hidden
symbols, however.
The internal implementation is similar to existing internal keys. Symbols
are represented as ``duk_hstring`` heap objects, with the string data
containing a byte prefix which is invalid (extended) UTF-8 so that it can
never occur for normal ECMAScript strings, or even strings with non-BMP
codepoints. Object coerced strings have a special object class and the
underlying symbol is stored in ``_Value`` similarly to e.g. Number object.
Representation basics:
* Symbols have an external type ``DUK_TYPE_STRING``.
* Symbols have internal type tag ``DUK_TAG_STRING``.
* Symbols can be distinguished internally from ordinary strings by looking
up the ``DUK_HSTRING_FLAG_SYMBOL`` flag. Hidden symbols also have the
``DUK_HSTRING_FLAG_HIDDEN`` set.
Behavior basics:
* Symbols are visible to ECMAScript code as required by ES2015 and later.
Hidden symbols are not visible through e.g.
``Object.getOwnPropertySymbols()``. They can only be accessed if a
reference to the hidden symbol string is somehow available, e.g. via a
C binding.
* Symbols are visible to the public C API as strings: ``duk_is_string()``
is true, ``duk_get_string()`` returns a pointer to the symbol internal
string representation, etc. C code can create symbols simply by pushing
C strings with a specific format, see below.
* While symbols are strings in the C API, coercion semantics are based on
the ECMAScript behavior. For example, ``duk_to_string()`` applied to a
symbol throws a ``TypeError``.
See:
* https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Symbol
* http://www.2ality.com/2014/12/es6-symbols.html
Internal key formats
====================
Initial bytes in the ranges [0x00,0x7F] and [0xC0,0xFE] are valid for Duktape's
extended UTF-8 flavor. The byte 0xFF and the range [0x80,0xBF] are free to be
used as symbol markers.
+-----------------------------------------------+-----------------------------------------------------------------+
| Internal string format | Description |
+-----------------------------------------------+-----------------------------------------------------------------+
| <ff> anyValue | Hidden symbol (Duktape specific) used by application code. |
| | Prior to Duktape 2.2 Duktape internal hidden symbols also used |
| | the 0xFF prefix followed by a capital letter (A-Z). Starting |
| | from Duktape 2.2 all 0xFF prefixed strings are reserved for |
| | application code. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <80> symbolDescription | Global symbol with description 'symbolDescription' created |
| | using Symbol.for(). |
+-----------------------------------------------+-----------------------------------------------------------------+
| <81> symbolDescription <ff> uniqueSuffix | Local symbol with description 'symbolDescription'. Trailing |
| | unique string makes the symbol unique. The unique suffix is |
| | opaque and chosen arbitrarily by Duktape. It's unique within a |
| | Duktape heap (across all global environments). |
+-----------------------------------------------+-----------------------------------------------------------------+
| <81> <ff> uniqueSuffix | Local symbol with an empty description. Unique suffix makes |
| | each such symbol unique. The unique suffix is arbitrary but |
| | must not contain the 0xFF byte. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <81> <ff> uniqueSuffix <ff> | Local symbol with undefined description. ES2015 differentiates |
| | internally between symbols with an empty string description vs. |
| | symbols with an undefined description. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <81> symbolDescription <ff> | Well known symbol with description 'symbolDescription'. Well |
| | known symbols (like Symbol.iterator) are local symbols which |
| | are still shared across "code realms". Any fixed suffix never |
| | colliding with runtime generated unique local symbols works, |
| | currently an empty suffix is used. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <82> anyValue | Hidden symbol (Duktape specific) used by Duktape internals. |
| | User code should never use this byte prefix or rely on any |
| | Duktape internal hidden Symbols. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <83 to be> | Reserved for future use, behavior is undefined (Duktape 2.1 |
| | interprets as Symbols, Duktape 2.2 does not, don't rely on |
| | either behavior. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <bf> | Initial byte marker for bytecode dump format since Duktape 2.2. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <00 to 7f> | Valid ASCII initial byte. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <c0 to f7> | Valid standard UTF-8 (or CESU-8) initial byte. |
+-----------------------------------------------+-----------------------------------------------------------------+
| <f8 to fe> | Valid extended UTF-8 initial byte. |
+-----------------------------------------------+-----------------------------------------------------------------+
There are public API macros (DUK_HIDDEN_SYMBOL() etc) to create symbol literals
from C code.
Global symbols
==============
Global symbols are the same across separate global environments and even across
Duktape heaps. ES2015 Section 19.4.2.1:
The GlobalSymbolRegistry is a List that is globally available.
It is shared by all Code Realms.
and ES2015 Section 8.2:
Before it is evaluated, all ECMAScript code must be associated with a Realm.
Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript
global environment, all of the ECMAScript code that is loaded within the
scope of that global environment, and other associated state and resources.
The current approach satisfies these simply by making a globally registered
Symbol have a fixed format so that if a Symbol with the same description is
created in another Duktape thread (or even Duktape heap), its internal
representation will be identical. No explicit registry is maintained.
Well-known symbols
==================
Well-known symbols (such as ``Symbol.iterator``) are distinct from any local or
global symbols. ES2015 Section 6.1.5.1:
Well-known symbols are built-in Symbol values that are explicitly referenced
by algorithms of this specification. They are typically used as the keys of
properties whose values serve as extension points of a specification algorithm.
Unless otherwise specified, well-known symbols values are shared by all Code
Realms (8.2).
The fixed representation described above ensures that well-known symbols are
the same across all code realms (and even across Duktape heaps). The internal
representation is essentially the same as for a unique local symbol, but the
suffix that makes local symbols unique is missing. Thus, they behave like
local symbols other than having a fixed representation.
Unifying with Duktape internal keys
===================================
Necessary changes to add symbol behavior:
* Strings with initial byte 0x80, 0x81, 0x82 or 0xFF are flagged as symbols
(``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF or 0x82, also
the hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set.
* ``typeof(sym)`` should return "symbol" rather than string. This is done
for Duktape hidden symbols too.
* ``ToString(sym)`` must be rejected for a symbol, while ``String(sym)``
must specifically check for symbols. Coercion needs to strip possible
"unique suffix" when coming up with the Symbol description.
* Symbols should be safe from accidental enumeration, JSON serialization, etc.
This is actually already the case because internal keys are already excluded
in Duktape 1.x.
* ``Object.getOwnPropertySymbols(``) should return a list of symbol properties
for an object, but filter out Duktape hidden symbols.
* ``Object(sym)`` should create an object with internal class "Symbol",
with the plain symbol value stored behind ``_Value`` (hidden symbol
property) as for Number objects, etc.
* Non-strict comparison needs to handle symbols. ToPrimitive() coercion
is maybe enough to ensure ``sym == Object(sym)`` is accepted.
* Property code needs to accept plain Symbols as is (treated like any other
strings), and Symbol objects should look up their internal string value
(instead of being coerced to e.g. ``Symbol(symbolDescription)``. Current
code just uses ``ToString()`` which causes a TypeError.
* Dozens of similar semantics checks throughout the code base.
Some design questions
=====================
How should C code see Symbols?
------------------------------
Easiest approach:
* Symbols are not enumerated by duk_enum() unless requested. Either fold in with
internal keys, add a separate flags. Maybe rename existing internal keys
flag.
* Property operations work with symbols and internal keys without distinction.
* API call to create a symbol from C code. Hides the construction of the internal
string.
Best naming for Duktape internal keys
-------------------------------------
With https://github.com/svaarala/duktape/pull/979 Duktape internal properties
would become unreachable from ECMAScript code, even if you construct the
internal string using a buffer and then try to use it as an object key.
This offers more protection for sandboxing than ES2015 Symbols which can be
enumerated.
Current naming for Duktape 1.x internal keys is "hidden symbols". Some
alternatives considered:
* Internal symbol: easy to confuse with specification symbols for example.
One benefit would be that as a term close to "internal property".
* Hidden symbol: conveys semantics (assuming GH-797) pretty well.
* Private symbol
* Native symbol
* Invisible symbol