mirror of https://github.com/svaarala/duktape.git
Sami Vaarala
8 years ago
committed by
GitHub
1 changed files with 230 additions and 0 deletions
@ -0,0 +1,230 @@ |
|||||
|
========================== |
||||
|
ES6 Symbols in Duktape 2.x |
||||
|
========================== |
||||
|
|
||||
|
Overview |
||||
|
======== |
||||
|
|
||||
|
Duktape 2.x adds ES6 Symbol support. Duktape 1.x internal keys are unified |
||||
|
with the Symbol concept, and are considered a custom "hidden symbol" type |
||||
|
which is not normally visible to Ecmascript code. C code can access hidden |
||||
|
symbols, however. |
||||
|
|
||||
|
The internal implementation is similar to existing internal keys. Symbols |
||||
|
are represented as ``duk_hstring`` heap objects, with the string data |
||||
|
containing a byte prefix which is invalid (extended) UTF-8 so that it can |
||||
|
never occur for normal Ecmascript strings, or even strings with non-BMP |
||||
|
codepoints. Object coerced strings have a special object class and the |
||||
|
underlying symbol is stored in ``_Value`` similarly to e.g. Number object. |
||||
|
|
||||
|
Representation basics: |
||||
|
|
||||
|
* Symbols have an external type ``DUK_TYPE_STRING``. |
||||
|
|
||||
|
* Symbols have internal type tag ``DUK_TAG_STRING``. |
||||
|
|
||||
|
* Symbols can be distinguished internally from ordinary strings by looking |
||||
|
up the ``DUK_HSTRING_FLAG_SYMBOL`` flag. Hidden symbols also have the |
||||
|
``DUK_HSTRING_FLAG_HIDDEN`` set. |
||||
|
|
||||
|
Behavior basics: |
||||
|
|
||||
|
* Symbols are visible to Ecmascript code as required by ES6 and later. |
||||
|
Hidden symbols are not visible through e.g. |
||||
|
``Object.getOwnPropertySymbols()``. They can only be accessed if a |
||||
|
reference to the hidden symbol string is somehow available, e.g. via a |
||||
|
C binding. |
||||
|
|
||||
|
* Symbols are visible to the public C API as strings: ``duk_is_string()`` |
||||
|
is true, ``duk_get_string()`` returns a pointer to the symbol internal |
||||
|
string representation, etc. C code can create symbols simply by pushing |
||||
|
C strings with a specific format, see below. |
||||
|
|
||||
|
* While symbols are strings in the C API, coercion semantics are based on |
||||
|
the Ecmascript behavior. For example, ``duk_to_string()`` applied to a |
||||
|
symbol throws a ``TypeError``. |
||||
|
|
||||
|
See: |
||||
|
|
||||
|
* https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Symbol |
||||
|
|
||||
|
* http://www.2ality.com/2014/12/es6-symbols.html |
||||
|
|
||||
|
Internal key formats |
||||
|
==================== |
||||
|
|
||||
|
Duktape custom hidden Symbols have an initial 0xFF byte prefix, which matches |
||||
|
the existing convention for Duktape 1.x internal keys. While all bytes in the |
||||
|
range [0xC0,0xFE] are valid initial bytes for Duktape's extended UTF-8 flavor, |
||||
|
the continuation bytes [0x80,0xBF] are never a valid first byte so they are used |
||||
|
for ES6 symbols (and reserved for other future uses) in Duktape 2.x. |
||||
|
|
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| Internal string format | Description | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <ff> SomeUpperCaseValue | Hidden symbol (Duktape specific) used by Duktape internals. | |
||||
|
| | Previously called internal properties. First byte is 0xFF, | |
||||
|
| | second is from [A-Z]. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <ff> anyOtherValue | Hidden symbol (Duktape specific) used by application code. | |
||||
|
| | First byte is 0xFF, second is ASCII (0x00-0x7f) but not | |
||||
|
| | from [A-Z]. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <ff> <ff> anyOtherValue | Hidden symbol (Duktape specific) used by application code. | |
||||
|
| | First and second bytes are 0xFF, remaining bytes arbitrary. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <80> symbolDescription | Global symbol with description 'symbolDescription' created | |
||||
|
| | using Symbol.for(). | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <81> symbolDescription <ff> uniqueSuffix | Local symbol with description 'symbolDescription'. Trailing | |
||||
|
| | unique string makes the symbol unique. The unique suffix is | |
||||
|
| | opaque and chosen arbitrarily by Duktape. It's unique within a | |
||||
|
| | Duktape heap (across all global environments). | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <81> <ff> uniqueSuffix | Local symbol with an empty description. Unique suffix makes | |
||||
|
| | each such symbol unique. The unique suffix is arbitrary but | |
||||
|
| | must not contain the 0xFF byte. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <81> <ff> uniqueSuffix <ff> | Local symbol with undefined description. ES6 differentiates | |
||||
|
| | internally between symbols with an empty string description vs. | |
||||
|
| | symbols with an undefined description. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <81> symbolDescription <ff> | Well known symbol with description 'symbolDescription'. Well | |
||||
|
| | known symbols (like Symbol.iterator) are local symbols which | |
||||
|
| | are still shared across "code realms". Any fixed suffix never | |
||||
|
| | colliding with runtime generated unique local symbols works, | |
||||
|
| | currently an empty suffix is used. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <82 to bf> | Initial bytes 0x82 to 0xBF are reserved for future use. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <00 to 7f> | Valid ASCII initial byte. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <c0 to f7> | Valid standard UTF-8 (or CESU-8) initial byte. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
| <f8 to fe> | Valid extended UTF-8 initial byte. | |
||||
|
+-----------------------------------------------+-----------------------------------------------------------------+ |
||||
|
|
||||
|
Useful comparisons (``p`` is pointer to string data) for internal use only: |
||||
|
|
||||
|
* ``p[0] == 0xff || (p[0] & 0xc0) == 0x80``: some kind of Symbol, either Duktape |
||||
|
hidden Symbol or an ES6 Symbol. |
||||
|
|
||||
|
* ``p[0] == 0xff``: hidden symbol, user or Duktape |
||||
|
|
||||
|
* ``(p[0] & 0xc0) == 0x80``: ES6 Symbol, visible to Ecmascript code |
||||
|
|
||||
|
Global symbols |
||||
|
============== |
||||
|
|
||||
|
Global symbols are the same across separate global environments and even across |
||||
|
Duktape heaps. ES6 Section 19.4.2.1: |
||||
|
|
||||
|
The GlobalSymbolRegistry is a List that is globally available. |
||||
|
It is shared by all Code Realms. |
||||
|
|
||||
|
and ES6 Section 8.2: |
||||
|
|
||||
|
Before it is evaluated, all ECMAScript code must be associated with a Realm. |
||||
|
Conceptually, a realm consists of a set of intrinsic objects, an ECMAScript |
||||
|
global environment, all of the ECMAScript code that is loaded within the |
||||
|
scope of that global environment, and other associated state and resources. |
||||
|
|
||||
|
The current approach satisfies these simply by making a globally registered |
||||
|
Symbol have a fixed format so that if a Symbol with the same description is |
||||
|
created in another Duktape thread (or even Duktape heap), its internal |
||||
|
representation will be identical. No explicit registry is maintained. |
||||
|
|
||||
|
Well-known symbols |
||||
|
================== |
||||
|
|
||||
|
Well-known symbols (such as ``Symbol.iterator``) are distinct from any local or |
||||
|
global symbols. ES6 Section 6.1.5.1: |
||||
|
|
||||
|
Well-known symbols are built-in Symbol values that are explicitly referenced |
||||
|
by algorithms of this specification. They are typically used as the keys of |
||||
|
properties whose values serve as extension points of a specification algorithm. |
||||
|
Unless otherwise specified, well-known symbols values are shared by all Code |
||||
|
Realms (8.2). |
||||
|
|
||||
|
The fixed representation described above ensures that well-known symbols are |
||||
|
the same across all code realms (and even across Duktape heaps). The internal |
||||
|
representation is essentially the same as for a unique local symbol, but the |
||||
|
suffix that makes local symbols unique is missing. Thus, they behave like |
||||
|
local symbols other than having a fixed representation. |
||||
|
|
||||
|
Unifying with Duktape internal keys |
||||
|
=================================== |
||||
|
|
||||
|
Necessary changes to add symbol behavior: |
||||
|
|
||||
|
* Strings with initial byte 0x80, 0x81, or 0xFF are flagged as symbols |
||||
|
(``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF, also the |
||||
|
hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set. |
||||
|
|
||||
|
* ``typeof(sym)`` should return "symbol" rather than string. This is done |
||||
|
for Duktape hidden symbols too. |
||||
|
|
||||
|
* ``ToString(sym)`` must be rejected for a symbol, while ``String(sym)`` |
||||
|
must specifically check for symbols. Coercion needs to strip possible |
||||
|
"unique suffix" when coming up with the Symbol description. |
||||
|
|
||||
|
* Symbols should be safe from accidental enumeration, JSON serialization, etc. |
||||
|
This is actually already the case because internal keys are already excluded |
||||
|
in Duktape 1.x. |
||||
|
|
||||
|
* ``Object.getOwnPropertySymbols(``) should return a list of symbol properties |
||||
|
for an object, but filter out Duktape hidden symbols. |
||||
|
|
||||
|
* ``Object(sym)`` should create an object with internal class "Symbol", |
||||
|
with the plain symbol value stored behind ``_Value`` (hidden symbol |
||||
|
property) as for Number objects, etc. |
||||
|
|
||||
|
* Non-strict comparison needs to handle symbols. ToPrimitive() coercion |
||||
|
is maybe enough to ensure ``sym == Object(sym)`` is accepted. |
||||
|
|
||||
|
* Property code needs to accept plain Symbols as is (treated like any other |
||||
|
strings), and Symbol objects should look up their internal string value |
||||
|
(instead of being coerced to e.g. ``Symbol(symbolDescription)``. Current |
||||
|
code just uses ``ToString()`` which causes a TypeError. |
||||
|
|
||||
|
* Dozens of similar semantics checks throughout the code base. |
||||
|
|
||||
|
Some design questions |
||||
|
===================== |
||||
|
|
||||
|
How should C code see Symbols? |
||||
|
------------------------------ |
||||
|
|
||||
|
Easiest approach: |
||||
|
|
||||
|
* Symbols are not enumerated by duk_enum() unless requested. Either fold in with |
||||
|
internal keys, add a separate flags. Maybe rename existing internal keys |
||||
|
flag. |
||||
|
|
||||
|
* Property operations work with symbols and internal keys without distinction. |
||||
|
|
||||
|
* API call to create a symbol from C code. Hides the construction of the internal |
||||
|
string. |
||||
|
|
||||
|
Best naming for Duktape internal keys |
||||
|
------------------------------------- |
||||
|
|
||||
|
With https://github.com/svaarala/duktape/pull/979 Duktape internal properties |
||||
|
would become unreachable from Ecmascript code, even if you construct the |
||||
|
internal string using a buffer and then try to use it as an object key. |
||||
|
This offers more protection for sandboxing than ES6 Symbols which can be |
||||
|
enumerated. |
||||
|
|
||||
|
Current naming for Duktape 1.x internal keys is "hidden symbols". Some |
||||
|
alternatives considered: |
||||
|
|
||||
|
* Internal symbol: easy to confuse with specification symbols for example. |
||||
|
One benefit would be that as a term close to "internal property". |
||||
|
|
||||
|
* Hidden symbol: conveys semantics (assuming GH-797) pretty well. |
||||
|
|
||||
|
* Private symbol |
||||
|
|
||||
|
* Native symbol |
||||
|
|
||||
|
* Invisible symbol |
Loading…
Reference in new issue