From ed3ef811011093740768ace2c0d21647bf8bfabd Mon Sep 17 00:00:00 2001 From: Sami Vaarala Date: Fri, 8 Sep 2017 00:50:52 +0300 Subject: [PATCH] Internal doc update for 0x82 symbol prefix --- doc/debugger.rst | 6 ++++-- doc/hobject-design.rst | 4 ++-- doc/json.rst | 4 ++-- doc/low-memory.rst | 2 +- doc/symbols.rst | 46 +++++++++++++++++------------------------- 5 files changed, 27 insertions(+), 35 deletions(-) diff --git a/doc/debugger.rst b/doc/debugger.rst index 3fe63553..a20f8620 100644 --- a/doc/debugger.rst +++ b/doc/debugger.rst @@ -2077,8 +2077,10 @@ The flags field is an unsigned integer bitmask with the following bits: +---------+-----------------------------------------------------------------+ | 0x10 | Property is virtual, matches DUK_PROPDESC_FLAG_VIRTUAL. | +---------+-----------------------------------------------------------------+ -| 0x100 | Property is internal, and not visible to ordinary Ecmascript | -| | code. Currently set when initial key byte is 0xFF. | +| 0x100 | Property key is a Symbol. | ++---------+-----------------------------------------------------------------+ +| 0x200 | Property is a hidden Symbol which is not visible to ordinary | +| | Ecmascript code. | +---------+-----------------------------------------------------------------+ For artificial properties (returned by GetHeapObjInfo) the property attributes diff --git a/doc/hobject-design.rst b/doc/hobject-design.rst index b7fad049..d6fb0f18 100644 --- a/doc/hobject-design.rst +++ b/doc/hobject-design.rst @@ -1512,8 +1512,8 @@ properties is simple: since all standard keys encode into valid UTF-8 sequences (valid CESU-8 sequences to be exact) in memory, internal properties are prefixed with an invalid UTF-8 sequence which standard Ecmascript code cannot generate and thus cannot access. The current prefix is a single -``0xff`` byte. The prefix is denoted with an underscore in this document; -e.g. ``_Map`` would be represented as the byte sequence: ``0xff`` ``'M'`` +``0x82`` byte. The prefix is denoted with an underscore in this document; +e.g. ``_Map`` would be represented as the byte sequence: ``0x82`` ``'M'`` ``'a'`` ``'p'`` in memory. User C code can also use internal properties for its own purposes, as long as the property names don't conflict with Duktape's internal properties. diff --git a/doc/json.rst b/doc/json.rst index ad4a02b7..d17bdb26 100644 --- a/doc/json.rst +++ b/doc/json.rst @@ -237,8 +237,8 @@ solution is: into the output value. * The current UTF-8/CESU-8 decoding is not strict, so this is mainly - triggered for invalid initial bytes (0xFF) or when a codepoint has been - truncated (end of buffer). + triggered for invalid initial bytes (e.g. 0xFF) or when a codepoint has + been truncated (end of buffer). This is by no means an optimal solution and produces quite interesting results at times. diff --git a/doc/low-memory.rst b/doc/low-memory.rst index e21c6154..84518ec5 100644 --- a/doc/low-memory.rst +++ b/doc/low-memory.rst @@ -937,7 +937,7 @@ Ecmascript function footprint allocation size is not double that of final bytecode, as that is awkward for pool allocators. -* Improve property format, e.g. ``_formals`` is now a regular array which +* Improve property format, e.g. ``_Formals`` is now a regular array which is quite wasteful; it could be converted to a ``\xFF`` separated string for instance. diff --git a/doc/symbols.rst b/doc/symbols.rst index 3050b394..b93bb550 100644 --- a/doc/symbols.rst +++ b/doc/symbols.rst @@ -53,25 +53,18 @@ See: Internal key formats ==================== -Duktape custom hidden Symbols have an initial 0xFF byte prefix, which matches -the existing convention for Duktape 1.x internal keys. While all bytes in the -range [0xC0,0xFE] are valid initial bytes for Duktape's extended UTF-8 flavor, -the continuation bytes [0x80,0xBF] are never a valid first byte so they are used -for ES2015 symbols (and reserved for other future uses) in Duktape 2.x. +Initial bytes in the ranges [0x00,0x7F] and [0xC0,0xFE] are valid for Duktape's +extended UTF-8 flavor. The byte 0xFF and the range [0x80,0xBF] are free to be +used as symbol markers. +-----------------------------------------------+-----------------------------------------------------------------+ | Internal string format | Description | +-----------------------------------------------+-----------------------------------------------------------------+ -| SomeUpperCaseValue | Hidden symbol (Duktape specific) used by Duktape internals. | -| | Previously called internal properties. First byte is 0xFF, | -| | second is from [A-Z]. | -+-----------------------------------------------+-----------------------------------------------------------------+ -| anyOtherValue | Hidden symbol (Duktape specific) used by application code. | -| | First byte is 0xFF, second is ASCII (0x00-0x7f) but not | -| | from [A-Z]. | -+-----------------------------------------------+-----------------------------------------------------------------+ -| anyOtherValue | Hidden symbol (Duktape specific) used by application code. | -| | First and second bytes are 0xFF, remaining bytes arbitrary. | +| anyValue | Hidden symbol (Duktape specific) used by application code. | +| | Prior to Duktape 2.2 Duktape internal hidden symbols also used | +| | the 0xFF prefix followed by a capital letter (A-Z). Starting | +| | from Duktape 2.2 all 0xFF prefixed strings are reserved for | +| | application code. | +-----------------------------------------------+-----------------------------------------------------------------+ | <80> symbolDescription | Global symbol with description 'symbolDescription' created | | | using Symbol.for(). | @@ -95,7 +88,13 @@ for ES2015 symbols (and reserved for other future uses) in Duktape 2.x. | | colliding with runtime generated unique local symbols works, | | | currently an empty suffix is used. | +-----------------------------------------------+-----------------------------------------------------------------+ -| <82 to bf> | Initial bytes 0x82 to 0xBF are reserved for future use. | +| <82> anyValue | Hidden symbol (Duktape specific) used by Duktape internals. | +| | User code should never use this byte prefix or rely on any | +| | Duktape internal hidden Symbols. | ++-----------------------------------------------+-----------------------------------------------------------------+ +| <83 to bf> | Reserved for future use, behavior is undefined (Duktape 2.1 | +| | interprets as Symbols, Duktape 2.2 does not, don't rely on | +| | either behavior. | +-----------------------------------------------+-----------------------------------------------------------------+ | <00 to 7f> | Valid ASCII initial byte. | +-----------------------------------------------+-----------------------------------------------------------------+ @@ -104,15 +103,6 @@ for ES2015 symbols (and reserved for other future uses) in Duktape 2.x. | | Valid extended UTF-8 initial byte. | +-----------------------------------------------+-----------------------------------------------------------------+ -Useful comparisons (``p`` is pointer to string data) for internal use only: - -* ``p[0] == 0xff || (p[0] & 0xc0) == 0x80``: some kind of Symbol, either Duktape - hidden Symbol or an ES2015 Symbol. - -* ``p[0] == 0xff``: hidden symbol, user or Duktape - -* ``(p[0] & 0xc0) == 0x80``: ES2015 Symbol, visible to Ecmascript code - There are public API macros (DUK_HIDDEN_SYMBOL() etc) to create symbol literals from C code. @@ -160,9 +150,9 @@ Unifying with Duktape internal keys Necessary changes to add symbol behavior: -* Strings with initial byte 0x80, 0x81, or 0xFF are flagged as symbols - (``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF, also the - hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set. +* Strings with initial byte 0x80, 0x81, 0x82 or 0xFF are flagged as symbols + (``DUK_HSTRING_FLAG_SYMBOL``). If the initial byte is 0xFF or 0x82, also + the hidden symbol flag (``DUK_HSTRING_FLAG_HIDDEN``) is set. * ``typeof(sym)`` should return "symbol" rather than string. This is done for Duktape hidden symbols too.