Merge pull request #1202 from svaarala/symbol-documentation-updates

Documentation updates for initial symbol support
8 years ago · f117807f29
20 changed files with 193 additions and 182 deletions
--- a/doc/sandboxing.rst
+++ b/doc/sandboxing.rst
@ -25,6 +25,11 @@ carefully written with these sandboxing goals in mind.

 This document describes best practices for Duktape sandboxing.

+There's a YAML config file with some useful default options for sandboxing,
+and comments on what options you might consider:
+
+* ``config/examples/security_sensitive.yaml``
+
 .. note:: This document described the current status of sandboxing features
          which is not yet a complete solution.

@ -88,12 +93,12 @@ Verbose error messages may cause sandboxing security issues:
 * When ``DUK_USE_PARANOID_ERRORS`` is not set, offending object/key is
  summarized in an error message of some rejected property operations.
  If object keys contain potentially sensitive information, you should
-  enable this option.
+  enable this option.  Disable ``DUK_USE_PARANOID_ERRORS``.

 * When stack traces are enabled an attacker may gain useful information from
  the stack traces.  Further, access to the internal ``_Tracedata`` property
  provides access to call chain functions even when references to them are not
-  available directly.
+  available directly.  Disable ``DUK_USE_TRACEBACKS``.

 Replace the global object
 -------------------------
@ -124,30 +129,32 @@ Risky bindings:
  finalizers are a sandboxing risk.  It's also possible to override or unset a
  finalizer which the sandbox relies on.

-* Since Duktape 2.x buffer bindings no longer provide a way create "internal"
-  strings which allow access to internal properties.  See separate section on
-  internal properties.
+* Since Duktape 2.x buffer bindings no longer provide a way create hidden
+  Symbols (called "internal strings" in Duktape 1.x) which allow access to
+  internal properties.  See separate section on internal properties.

 You should also:

-* Remove the ``require`` module loading function in the global object.
-  If you need module loading in the sandbox, it's better to write a specific,
+* Remove the ``require`` module loading function in the global object
+  (since Duktape 2.x it's no longer present by default).  If you need
+  module loading in the sandbox, it's better to write a specific,
  constrained module loader for that environment.

 Restrict access to internal properties
 --------------------------------------

-Internal properties are intended to be used by Duktape and user C code
-to store "hidden properties" in objects.  The mechanism currently relies on
-using strings whose internal representation contains invalid UTF-8/CESU-8 data,
-in concrete terms, a 0xFF prefix.  These are called "internal strings".  Since
+Internal properties are used by Duktape and user C code to store "hidden
+properties" in objects.  The mechanism currently relies on "hidden Symbols"
+(called "internal keys" or "internal strings" in Duktape 1.x).  These are
+strings whose internal representation contains invalid UTF-8/CESU-8 data
+(see ``doc/symbols.rst`` for description of the current formats).  Because
 all standard Ecmascript strings are represented as CESU-8, such strings cannot
 normally be created by Ecmascript code.  The properties are also never
-enumerated or otherwise exposed to Ecmascript code, so that the only way to
-access them from Ecmascript code is to have access to an "internal string"
-acting as the property key.
+enumerated or otherwise exposed to Ecmascript code (not even by
+``Object.getOwnPropertySymbols()``) so that the only way to access them from
+Ecmascript code is to have access to a hidden Symbol acting as the property key.

-C code can create internal keys very easily, which can provide a way to access
+C code can create hidden Symbols very easily, which can provide a way to access
 internal properties.  For example::

    // Assume an application native binding returns an internal key pushed
@ -165,18 +172,18 @@ be modified, concrete security issues may arise.  For instance, if an internal
 property stores a raw pointer to a native handle (such as a ``FILE *``),
 changing its value can lead to a potentially exploitable segfault.

-Since Duktape 2.x Ecmascript code cannot create internal keys using standard
-Ecmascript code and the built-in bindings alone.  To prevent access to internal
-keys, ensure that no native bindings provided by the sandboxing environment
+Since Duktape 2.x Ecmascript code cannot create hidden Symbols using standard
+Ecmascript code and the built-in bindings alone.  To prevent access to hidden
+Symbols, ensure that no native bindings provided by the sandboxing environment
 accidentally return such strings.  The easiest way to ensure this is to make
 sure all strings pushed on the value stack are properly CESU-8 encoded.

 It's also good practice to ensure that sandboxed code has minimal access to
-objects with potentially dangerous keys like raw pointers.
+objects with potentially dangerous properties like raw pointers.

-.. note:: There's a future work issue, potentially included in Duktape 2.x,
+.. note:: There's a future work issue, potentially included in Duktape 3.x,
          for preventing access to internal properties from Ecmascript code
-          even when using the correct internal key.
+          even when using the correct hidden Symbol as a lookup key.

 Restrict access to function instances
 -------------------------------------
@ -234,9 +241,9 @@ string methods with a plain base value::

    print("foo".toUpperCase());

-Duktape 1.0 will use the original built-in prototype functions in these
-inheritance situations.  There is currently no way to replace these built-ins
-so that the replacements would be used for instead (see
+Duktape uses the original built-in prototype functions in these inheritance
+situations.  There is currently no way to replace these built-ins so that the
+replacements would be used for instead (see
 ``test-dev-sandbox-prototype-limitation.js``).

 As a result, sandboxed code will always have access to the built-in prototype
@ -261,7 +268,7 @@ objects which participate in implicit inheritance:
  through explicit construction (if constructors visible) or implicitly
  through internal errors, e.g. ``/foo\123/`` which throws a SyntaxError

-* ``ArrayBuffer.prototype``: through buffer values (if available); since
+* ``Uint8Array.prototype``: through buffer values (if available); since
  there is no buffer literal, user cannot construct buffer values directly

 * ``Duktape.Pointer.prototype`` through pointer values (if available); since
@ -367,7 +374,7 @@ vulnerabilities.  To avoid such issues:
  must match; patch version may vary as bytecode format doesn't change in
  patch versions.

-* Ensure integrity of bytecode being loaded e.g. by checksumming.
+* Ensure integrity of bytecode being loaded e.g. by checksumming or signing.

 * If bytecode is transported over the network or other unsafe media,
  use cryptographic means (keyed hashing, signatures, or similar) to
--- a/website/api/duk_del_prop.yaml
+++ b/website/api/duk_del_prop.yaml
@ -33,9 +33,10 @@ summary: |
  <ul>
  <li>The target value is automatically coerced to an object.  However, this
      object is a temporary one, so deleting its properties is not very useful.</li>
-  <li>The <code>key</code> argument is internally coerced to a string.  There is
-      an internal fast path for arrays and numeric indices which avoids an
-      explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
+  <li>The <code>key</code> argument is internally coerced using ToPropertyKey()
+      coercion which results in a string or a Symbol.  There is an internal
+      fast path for arrays and numeric indices which avoids an explicit string
+      coercion, so use a numeric <code>key</code> when applicable.</li>
  </ul>

  <p>If the target is a Proxy object which implements the <code>deleteProperty</code>
--- a/website/api/duk_enum.yaml
+++ b/website/api/duk_enum.yaml
@ -19,9 +19,20 @@ summary: |
      properties are enumerated</td>
  </tr>
  <tr>
-  <td>DUK_ENUM_INCLUDE_INTERNAL</td>
-  <td>Enumerate also internal properties, by default internal properties
-      are not enumerated</td>
+  <td>DUK_ENUM_INCLUDE_HIDDEN</td>
+  <td>Enumerate also hidden Symbols, by default hidden Symbols are not
+      enumerated.  Use together with <code>DUK_ENUM_INCLUDE_SYMBOLS</code>.
+      In Duktape 1.x this flag was called <code>DUK_ENUM_INCLUDE_INTERNAL</code>.</td>
+  </tr>
+  <tr>
+  <td>DUK_ENUM_INCLUDE_SYMBOLS</td>
+  <td>Include Symbols in the enumeration result.  Hidden Symbols are not
+      included unless <code>DUK_ENUM_INCLUDE_HIDDEN</code> is specified.</td>
+  </tr>
+  <tr>
+  <td>DUK_ENUM_EXCLUDE_STRINGS</td>
+  <td>Exclude strings from the enumeration result.  By default strings are
+      included.</td>
  </tr>
  <tr>
  <td>DUK_ENUM_OWN_PROPERTIES_ONLY</td>
@ -39,6 +50,10 @@ summary: |
      enumeration result rather than per inheritance level, this has the
      effect of sorting array indices (even when inherited)</td>
  </tr>
+  <tr>
+  <td>DUK_ENUM_NO_PROXY_BEHAVIOR</td>
+  <td>Enumerate a Proxy object itself without invoking Proxy behaviors.</td>
+  </tr>
  </table>

  <p>Without any flags the enumeration behaves like <code>for-in</code>:
--- a/website/api/duk_get_prop.yaml
+++ b/website/api/duk_get_prop.yaml
@ -34,9 +34,10 @@ summary: |
  <li>The target value is automatically coerced to an object.  For instance,
      a string is converted to a <code>String</code> and you can access its
      <code>"length"</code> property.</li>
-  <li>The <code>key</code> argument is internally coerced to a string.  There is
-      an internal fast path for arrays and numeric indices which avoids an
-      explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
+  <li>The <code>key</code> argument is internally coerced using ToPropertyKey()
+      coercion which results in a string or a Symbol.  There is an internal
+      fast path for arrays and numeric indices which avoids an explicit string
+      coercion, so use a numeric <code>key</code> when applicable.</li>
  </ul>

  <p>If the target is a Proxy object which implements the <code>get</code> trap,
--- a/website/api/duk_get_string.yaml
+++ b/website/api/duk_get_string.yaml
@ -21,6 +21,8 @@ summary: |
  this differs from how buffer data pointers are handled (for technical reasons).
  </div>

+  <div include="symbols-are-strings.html" />
+
 example: |
  const char *buf;

--- a/website/api/duk_get_type.yaml
+++ b/website/api/duk_get_type.yaml
@ -11,6 +11,8 @@ summary: |
  <code>DUK_TYPE_xxx</code> or <code>DUK_TYPE_NONE</code> if <code>idx</code>
  is invalid.</p>

+  <div include="symbols-are-strings.html" />
+
 example: |
  if (duk_get_type(ctx, -3) == DUK_TYPE_NUMBER) {
      printf("value is a number\n");
--- a/website/api/duk_get_type_mask.yaml
+++ b/website/api/duk_get_type_mask.yaml
@ -15,6 +15,8 @@ summary: |
  (the <code><a href="#duk_check_type_mask">duk_check_type_mask()</a></code> call is
  even more convenient for this purpose).</p>

+  <div include="symbols-are-strings.html" />
+
 example: |
  if (duk_get_type_mask(ctx, -3) & (DUK_TYPE_MASK_STRING |
                                    DUK_TYPE_MASK_NUMBER)) {
--- a/website/api/duk_has_prop.yaml
+++ b/website/api/duk_has_prop.yaml
@ -27,9 +27,10 @@ summary: |
  <li>The target value is automatically coerced to an object.  For instance,
      a string is converted to a <code>String</code> and you can check for its
      <code>"length"</code> property.</li>
-  <li>The <code>key</code> argument is internally coerced to a string.  There is
-      an internal fast path for arrays and numeric indices which avoids an
-      explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
+  <li>The <code>key</code> argument is internally coerced using ToPropertyKey()
+      coercion which results in a string or a Symbol.  There is an internal
+      fast path for arrays and numeric indices which avoids an explicit string
+      coercion, so use a numeric <code>key</code> when applicable.</li>
  </ul>

  <p>If the target is a Proxy object which implements the <code>has</code> trap,
--- a/website/api/duk_is_string.yaml
+++ b/website/api/duk_is_string.yaml
@ -10,6 +10,8 @@ summary: |
  <p>Returns 1 if value at <code>idx</code> is a string, otherwise
  returns 0.  If <code>idx</code> is invalid, also returns 0.</p>

+  <div include="symbols-are-strings.html" />
+
 example: |
  if (duk_is_string(ctx, -3)) {
      /* ... */
--- a/website/api/duk_push_string.yaml
+++ b/website/api/duk_push_string.yaml
@ -17,7 +17,14 @@ summary: |
  to the stack and <code>NULL</code> is returned.  This behavior differs from
  <code><a href="#duk_push_lstring">duk_push_lstring</a></code> on purpose.</p>

-  <p>C code should normally only push valid CESU-8 strings to the stack.</p>
+  <div class="note">
+  C code should normally only push valid CESU-8 strings to the stack.
+  Some invalid CESU-8/UTF-8 byte sequences are reserved for special
+  uses such as representing Symbol values.  When you push such an invalid
+  byte sequence, the value on the value stack will behave like a string for
+  C code but will appear as a <code>Symbol</code> for Ecmascript code.
+  See <a href="guide.html#symbols">Symbols</a> for more discussion.
+  </div>

  <p>If input string might contain internal NUL characters, use
  <code><a href="#duk_push_lstring">duk_push_lstring()</a></code> instead.</p>
--- a/website/api/duk_put_prop.yaml
+++ b/website/api/duk_put_prop.yaml
@ -35,9 +35,10 @@ summary: |
      transitory objects (see
      <a href="http://www.ecma-international.org/ecma-262/5.1/#sec-8.7.2">PutValue (V, W)</a>,
      step 7 of the special [[Put]] variant).</li>
-  <li>The <code>key</code> argument is internally coerced to a string.  There is
-      an internal fast path for arrays and numeric indices which avoids an
-      explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
+  <li>The <code>key</code> argument is internally coerced using ToPropertyKey()
+      coercion which results in a string or a Symbol.  There is an internal
+      fast path for arrays and numeric indices which avoids an explicit string
+      coercion, so use a numeric <code>key</code> when applicable.</li>
  </ul>

  <p>If the target is a Proxy object which implements the <code>set</code> trap,
--- a/website/api/duk_require_string.yaml
+++ b/website/api/duk_require_string.yaml
@ -11,6 +11,8 @@ summary: |
  but throws an error if the value at <code>idx</code> is not a string
  or if the index is invalid.</p>

+  <div include="symbols-are-strings.html" />
+
 example: |
  const char *buf;

--- a/website/api/duk_to_string.yaml
+++ b/website/api/duk_to_string.yaml
@ -14,6 +14,10 @@ summary: |

  <div include="ref-custom-type-coercion.html" />

+  <div class="note">
+  ToString() coercion for a Symbol value causes a TypeError.
+  </div>
+
  <div class="note">
  In Duktape 2.x plain buffers mimic ArrayBuffer objects and will usually
  ToString() coerce to "[object ArrayBuffer]".  To convert buffer or buffer
--- a/website/api/symbols-are-strings.html
+++ b/website/api/symbols-are-strings.html
@ -0,0 +1,7 @@
+<div class="note">
+Symbol values are visible in the C API as strings, e.g. <code>duk_is_string()</code>
+is true (this behavior is similar to Duktape 1.x internal strings).  Symbols are
+still an experimental feature.  For now, you can distinguish Symbols from ordinary
+strings by looking at their initial byte, see
+<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>.
+</div>
--- a/website/buildsite.py
+++ b/website/buildsite.py
@ -955,7 +955,7 @@ def generateGuide():
 	navlinks.append(['#finalization', 'Finalization'])
 	navlinks.append(['#coroutines', 'Coroutines'])
 	navlinks.append(['#virtualproperties', 'Virtual properties'])
-	navlinks.append(['#internalproperties', 'Internal properties'])
+	navlinks.append(['#symbols', 'Symbols'])
 	navlinks.append(['#bytecodedumpload', 'Bytecode dump/load'])
 	navlinks.append(['#threading', 'Threading'])
 	navlinks.append(['#sandboxing', 'Sandboxing'])
@ -1006,7 +1006,7 @@ def generateGuide():
 	res += processRawDoc('guide/finalization.html')
 	res += processRawDoc('guide/coroutines.html')
 	res += processRawDoc('guide/virtualproperties.html')
-	res += processRawDoc('guide/internalproperties.html')
+	res += processRawDoc('guide/symbols.html')
 	res += processRawDoc('guide/bytecodedumpload.html')
 	res += processRawDoc('guide/threading.html')
 	res += processRawDoc('guide/sandboxing.html')
--- a/website/guide/custombehavior.html
+++ b/website/guide/custombehavior.html
@ -9,13 +9,13 @@ other relevant specifications.</p>
 access to Duktape specific features.  Also the buffer, pointer, and lightfunc
 types are custom.</p>

-<h2>Internal properties</h2>
+<h2>Hidden Symbols</h2>

-<p>Objects may have <a href="#internalproperties">internal properties</a> which
-are essentially hidden from normal code: they won't be enumerated or returned
-even by e.g. <code>Object.getOwnPropertyNames()</code>.  Ordinary Ecmascript
-code cannot refer to such properties because the property keys intentionally
-use invalid UTF-8 (<code>0xFF</code> prefix byte).</p>
+<p>Objects may have properties with <a href="#symbols">hidden Symbol</a> keys.
+These are similar to ES2015 Symbols but won't be enumerated or returned from even
+<code>Object.getOwnPropertySymbols()</code>.  Ordinary Ecmascript code cannot
+refer to such properties because the keys intentionally use an invalid (extended)
+UTF-8 representation.</p>

 <h2>"use duk notail" directive</h2>

--- a/website/guide/internalproperties.html
+++ b/website/guide/internalproperties.html
@ -1,110 +0,0 @@
-<h1 id="internalproperties">Internal properties</h1>
-
-<p>Duktape supports non-standard <b>internal properties</b> which are
-essentially hidden from user code.  They can only be accessed by a
-direct property read/write, and are never enumerated, serialized by
-<code>JSON.stringify()</code> or returned from built-in functions such
-as <code>Object.getOwnPropertyNames()</code>.</p>
-
-<p>Duktape uses internal properties for various implementation specific
-purposes, such as storing an object's finalizer reference, the internal
-value held by <code>Number</code> and <code>Date</code>, etc.  User code
-can also use internal properties for its own purposes, e.g. to
-store "hidden state" in objects, as long as the property names never
-conflict with current or future Duktape internal keys (this is ensured
-by the naming convention described below).  User code should never try
-to access Duktape's internal properties: the set of internal properties
-used can change arbitrarily between versions.</p>
-
-<p>Internal properties are distinguished from other properties by the
-property key: if the byte representation of a property key begins with
-a <code>0xFF</code> byte Duktape automatically treats the property as an
-internal property.  Such a string is referred to as an <b>internal string</b>.
-The initial byte makes the key invalid UTF-8 (even invalid extended UTF-8),
-which ensures that (1) internal properties never conflict with normal Unicode
-property names and that (2) ordinary Ecmascript code cannot accidentally access
-them.  The initial prefix byte is often represented by an underscore in
-documentation for readability, e.g. <code>_Value</code> is used instead
-of <code>\xFFValue</code>.</p>
-
-<p>The following naming convention is used.  The convention ensures that
-Duktape and user internal properties never conflict:</p>
-<table>
-<tr>
-<th>Type</th>
-<th>Example (C)</th>
-<th>Bytes</th>
-<th>Description</th>
-</tr>
-<tr>
-<td>Duktape</td>
-<td><code>"\xFF" "Value"</code></td>
-<td><code>ff 56 61 6c 75 65</code></td>
-<td>First character is always uppercase, followed by <code>[a-z0-9_]*</code>.</td>
-</tr>
-<tr>
-<td>User</td>
-<td><code>"\xFF" "myprop"</code></td>
-<td><code>ff 6d 79 70 72 6f 70</code></td>
-<td>First character must not be uppercase to avoid conflict with
-current or future Duktape keys.</td>
-</tr>
-<tr>
-<td>User</td>
-<td><code>"\xFF\xFF" &lt;arbitrary&gt;</code></td>
-<td><code>ff ff &lt;arbitrary&gt;</code></td>
-<td>Double <code>0xFF</code> prefix followed by arbitrary data.</td>
-</tr>
-</table>
-
-<p>In some cases the internal key needed by user code is not static, e.g.
-it can be dynamically generated by serializing a pointer or perhaps the
-bytes are from an external source.  In this case it is safest to use
-two <code>0xFF</code> prefix bytes as the example above shows.</p>
-
-<div class="note">
-Note that the <code>0xFF</code> prefix cannot be expressed as a valid
-Ecmascript string.  For example, the internal string <code>\xFFxyz</code>
-would appear as the bytes <code>ff 78 79 7a</code> in memory, while the
-Ecmascript string <code>"\u00ffxyz"</code> would be represented as the
-CESU-8 bytes <code>c3 bf 78 79 7a</code> in memory.
-</div>
-
-<p>Creating an internal string is easy from C code:</p>
-<pre class="c-code">
-/* Create an internal string, which can then be used to read/write internal
- * properties, and can be passed on to Ecmascript code like any other string.
- * Terminating a string literal after a hex escape is safest to avoid some
- * ambiguous cases like "\xffab".
- */
-duk_push_string(ctx, "\xff" "myprop");
-</pre>
-
-<p>For more discussion on C string hex escaping, see
-<a href="https://github.com/svaarala/duktape/blob/master/misc/c_hex_esc.c">c_hex_esc.c</a>.</p>
-
-<p>Internal strings cannot be created from Ecmascript code using the default
-built-ins alone.  However, application code can easily add such a binding
-using the C API which must be considered in sandboxing.</p>
-
-<p>There's no special access control for internal properties: if user code has
-access to the property name (string), it can read/write the property value.
-The default Ecmascript built-ins don't provide a way of creating an internal
-string: buffer-to-string coercions always involve an encoding such as UTF-8
-which will reject or replace invalid byte sequences.  However, C code can
-easily create internal strings.  When sandboxing, ensure that custom C bindings
-don't accidentally provide a mechanism to create internal strings by e.g.
-converting a buffer as-is to a string.</p>
-
-<p>As a concrete example the internal value of a <code>Date</code> instance
-can be accessed as follows:</p>
-<pre class="ecmascript-code">
-// Print the internal timestamp of a Date instance.  Assumes a hypothetical
-// rawBufferToString() custom C binding which takes an input buffer and pushes
-// the bytes as-is as a string using duk_push_lstring(), thus creating an
-// internal string.
-
-var key = rawBufferToString(Duktape.dec('hex', 'ff56616c7565'));  // \xFFValue
-var dt = new Date(123456);
-print('internal value is:', dt[key]);  // prints 123456
-</pre>
--- a/website/guide/intro.html
+++ b/website/guide/intro.html
@ -177,7 +177,7 @@ wrappers are discussed in detail.</p>
 <a href="#finalization">Finalization</a>,
 <a href="#coroutines">Coroutines</a>,
 <a href="#virtualproperties">Virtual properties</a>,
-<a href="#internalproperties">Internal properties</a>,
+<a href="#symbols">Symbols</a>,
 <a href="#bytecodedumpload">Bytecode dump/load</a>,
 <a href="#threading">Threading</a>,
 <a href="#sandboxing">Sandboxing</a>.
--- a/website/guide/stacktypes.html
+++ b/website/guide/stacktypes.html
@ -10,7 +10,7 @@
 <tr><td><a href="#type-null">null</a></td><td>DUK_TYPE_NULL</td><td>DUK_TYPE_MASK_NULL</td><td><code>null</code></td></tr>
 <tr><td><a href="#type-boolean">boolean</a></td><td>DUK_TYPE_BOOLEAN</td><td>DUK_TYPE_MASK_BOOLEAN</td><td><code>true</code> and <code>false</code></td></tr>
 <tr><td><a href="#type-number">number</a></td><td>DUK_TYPE_NUMBER</td><td>DUK_TYPE_MASK_NUMBER</td><td>IEEE double</td></tr>
-<tr><td><a href="#type-string">string</a></td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable (plain) string</td></tr>
+<tr><td><a href="#type-string">string</a></td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable (plain) string or (plain) Symbol</td></tr>
 <tr><td><a href="#type-object">object</a></td><td>DUK_TYPE_OBJECT</td><td>DUK_TYPE_MASK_OBJECT</td><td>object with properties</td></tr>
 <tr><td><a href="#type-buffer">buffer</a></td><td>DUK_TYPE_BUFFER</td><td>DUK_TYPE_MASK_BUFFER</td><td>mutable (plain) byte buffer, fixed/dynamic/external; mimics an ArrayBuffer</td></tr>
 <tr><td><a href="#type-pointer">pointer</a></td><td>DUK_TYPE_POINTER</td><td>DUK_TYPE_MASK_POINTER</td><td>opaque pointer (void *)</td></tr>
@ -172,17 +172,18 @@ come out.  Don't rely on NaNs preserving their exact form.</p>

 <h2 id="type-string">String</h2>

-<p>The <b>string</b> type is an arbitrary byte sequence of a certain length which
-may contain internal NUL (0x00) values.  Strings are always automatically NUL
-terminated for C coding convenience.  The NUL terminator is not counted as part
-of the string length.  For instance, the string <code>"foo"</code> has byte length 3
-and is stored in memory as <code>{ 'f', 'o', 'o', '\0' }</code>.  Because of the
-guaranteed NUL termination, strings can always be pointed to using a simple
-<code>const char *</code> as long as internal NULs are not an issue; if they are,
-the explicit byte length of the string can be queried with the API.  Calling code
-can refer directly to the string data held by Duktape.  Such string data
-pointers are valid (and stable) for as long as a string is reachable in the
-Duktape heap.</p>
+<p>The <b>string</b> stack type is used to represent both plain strings and
+plain Symbols (introduced in ES2015).  A string is an arbitrary byte sequence
+of a certain length which may contain internal NUL (0x00) values.  Strings are
+always automatically NUL terminated for C coding convenience.  The NUL terminator
+is not counted as part of the string length.  For instance, the string
+<code>"foo"</code> has byte length 3 and is stored in memory as
+<code>{ 'f', 'o', 'o', '\0' }</code>.  Because of the guaranteed NUL termination,
+strings can always be pointed to using a simple <code>const char *</code> as long
+as internal NULs are not an issue for the application; if they are, the explicit
+byte length of the string can be queried with the API.  Calling code can refer
+directly to the string data held by Duktape.  Such string data pointers are valid
+(and stable) for as long as a string is reachable in the Duktape heap.</p>

 <p>Strings are <a href="http://en.wikipedia.org/wiki/String_interning">interned</a>
 for efficiency: only a single copy of a certain string ever exists at a time.
@ -212,13 +213,7 @@ characters as-is which is convenient for C code.  For example:</p>
 can be represented with UTF-8, and codepoints above that up to full 32 bits
 can be represented with
 <a href="http://en.wikipedia.org/wiki/UTF-8#Extending_from_31_bit_to_36_bit_range">extended UTF-8</a>.
-Non-standard strings are used for storing internal object properties; using a
-non-standard string ensures that such properties never conflict with properties
-accessible using standard Ecmascript strings.  Non-standard strings can be given
-to Ecmascript built-in functions, but since behavior may not be exactly
-specified, results may vary.</p>
-
-<p>The extended UTF-8 encoding used by Duktape is described in the table below.
+The extended UTF-8 encoding used by Duktape is described in the table below.
 The leading byte is shown in binary (with "x" marking data bits) while
 continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</p>

@ -241,8 +236,22 @@ continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</
 the leading byte will be <code>0xFE</code> which conflicts with Unicode byte order
 marker encoding.  This is not a practical concern in Duktape's internal use.</p>

-<p>The leading <code>0xFF</code> byte never appears in Duktape's extended UTF-8
-encoding, and is used to implement <a href="#internalproperties">internal properties</a>.</p>
+<p>Finally, invalid extended UTF-8 byte sequences are used for special purposes
+such as representing Symbol values.  Invalid extened UTF-8/CESU-8 byte sequences
+never conflict with standard Ecmascript strings (which are CESU-8) and will remain
+cleanly separated within object property tables.  For more information see
+<a href="#symbols">Symbols</a> and
+<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>.</p>
+
+<p>Strings with invalid extended UTF-8 sequences can be pushed on the value stack
+from C code and also passed to Ecmascript functions, with two caveats:</p>
+<ul>
+<li>If the invalid byte sequence matches the internal format used to represent
+    Symbols, the value will appear as a Symbol rather than a string for Ecmascript
+    code.  For example, <code>typeof val</code> will be <code>symbol</code>.</li>
+<li>Behavior of string operations on invalid byte sequences if not well defined
+    and results may vary, and change even in minor Duktape version updates.</li>
+</ul>

 <h2 id="type-object">Object</h2>

--- a/website/guide/symbols.html
+++ b/website/guide/symbols.html
@ -0,0 +1,58 @@
+<a name="internalproperties"></a>  <!-- legacy links -->
+<h1 id="symbols">Symbols</h1>
+
+<p>Duktape supports ES2015 Symbols and also provides a Duktape specific
+<b>hidden Symbol</b> variant similar to internal strings in Duktape 1.x.
+Hidden Symbols differ from ES2015 Symbols in that they're hidden from
+ordinary Ecmascript code: they can't be created from Ecmascript code,
+won't be enumerated or JSON-serialized, and won't be returned even from
+<code>Object.getOwnPropertyNames()</code>.  Properties with hidden Symbol
+keys can only be accessed by a direct property read/write when holding a
+reference to a hidden Symbol.</p>
+
+<p>Duktape uses hidden Symbols for various implementation specific purposes,
+such as storing an object's finalizer reference.  User code can also use hidden
+Symbols for its own purposes, e.g. to store hidden state in objects.  User code
+should never try to access Duktape's hidden Symbol keyed properties: the set of
+such properties can change arbitrarily between versions.</p>
+
+<p>Symbols of all kinds are represented internally using byte sequences which
+are invalid UTF-8; see
+<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>
+for the current formats in use.  When C code pushes a string using e.g.
+<code>duk_push_string()</code> and the byte sequence matches an internal
+Symbol format, the string value is automatically interpreted as a Symbol.</p>
+
+<div class="note">
+Note that the internal UTF-8 byte sequences cannot be created from Ecmascript
+code as a valid Ecmascript string.  For example, a hidden Symbol might be
+represented using <code>\xFFxyz</code>, i.e. the byte sequence
+<code>ff 78 79 7a</code>, while the Ecmascript string <code>"\u00ffxyz"</code>
+would be represented as the CESU-8 bytes <code>c3 bf 78 79 7a</code> in memory.
+</div>
+
+<p>Creating a Symbol is straightforward from C code:</p>
+<pre class="c-code">
+/* Create a hidden Symbol which can then be used to read/write properties.
+ * The Symbol can be passed on to Ecmascript code like any other string or
+ * Symbol.  Terminating a string literal after a hex escape is safest to
+ * avoid some ambiguous cases like "\xffab".
+ */
+duk_push_string(ctx, "\xff" "mySymbol");
+</pre>
+
+<p>For more discussion on C string hex escaping, see
+<a href="https://github.com/svaarala/duktape/blob/master/misc/c_hex_esc.c">c_hex_esc.c</a>.</p>
+
+<p>Hidden Symbols cannot be created from Ecmascript code using the default
+built-ins alone.  Standard ES2015 Symbols can be created using the
+<code>Symbol</code> built-in, e.g. as <code>Symbol.for('foo')</code>.
+When sandboxing, ensure that application C bindings don't accidentally provide
+a mechanism to create hidden Symbols by e.g. converting an input buffer as-is
+to a string without applying an encoding.</p>
+
+<p>There's currently no special access control for properties with hidden
+Symbol keys: if user code has access to the Symbol, it can read/write the
+property value.  This will most likely change in future major versions so
+that Ecmascript code cannot access a property with a hidden Symbol key,
+even when holding a reference to the hidden Symbol value.</p>