Browse Source

Merge pull request #1202 from svaarala/symbol-documentation-updates

Documentation updates for initial symbol support
pull/1221/head
Sami Vaarala 8 years ago
committed by GitHub
parent
commit
f117807f29
  1. 59
      doc/sandboxing.rst
  2. 7
      website/api/duk_del_prop.yaml
  3. 21
      website/api/duk_enum.yaml
  4. 7
      website/api/duk_get_prop.yaml
  5. 2
      website/api/duk_get_string.yaml
  6. 2
      website/api/duk_get_type.yaml
  7. 2
      website/api/duk_get_type_mask.yaml
  8. 7
      website/api/duk_has_prop.yaml
  9. 2
      website/api/duk_is_string.yaml
  10. 9
      website/api/duk_push_string.yaml
  11. 7
      website/api/duk_put_prop.yaml
  12. 2
      website/api/duk_require_string.yaml
  13. 4
      website/api/duk_to_string.yaml
  14. 7
      website/api/symbols-are-strings.html
  15. 4
      website/buildsite.py
  16. 12
      website/guide/custombehavior.html
  17. 110
      website/guide/internalproperties.html
  18. 2
      website/guide/intro.html
  19. 51
      website/guide/stacktypes.html
  20. 58
      website/guide/symbols.html

59
doc/sandboxing.rst

@ -25,6 +25,11 @@ carefully written with these sandboxing goals in mind.
This document describes best practices for Duktape sandboxing.
There's a YAML config file with some useful default options for sandboxing,
and comments on what options you might consider:
* ``config/examples/security_sensitive.yaml``
.. note:: This document described the current status of sandboxing features
which is not yet a complete solution.
@ -88,12 +93,12 @@ Verbose error messages may cause sandboxing security issues:
* When ``DUK_USE_PARANOID_ERRORS`` is not set, offending object/key is
summarized in an error message of some rejected property operations.
If object keys contain potentially sensitive information, you should
enable this option.
enable this option. Disable ``DUK_USE_PARANOID_ERRORS``.
* When stack traces are enabled an attacker may gain useful information from
the stack traces. Further, access to the internal ``_Tracedata`` property
provides access to call chain functions even when references to them are not
available directly.
available directly. Disable ``DUK_USE_TRACEBACKS``.
Replace the global object
-------------------------
@ -124,30 +129,32 @@ Risky bindings:
finalizers are a sandboxing risk. It's also possible to override or unset a
finalizer which the sandbox relies on.
* Since Duktape 2.x buffer bindings no longer provide a way create "internal"
strings which allow access to internal properties. See separate section on
internal properties.
* Since Duktape 2.x buffer bindings no longer provide a way create hidden
Symbols (called "internal strings" in Duktape 1.x) which allow access to
internal properties. See separate section on internal properties.
You should also:
* Remove the ``require`` module loading function in the global object.
If you need module loading in the sandbox, it's better to write a specific,
* Remove the ``require`` module loading function in the global object
(since Duktape 2.x it's no longer present by default). If you need
module loading in the sandbox, it's better to write a specific,
constrained module loader for that environment.
Restrict access to internal properties
--------------------------------------
Internal properties are intended to be used by Duktape and user C code
to store "hidden properties" in objects. The mechanism currently relies on
using strings whose internal representation contains invalid UTF-8/CESU-8 data,
in concrete terms, a 0xFF prefix. These are called "internal strings". Since
Internal properties are used by Duktape and user C code to store "hidden
properties" in objects. The mechanism currently relies on "hidden Symbols"
(called "internal keys" or "internal strings" in Duktape 1.x). These are
strings whose internal representation contains invalid UTF-8/CESU-8 data
(see ``doc/symbols.rst`` for description of the current formats). Because
all standard Ecmascript strings are represented as CESU-8, such strings cannot
normally be created by Ecmascript code. The properties are also never
enumerated or otherwise exposed to Ecmascript code, so that the only way to
access them from Ecmascript code is to have access to an "internal string"
acting as the property key.
enumerated or otherwise exposed to Ecmascript code (not even by
``Object.getOwnPropertySymbols()``) so that the only way to access them from
Ecmascript code is to have access to a hidden Symbol acting as the property key.
C code can create internal keys very easily, which can provide a way to access
C code can create hidden Symbols very easily, which can provide a way to access
internal properties. For example::
// Assume an application native binding returns an internal key pushed
@ -165,18 +172,18 @@ be modified, concrete security issues may arise. For instance, if an internal
property stores a raw pointer to a native handle (such as a ``FILE *``),
changing its value can lead to a potentially exploitable segfault.
Since Duktape 2.x Ecmascript code cannot create internal keys using standard
Ecmascript code and the built-in bindings alone. To prevent access to internal
keys, ensure that no native bindings provided by the sandboxing environment
Since Duktape 2.x Ecmascript code cannot create hidden Symbols using standard
Ecmascript code and the built-in bindings alone. To prevent access to hidden
Symbols, ensure that no native bindings provided by the sandboxing environment
accidentally return such strings. The easiest way to ensure this is to make
sure all strings pushed on the value stack are properly CESU-8 encoded.
It's also good practice to ensure that sandboxed code has minimal access to
objects with potentially dangerous keys like raw pointers.
objects with potentially dangerous properties like raw pointers.
.. note:: There's a future work issue, potentially included in Duktape 2.x,
.. note:: There's a future work issue, potentially included in Duktape 3.x,
for preventing access to internal properties from Ecmascript code
even when using the correct internal key.
even when using the correct hidden Symbol as a lookup key.
Restrict access to function instances
-------------------------------------
@ -234,9 +241,9 @@ string methods with a plain base value::
print("foo".toUpperCase());
Duktape 1.0 will use the original built-in prototype functions in these
inheritance situations. There is currently no way to replace these built-ins
so that the replacements would be used for instead (see
Duktape uses the original built-in prototype functions in these inheritance
situations. There is currently no way to replace these built-ins so that the
replacements would be used for instead (see
``test-dev-sandbox-prototype-limitation.js``).
As a result, sandboxed code will always have access to the built-in prototype
@ -261,7 +268,7 @@ objects which participate in implicit inheritance:
through explicit construction (if constructors visible) or implicitly
through internal errors, e.g. ``/foo\123/`` which throws a SyntaxError
* ``ArrayBuffer.prototype``: through buffer values (if available); since
* ``Uint8Array.prototype``: through buffer values (if available); since
there is no buffer literal, user cannot construct buffer values directly
* ``Duktape.Pointer.prototype`` through pointer values (if available); since
@ -367,7 +374,7 @@ vulnerabilities. To avoid such issues:
must match; patch version may vary as bytecode format doesn't change in
patch versions.
* Ensure integrity of bytecode being loaded e.g. by checksumming.
* Ensure integrity of bytecode being loaded e.g. by checksumming or signing.
* If bytecode is transported over the network or other unsafe media,
use cryptographic means (keyed hashing, signatures, or similar) to

7
website/api/duk_del_prop.yaml

@ -33,9 +33,10 @@ summary: |
<ul>
<li>The target value is automatically coerced to an object. However, this
object is a temporary one, so deleting its properties is not very useful.</li>
<li>The <code>key</code> argument is internally coerced to a string. There is
an internal fast path for arrays and numeric indices which avoids an
explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
<li>The <code>key</code> argument is internally coerced using ToPropertyKey()
coercion which results in a string or a Symbol. There is an internal
fast path for arrays and numeric indices which avoids an explicit string
coercion, so use a numeric <code>key</code> when applicable.</li>
</ul>
<p>If the target is a Proxy object which implements the <code>deleteProperty</code>

21
website/api/duk_enum.yaml

@ -19,9 +19,20 @@ summary: |
properties are enumerated</td>
</tr>
<tr>
<td>DUK_ENUM_INCLUDE_INTERNAL</td>
<td>Enumerate also internal properties, by default internal properties
are not enumerated</td>
<td>DUK_ENUM_INCLUDE_HIDDEN</td>
<td>Enumerate also hidden Symbols, by default hidden Symbols are not
enumerated. Use together with <code>DUK_ENUM_INCLUDE_SYMBOLS</code>.
In Duktape 1.x this flag was called <code>DUK_ENUM_INCLUDE_INTERNAL</code>.</td>
</tr>
<tr>
<td>DUK_ENUM_INCLUDE_SYMBOLS</td>
<td>Include Symbols in the enumeration result. Hidden Symbols are not
included unless <code>DUK_ENUM_INCLUDE_HIDDEN</code> is specified.</td>
</tr>
<tr>
<td>DUK_ENUM_EXCLUDE_STRINGS</td>
<td>Exclude strings from the enumeration result. By default strings are
included.</td>
</tr>
<tr>
<td>DUK_ENUM_OWN_PROPERTIES_ONLY</td>
@ -39,6 +50,10 @@ summary: |
enumeration result rather than per inheritance level, this has the
effect of sorting array indices (even when inherited)</td>
</tr>
<tr>
<td>DUK_ENUM_NO_PROXY_BEHAVIOR</td>
<td>Enumerate a Proxy object itself without invoking Proxy behaviors.</td>
</tr>
</table>
<p>Without any flags the enumeration behaves like <code>for-in</code>:

7
website/api/duk_get_prop.yaml

@ -34,9 +34,10 @@ summary: |
<li>The target value is automatically coerced to an object. For instance,
a string is converted to a <code>String</code> and you can access its
<code>"length"</code> property.</li>
<li>The <code>key</code> argument is internally coerced to a string. There is
an internal fast path for arrays and numeric indices which avoids an
explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
<li>The <code>key</code> argument is internally coerced using ToPropertyKey()
coercion which results in a string or a Symbol. There is an internal
fast path for arrays and numeric indices which avoids an explicit string
coercion, so use a numeric <code>key</code> when applicable.</li>
</ul>
<p>If the target is a Proxy object which implements the <code>get</code> trap,

2
website/api/duk_get_string.yaml

@ -21,6 +21,8 @@ summary: |
this differs from how buffer data pointers are handled (for technical reasons).
</div>
<div include="symbols-are-strings.html" />
example: |
const char *buf;

2
website/api/duk_get_type.yaml

@ -11,6 +11,8 @@ summary: |
<code>DUK_TYPE_xxx</code> or <code>DUK_TYPE_NONE</code> if <code>idx</code>
is invalid.</p>
<div include="symbols-are-strings.html" />
example: |
if (duk_get_type(ctx, -3) == DUK_TYPE_NUMBER) {
printf("value is a number\n");

2
website/api/duk_get_type_mask.yaml

@ -15,6 +15,8 @@ summary: |
(the <code><a href="#duk_check_type_mask">duk_check_type_mask()</a></code> call is
even more convenient for this purpose).</p>
<div include="symbols-are-strings.html" />
example: |
if (duk_get_type_mask(ctx, -3) & (DUK_TYPE_MASK_STRING |
DUK_TYPE_MASK_NUMBER)) {

7
website/api/duk_has_prop.yaml

@ -27,9 +27,10 @@ summary: |
<li>The target value is automatically coerced to an object. For instance,
a string is converted to a <code>String</code> and you can check for its
<code>"length"</code> property.</li>
<li>The <code>key</code> argument is internally coerced to a string. There is
an internal fast path for arrays and numeric indices which avoids an
explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
<li>The <code>key</code> argument is internally coerced using ToPropertyKey()
coercion which results in a string or a Symbol. There is an internal
fast path for arrays and numeric indices which avoids an explicit string
coercion, so use a numeric <code>key</code> when applicable.</li>
</ul>
<p>If the target is a Proxy object which implements the <code>has</code> trap,

2
website/api/duk_is_string.yaml

@ -10,6 +10,8 @@ summary: |
<p>Returns 1 if value at <code>idx</code> is a string, otherwise
returns 0. If <code>idx</code> is invalid, also returns 0.</p>
<div include="symbols-are-strings.html" />
example: |
if (duk_is_string(ctx, -3)) {
/* ... */

9
website/api/duk_push_string.yaml

@ -17,7 +17,14 @@ summary: |
to the stack and <code>NULL</code> is returned. This behavior differs from
<code><a href="#duk_push_lstring">duk_push_lstring</a></code> on purpose.</p>
<p>C code should normally only push valid CESU-8 strings to the stack.</p>
<div class="note">
C code should normally only push valid CESU-8 strings to the stack.
Some invalid CESU-8/UTF-8 byte sequences are reserved for special
uses such as representing Symbol values. When you push such an invalid
byte sequence, the value on the value stack will behave like a string for
C code but will appear as a <code>Symbol</code> for Ecmascript code.
See <a href="guide.html#symbols">Symbols</a> for more discussion.
</div>
<p>If input string might contain internal NUL characters, use
<code><a href="#duk_push_lstring">duk_push_lstring()</a></code> instead.</p>

7
website/api/duk_put_prop.yaml

@ -35,9 +35,10 @@ summary: |
transitory objects (see
<a href="http://www.ecma-international.org/ecma-262/5.1/#sec-8.7.2">PutValue (V, W)</a>,
step 7 of the special [[Put]] variant).</li>
<li>The <code>key</code> argument is internally coerced to a string. There is
an internal fast path for arrays and numeric indices which avoids an
explicit string coercion, so use a numeric <code>key</code> when applicable.</li>
<li>The <code>key</code> argument is internally coerced using ToPropertyKey()
coercion which results in a string or a Symbol. There is an internal
fast path for arrays and numeric indices which avoids an explicit string
coercion, so use a numeric <code>key</code> when applicable.</li>
</ul>
<p>If the target is a Proxy object which implements the <code>set</code> trap,

2
website/api/duk_require_string.yaml

@ -11,6 +11,8 @@ summary: |
but throws an error if the value at <code>idx</code> is not a string
or if the index is invalid.</p>
<div include="symbols-are-strings.html" />
example: |
const char *buf;

4
website/api/duk_to_string.yaml

@ -14,6 +14,10 @@ summary: |
<div include="ref-custom-type-coercion.html" />
<div class="note">
ToString() coercion for a Symbol value causes a TypeError.
</div>
<div class="note">
In Duktape 2.x plain buffers mimic ArrayBuffer objects and will usually
ToString() coerce to "[object ArrayBuffer]". To convert buffer or buffer

7
website/api/symbols-are-strings.html

@ -0,0 +1,7 @@
<div class="note">
Symbol values are visible in the C API as strings, e.g. <code>duk_is_string()</code>
is true (this behavior is similar to Duktape 1.x internal strings). Symbols are
still an experimental feature. For now, you can distinguish Symbols from ordinary
strings by looking at their initial byte, see
<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>.
</div>

4
website/buildsite.py

@ -955,7 +955,7 @@ def generateGuide():
navlinks.append(['#finalization', 'Finalization'])
navlinks.append(['#coroutines', 'Coroutines'])
navlinks.append(['#virtualproperties', 'Virtual properties'])
navlinks.append(['#internalproperties', 'Internal properties'])
navlinks.append(['#symbols', 'Symbols'])
navlinks.append(['#bytecodedumpload', 'Bytecode dump/load'])
navlinks.append(['#threading', 'Threading'])
navlinks.append(['#sandboxing', 'Sandboxing'])
@ -1006,7 +1006,7 @@ def generateGuide():
res += processRawDoc('guide/finalization.html')
res += processRawDoc('guide/coroutines.html')
res += processRawDoc('guide/virtualproperties.html')
res += processRawDoc('guide/internalproperties.html')
res += processRawDoc('guide/symbols.html')
res += processRawDoc('guide/bytecodedumpload.html')
res += processRawDoc('guide/threading.html')
res += processRawDoc('guide/sandboxing.html')

12
website/guide/custombehavior.html

@ -9,13 +9,13 @@ other relevant specifications.</p>
access to Duktape specific features. Also the buffer, pointer, and lightfunc
types are custom.</p>
<h2>Internal properties</h2>
<h2>Hidden Symbols</h2>
<p>Objects may have <a href="#internalproperties">internal properties</a> which
are essentially hidden from normal code: they won't be enumerated or returned
even by e.g. <code>Object.getOwnPropertyNames()</code>. Ordinary Ecmascript
code cannot refer to such properties because the property keys intentionally
use invalid UTF-8 (<code>0xFF</code> prefix byte).</p>
<p>Objects may have properties with <a href="#symbols">hidden Symbol</a> keys.
These are similar to ES2015 Symbols but won't be enumerated or returned from even
<code>Object.getOwnPropertySymbols()</code>. Ordinary Ecmascript code cannot
refer to such properties because the keys intentionally use an invalid (extended)
UTF-8 representation.</p>
<h2>"use duk notail" directive</h2>

110
website/guide/internalproperties.html

@ -1,110 +0,0 @@
<h1 id="internalproperties">Internal properties</h1>
<p>Duktape supports non-standard <b>internal properties</b> which are
essentially hidden from user code. They can only be accessed by a
direct property read/write, and are never enumerated, serialized by
<code>JSON.stringify()</code> or returned from built-in functions such
as <code>Object.getOwnPropertyNames()</code>.</p>
<p>Duktape uses internal properties for various implementation specific
purposes, such as storing an object's finalizer reference, the internal
value held by <code>Number</code> and <code>Date</code>, etc. User code
can also use internal properties for its own purposes, e.g. to
store "hidden state" in objects, as long as the property names never
conflict with current or future Duktape internal keys (this is ensured
by the naming convention described below). User code should never try
to access Duktape's internal properties: the set of internal properties
used can change arbitrarily between versions.</p>
<p>Internal properties are distinguished from other properties by the
property key: if the byte representation of a property key begins with
a <code>0xFF</code> byte Duktape automatically treats the property as an
internal property. Such a string is referred to as an <b>internal string</b>.
The initial byte makes the key invalid UTF-8 (even invalid extended UTF-8),
which ensures that (1) internal properties never conflict with normal Unicode
property names and that (2) ordinary Ecmascript code cannot accidentally access
them. The initial prefix byte is often represented by an underscore in
documentation for readability, e.g. <code>_Value</code> is used instead
of <code>\xFFValue</code>.</p>
<p>The following naming convention is used. The convention ensures that
Duktape and user internal properties never conflict:</p>
<table>
<tr>
<th>Type</th>
<th>Example (C)</th>
<th>Bytes</th>
<th>Description</th>
</tr>
<tr>
<td>Duktape</td>
<td><code>"\xFF" "Value"</code></td>
<td><code>ff 56 61 6c 75 65</code></td>
<td>First character is always uppercase, followed by <code>[a-z0-9_]*</code>.</td>
</tr>
<tr>
<td>User</td>
<td><code>"\xFF" "myprop"</code></td>
<td><code>ff 6d 79 70 72 6f 70</code></td>
<td>First character must not be uppercase to avoid conflict with
current or future Duktape keys.</td>
</tr>
<tr>
<td>User</td>
<td><code>"\xFF\xFF" &lt;arbitrary&gt;</code></td>
<td><code>ff ff &lt;arbitrary&gt;</code></td>
<td>Double <code>0xFF</code> prefix followed by arbitrary data.</td>
</tr>
</table>
<p>In some cases the internal key needed by user code is not static, e.g.
it can be dynamically generated by serializing a pointer or perhaps the
bytes are from an external source. In this case it is safest to use
two <code>0xFF</code> prefix bytes as the example above shows.</p>
<div class="note">
Note that the <code>0xFF</code> prefix cannot be expressed as a valid
Ecmascript string. For example, the internal string <code>\xFFxyz</code>
would appear as the bytes <code>ff 78 79 7a</code> in memory, while the
Ecmascript string <code>"\u00ffxyz"</code> would be represented as the
CESU-8 bytes <code>c3 bf 78 79 7a</code> in memory.
</div>
<p>Creating an internal string is easy from C code:</p>
<pre class="c-code">
/* Create an internal string, which can then be used to read/write internal
* properties, and can be passed on to Ecmascript code like any other string.
* Terminating a string literal after a hex escape is safest to avoid some
* ambiguous cases like "\xffab".
*/
duk_push_string(ctx, "\xff" "myprop");
</pre>
<p>For more discussion on C string hex escaping, see
<a href="https://github.com/svaarala/duktape/blob/master/misc/c_hex_esc.c">c_hex_esc.c</a>.</p>
<p>Internal strings cannot be created from Ecmascript code using the default
built-ins alone. However, application code can easily add such a binding
using the C API which must be considered in sandboxing.</p>
<p>There's no special access control for internal properties: if user code has
access to the property name (string), it can read/write the property value.
The default Ecmascript built-ins don't provide a way of creating an internal
string: buffer-to-string coercions always involve an encoding such as UTF-8
which will reject or replace invalid byte sequences. However, C code can
easily create internal strings. When sandboxing, ensure that custom C bindings
don't accidentally provide a mechanism to create internal strings by e.g.
converting a buffer as-is to a string.</p>
<p>As a concrete example the internal value of a <code>Date</code> instance
can be accessed as follows:</p>
<pre class="ecmascript-code">
// Print the internal timestamp of a Date instance. Assumes a hypothetical
// rawBufferToString() custom C binding which takes an input buffer and pushes
// the bytes as-is as a string using duk_push_lstring(), thus creating an
// internal string.
var key = rawBufferToString(Duktape.dec('hex', 'ff56616c7565')); // \xFFValue
var dt = new Date(123456);
print('internal value is:', dt[key]); // prints 123456
</pre>

2
website/guide/intro.html

@ -177,7 +177,7 @@ wrappers are discussed in detail.</p>
<a href="#finalization">Finalization</a>,
<a href="#coroutines">Coroutines</a>,
<a href="#virtualproperties">Virtual properties</a>,
<a href="#internalproperties">Internal properties</a>,
<a href="#symbols">Symbols</a>,
<a href="#bytecodedumpload">Bytecode dump/load</a>,
<a href="#threading">Threading</a>,
<a href="#sandboxing">Sandboxing</a>.

51
website/guide/stacktypes.html

@ -10,7 +10,7 @@
<tr><td><a href="#type-null">null</a></td><td>DUK_TYPE_NULL</td><td>DUK_TYPE_MASK_NULL</td><td><code>null</code></td></tr>
<tr><td><a href="#type-boolean">boolean</a></td><td>DUK_TYPE_BOOLEAN</td><td>DUK_TYPE_MASK_BOOLEAN</td><td><code>true</code> and <code>false</code></td></tr>
<tr><td><a href="#type-number">number</a></td><td>DUK_TYPE_NUMBER</td><td>DUK_TYPE_MASK_NUMBER</td><td>IEEE double</td></tr>
<tr><td><a href="#type-string">string</a></td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable (plain) string</td></tr>
<tr><td><a href="#type-string">string</a></td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable (plain) string or (plain) Symbol</td></tr>
<tr><td><a href="#type-object">object</a></td><td>DUK_TYPE_OBJECT</td><td>DUK_TYPE_MASK_OBJECT</td><td>object with properties</td></tr>
<tr><td><a href="#type-buffer">buffer</a></td><td>DUK_TYPE_BUFFER</td><td>DUK_TYPE_MASK_BUFFER</td><td>mutable (plain) byte buffer, fixed/dynamic/external; mimics an ArrayBuffer</td></tr>
<tr><td><a href="#type-pointer">pointer</a></td><td>DUK_TYPE_POINTER</td><td>DUK_TYPE_MASK_POINTER</td><td>opaque pointer (void *)</td></tr>
@ -172,17 +172,18 @@ come out. Don't rely on NaNs preserving their exact form.</p>
<h2 id="type-string">String</h2>
<p>The <b>string</b> type is an arbitrary byte sequence of a certain length which
may contain internal NUL (0x00) values. Strings are always automatically NUL
terminated for C coding convenience. The NUL terminator is not counted as part
of the string length. For instance, the string <code>"foo"</code> has byte length 3
and is stored in memory as <code>{ 'f', 'o', 'o', '\0' }</code>. Because of the
guaranteed NUL termination, strings can always be pointed to using a simple
<code>const char *</code> as long as internal NULs are not an issue; if they are,
the explicit byte length of the string can be queried with the API. Calling code
can refer directly to the string data held by Duktape. Such string data
pointers are valid (and stable) for as long as a string is reachable in the
Duktape heap.</p>
<p>The <b>string</b> stack type is used to represent both plain strings and
plain Symbols (introduced in ES2015). A string is an arbitrary byte sequence
of a certain length which may contain internal NUL (0x00) values. Strings are
always automatically NUL terminated for C coding convenience. The NUL terminator
is not counted as part of the string length. For instance, the string
<code>"foo"</code> has byte length 3 and is stored in memory as
<code>{ 'f', 'o', 'o', '\0' }</code>. Because of the guaranteed NUL termination,
strings can always be pointed to using a simple <code>const char *</code> as long
as internal NULs are not an issue for the application; if they are, the explicit
byte length of the string can be queried with the API. Calling code can refer
directly to the string data held by Duktape. Such string data pointers are valid
(and stable) for as long as a string is reachable in the Duktape heap.</p>
<p>Strings are <a href="http://en.wikipedia.org/wiki/String_interning">interned</a>
for efficiency: only a single copy of a certain string ever exists at a time.
@ -212,13 +213,7 @@ characters as-is which is convenient for C code. For example:</p>
can be represented with UTF-8, and codepoints above that up to full 32 bits
can be represented with
<a href="http://en.wikipedia.org/wiki/UTF-8#Extending_from_31_bit_to_36_bit_range">extended UTF-8</a>.
Non-standard strings are used for storing internal object properties; using a
non-standard string ensures that such properties never conflict with properties
accessible using standard Ecmascript strings. Non-standard strings can be given
to Ecmascript built-in functions, but since behavior may not be exactly
specified, results may vary.</p>
<p>The extended UTF-8 encoding used by Duktape is described in the table below.
The extended UTF-8 encoding used by Duktape is described in the table below.
The leading byte is shown in binary (with "x" marking data bits) while
continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</p>
@ -241,8 +236,22 @@ continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</
the leading byte will be <code>0xFE</code> which conflicts with Unicode byte order
marker encoding. This is not a practical concern in Duktape's internal use.</p>
<p>The leading <code>0xFF</code> byte never appears in Duktape's extended UTF-8
encoding, and is used to implement <a href="#internalproperties">internal properties</a>.</p>
<p>Finally, invalid extended UTF-8 byte sequences are used for special purposes
such as representing Symbol values. Invalid extened UTF-8/CESU-8 byte sequences
never conflict with standard Ecmascript strings (which are CESU-8) and will remain
cleanly separated within object property tables. For more information see
<a href="#symbols">Symbols</a> and
<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>.</p>
<p>Strings with invalid extended UTF-8 sequences can be pushed on the value stack
from C code and also passed to Ecmascript functions, with two caveats:</p>
<ul>
<li>If the invalid byte sequence matches the internal format used to represent
Symbols, the value will appear as a Symbol rather than a string for Ecmascript
code. For example, <code>typeof val</code> will be <code>symbol</code>.</li>
<li>Behavior of string operations on invalid byte sequences if not well defined
and results may vary, and change even in minor Duktape version updates.</li>
</ul>
<h2 id="type-object">Object</h2>

58
website/guide/symbols.html

@ -0,0 +1,58 @@
<a name="internalproperties"></a> <!-- legacy links -->
<h1 id="symbols">Symbols</h1>
<p>Duktape supports ES2015 Symbols and also provides a Duktape specific
<b>hidden Symbol</b> variant similar to internal strings in Duktape 1.x.
Hidden Symbols differ from ES2015 Symbols in that they're hidden from
ordinary Ecmascript code: they can't be created from Ecmascript code,
won't be enumerated or JSON-serialized, and won't be returned even from
<code>Object.getOwnPropertyNames()</code>. Properties with hidden Symbol
keys can only be accessed by a direct property read/write when holding a
reference to a hidden Symbol.</p>
<p>Duktape uses hidden Symbols for various implementation specific purposes,
such as storing an object's finalizer reference. User code can also use hidden
Symbols for its own purposes, e.g. to store hidden state in objects. User code
should never try to access Duktape's hidden Symbol keyed properties: the set of
such properties can change arbitrarily between versions.</p>
<p>Symbols of all kinds are represented internally using byte sequences which
are invalid UTF-8; see
<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>
for the current formats in use. When C code pushes a string using e.g.
<code>duk_push_string()</code> and the byte sequence matches an internal
Symbol format, the string value is automatically interpreted as a Symbol.</p>
<div class="note">
Note that the internal UTF-8 byte sequences cannot be created from Ecmascript
code as a valid Ecmascript string. For example, a hidden Symbol might be
represented using <code>\xFFxyz</code>, i.e. the byte sequence
<code>ff 78 79 7a</code>, while the Ecmascript string <code>"\u00ffxyz"</code>
would be represented as the CESU-8 bytes <code>c3 bf 78 79 7a</code> in memory.
</div>
<p>Creating a Symbol is straightforward from C code:</p>
<pre class="c-code">
/* Create a hidden Symbol which can then be used to read/write properties.
* The Symbol can be passed on to Ecmascript code like any other string or
* Symbol. Terminating a string literal after a hex escape is safest to
* avoid some ambiguous cases like "\xffab".
*/
duk_push_string(ctx, "\xff" "mySymbol");
</pre>
<p>For more discussion on C string hex escaping, see
<a href="https://github.com/svaarala/duktape/blob/master/misc/c_hex_esc.c">c_hex_esc.c</a>.</p>
<p>Hidden Symbols cannot be created from Ecmascript code using the default
built-ins alone. Standard ES2015 Symbols can be created using the
<code>Symbol</code> built-in, e.g. as <code>Symbol.for('foo')</code>.
When sandboxing, ensure that application C bindings don't accidentally provide
a mechanism to create hidden Symbols by e.g. converting an input buffer as-is
to a string without applying an encoding.</p>
<p>There's currently no special access control for properties with hidden
Symbol keys: if user code has access to the Symbol, it can read/write the
property value. This will most likely change in future major versions so
that Ecmascript code cannot access a property with a hidden Symbol key,
even when holding a reference to the hidden Symbol value.</p>
Loading…
Cancel
Save