mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
390 lines
19 KiB
390 lines
19 KiB
<h1 id="stacktypes">Stack types</h1>
|
|
|
|
<h2 id="type-table">Overview</h2>
|
|
|
|
<div class="table-wrap">
|
|
<table>
|
|
<tr class="header"><th>Type</th><th>Type constant</th><th>Type mask constant</th><th>Description</th></tr>
|
|
<tr><td><a href="#type-none">(none)</a></td><td>DUK_TYPE_NONE</td><td>DUK_TYPE_MASK_NONE</td><td>no type (missing value, invalid index, etc)</td></tr>
|
|
<tr><td><a href="#type-undefined">undefined</a></td><td>DUK_TYPE_UNDEFINED</td><td>DUK_TYPE_MASK_UNDEFINED</td><td><code>undefined</code></td></tr>
|
|
<tr><td><a href="#type-null">null</a></td><td>DUK_TYPE_NULL</td><td>DUK_TYPE_MASK_NULL</td><td><code>null</code></td></tr>
|
|
<tr><td><a href="#type-boolean">boolean</a></td><td>DUK_TYPE_BOOLEAN</td><td>DUK_TYPE_MASK_BOOLEAN</td><td><code>true</code> and <code>false</code></td></tr>
|
|
<tr><td><a href="#type-number">number</a></td><td>DUK_TYPE_NUMBER</td><td>DUK_TYPE_MASK_NUMBER</td><td>IEEE double</td></tr>
|
|
<tr><td><a href="#type-string">string</a></td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable (plain) string or (plain) Symbol</td></tr>
|
|
<tr><td><a href="#type-object">object</a></td><td>DUK_TYPE_OBJECT</td><td>DUK_TYPE_MASK_OBJECT</td><td>object with properties</td></tr>
|
|
<tr><td><a href="#type-buffer">buffer</a></td><td>DUK_TYPE_BUFFER</td><td>DUK_TYPE_MASK_BUFFER</td><td>mutable (plain) byte buffer, fixed/dynamic/external; mimics an Uint8Array</td></tr>
|
|
<tr><td><a href="#type-pointer">pointer</a></td><td>DUK_TYPE_POINTER</td><td>DUK_TYPE_MASK_POINTER</td><td>opaque pointer (void *)</td></tr>
|
|
<tr><td><a href="#type-lightfunc">lightfunc</a></td><td>DUK_TYPE_LIGHTFUNC</td><td>DUK_TYPE_MASK_LIGHTFUNC</td><td>plain Duktape/C function pointer (non-object); mimics a Function</td></tr>
|
|
</table>
|
|
</div>
|
|
|
|
<h2 id="type-memory-allocations">Memory allocations</h2>
|
|
|
|
<p>The following stack types involve additional heap allocations:</p>
|
|
|
|
<ul>
|
|
<li>String: a single allocation contains a combined heap and string header,
|
|
followed by the immutable string data.</li>
|
|
<li>Object: one allocation is used for a combined heap and object header,
|
|
and another allocation is used for object properties. The property
|
|
allocation contains both array entries and normal properties, and if
|
|
the object is large enough, a hash table to speed up lookups.</li>
|
|
<li>Buffer: for fixed buffers a single allocation contains a combined heap
|
|
and buffer header, followed by the mutable fixed-size buffer. For
|
|
dynamic buffers the current buffer is allocated separately. For
|
|
external buffers a single heap object is allocated and points to
|
|
a user buffer.</li>
|
|
</ul>
|
|
|
|
<p>Note that while strings and buffers are considered a primitive (pass-by-value)
|
|
types, they are a heap allocated type from a memory allocation viewpoint.</p>
|
|
|
|
<h2 id="type-pointer-stability">Pointer stability</h2>
|
|
|
|
<p>Heap objects allocated by Duktape have stable pointers: the objects are
|
|
not relocated in memory while they are reachable from a garbage collection
|
|
point of view. This is the case for the main heap object, but not
|
|
necessarily for any additional allocations related to the object, such as
|
|
dynamic property tables or dynamic buffer data area. A heap object is
|
|
reachable e.g. when it resides on the value stack of a reachable thread or
|
|
is reachable through the global object. Once a heap object becomes
|
|
unreachable any pointers held by user C code referring to the object are
|
|
unsafe and should no longer be dereferenced.</p>
|
|
|
|
<p>In practice the only heap allocated data directly referenced by user code
|
|
are strings, fixed buffers, and dynamic buffers. The data area of strings
|
|
and fixed buffers is stable; it is safe to keep a C pointer referring to the
|
|
data even after a Duktape/C function returns as long the string or fixed
|
|
buffer remains reachable from a garbage collection point of view at all times.
|
|
Note that this is <i>not</i> the case for Duktape/C value stack arguments, for
|
|
instance, unless specific arrangements are made.</p>
|
|
|
|
<p>The data area of a dynamic buffer does <b>not</b> have a stable pointer.
|
|
The buffer itself has a heap header with a stable address but the current
|
|
buffer is allocated separately and potentially relocated when the buffer
|
|
is resized. It is thus unsafe to hold a pointer to a dynamic buffer's data
|
|
area across a buffer resize, and it's probably best not to hold a pointer
|
|
after a Duktape/C function returns (as there would be no easy way of being
|
|
sure that the buffer hadn't been resized). The data area of an external
|
|
buffer also has a potentially changing pointer, but the pointer is only
|
|
changed by an explicit user API call.</p>
|
|
|
|
<p>For external buffers the stability of the data pointer is up to the user
|
|
code which sets and updates the pointer.</p>
|
|
|
|
<h2 id="type-masks">Type masks</h2>
|
|
|
|
<p>Type masks allows calling code to easily check whether a type belongs to
|
|
a certain type set. For instance, to check that a certain stack value is
|
|
a number, string, or an object:</p>
|
|
|
|
<pre class="c-code">
|
|
if (duk_get_type_mask(ctx, -3) & (DUK_TYPE_MASK_NUMBER |
|
|
DUK_TYPE_MASK_STRING |
|
|
DUK_TYPE_MASK_OBJECT)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<p>There is a specific API call for matching a set of types even more
|
|
conveniently:</p>
|
|
|
|
<pre class="c-code">
|
|
if (duk_check_type_mask(ctx, -3, DUK_TYPE_MASK_NUMBER |
|
|
DUK_TYPE_MASK_STRING |
|
|
DUK_TYPE_MASK_OBJECT)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<p>These are faster and more compact than the alternatives:</p>
|
|
|
|
<pre class="c-code">
|
|
// alt 1
|
|
if (duk_is_number(ctx, -3) || duk_is_string(ctx, -3) || duk_is_object(ctx, -3)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
|
|
// alt 2
|
|
int t = duk_get_type(ctx, -3);
|
|
if (t == DUK_TYPE_NUMBER || t == DUK_TYPE_STRING || t == DUK_TYPE_OBJECT) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<h2 id="type-none">None</h2>
|
|
|
|
<p>The <b>none</b> type is not actually a type but is used in the API to
|
|
indicate that a value does not exist, a stack index is invalid, etc.</p>
|
|
|
|
<h2 id="type-undefined">Undefined</h2>
|
|
|
|
<p>The <b>undefined</b> type maps to Ecmascript <code>undefined</code>, which is
|
|
distinguished from a <code>null</code>.</p>
|
|
|
|
<p>Values read from outside the active value stack range read back as
|
|
<b>undefined</b>.</p>
|
|
|
|
<h2 id="type-null">Null</h2>
|
|
|
|
<p>The <b>null</b> type maps to Ecmascript <code>null</code>.</p>
|
|
|
|
<h2 id="type-boolean">Boolean</h2>
|
|
|
|
<p>The <b>boolean</b> type is represented in the C API as an integer: zero for false,
|
|
and non-zero for true.</p>
|
|
|
|
<p>Whenever giving boolean values as arguments in API calls, any non-zero value is
|
|
accepted as a "true" value. Whenever API calls return boolean values, the value
|
|
<code>1</code> is always used for a "true" value. This allows certain C idioms to be
|
|
used. For instance, a bitmask can be built directly based on API call return values,
|
|
as follows:
|
|
</p>
|
|
|
|
<pre class="c-code">
|
|
/* this works and generates nice code */
|
|
int bitmask = (duk_get_boolean(ctx, -3) << 2) |
|
|
(duk_get_boolean(ctx, -2) << 1) |
|
|
duk_get_boolean(ctx, -1);
|
|
|
|
/* more verbose variant not relying on "true" being represented by 1 */
|
|
int bitmask = ((duk_get_boolean(ctx, -3) ? 1 : 0) << 2) |
|
|
((duk_get_boolean(ctx, -2) ? 1 : 0) << 1) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
|
|
/* another verbose variant */
|
|
int bitmask = (duk_get_boolean(ctx, -3) ? (1 << 2) : 0) |
|
|
(duk_get_boolean(ctx, -2) ? (1 << 1) : 0) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
</pre>
|
|
|
|
<h2 id="type-number">Number</h2>
|
|
|
|
<p>The <b>number</b> type is an IEEE double, including +/- Infinity and NaN values.
|
|
Zero sign is also preserved. An IEEE double represents all integers up to 53 bits
|
|
accurately.</p>
|
|
|
|
<p>IEEE double allows NaN values to have additional signaling bits. Because these
|
|
bits are used by Duktape internal tagged type representation (when using 8-byte
|
|
packed values), NaN values in the Duktape API are normalized. Concretely, if you
|
|
push a certain NaN value to the value stack, another (normalized) NaN value may
|
|
come out. Don't rely on NaNs preserving their exact form.</p>
|
|
|
|
<h2 id="type-string">String</h2>
|
|
|
|
<p>The <b>string</b> stack type is used to represent both plain strings and
|
|
plain Symbols (introduced in ES2015). A string is an arbitrary byte sequence
|
|
of a certain length which may contain internal NUL (0x00) values. Strings are
|
|
always automatically NUL terminated for C coding convenience. The NUL terminator
|
|
is not counted as part of the string length. For instance, the string
|
|
<code>"foo"</code> has byte length 3 and is stored in memory as
|
|
<code>{ 'f', 'o', 'o', '\0' }</code>. Because of the guaranteed NUL termination,
|
|
strings can always be pointed to using a simple <code>const char *</code> as long
|
|
as internal NULs are not an issue for the application; if they are, the explicit
|
|
byte length of the string can be queried with the API. Calling code can refer
|
|
directly to the string data held by Duktape. Such string data pointers are valid
|
|
(and stable) for as long as a string is reachable in the Duktape heap.</p>
|
|
|
|
<p>Strings are <a href="http://en.wikipedia.org/wiki/String_interning">interned</a>
|
|
for efficiency: only a single copy of a certain string ever exists at a time.
|
|
Strings are immutable and must NEVER be changed by calling C code. Doing so will
|
|
lead to very mysterious issues which are hard to diagnose.</p>
|
|
|
|
<p>Calling code most often deals with Ecmascript strings, which may contain
|
|
arbitrary 16-bit codepoints in the range U+0000 to U+FFFF but cannot represent
|
|
non-<a href="http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane">BMP</a>
|
|
codepoints. This is how strings are defined in the Ecmascript standard.
|
|
In Duktape, Ecmascript strings are encoded with
|
|
<a href="http://en.wikipedia.org/wiki/CESU-8">CESU-8</a> encoding. CESU-8
|
|
matches <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> except that it
|
|
allows codepoints in the surrogate pair range (U+D800 to U+DFFF) to be encoded
|
|
directly (prohibited in UTF-8). CESU-8, like UTF-8, encodes all 7-bit ASCII
|
|
characters as-is which is convenient for C code. For example:</p>
|
|
|
|
<ul>
|
|
<li>U+0041 ("A") encodes to <code>41</code>.</li>
|
|
<li>U+1234 (ETHIOPIC SYLLABLE SEE) encodes to <code>e1 88 b4</code>.</li>
|
|
<li>U+D812 (high surrogate) encodes to <code>ed a0 92</code>. This would be
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points">invalid UTF-8</a>.</li>
|
|
</ul>
|
|
|
|
<a name="extended-utf8"></a>
|
|
<p>Duktape also uses extended strings internally. Codepoints up to U+10FFFF
|
|
can be represented with UTF-8, and codepoints above that up to full 32 bits
|
|
can be represented with
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Extending_from_31_bit_to_36_bit_range">extended UTF-8</a>.
|
|
The extended UTF-8 encoding used by Duktape is described in the table below.
|
|
The leading byte is shown in binary (with "x" marking data bits) while
|
|
continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</p>
|
|
|
|
<table>
|
|
<thead>
|
|
<tr class="header"><th>Codepoint range</th><th>Bits</th><th>Byte sequence</th><th>Notes</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr><td>U+0000 to U+007F</td><td>7</td><td>0xxxxxxx</td><td></td></tr>
|
|
<tr><td>U+0080 to U+07FF</td><td>11</td><td>110xxxxx C</td><td></td></tr>
|
|
<tr><td>U+0800 to U+FFFF</td><td>16</td><td>1110xxxx C C</td><td>U+D800 to U+DFFF allowed (unlike UTF-8)</td></tr>
|
|
<tr><td>U+1 0000 to U+1F FFFF</td><td>21</td><td>11110xxx C C C</td><td>Above U+10FFFF allowed (unlike UTF-8)</td></tr>
|
|
<tr><td>U+20 0000 to U+3FF FFFF</td><td>26</td><td>111110xx C C C C</td><td></td></tr>
|
|
<tr><td>U+400 0000 to U+7FFF FFFF</td><td>31</td><td>1111110x C C C C C</td><td></td></tr>
|
|
<tr><td>U+8000 0000 to U+F FFFF FFFF</td><td>36</td><td>11111110 C C C C C C</td><td>Only 32 bits used in practice (up to U+FFFF FFFF)</td></tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The downside of the encoding for codepoints above U+7FFFFFFF is that
|
|
the leading byte will be <code>0xFE</code> which conflicts with Unicode byte order
|
|
marker encoding. This is not a practical concern in Duktape's internal use.</p>
|
|
|
|
<p>Finally, invalid extended UTF-8 byte sequences are used for special purposes
|
|
such as representing Symbol values. Invalid extened UTF-8/CESU-8 byte sequences
|
|
never conflict with standard Ecmascript strings (which are CESU-8) and will remain
|
|
cleanly separated within object property tables. For more information see
|
|
<a href="#symbols">Symbols</a> and
|
|
<a href="https://github.com/svaarala/duktape/blob/master/doc/symbols.rst">symbols.rst</a>.</p>
|
|
|
|
<p>Strings with invalid extended UTF-8 sequences can be pushed on the value stack
|
|
from C code and also passed to Ecmascript functions, with two caveats:</p>
|
|
<ul>
|
|
<li>If the invalid byte sequence matches the internal format used to represent
|
|
Symbols, the value will appear as a Symbol rather than a string for Ecmascript
|
|
code. For example, <code>typeof val</code> will be <code>symbol</code>.</li>
|
|
<li>Behavior of string operations on invalid byte sequences if not well defined
|
|
and results may vary, and change even in minor Duktape version updates.</li>
|
|
</ul>
|
|
|
|
<h2 id="type-object">Object</h2>
|
|
|
|
<p>The <b>object</b> type includes Ecmascript objects and arrays, functions,
|
|
threads (coroutines), and buffer objects. In other words, anything with
|
|
properties is an object. Properties are key-value pairs with a string key
|
|
and an arbitrary value (including <b>undefined</b>).</p>
|
|
|
|
<p>Objects may participate in garbage collection finalization.</p>
|
|
|
|
<h2 id="type-buffer">Buffer</h2>
|
|
|
|
<p>The plain <b>buffer</b> type is a raw buffer for user data. It's much
|
|
more memory efficient than standard buffer object types like Uint8Array
|
|
or Node.js Buffer. There are three plain buffer sub-types:</p>
|
|
<table>
|
|
<tr>
|
|
<th>Buffer sub-type</th>
|
|
<th>Data pointer</th>
|
|
<th>Resizable</th>
|
|
<th>Memory managed by</th>
|
|
<th>Description</th>
|
|
</tr>
|
|
<tr>
|
|
<td>Fixed</td>
|
|
<td>Stable, non-NULL</td>
|
|
<td>No</td>
|
|
<td>Duktape</td>
|
|
<td>Buffer size is fixed at creation, memory is managed automatically by
|
|
Duktape. Fixed buffers have an unchanging (stable) non-NULL data
|
|
pointer.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>Dynamic</td>
|
|
<td>Unstable, may be NULL</td>
|
|
<td>Yes</td>
|
|
<td>Duktape</td>
|
|
<td>Buffer size can be changed after creation, memory is managed
|
|
automatically by Duktape. Requires two memory allocations internally
|
|
to allow resizing. Data pointer may change if buffer is resized.
|
|
Zero-size buffer may have a NULL data pointer.</td>
|
|
</tr>
|
|
<tr>
|
|
<td>External</td>
|
|
<td>Unstable, may be NULL</td>
|
|
<td>Yes</td>
|
|
<td>Duktape and user code</td>
|
|
<td>Buffer data is externally allocated. Duktape allocates and manages
|
|
a heap allocated buffer header structure, while the data area pointer
|
|
and length are configured by user code explicitly. External buffers
|
|
are useful to allow a buffer to point to data structures outside
|
|
Duktape heap, e.g. a frame buffer allocated by a graphics library.
|
|
Zero-size buffer may have a NULL data pointer.</td>
|
|
</tr>
|
|
</table>
|
|
|
|
<p>Unlike strings, buffer data areas are not automatically NUL-terminated
|
|
and calling code must never access bytes beyond the currently allocated
|
|
buffer size. The data pointer of a zero-size dynamic or external buffer
|
|
may be NULL; fixed buffers always have a non-NULL data pointer.</p>
|
|
|
|
<p>Fixed and dynamic buffers are automatically garbage collected. This
|
|
also means that C code must not hold onto a buffer data pointer unless the
|
|
buffer is reachable to Duktape, e.g. resides in an active value stack.
|
|
The data area of an external buffer is not automatically garbage collected
|
|
so user code is responsible for managing its life cycle. User code must also
|
|
make sure that no external buffer value is accessed when the external buffer
|
|
is no longer (or not yet) valid.</p>
|
|
|
|
<p>Plain buffers mimic Uint8Array objects to a large extent, so they are
|
|
a non-standard, memory efficient alternative to working with e.g. ArrayBuffer
|
|
and typed arrays. The standard buffer objects are implemented on top of plain
|
|
buffers, so that e.g. an ArrayBuffer will be backed by a plain buffer. See
|
|
<a href="#bufferobjects">Buffer objects</a> for more discussion.</p>
|
|
|
|
<p>A few notes:</p>
|
|
<ul>
|
|
<li>Because the value written to a buffer index is number coerced, assigning
|
|
a one-character value does not work as often expected. For instance,
|
|
<code>buf[123] = 'x'</code> causes zero to be written to the buffer, as
|
|
ToNumber('x') = 0. For clarity, you should only assign number values,
|
|
e.g. <code>buf[123] = 0x78</code>.</li>
|
|
<li>There is a fast path for reading and writing numeric indices of plain buffer
|
|
values, e.g. <code>x = buf[123]</code> or <code>buf[123] = x</code>. A similar
|
|
fast path exists for the different buffer object values.</li>
|
|
<li>Buffer virtual properties are not currently implemented in
|
|
<code>defineProperty()</code>, so you can't write to buffer indices or
|
|
buffer <code>length</code> with <code>defineProperty()</code> now (attempt
|
|
to do so results in a <code>TypeError</code>).</li>
|
|
</ul>
|
|
|
|
<h2 id="type-pointer">Pointer</h2>
|
|
|
|
<p>The <b>pointer</b> type is a raw, uninterpreted C pointer, essentially
|
|
a <code>void *</code>. Pointers can be used to point to native objects (memory
|
|
allocations, handles, etc), but because Duktape doesn't know their use, they
|
|
are not automatically garbage collected. You can, however, put one or more
|
|
pointers inside an object and use the object finalizer to free the
|
|
native resources related to the pointer(s).</p>
|
|
|
|
<h2 id="type-lightfunc">Lightfunc</h2>
|
|
|
|
<p>The <b>lightfunc</b> type is a plain Duktape/C function pointer and a small
|
|
set of control flags packed into a single tagged value which requires no separate
|
|
heap allocations. The control flags (16 bits currently) encode:
|
|
(1) number of stack arguments expected by the Duktape/C function (0 to 14 or
|
|
varargs), (2) virtual <code>length</code> property value (0 to 15), and
|
|
(3) a magic value (-128 to 127).</p>
|
|
|
|
<p>Lightfuncs are a separate tagged type in the Duktape C API, but behave mostly
|
|
like Function objects for Ecmascript code. They have significant limitations
|
|
compared to ordinary Function objects, the most important being:</p>
|
|
|
|
<ul>
|
|
<li>Lightfuncs cannot hold own properties. They have a fixed virtual <code>.name</code>
|
|
property which appears in tracebacks, and a virtual <code>.length</code> property.
|
|
Other properties are inherited from <code>Function.prototype</code>.</li>
|
|
<li>Lightfuncs can be used as constructor functions, but cannot have a
|
|
<code>.prototype</code> property. If you need to construct objects which
|
|
don't inherit from <code>Object.prototype</code> (the default), you need to
|
|
either (a) construct and return an instance explicitly in the constructor or
|
|
(b) explicitly override the internal prototype of the default instance in the
|
|
constructor.</li>
|
|
<li>Lightfuncs can be used as accessor properties (getters/setters), but they
|
|
get converted to actual functions so that their memory advantage is lost.
|
|
See
|
|
<a href="https://github.com/svaarala/duktape/blob/master/tests/ecmascript/test-dev-lightfunc-accessor.js">test-dev-lightfunc-accessor.js</a>.</li>
|
|
<li>Lightfuncs cannot have a finalizer as they are a primitive type and
|
|
don't have a reference count field or otherwise participate in garbage
|
|
collection. See
|
|
<a href="https://github.com/svaarala/duktape/blob/master/tests/ecmascript/test-dev-lightfunc-finalizer.js">test-dev-lightfunc-finalizer.js</a>.</li>
|
|
</ul>
|
|
|
|
<p>Lightfuncs are useful for very low memory environments where the memory
|
|
impact of ordinary Function objects matters. See
|
|
<a href="#functionobjects">Function objects</a> for more discussion.</p>
|
|
|