mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
315 lines
16 KiB
315 lines
16 KiB
<h1 id="types">Stack types</h1>
|
|
|
|
<p>Duktape stack types are:</p>
|
|
|
|
<div class="table-wrap">
|
|
<table>
|
|
<tr class="header"><th>Type</th><th>Type constant</th><th>Type mask constant</th><th>Description</th></tr>
|
|
<tr><td>(none)</td><td>DUK_TYPE_NONE</td><td>DUK_TYPE_MASK_NONE</td><td>no type (missing value, invalid index, etc)</td></tr>
|
|
<tr><td>undefined</td><td>DUK_TYPE_UNDEFINED</td><td>DUK_TYPE_MASK_UNDEFINED</td><td><code>undefined</code></td></tr>
|
|
<tr><td>null</td><td>DUK_TYPE_NULL</td><td>DUK_TYPE_MASK_NULL</td><td><code>null</code></td></tr>
|
|
<tr><td>boolean</td><td>DUK_TYPE_BOOLEAN</td><td>DUK_TYPE_MASK_BOOLEAN</td><td><code>true</code> and <code>false</code></td></tr>
|
|
<tr><td>number</td><td>DUK_TYPE_NUMBER</td><td>DUK_TYPE_MASK_NUMBER</td><td>IEEE double</td></tr>
|
|
<tr><td>string</td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable string</td></tr>
|
|
<tr><td>object</td><td>DUK_TYPE_OBJECT</td><td>DUK_TYPE_MASK_OBJECT</td><td>object with properties</td></tr>
|
|
<tr><td>buffer</td><td>DUK_TYPE_BUFFER</td><td>DUK_TYPE_MASK_BUFFER</td><td>mutable byte buffer, fixed/dynamic</td></tr>
|
|
<tr><td>pointer</td><td>DUK_TYPE_POINTER</td><td>DUK_TYPE_MASK_POINTER</td><td>opaque pointer (void *)</td></tr>
|
|
</table>
|
|
</div>
|
|
|
|
<h2 id="type-memory-allocations">Memory allocations</h2>
|
|
|
|
<p>The following stack types involve additional heap allocations:</p>
|
|
|
|
<ul>
|
|
<li>String: a single allocation contains a combined heap and string header,
|
|
followed by the immutable string data.</li>
|
|
<li>Object: one allocation is used for a combined heap and object header,
|
|
and another allocation is used for object properties. The property
|
|
allocation contains both array entries and normal properties, and if
|
|
the object is large enough, a hash table to speed up lookups.</li>
|
|
<li>Buffer: for fixed buffers a single allocation contains a combined heap
|
|
and buffer header, followed by the mutable fixed-size buffer. For
|
|
dynamic buffers the current buffer is allocated separately.</li>
|
|
</ul>
|
|
|
|
<p>Note that while strings are considered a primitive (pass-by-value)
|
|
type in Ecmascript, they are a heap allocated type from a memory allocation
|
|
viewpoint.</p>
|
|
|
|
<h2 id="type-pointer-stability">Pointer stability</h2>
|
|
|
|
<p>Heap objects allocated by Duktape have stable pointers: the objects are
|
|
not relocated in memory while they are reachable from a garbage collection
|
|
point of view. This is the case for the main heap object, but not
|
|
necessarily for any additional allocations related to the object, such as
|
|
dynamic property tables or dynamic buffer data area. A heap object is
|
|
reachable e.g. when it resides on the value stack of a reachable thread or
|
|
is reachable through the global object. Once a heap object becomes
|
|
unreachable any pointers held by user C code referring to the object are
|
|
unsafe and should no longer be dereferenced.</p>
|
|
|
|
<p>In practice the only heap allocated data directly referenced by user code
|
|
are strings, fixed buffers, and dynamic buffers. The data area of strings
|
|
and fixed buffers is stable; it is safe to keep a C pointer referring to the
|
|
data even after a Duktape/C function returns as long the string or fixed
|
|
buffer remains reachable from a garbage collection point of view at all times.
|
|
Note that this is not usually <i>not</i> the case for Duktape/C value stack
|
|
arguments, for instance, unless specific arrangements are made.</p>
|
|
|
|
<p>The data area of a dynamic buffer does <b>not</b> have a stable pointer.
|
|
The buffer itself has a heap header with a stable address but the current
|
|
buffer is allocated separately and potentially relocated when the buffer
|
|
is resized. It is thus unsafe to hold a pointer to a dynamic buffer's data
|
|
area across a buffer resize, and it's probably best not to hold a pointer
|
|
after a Duktape/C function returns (how would you reliably know when the
|
|
buffer is resized?).</p>
|
|
|
|
<h2 id="type-masks">Type masks</h2>
|
|
|
|
<p>Type masks allows calling code to easily check whether a type belongs to
|
|
a certain type set. For instance, to check that a certain stack value is
|
|
a number, string, or an object:</p>
|
|
|
|
<pre class="c-code">
|
|
if (duk_get_type_mask(ctx, -3) & (DUK_TYPE_MASK_NUMBER |
|
|
DUK_TYPE_MASK_STRING |
|
|
DUK_TYPE_MASK_OBJECT)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<p>There is a specific API call for matching a set of types even more
|
|
conveniently:</p>
|
|
|
|
<pre class="c-code">
|
|
if (duk_check_type_mask(ctx, -3, DUK_TYPE_MASK_NUMBER |
|
|
DUK_TYPE_MASK_STRING |
|
|
DUK_TYPE_MASK_OBJECT)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<p>These are faster and more compact than the alternatives:</p>
|
|
|
|
<pre class="c-code">
|
|
// alt 1
|
|
if (duk_is_number(ctx, -3) || duk_is_string(ctx, -3) || duk_is_object(ctx, -3)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
|
|
// alt 2
|
|
int t = duk_get_type(ctx, -3);
|
|
if (t == DUK_TYPE_NUMBER || t == DUK_TYPE_STRING || t == DUK_TYPE_OBJECT) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<h2 id="type-none">None</h2>
|
|
|
|
<p>The <b>none</b> type is not actually a type but is used in the API to
|
|
indicate that a value does not exist, a stack index is invalid, etc.</p>
|
|
|
|
<h2 id="type-undefined">Undefined</h2>
|
|
|
|
<p>The <b>undefined</b> type maps to Ecmascript <code>undefined</code>, which is
|
|
distinguished from a <code>null</code>.</p>
|
|
|
|
<p>Values read from outside the active value stack range read back as
|
|
<b>undefined</b>.</p>
|
|
|
|
<h2 id="type-null">Null</h2>
|
|
|
|
<p>The <b>null</b> type maps to Ecmascript <code>null</code>.</p>
|
|
|
|
<h2 id="type-boolean">Boolean</h2>
|
|
|
|
<p>The <b>boolean</b> type is represented in the C API as an integer: zero for false,
|
|
and non-zero for true.</p>
|
|
|
|
<p>Whenever giving boolean values as arguments in API calls, any non-zero value is
|
|
accepted as a "true" value. Whenever API calls return boolean values, the value
|
|
<code>1</code> is always used for a "true" value. This allows certain C idioms to be
|
|
used. For instance, a bitmask can be built directly based on API call return values,
|
|
as follows:
|
|
</p>
|
|
|
|
<pre class="c-code">
|
|
// this works and generates nice code
|
|
int bitmask = (duk_get_boolean(ctx, -3) << 2) |
|
|
(duk_get_boolean(ctx, -2) << 1) |
|
|
duk_get_boolean(ctx, -1);
|
|
|
|
// more verbose variant not relying on "true" being represented by 1
|
|
int bitmask = ((duk_get_boolean(ctx, -3) ? 1 : 0) << 2) |
|
|
((duk_get_boolean(ctx, -2) ? 1 : 0) << 1) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
|
|
// another verbose variant
|
|
int bitmask = (duk_get_boolean(ctx, -3) ? (1 << 2) : 0) |
|
|
(duk_get_boolean(ctx, -2) ? (1 << 1) : 0) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
</pre>
|
|
|
|
<h2 id="type-number">Number</h2>
|
|
|
|
<p>The <b>number</b> type is an IEEE double, including +/- Infinity and NaN values.
|
|
Zero sign is also preserved. An IEEE double represents all integers up to 53 bits
|
|
accurately.</p>
|
|
|
|
<p>IEEE double allows NaN values to have additional signaling bits. Because these
|
|
bits are used by Duktape internal tagged type representation (when using 8-byte
|
|
packed values), NaN values in the Duktape API are normalized. Concretely, if you
|
|
push a certain NaN value to the value stack, another (normalized) NaN value may
|
|
come out. Don't rely on NaNs preserving their exact form.</p>
|
|
|
|
<h2 id="type-string">String</h2>
|
|
|
|
<p>The <b>string</b> type is a raw byte sequence of a certain length which may
|
|
contain internal NUL (0x00) values. Strings are always automatically NUL
|
|
terminated for C coding convenience. The NUL terminator is not counted as part
|
|
of the string length. For instance, the string <code>"foo"</code> has byte length 3
|
|
and is stored in memory as <code>{ 'f', 'o', 'o', '\0' }</code>. Because of the
|
|
guaranteed NUL termination, strings can always be pointed to using a simple
|
|
<code>const char *</code> as long as internal NULs are not an issue; if they are,
|
|
the explicit byte length of the string can be queried with the API. Calling code
|
|
can refer directly to the string data held by Duktape. Such string data
|
|
pointers are valid (and stable) for as long as a string is reachable in the
|
|
Duktape heap.</p>
|
|
|
|
<p>Strings are <a href="http://en.wikipedia.org/wiki/String_interning">interned</a>
|
|
for efficiency: only a single copy of a certain string ever exists at a time.
|
|
Strings are immutable and must NEVER be changed by calling C code. Doing so will
|
|
lead to very mysterious issues which are hard to diagnose.</p>
|
|
|
|
<p>Calling code most often deals with Ecmascript strings, which may contain
|
|
arbitrary 16-bit codepoints (the whole range 0x0000 to 0xFFFF) but cannot represent
|
|
non-<a href="http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane">BMP</a>
|
|
codepoints (this is how strings are defined in the Ecmascript standard).
|
|
In Duktape, Ecmascript strings are encoded with
|
|
<a href="http://en.wikipedia.org/wiki/CESU-8">CESU-8</a> encoding. CESU-8
|
|
matches <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> except that it
|
|
allows codepoints in the surrogate pair range (U+D800 to U+DFFF) to be encoded
|
|
directly; these are prohibited in UTF-8. CESU-8, like UTF-8, encodes all 7-bit
|
|
ASCII characters as-is which is convenient for C code. For example:</p>
|
|
|
|
<ul>
|
|
<li>U+0041 ("A") encodes to <code>41</code>.</li>
|
|
<li>U+1234 (ETHIOPIC SYLLABLE SEE) encodes to <code>e1 88 b4</code>.</li>
|
|
<li>U+D812 (high surrogate) encodes to <code>ed a0 92</code> (this would be
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points">invalid UTF-8</a>).</li>
|
|
</ul>
|
|
|
|
<p>Duktape also allows extended strings internally. Codepoints up to U+10FFFF
|
|
can be represented with UTF-8, and codepoints above that up to full 32 bits
|
|
can be represented with
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Extending_from_31_bit_to_36_bit_range">"extended UTF-8"</a>.
|
|
Non-standard strings are used for storing internal object properties; using a
|
|
non-standard string ensures that such properties never conflict with properties
|
|
accessible using standard Ecmascript strings. Non-standard strings can be given
|
|
to Ecmascript built-in functions, but since behavior may not be exactly
|
|
specified, results may vary.</p>
|
|
|
|
<p>Duktape uses internal object properties to record internal implementation
|
|
related fields in e.g. function objects. For example, a finalizer reference is
|
|
stored in an internal finalizer property. Such internal keys are kept separate
|
|
from valid Ecmascript keys by using a byte sequence which can never occur in
|
|
a valid CESU-8 string; consequently, standard Ecmascript code cannot accidentally
|
|
reference such fields. Internal properties are never enumerable, and are not
|
|
returned by e.g. <code>Object.getOwnPropertyNames()</code>. Currently, internal
|
|
property names begin with an <code>0xFF</code> byte followed by the property name.
|
|
For instance the finalizer property key consists of the byte <code>0xFF</code>
|
|
followed by the ASCII string <code>"finalizer"</code>. In internal documentation
|
|
this property would usually be referred to as <code>_finalizer</code> for convenience.
|
|
You should never read or write internal properties directly.</p>
|
|
|
|
<p>The "extended UTF-8" encoding used by Duktape is described in the table below.
|
|
The leading byte is shown in binary (with "x" marking data bits) while
|
|
continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</p>
|
|
|
|
<table>
|
|
<thead>
|
|
<tr class="header"><th>Codepoint range</th><th>Bits</th><th>Byte sequence</th></tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr><td>U+0000 to U+007F</td><td>7</td><td>0xxxxxxx</td></tr>
|
|
<tr><td>U+0080 to U+07FF</td><td>11</td><td>110xxxxx C</td></tr>
|
|
<tr><td>U+0800 to U+FFFF</td><td>16</td><td>1110xxxx C C</td></tr>
|
|
<tr><td>U+1 0000 to U+1F FFFF</td><td>21</td><td>11110xxx C C C</td></tr>
|
|
<tr><td>U+20 0000 to U+3FF FFFF</td><td>26</td><td>111110xx C C C C</td></tr>
|
|
<tr><td>U+400 0000 to U+7FFF FFFF</td><td>31</td><td>1111110x C C C C C</td></tr>
|
|
<tr><td>U+8000 0000 to U+F FFFF FFFF</td><td>36 (32 used)</td><td>11111110 C C C C C C</td></tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The downside of the encoding for codepoints above U+7FFFFFFF is that
|
|
the leading byte will be <code>0xFE</code> which conflicts with Unicode byte order
|
|
marker encoding. This is not a practical concern in Duktape's internal use.</p>
|
|
|
|
<h2 id="type-object">Object</h2>
|
|
|
|
<p>The <b>object</b> type includes Ecmascript objects and arrays, functions, and
|
|
threads (coroutines). In other words, anything with properties is an object.
|
|
Properties are key-value pairs with a string key and an arbitrary value
|
|
(including <b>undefined</b>).</p>
|
|
|
|
<p>Objects may participate in garbage collection finalization.</p>
|
|
|
|
<h2 id="type-buffer">Buffer</h2>
|
|
|
|
<p>The <b>buffer</b> type is a raw buffer for user data of either fixed or dynamic
|
|
size. The size of a fixed buffer is given at its creation, and fixed buffers
|
|
have an unchanging (stable) data pointer. Dynamic buffers may change during their
|
|
life time at the cost of having a (potentially) changing data pointer. Dynamic
|
|
buffers also need two memory allocations internally, while fixed buffers only
|
|
need one. The data pointer of a zero-size dynamic buffer may (or may not) be
|
|
<code>NULL</code> which must be handled by calling code properly (i.e. a <code>NULL</code>
|
|
data pointer only indicates an error if the requested size is non-zero).
|
|
Unlike strings, buffer data areas are not automatically NUL terminated and calling
|
|
code must not access the bytes following the allocated buffer size.</p>
|
|
|
|
<p>Buffers are automatically garbage collected. This also means that C code
|
|
must not hold onto a buffer data pointer unless the buffer is reachable to
|
|
Duktape, e.g. resides in an active value stack.</p>
|
|
|
|
<p>The buffer type is not standard Ecmascript. There are a few
|
|
different Ecmascript typed array specifications, though, see e.g.
|
|
<a href="http://www.khronos.org/registry/typedarray/specs/latest/">Typed Array Specification</a>.
|
|
These will be implemented on top of raw arrays, most likely.
|
|
</p>
|
|
|
|
<p>Like strings, buffer values have a <code>length</code> property and
|
|
array index properties for reading and writing individual bytes in the buffer.
|
|
The value of a indexed byte (<code>buf[123]</code>) is a number in the
|
|
range 0...255 which represents a byte value (written values are coerced to
|
|
integer modulo 256). This differs from string behavior where the indexed
|
|
values are one-character strings (much more expensive). The <code>length</code>
|
|
property is read-only at the moment (so you can't resize a string by assigning
|
|
to the length property). These properties are available for both plain buffer
|
|
values and buffer object values.</p>
|
|
|
|
<p>A few notes:</p>
|
|
<ul>
|
|
<li>Because the value written to a buffer index is number coerced, assigning
|
|
a one-character value does not work as often expected. For instance,
|
|
<code>buf[123] = 'x'</code> causes zero to be written to the buffer, as
|
|
ToNumber('x') = 0. For clarity, you should only assign number values,
|
|
e.g. <code>buf[123] = 0x78</code>.</li>
|
|
<li>There is a fast path for reading and writing numeric indices of plain buffer
|
|
values, e.g. <code>x = buf[123]</code> or <code>buf[123] = x</code>. This
|
|
fast path is not active when the base value is a Buffer object.</li>
|
|
<li>Buffer virtual properties are not currently implemented in
|
|
<code>defineProperty()</code>, so you can't write to buffer indices or
|
|
buffer <code>length</code> with <code>defineProperty()</code> now (attempt
|
|
to do so results in a <code>TypeError</code>).</li>
|
|
</ul>
|
|
|
|
<h2 id="type-pointer">Pointer</h2>
|
|
|
|
<p>The <b>pointer</b> type is a raw, uninterpreted C pointer, essentially
|
|
a <code>void *</code>. Pointers can be used to point to native objects (memory
|
|
allocations, handles, etc), but because Duktape doesn't know their use, they
|
|
are not automatically garbage collected. You can, however, put one or more
|
|
pointers inside an object and use the object finalizer to free the
|
|
native resources related to the pointer(s).</p>
|
|
|
|
|