mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
250 lines
12 KiB
250 lines
12 KiB
<hr> <!-- this improves readability on e.g. elinks and w3m -->
|
|
<h2 id="types">Types</h2>
|
|
|
|
<p>Duktape stack types are:</p>
|
|
|
|
<div class="table-wrap">
|
|
<table>
|
|
<tr class="header"><th>Type</th><th>Type constant</th><th>Type mask constant</th><th>Description</th></tr>
|
|
<tr><td>(none)</td><td>DUK_TYPE_NONE</td><td>DUK_TYPE_MASK_NONE</td><td>no type (missing value, invalid index, etc)</td></tr>
|
|
<tr><td>undefined</td><td>DUK_TYPE_UNDEFINED</td><td>DUK_TYPE_MASK_UNDEFINED</td><td><tt>undefined</tt></td></tr>
|
|
<tr><td>null</td><td>DUK_TYPE_NULL</td><td>DUK_TYPE_MASK_NULL</td><td><tt>null</tt></td></tr>
|
|
<tr><td>boolean</td><td>DUK_TYPE_BOOLEAN</td><td>DUK_TYPE_MASK_BOOLEAN</td><td><tt>true</tt> and <tt>false</tt></td></tr>
|
|
<tr><td>number</td><td>DUK_TYPE_NUMBER</td><td>DUK_TYPE_MASK_NUMBER</td><td>IEEE double</td></tr>
|
|
<tr><td>string</td><td>DUK_TYPE_STRING</td><td>DUK_TYPE_MASK_STRING</td><td>immutable string</td></tr>
|
|
<tr><td>object</td><td>DUK_TYPE_OBJECT</td><td>DUK_TYPE_MASK_OBJECT</td><td>object with properties</td></tr>
|
|
<tr><td>buffer</td><td>DUK_TYPE_BUFFER</td><td>DUK_TYPE_MASK_BUFFER</td><td>mutable byte buffer, fixed/dynamic</td></tr>
|
|
<tr><td>pointer</td><td>DUK_TYPE_POINTER</td><td>DUK_TYPE_MASK_POINTER</td><td>opaque pointer (void *)</td></tr>
|
|
</table>
|
|
</div>
|
|
|
|
<h3>Memory allocations</h3>
|
|
|
|
<p>The following stack types involve additional heap allocations:</p>
|
|
|
|
<ul>
|
|
<li>String: a single allocation contains a combined heap and string header,
|
|
followed by the immutable string data.</li>
|
|
<li>Object: one allocation is used for a combined heap and object header,
|
|
and another allocation is used for object properties. The property
|
|
allocation contains both array entries and normal properties, and if
|
|
the object is large enough, a hash table to speed up lookups.</li>
|
|
<li>Buffer: for fixed buffers a single allocation contains a combined heap
|
|
and buffer header, followed by the mutable fixed-size buffer. For
|
|
dynamic buffers the current buffer is allocated separately.</li>
|
|
</ul>
|
|
|
|
<p>Note that while strings are considered a primitive (pass-by-value)
|
|
type in Ecmascript, they are a heap allocated type from a memory allocation
|
|
viewpoint.</p>
|
|
|
|
<h3>Type masks</h3>
|
|
|
|
<p>Type masks allows calling code to easily check whether a type belongs to
|
|
a certain type set. For instance, to check that a certain stack value is
|
|
a number, string, or an object:</p>
|
|
|
|
<pre class="c-code">
|
|
if (duk_get_type_mask(ctx, -3) & (DUK_TYPE_MASK_NUMBER |
|
|
DUK_TYPE_MASK_STRING |
|
|
DUK_TYPE_MASK_OBJECT)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
FIXME: shortcut? duk_match_type_mask(ctx, -3, DUK_TYPE_MASK_NUMBER | DUK_TYPE_MASK_STRING | DUK_TYPE_MASK_OBJECT);
|
|
|
|
<p>This is faster and more compact than the alternatives:</p>
|
|
|
|
<pre class="c-code">
|
|
// alt 1
|
|
if (duk_is_number(ctx, -3) || duk_is_string(ctx, -3) || duk_is_object(ctx, -3)) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
|
|
// alt 2
|
|
int t = duk_get_type(ctx, -3);
|
|
if (t == DUK_TYPE_NUMBER || t == DUK_TYPE_STRING || t == DUK_TYPE_OBJECT) {
|
|
printf("type is number, string, or object\n");
|
|
}
|
|
</pre>
|
|
|
|
<h3>None</h3>
|
|
|
|
<p>The <b>none</b> type is not actually a type but is used in the API to
|
|
indicate that a value does not exist, a stack index is invalid, etc.</p>
|
|
|
|
<h3>Undefined</h3>
|
|
|
|
<p>The <b>undefined</b> type maps to Ecmascript <tt>undefined</tt>, which is
|
|
distinguished from a <tt>null</tt>.</p>
|
|
|
|
<p>Values read from outside the active value stack range read back as
|
|
<b>undefined</b>.</p>
|
|
|
|
<h3>Null</h3>
|
|
|
|
<p>The <b>null</b> type maps to Ecmascript <tt>null</tt>.</p>
|
|
|
|
<h3>Boolean</h3>
|
|
|
|
<p>The <b>boolean</b> type is represented in the C API as an integer: zero for false,
|
|
and non-zero for true.</p>
|
|
|
|
<p>Whenever giving boolean values as arguments in API calls, any non-zero value is
|
|
accepted as a "true" value. Whenever API calls return boolean values, the value
|
|
<tt>1</tt> is always used for a "true" value. This allows certain C idioms to be
|
|
used. For instance, a bitmask can be built directly based on API call return values,
|
|
as follows:
|
|
</p>
|
|
|
|
<pre class="c-code">
|
|
// this works and generates nice code
|
|
int bitmask = (duk_get_boolean(ctx, -3) << 2) |
|
|
(duk_get_boolean(ctx, -2) << 1) |
|
|
duk_get_boolean(ctx, -1);
|
|
|
|
// more verbose variant not relying on "true" being represented by 1
|
|
int bitmask = ((duk_get_boolean(ctx, -3) ? 1 : 0) << 2) |
|
|
((duk_get_boolean(ctx, -2) ? 1 : 0) << 1) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
|
|
// another verbose variant
|
|
int bitmask = (duk_get_boolean(ctx, -3) ? (1 << 2) : 0) |
|
|
(duk_get_boolean(ctx, -2) ? (1 << 1) : 0) |
|
|
(duk_get_boolean(ctx, -1) ? 1 : 0);
|
|
</pre>
|
|
|
|
<h3>Number</h3>
|
|
|
|
<p>The <b>number</b> type is an IEEE double, including +/- Infinity and NaN values.
|
|
Zero sign is also preserved. An IEEE double represents all integers up to 53 bits
|
|
accurately.</p>
|
|
|
|
<p>IEEE double allows NaN values to have additional signaling bits. Because these
|
|
bits are used by Duktape internal tagged type representation (when using 8-byte
|
|
packed values), NaN values in the Duktape API are normalized. Concretely, if you
|
|
push a certain NaN value to the value stack, another (normalized) NaN value may
|
|
come out. Don't rely on NaNs preserving their exact form.</p>
|
|
|
|
<h3>String</h3>
|
|
|
|
<p>The <b>string</b> type is a raw byte sequence of a certain length which may
|
|
contain internal NUL (0x00) values. Strings are always automatically NUL
|
|
terminated for C coding convenience. The NUL terminator is not counted as part
|
|
of the string length. For instance, the string <tt>"foo"</tt> has byte length 3
|
|
and is stored in memory as <tt>{ 'f', 'o', 'o', '\0' }</tt>. Because of the
|
|
guaranteed NUL termination, strings can always be pointed to using a simple
|
|
<tt>const char *</tt> as long as internal NULs are not an issue; if they are,
|
|
the explicit byte length of the string can be queried with the API. Calling code
|
|
can refer directly to the string data held by Duktape. Such string data
|
|
pointers are valid (and stable) for as long as a string is reachable in the
|
|
Duktape heap.</p>
|
|
|
|
<p>Strings are <a href="http://en.wikipedia.org/wiki/String_interning">interned</a>
|
|
for efficiency: only a single copy of a certain string ever exists at a time.
|
|
Strings are immutable and must NEVER be changed by calling C code. Doing so will
|
|
lead to very mysterious issues which are hard to diagnose.</p>
|
|
|
|
<p>Calling code most often deals with Ecmascript strings, which may contain
|
|
arbitrary 16-bit codepoints (the whole range 0x0000 to 0xFFFF) but cannot represent
|
|
non-<a href="http://en.wikipedia.org/wiki/Basic_Multilingual_Plane#Basic_Multilingual_Plane">BMP</a>
|
|
codepoints (this is how strings are defined in the Ecmascript standard).
|
|
In Duktape, Ecmascript strings are encoded with
|
|
<a href="http://en.wikipedia.org/wiki/CESU-8">CESU-8</a> encoding. CESU-8
|
|
matches <a href="http://en.wikipedia.org/wiki/UTF-8">UTF-8</a> except that it
|
|
allows codepoints in the surrogate pair range (U+D800 to U+DFFF) to be encoded
|
|
directly; these are prohibited in UTF-8. CESU-8, like UTF-8, encodes all 7-bit
|
|
ASCII characters as-is which is convenient for C code. For example:</p>
|
|
|
|
<ul>
|
|
<li>U+0041 ("A") encodes to <tt>41</tt>.</li>
|
|
<li>U+1234 (ETHIOPIC SYLLABLE SEE) encodes to <tt>e1 88 b4</tt>.</li>
|
|
<li>U+D812 (high surrogate) encodes to <tt>ed a0 92</tt> (this would be
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points">invalid UTF-8</a>).</li>
|
|
</ul>
|
|
|
|
<p>Duktape also allows extended strings internally. Codepoints up to U+10FFFF
|
|
can be represented with UTF-8, and codepoints above that up to full 32 bits
|
|
can be represented with
|
|
<a href="http://en.wikipedia.org/wiki/UTF-8#Extending_from_31_bit_to_36_bit_range">"extended UTF-8"</a>.
|
|
Non-standard strings are used for storing internal object properties; using a
|
|
non-standard string ensures that such properties never conflict with properties
|
|
accessible using standard Ecmascript strings. Non-standard strings can be given
|
|
to Ecmascript built-in functions, but since behavior may not be exactly
|
|
specified, results may vary.</p>
|
|
|
|
<p>Duktape uses internal object properties to record internal implementation
|
|
related fields in e.g. function objects. For example, a finalizer reference is
|
|
stored in an internal finalizer property. Such internal keys are kept separate
|
|
from valid Ecmascript keys by using a byte sequence which can never occur in
|
|
a valid CESU-8 string; consequently, standard Ecmascript code cannot accidentally
|
|
reference such fields. Internal properties are never enumerable, and are not
|
|
returned by e.g. <tt>Object.getOwnPropertyNames()</tt>. Currently, internal
|
|
property names begin with an <tt>0xFF</tt> byte followed by the property name.
|
|
For instance the finalizer property key consists of the byte <tt>0xFF</tt>
|
|
followed by the ASCII string <tt>"finalizer"</tt>. In internal documentation
|
|
this property would usually be referred to as <tt>_finalizer</tt> for convenience.
|
|
You should never read or write internal properties directly.</p>
|
|
|
|
<p>The "extended UTF-8" encoding used by Duktape is described in the table below.
|
|
The leading byte is shown in binary (with "x" marking data bits) while
|
|
continuation bytes are marked with "C" (indicating the bit sequence 10xxxxxx):</p>
|
|
|
|
<table>
|
|
<thead>
|
|
<tr class="header"><th>Codepoint range</th><th>Bits</th><th>Byte sequence</th></tr>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr><td>U+0000 to U+007F</td><td>7</td><td>0xxxxxxx</td></tr>
|
|
<tr><td>U+0080 to U+07FF</td><td>11</td><td>110xxxxx C</td></tr>
|
|
<tr><td>U+0800 to U+FFFF</td><td>16</td><td>1110xxxx C C</td></tr>
|
|
<tr><td>U+1 0000 to U+1F FFFF</td><td>21</td><td>11110xxx C C C</td></tr>
|
|
<tr><td>U+20 0000 to U+3FF FFFF</td><td>26</td><td>111110xx C C C C</td></tr>
|
|
<tr><td>U+400 0000 to U+7FFF FFFF</td><td>31</td><td>1111110x C C C C C</td></tr>
|
|
<tr><td>U+8000 0000 to U+F FFFF FFFF</td><td>36 (32 used)</td><td>11111110 C C C C C C</td></tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<p>The downside of the encoding for codepoints above U+7FFFFFFF is that
|
|
the leading byte will be <tt>0xFE</tt> which conflicts with Unicode byte order
|
|
marker encoding. This is not a practical concern in Duktape's internal use.</p>
|
|
|
|
<h3>Object</h3>
|
|
|
|
<p>The <b>object</b> type includes Ecmascript objects and arrays, functions, and
|
|
threads (coroutines). In other words, anything with properties is an object.
|
|
Properties are key-value pairs with a string key and an arbitrary value
|
|
(including <b>undefined</b>).</p>
|
|
|
|
<p>Objects may participate in garbage collection finalization.</p>
|
|
|
|
<h3>Buffer</h3>
|
|
|
|
<p>The <b>buffer</b> type is a raw buffer for user data of either fixed or dynamic
|
|
size. The size of a fixed buffer is given at its creation, and fixed buffers
|
|
have an unchanging (stable) data pointer. Dynamic buffers may change during their
|
|
life time at the cost of having a (potentially) changing data pointer. Dynamic
|
|
buffers also need two memory allocations internally, while fixed buffers only
|
|
need one.</p>
|
|
|
|
<p>Buffers are automatically garbage collected. This also means that C code
|
|
must not hold onto a buffer data pointer unless the buffer is reachable to
|
|
Duktape, e.g. resides in an active value stack.</p>
|
|
|
|
<p>The buffer type is not standard Ecmascript. Tthere are a few
|
|
different Ecmascript typed array specifications, though, see e.g.
|
|
<a href="http://www.khronos.org/registry/typedarray/specs/latest/">Typed Array Specification</a>.
|
|
These will be implemented on top of raw arrays, most likely.
|
|
</p>
|
|
|
|
<h3>Pointer</h3>
|
|
|
|
<p>The <b>pointer</b> type is a raw, uninterpreted C pointer, essentially
|
|
a <tt>void *</tt>. Pointers can be used to point to native objects (memory
|
|
allocations, handles, etc), but because Duktape doesn't know their use, they
|
|
are not automatically garbage collected. You can, however, put one or more
|
|
pointers inside an object and use the object finalizer to free the
|
|
native resources related to the pointer(s).</p>
|
|
|
|
|