|
@ -1,10 +1,8 @@ |
|
|
|
|
|
<a name="performance-overview"></a> <!-- for old links --> |
|
|
|
|
|
<a name="performance-characteristics"></a> <!-- for old links --> |
|
|
|
|
|
<a name="performance-json-stringify-fastpath"></a> <!-- for old links --> |
|
|
<h1 id="performance">Performance</h1> |
|
|
<h1 id="performance">Performance</h1> |
|
|
|
|
|
|
|
|
<p>This section discussed Duktape specific performance characteristics and |
|
|
|
|
|
provides some hints to avoid Duktape specific performance pitfalls.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2 id="performance-overview">Performance overview</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>Duktape is an interpreted engine with currently no JIT support. It uses |
|
|
<p>Duktape is an interpreted engine with currently no JIT support. It uses |
|
|
reference counting which makes memory usage tight at the cost of some execution |
|
|
reference counting which makes memory usage tight at the cost of some execution |
|
|
performance. Overall Duktape performance should be similar to other interpreted |
|
|
performance. Overall Duktape performance should be similar to other interpreted |
|
@ -12,463 +10,6 @@ languages. However, up to Duktape 1.1 there has been very little performance |
|
|
work and Duktape is significantly slower than e.g. Lua right now. Performance |
|
|
work and Duktape is significantly slower than e.g. Lua right now. Performance |
|
|
work has been scheduled for spring 2015.</p> |
|
|
work has been scheduled for spring 2015.</p> |
|
|
|
|
|
|
|
|
<h2 id="performance-characteristics">Duktape performance characteristics</h2> |
|
|
<p>See <a href="http://wiki.duktape.org/HowtoPerformance.html">How to optimize performance</a> |
|
|
|
|
|
for discussion of Duktape performance characteristics and hints to optimize code |
|
|
<h3>String interning</h3> |
|
|
for performance.</p> |
|
|
|
|
|
|
|
|
<p>Strings are |
|
|
|
|
|
<a href="http://en.wikipedia.org/wiki/String_interning">interned</a>: only |
|
|
|
|
|
a single copy of a certain string exists at any point in time. Interning |
|
|
|
|
|
a string involves hashing the string and looking up a global string table |
|
|
|
|
|
to see whether the string is already present. If so, a pointer to the |
|
|
|
|
|
existing string is returned; if not, the string is inserted into the string |
|
|
|
|
|
table, potentially involving a string table resize. While a string remains |
|
|
|
|
|
reachable, it has a unique and a stable pointer which allows byte-by-byte |
|
|
|
|
|
string comparisons to be converted to simple pointer comparisons. Also, |
|
|
|
|
|
string hashes are computed during interning which makes the use of string |
|
|
|
|
|
keys in internal hash tables efficient.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>There are many downsides also. Strings cannot be modified in-place but |
|
|
|
|
|
a copy needs to be made for every modification. For instance, repeated |
|
|
|
|
|
string concatenation creates a temporary value for each intermediate string |
|
|
|
|
|
which is especially bad if a result string is built one character at a time. |
|
|
|
|
|
Duktape internal primitives, such as string case conversion and array |
|
|
|
|
|
<code>join()</code>, try to avoid these downsides by minimizing the number |
|
|
|
|
|
of temporary strings created.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h3>String memory representation and the string cache</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>The internal memory representation for strings is extended UTF-8, which |
|
|
|
|
|
represents each ASCII character with a single byte but uses two or more |
|
|
|
|
|
bytes to represent non-ASCII characters. This reduces memory footprint for |
|
|
|
|
|
most strings and makes strings easy to interact with in C code. However, |
|
|
|
|
|
it makes random access expensive for non-ASCII strings. Random access is |
|
|
|
|
|
needed for operations such as extracting a substring or looking up a |
|
|
|
|
|
character at a certain character index.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Duktape automatically detects pure ASCII strings (based on the fact that |
|
|
|
|
|
their character and byte length are identical) and provides efficient random |
|
|
|
|
|
access to such strings.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>However, when a string contains non-ASCII characters a <b>string cache</b> |
|
|
|
|
|
is used to resolve a character index to an internal byte index. Duktape |
|
|
|
|
|
maintains a few (internal define <code>DUK_HEAP_STRCACHE_SIZE</code>, |
|
|
|
|
|
currently 4) string cache entries which remember the last byte offset and |
|
|
|
|
|
character offset for recently accessed strings. Character index lookups |
|
|
|
|
|
near a cached character/byte offset can be efficiently handled by scanning |
|
|
|
|
|
backwards or forwards from the cached location. When a string access cannot |
|
|
|
|
|
be resolved using the cache, the string is scanned either from the beginning |
|
|
|
|
|
or the end, which is obviously very expensive for large strings. The cache |
|
|
|
|
|
is maintained with a very simple LRU mechanism and is transparent to both |
|
|
|
|
|
Ecmascript and C code.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>The string cache makes simple loops like the following efficient:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var i; |
|
|
|
|
|
var n = inp.length; |
|
|
|
|
|
for (i = 0; i < n; i++) { |
|
|
|
|
|
print(inp.charCodeAt(i)); |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>When random accesses are made from here and there to multiple strings, |
|
|
|
|
|
the strings may very easily fall out of the cache and become expensive |
|
|
|
|
|
at least for longer strings.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Note that the cache never maintains more than one entry for each |
|
|
|
|
|
string, so the following would be very inefficient:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var i; |
|
|
|
|
|
var n = inp.length; |
|
|
|
|
|
for (i = 0; i < n; i++) { |
|
|
|
|
|
// Accessing the string alternatively from beginning and end will |
|
|
|
|
|
// have a major performance impact. |
|
|
|
|
|
print(inp.charCodeAt(i)); |
|
|
|
|
|
print(inp.charCodeAt(n - 1 - i)); |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>As mentioned above, these performance issues are avoided entirely for |
|
|
|
|
|
ASCII strings which behave as one would expect. More generally, Duktape |
|
|
|
|
|
provides fast paths for ASCII characters and pure ASCII strings in internal |
|
|
|
|
|
algorithms whenever applicable. This applies to algorithms such as case |
|
|
|
|
|
conversion, regexp matching, etc.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Buffer accesses</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>There is a fast path for reading and writing numeric indices of plain buffer |
|
|
|
|
|
values, e.g. <code>x = buf[123]</code> or <code>buf[123] = x</code>. The |
|
|
|
|
|
fast path avoids coercing the index to a string (here <code>"123"</code>) |
|
|
|
|
|
before attempting a lookup.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>There's a similar fast path for buffer objects (Duktape.Buffer, Node.js |
|
|
|
|
|
Buffer, ArrayBuffer, and TypedArray views), but it is about 20% slower.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Object/array storage</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>Object properties are stored in a linear key/value list which provides |
|
|
|
|
|
stable ordering (insertion order). When an object has enough properties |
|
|
|
|
|
(internal define <code>DUK_HOBJECT_E_USE_HASH_LIMIT</code>, currently 32), |
|
|
|
|
|
a hash lookup table is also allocated to speed up property lookups. Even |
|
|
|
|
|
in this case the key ordering is retained which is a practical requirement |
|
|
|
|
|
for an Ecmascript implementation. The hash part is avoided for most objects |
|
|
|
|
|
because it increases memory footprint and doesn't significantly speed up |
|
|
|
|
|
property lookups for very small objects.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>For most objects property lookup thus involves a linear comparison |
|
|
|
|
|
against the object's property table. Because properties are kept in the |
|
|
|
|
|
property table in their insertion order, properties added earlier are |
|
|
|
|
|
slightly faster to access than those added later. When the object grows |
|
|
|
|
|
large enough to gain a hash table this effect disappears.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Array elements are stored in a special "array part" to reduce memory |
|
|
|
|
|
footprint and to speed up access. Accessing an array with a numeric index |
|
|
|
|
|
officially first coerces the number to a string (e.g. <code>x[123]</code> |
|
|
|
|
|
to <code>x["123"]</code>) and then does a string key lookup; when an object |
|
|
|
|
|
has an array part no temporary string is actually created in most cases.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>The array part can be "sparse", i.e. contain unmapped entries. Duktape |
|
|
|
|
|
occasionally rechecks the density of the array part, and it it becomes too |
|
|
|
|
|
sparse the array part is abandoned (current limit is roughly: if fewer than |
|
|
|
|
|
25% of array part elements are mapped, the array part is abandoned). The |
|
|
|
|
|
array entries are then converted to ordinary object properties, with every |
|
|
|
|
|
mapped array index converted to an explicit string key (such as |
|
|
|
|
|
<code>"123"</code>), which is relatively expensive. If an array part has |
|
|
|
|
|
once been abandoned, it is never recreated even if the object would be dense |
|
|
|
|
|
enough to warrant an array part.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Elements in the array part are required to be plain properties (not |
|
|
|
|
|
accessors) and have default property attributes (writable, enumerable, |
|
|
|
|
|
and configurable). If any element deviates from this, the array part is |
|
|
|
|
|
again abandoned and array elements converted to ordinary properties.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Identifier access</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>Duktape has two modes for storing and accessing identifiers (function |
|
|
|
|
|
arguments, local variables, function declarations): a fast path and a slow |
|
|
|
|
|
path. The fast path is used when an identifier can be bound to a virtual |
|
|
|
|
|
machine register, i.e., a fixed index in a virtual stack frame allocated |
|
|
|
|
|
for a function. Identifier access is then simply an array lookup. The |
|
|
|
|
|
slow path is used when the fast path cannot be safely used; identifier |
|
|
|
|
|
accesses are then converted to explicit property lookups on either |
|
|
|
|
|
external or internal objects, which is more than an order of magnitude |
|
|
|
|
|
slower.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>To keep identifier accesses in the fast path:</p> |
|
|
|
|
|
<ul> |
|
|
|
|
|
<li>Execute (almost all) inside Ecmascript functions, not in the top-level |
|
|
|
|
|
global or eval code: global/eval code never uses fast path identifier |
|
|
|
|
|
accesses (however, function code inside global/eval does)</li> |
|
|
|
|
|
<li>Store frequently accessed values in local variables instead of looking |
|
|
|
|
|
them up from the global object or other objects</li> |
|
|
|
|
|
</ul> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Enumeration</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>When an object is enumerated, with either the <code>for-in</code> statement |
|
|
|
|
|
or <code>Object.keys()</code>, Duktape first traverses the target object and its |
|
|
|
|
|
prototype chain and forms an internal enumeration object, which contains all |
|
|
|
|
|
the enumeration keys as strings. |
|
|
|
|
|
In particular, all array indices (or character indices in case of |
|
|
|
|
|
strings) are converted and interned into string values before enumeration and |
|
|
|
|
|
they remain interned until the enumeration completes. This can be memory |
|
|
|
|
|
intensive especially if large arrays or strings are enumerated.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Note, however, that iterating a string or an array with <code>for-in</code> |
|
|
|
|
|
and expecting the array elements or string indices to be enumerated in an |
|
|
|
|
|
ascending order is non-portable. Such behavior, while guaranteed by many |
|
|
|
|
|
implementations including Duktape, is not guaranteed by the Ecmascript |
|
|
|
|
|
standard.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h3>Function features</h3> |
|
|
|
|
|
|
|
|
|
|
|
<p>Ecmascript has a lot of features which make function entry and execution |
|
|
|
|
|
quite expensive. The general goal of the Duktape Ecmascript compiler is to |
|
|
|
|
|
avoid all the troublesome features for most functions while providing full |
|
|
|
|
|
compatibility for the rest.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>An ideal compiled function has all its variables and functions bound to |
|
|
|
|
|
virtual machine registers to allow fast path identifier access, avoids |
|
|
|
|
|
creation of the <code>arguments</code> object on entry, avoids creation of |
|
|
|
|
|
explicit lexical environment records upon entry and during execution, and |
|
|
|
|
|
avoids storing any lexical environment related control information such as |
|
|
|
|
|
internal identifier-to-register binding tables.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>The following features have a significant impact on execution performance:</p> |
|
|
|
|
|
<ul> |
|
|
|
|
|
<li>access to the <code>arguments</code> object: requires creation of an |
|
|
|
|
|
expensive object upon function entry in case it is accessed</li> |
|
|
|
|
|
<li>a direct call to <code>eval()</code>: requires initialization of |
|
|
|
|
|
the <code>arguments</code> and full identifier binding information |
|
|
|
|
|
needs to be retained in case evaluated code needs it</li> |
|
|
|
|
|
<li>global and eval code in general: identifiers are never bound to |
|
|
|
|
|
virtual machine registers but use explicit property lookups instead</li> |
|
|
|
|
|
</ul> |
|
|
|
|
|
|
|
|
|
|
|
<p>The following features have a more moderate impact:</p> |
|
|
|
|
|
<ul> |
|
|
|
|
|
<li><code>try-catch-finally</code> statement: the dynamic binding required |
|
|
|
|
|
by the catch variable is relatively expensive</li> |
|
|
|
|
|
<li><code>with</code> statement: the object binding required is relatively |
|
|
|
|
|
expensive</li> |
|
|
|
|
|
<li>use of bound functions, i.e. functions created with |
|
|
|
|
|
<code>Function.prototype.bind()</code>: function invocation is slowed |
|
|
|
|
|
down by handling of bound function objects and argument shuffling</li> |
|
|
|
|
|
<li>more than about 250 formal arguments, literals, and active temporaries: |
|
|
|
|
|
causes bytecode to use register shuffling</li> |
|
|
|
|
|
</ul> |
|
|
|
|
|
|
|
|
|
|
|
<p>To avoid these, isolate performance critical parts into separate minimal |
|
|
|
|
|
functions which avoid using the features mentioned above.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Minimize use of temporary strings</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>All temporary strings are interned. It is particularly bad to accumulate |
|
|
|
|
|
strings in a loop:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var t = ''; |
|
|
|
|
|
for (var i = 0; i < 1024; i++) { |
|
|
|
|
|
t += 'x'; |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>This will intern 1025 strings. Execution time is <code>O(n^2)</code> |
|
|
|
|
|
where <code>n</code> is the loop limit. It is better to use a temporary |
|
|
|
|
|
array instead:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var t = []; |
|
|
|
|
|
for (var i = 0; i < 1024; i++) { |
|
|
|
|
|
t[i] = 'x'; |
|
|
|
|
|
} |
|
|
|
|
|
t = t.join(''); |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Here, <code>x</code> will be interned once into a function constant, and |
|
|
|
|
|
each array entry simply refers to the same string, typically costing only 8 |
|
|
|
|
|
bytes per array entry. The final <code>Array.prototype.join()</code> avoids |
|
|
|
|
|
unnecessary interning and creates the final string in one go.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Avoid large non-ASCII strings if possible</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>Avoid operations which require access to a random character offset inside |
|
|
|
|
|
a large string containing one or more non-ASCII characters. Such accesses |
|
|
|
|
|
require use of the internal "string cache" and may, in the worst case, require |
|
|
|
|
|
a brute force scanning of the string to find the correct byte offset |
|
|
|
|
|
corresponding to the character offset.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Case conversion and other Unicode related operations have fast paths for |
|
|
|
|
|
ASCII codepoints but fall back to a slow path for non-ASCII codepoints. The |
|
|
|
|
|
slow path is size optimized, not speed optimized, and often involve linear |
|
|
|
|
|
range matching.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Iterate over plain buffer values, not Buffer objects</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>Both buffer objects and plain buffer values have a fast path when buffer |
|
|
|
|
|
contents are accessed with numeric indices. The fast path for plain buffer |
|
|
|
|
|
values is a bit faster, so your code will run faster if you get the plain |
|
|
|
|
|
buffer before iteration:</p> |
|
|
|
|
|
|
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var b, i, n; |
|
|
|
|
|
|
|
|
|
|
|
// Buffer object, typeof is 'object' |
|
|
|
|
|
var bufferValue = new Duktape.Buffer('foo'); |
|
|
|
|
|
print(typeof bufferValue); // 'object' |
|
|
|
|
|
|
|
|
|
|
|
// Get plain buffer, if already plain, no harm |
|
|
|
|
|
b = bufferValue.valueOf(); |
|
|
|
|
|
print(typeof b); // always 'buffer' |
|
|
|
|
|
|
|
|
|
|
|
n = b.length; |
|
|
|
|
|
for (i = 0; i < n; i++) { |
|
|
|
|
|
print(i, b[i]); // fast path for plain buffer access |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>When creating buffers, note that <code>new Duktape.Buffer(x)</code> always |
|
|
|
|
|
creates a Buffer object, while <code>Duktape.Buffer(x)</code> returns a plain |
|
|
|
|
|
buffer value. This mimics how Ecmascript <code>new String()</code> and |
|
|
|
|
|
<code>String()</code> work. Plain buffers should be preferred whenever |
|
|
|
|
|
possible.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Avoid sparse arrays when possible</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>If an array becomes too sparse at any point, Duktape will abandon the |
|
|
|
|
|
array part permanently and convert array properties to explicit string keyed |
|
|
|
|
|
properties. This may happen for instance if an array is initialized with a |
|
|
|
|
|
descending index:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var arr = []; |
|
|
|
|
|
for (var i = 1000; i >= 0; i--) { |
|
|
|
|
|
// bad: first write will abandon array part permanently |
|
|
|
|
|
arr[i] = i * i; |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Right after the first array write the array part would contain 1001 |
|
|
|
|
|
entries with only one mapped array element. The density of the array would |
|
|
|
|
|
thus be less than 0.1%. This is way below the density limit for abandoning |
|
|
|
|
|
the array part, so the array part is abandoned immediately. At the end the |
|
|
|
|
|
array part would be 100% dense but will never be restored. Using an ascending |
|
|
|
|
|
index fixes the issue:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var arr = []; |
|
|
|
|
|
for (var i = 0; i <= 1000; i++) { |
|
|
|
|
|
arr[i] = i * i; |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Setting the <code>length</code> property of an array manually does not, |
|
|
|
|
|
by itself, cause an array part to be abandoned. To simplify a bit, the array |
|
|
|
|
|
density check compares the number of mapped elements relative to the highest |
|
|
|
|
|
used element (actually allocated size). The <code>length</code> property does |
|
|
|
|
|
not affect the check. Although setting an array length beforehand may |
|
|
|
|
|
effectively pre-allocate an array in some implementations, it has no such |
|
|
|
|
|
effect in Duktape, at least at the moment. For example:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var arr = []; |
|
|
|
|
|
arr.length = 1001; // array part not abandoned, but no speedup in Duktape |
|
|
|
|
|
for (var i = 0; i <= 1000; i++) { |
|
|
|
|
|
arr[i] = i * i; |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Iterate arrays with explicit indices, not a "for-in"</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>Because the internal enumeration object contains all (used) array |
|
|
|
|
|
indices converted to string values, avoid <code>for-in</code> enumeration |
|
|
|
|
|
of at least large arrays. As a concrete example, consider:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var a = []; |
|
|
|
|
|
for (var i = 0; i < 1000000; i++) { |
|
|
|
|
|
a[i] = i; |
|
|
|
|
|
} |
|
|
|
|
|
for (var i in a) { |
|
|
|
|
|
// Before this loop is first entered, a million strings ("0", "1", |
|
|
|
|
|
// ..., "999999") will be interned. |
|
|
|
|
|
print(i, a[i]); |
|
|
|
|
|
} |
|
|
|
|
|
// The million strings become garbage collectable only here. |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>The internal enumeration object created in this example would contain a |
|
|
|
|
|
million interned string keys for "0", "1", ..., "999999". All of these keys |
|
|
|
|
|
would remain reachable for the entire duration of the enumeration. The |
|
|
|
|
|
following code would perform much better (and would be more portable, as it |
|
|
|
|
|
makes no assumptions on enumeration order):</p> |
|
|
|
|
|
|
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var a = []; |
|
|
|
|
|
for (var i = 0; i < 1000000; i++) { |
|
|
|
|
|
a[i] = i; |
|
|
|
|
|
} |
|
|
|
|
|
var n = a.length; |
|
|
|
|
|
for (var i = 0; i < n; i++) { |
|
|
|
|
|
print(i, a[i]); |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Minimize top-level global/eval code</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>Identifier accesses in global and eval code always use slow path instructions |
|
|
|
|
|
to ensure correctness. This is at least a few orders of magnitude slower than |
|
|
|
|
|
the fast path where identifiers are mapped to registers of a function |
|
|
|
|
|
activation.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>So, this is slow:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
for (var i = 0; i < 100; i++) { |
|
|
|
|
|
print(i); |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Each read and write of <code>i</code> will be an explicit environment |
|
|
|
|
|
record lookup, essentially a property lookup from an internal environment |
|
|
|
|
|
record object, with the string key <code>i</code>.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>Optimize by putting most code into a function:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
function main() { |
|
|
|
|
|
for (var i = 0; i < 100; i++) { |
|
|
|
|
|
print(i); |
|
|
|
|
|
} |
|
|
|
|
|
} |
|
|
|
|
|
main(); |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Here, <code>i</code> will be mapped to a function register, and each |
|
|
|
|
|
access will be a simple register reference (basically a pointer to a |
|
|
|
|
|
tagged value), which is much faster than the slow path.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>If you don't want to name an explicit function, use:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
(function() { |
|
|
|
|
|
var i; |
|
|
|
|
|
|
|
|
|
|
|
for (i = 0; i < 100; i++) { |
|
|
|
|
|
print(i); |
|
|
|
|
|
} |
|
|
|
|
|
})(); |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Eval code provides an implicit return value which also has a performance |
|
|
|
|
|
impact. Consider, for instance, the following:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
var res = eval("if (4 > 3) { 'foo'; } else { 'bar'; }"); |
|
|
|
|
|
print(res); // prints 'foo' |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>To support such code the compiler emits bytecode to store a statement's |
|
|
|
|
|
implicit return value to a temporary register in case it is needed. These |
|
|
|
|
|
instructions slow down execution and increase bytecode size unnecessarily.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2>Prefer local variables over external ones</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>When variables are bound to virtual machine registers, identifier lookups |
|
|
|
|
|
are much faster than using explicit property lookups on the global object or |
|
|
|
|
|
on other objects.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>When an external value or function is required multiple times, copy it to |
|
|
|
|
|
a local variable instead:</p> |
|
|
|
|
|
<pre class="ecmascript-code"> |
|
|
|
|
|
function slow(x) { |
|
|
|
|
|
var i; |
|
|
|
|
|
|
|
|
|
|
|
// 'x.length' is an explicit property lookup and happens on every loop |
|
|
|
|
|
for (i = 0; i < x.length; i++) { |
|
|
|
|
|
// 'print' causes a property lookup to the global object |
|
|
|
|
|
print(x[i]); |
|
|
|
|
|
} |
|
|
|
|
|
} |
|
|
|
|
|
|
|
|
|
|
|
function fast(x) { |
|
|
|
|
|
var i; |
|
|
|
|
|
var n = x.length; |
|
|
|
|
|
var p = print; |
|
|
|
|
|
|
|
|
|
|
|
// every access in the loop now happens through register-bound identifiers |
|
|
|
|
|
for (i = 0; i < n; i++) { |
|
|
|
|
|
p(x[i]); |
|
|
|
|
|
} |
|
|
|
|
|
} |
|
|
|
|
|
</pre> |
|
|
|
|
|
|
|
|
|
|
|
<p>Use such optimizations only where it matters, because they often reduce |
|
|
|
|
|
code readability.</p> |
|
|
|
|
|
|
|
|
|
|
|
<h2 id="performance-json-stringify-fastpath">JSON.stringify() fast path</h2> |
|
|
|
|
|
|
|
|
|
|
|
<p>There's a fast path for <code>JSON.stringify()</code> serialization which |
|
|
|
|
|
is used when there is no "replacer" argument and no indent. The fast path |
|
|
|
|
|
assumes that it can serialize the argument value without risking any side |
|
|
|
|
|
effects (which might mutate the value being serialized), and will fall back |
|
|
|
|
|
to the slower default algorithm if necessary. This happens at least when: |
|
|
|
|
|
(1) any object has a <code>.toJSON()</code> property, and (2) any object |
|
|
|
|
|
property is a getter. See: |
|
|
|
|
|
<a href="https://github.com/svaarala/duktape/blob/master/tests/ecmascript/test-bi-json-enc-fastpath.js">test-bi-json-enc-fastpath.js</a> |
|
|
|
|
|
for detailed notes on current limitations; the fast path preconditions and |
|
|
|
|
|
limitations are very likely to change between Duktape releases.</p> |
|
|
|
|
|
|
|
|
|
|
|
<p>The fast path is currently not enabled by default. To enable, ensure |
|
|
|
|
|
<code>DUK_USE_JSON_STRINGIFY_FASTPATH</code> is enabled in your |
|
|
|
|
|
<code>duk_config.h</code>.</p> |
|
|
|
|
|