diff --git a/website/guide/performance.html b/website/guide/performance.html index a4137839..deebfbb0 100644 --- a/website/guide/performance.html +++ b/website/guide/performance.html @@ -1,10 +1,8 @@ + + +
This section discussed Duktape specific performance characteristics and -provides some hints to avoid Duktape specific performance pitfalls.
- -Duktape is an interpreted engine with currently no JIT support. It uses reference counting which makes memory usage tight at the cost of some execution performance. Overall Duktape performance should be similar to other interpreted @@ -12,463 +10,6 @@ languages. However, up to Duktape 1.1 there has been very little performance work and Duktape is significantly slower than e.g. Lua right now. Performance work has been scheduled for spring 2015.
-Strings are -interned: only -a single copy of a certain string exists at any point in time. Interning -a string involves hashing the string and looking up a global string table -to see whether the string is already present. If so, a pointer to the -existing string is returned; if not, the string is inserted into the string -table, potentially involving a string table resize. While a string remains -reachable, it has a unique and a stable pointer which allows byte-by-byte -string comparisons to be converted to simple pointer comparisons. Also, -string hashes are computed during interning which makes the use of string -keys in internal hash tables efficient.
- -There are many downsides also. Strings cannot be modified in-place but
-a copy needs to be made for every modification. For instance, repeated
-string concatenation creates a temporary value for each intermediate string
-which is especially bad if a result string is built one character at a time.
-Duktape internal primitives, such as string case conversion and array
-join()
, try to avoid these downsides by minimizing the number
-of temporary strings created.
The internal memory representation for strings is extended UTF-8, which -represents each ASCII character with a single byte but uses two or more -bytes to represent non-ASCII characters. This reduces memory footprint for -most strings and makes strings easy to interact with in C code. However, -it makes random access expensive for non-ASCII strings. Random access is -needed for operations such as extracting a substring or looking up a -character at a certain character index.
- -Duktape automatically detects pure ASCII strings (based on the fact that -their character and byte length are identical) and provides efficient random -access to such strings.
- -However, when a string contains non-ASCII characters a string cache
-is used to resolve a character index to an internal byte index. Duktape
-maintains a few (internal define DUK_HEAP_STRCACHE_SIZE
,
-currently 4) string cache entries which remember the last byte offset and
-character offset for recently accessed strings. Character index lookups
-near a cached character/byte offset can be efficiently handled by scanning
-backwards or forwards from the cached location. When a string access cannot
-be resolved using the cache, the string is scanned either from the beginning
-or the end, which is obviously very expensive for large strings. The cache
-is maintained with a very simple LRU mechanism and is transparent to both
-Ecmascript and C code.
The string cache makes simple loops like the following efficient:
--var i; -var n = inp.length; -for (i = 0; i < n; i++) { - print(inp.charCodeAt(i)); -} -- -
When random accesses are made from here and there to multiple strings, -the strings may very easily fall out of the cache and become expensive -at least for longer strings.
- -Note that the cache never maintains more than one entry for each -string, so the following would be very inefficient:
--var i; -var n = inp.length; -for (i = 0; i < n; i++) { - // Accessing the string alternatively from beginning and end will - // have a major performance impact. - print(inp.charCodeAt(i)); - print(inp.charCodeAt(n - 1 - i)); -} -- -
As mentioned above, these performance issues are avoided entirely for -ASCII strings which behave as one would expect. More generally, Duktape -provides fast paths for ASCII characters and pure ASCII strings in internal -algorithms whenever applicable. This applies to algorithms such as case -conversion, regexp matching, etc.
- -There is a fast path for reading and writing numeric indices of plain buffer
-values, e.g. x = buf[123]
or buf[123] = x
. The
-fast path avoids coercing the index to a string (here "123"
)
-before attempting a lookup.
There's a similar fast path for buffer objects (Duktape.Buffer, Node.js -Buffer, ArrayBuffer, and TypedArray views), but it is about 20% slower.
- -Object properties are stored in a linear key/value list which provides
-stable ordering (insertion order). When an object has enough properties
-(internal define DUK_HOBJECT_E_USE_HASH_LIMIT
, currently 32),
-a hash lookup table is also allocated to speed up property lookups. Even
-in this case the key ordering is retained which is a practical requirement
-for an Ecmascript implementation. The hash part is avoided for most objects
-because it increases memory footprint and doesn't significantly speed up
-property lookups for very small objects.
For most objects property lookup thus involves a linear comparison -against the object's property table. Because properties are kept in the -property table in their insertion order, properties added earlier are -slightly faster to access than those added later. When the object grows -large enough to gain a hash table this effect disappears.
- -Array elements are stored in a special "array part" to reduce memory
-footprint and to speed up access. Accessing an array with a numeric index
-officially first coerces the number to a string (e.g. x[123]
-to x["123"]
) and then does a string key lookup; when an object
-has an array part no temporary string is actually created in most cases.
The array part can be "sparse", i.e. contain unmapped entries. Duktape
-occasionally rechecks the density of the array part, and it it becomes too
-sparse the array part is abandoned (current limit is roughly: if fewer than
-25% of array part elements are mapped, the array part is abandoned). The
-array entries are then converted to ordinary object properties, with every
-mapped array index converted to an explicit string key (such as
-"123"
), which is relatively expensive. If an array part has
-once been abandoned, it is never recreated even if the object would be dense
-enough to warrant an array part.
Elements in the array part are required to be plain properties (not -accessors) and have default property attributes (writable, enumerable, -and configurable). If any element deviates from this, the array part is -again abandoned and array elements converted to ordinary properties.
- -Duktape has two modes for storing and accessing identifiers (function -arguments, local variables, function declarations): a fast path and a slow -path. The fast path is used when an identifier can be bound to a virtual -machine register, i.e., a fixed index in a virtual stack frame allocated -for a function. Identifier access is then simply an array lookup. The -slow path is used when the fast path cannot be safely used; identifier -accesses are then converted to explicit property lookups on either -external or internal objects, which is more than an order of magnitude -slower.
- -To keep identifier accesses in the fast path:
-When an object is enumerated, with either the for-in
statement
-or Object.keys()
, Duktape first traverses the target object and its
-prototype chain and forms an internal enumeration object, which contains all
-the enumeration keys as strings.
-In particular, all array indices (or character indices in case of
-strings) are converted and interned into string values before enumeration and
-they remain interned until the enumeration completes. This can be memory
-intensive especially if large arrays or strings are enumerated.
Note, however, that iterating a string or an array with for-in
-and expecting the array elements or string indices to be enumerated in an
-ascending order is non-portable. Such behavior, while guaranteed by many
-implementations including Duktape, is not guaranteed by the Ecmascript
-standard.
Ecmascript has a lot of features which make function entry and execution -quite expensive. The general goal of the Duktape Ecmascript compiler is to -avoid all the troublesome features for most functions while providing full -compatibility for the rest.
- -An ideal compiled function has all its variables and functions bound to
-virtual machine registers to allow fast path identifier access, avoids
-creation of the arguments
object on entry, avoids creation of
-explicit lexical environment records upon entry and during execution, and
-avoids storing any lexical environment related control information such as
-internal identifier-to-register binding tables.
The following features have a significant impact on execution performance:
-arguments
object: requires creation of an
- expensive object upon function entry in case it is accessedeval()
: requires initialization of
- the arguments
and full identifier binding information
- needs to be retained in case evaluated code needs itThe following features have a more moderate impact:
-try-catch-finally
statement: the dynamic binding required
- by the catch variable is relatively expensivewith
statement: the object binding required is relatively
- expensiveFunction.prototype.bind()
: function invocation is slowed
- down by handling of bound function objects and argument shufflingTo avoid these, isolate performance critical parts into separate minimal -functions which avoid using the features mentioned above.
- -All temporary strings are interned. It is particularly bad to accumulate -strings in a loop:
--var t = ''; -for (var i = 0; i < 1024; i++) { - t += 'x'; -} -- -
This will intern 1025 strings. Execution time is O(n^2)
-where n
is the loop limit. It is better to use a temporary
-array instead:
-var t = []; -for (var i = 0; i < 1024; i++) { - t[i] = 'x'; -} -t = t.join(''); -- -
Here, x
will be interned once into a function constant, and
-each array entry simply refers to the same string, typically costing only 8
-bytes per array entry. The final Array.prototype.join()
avoids
-unnecessary interning and creates the final string in one go.
Avoid operations which require access to a random character offset inside -a large string containing one or more non-ASCII characters. Such accesses -require use of the internal "string cache" and may, in the worst case, require -a brute force scanning of the string to find the correct byte offset -corresponding to the character offset.
- -Case conversion and other Unicode related operations have fast paths for -ASCII codepoints but fall back to a slow path for non-ASCII codepoints. The -slow path is size optimized, not speed optimized, and often involve linear -range matching.
- -Both buffer objects and plain buffer values have a fast path when buffer -contents are accessed with numeric indices. The fast path for plain buffer -values is a bit faster, so your code will run faster if you get the plain -buffer before iteration:
- --var b, i, n; - -// Buffer object, typeof is 'object' -var bufferValue = new Duktape.Buffer('foo'); -print(typeof bufferValue); // 'object' - -// Get plain buffer, if already plain, no harm -b = bufferValue.valueOf(); -print(typeof b); // always 'buffer' - -n = b.length; -for (i = 0; i < n; i++) { - print(i, b[i]); // fast path for plain buffer access -} -- -
When creating buffers, note that new Duktape.Buffer(x)
always
-creates a Buffer object, while Duktape.Buffer(x)
returns a plain
-buffer value. This mimics how Ecmascript new String()
and
-String()
work. Plain buffers should be preferred whenever
-possible.
If an array becomes too sparse at any point, Duktape will abandon the -array part permanently and convert array properties to explicit string keyed -properties. This may happen for instance if an array is initialized with a -descending index:
--var arr = []; -for (var i = 1000; i >= 0; i--) { - // bad: first write will abandon array part permanently - arr[i] = i * i; -} -- -
Right after the first array write the array part would contain 1001 -entries with only one mapped array element. The density of the array would -thus be less than 0.1%. This is way below the density limit for abandoning -the array part, so the array part is abandoned immediately. At the end the -array part would be 100% dense but will never be restored. Using an ascending -index fixes the issue:
--var arr = []; -for (var i = 0; i <= 1000; i++) { - arr[i] = i * i; -} -- -
Setting the length
property of an array manually does not,
-by itself, cause an array part to be abandoned. To simplify a bit, the array
-density check compares the number of mapped elements relative to the highest
-used element (actually allocated size). The length
property does
-not affect the check. Although setting an array length beforehand may
-effectively pre-allocate an array in some implementations, it has no such
-effect in Duktape, at least at the moment. For example:
-var arr = []; -arr.length = 1001; // array part not abandoned, but no speedup in Duktape -for (var i = 0; i <= 1000; i++) { - arr[i] = i * i; -} -- -
Because the internal enumeration object contains all (used) array
-indices converted to string values, avoid for-in
enumeration
-of at least large arrays. As a concrete example, consider:
-var a = []; -for (var i = 0; i < 1000000; i++) { - a[i] = i; -} -for (var i in a) { - // Before this loop is first entered, a million strings ("0", "1", - // ..., "999999") will be interned. - print(i, a[i]); -} -// The million strings become garbage collectable only here. -- -
The internal enumeration object created in this example would contain a -million interned string keys for "0", "1", ..., "999999". All of these keys -would remain reachable for the entire duration of the enumeration. The -following code would perform much better (and would be more portable, as it -makes no assumptions on enumeration order):
- --var a = []; -for (var i = 0; i < 1000000; i++) { - a[i] = i; -} -var n = a.length; -for (var i = 0; i < n; i++) { - print(i, a[i]); -} -- -
Identifier accesses in global and eval code always use slow path instructions -to ensure correctness. This is at least a few orders of magnitude slower than -the fast path where identifiers are mapped to registers of a function -activation.
- -So, this is slow:
--for (var i = 0; i < 100; i++) { - print(i); -} -- -
Each read and write of i
will be an explicit environment
-record lookup, essentially a property lookup from an internal environment
-record object, with the string key i
.
Optimize by putting most code into a function:
--function main() { - for (var i = 0; i < 100; i++) { - print(i); - } -} -main(); -- -
Here, i
will be mapped to a function register, and each
-access will be a simple register reference (basically a pointer to a
-tagged value), which is much faster than the slow path.
If you don't want to name an explicit function, use:
--(function() { - var i; - - for (i = 0; i < 100; i++) { - print(i); - } -})(); -- -
Eval code provides an implicit return value which also has a performance -impact. Consider, for instance, the following:
--var res = eval("if (4 > 3) { 'foo'; } else { 'bar'; }"); -print(res); // prints 'foo' -- -
To support such code the compiler emits bytecode to store a statement's -implicit return value to a temporary register in case it is needed. These -instructions slow down execution and increase bytecode size unnecessarily.
- -When variables are bound to virtual machine registers, identifier lookups -are much faster than using explicit property lookups on the global object or -on other objects.
- -When an external value or function is required multiple times, copy it to -a local variable instead:
--function slow(x) { - var i; - - // 'x.length' is an explicit property lookup and happens on every loop - for (i = 0; i < x.length; i++) { - // 'print' causes a property lookup to the global object - print(x[i]); - } -} - -function fast(x) { - var i; - var n = x.length; - var p = print; - - // every access in the loop now happens through register-bound identifiers - for (i = 0; i < n; i++) { - p(x[i]); - } -} -- -
Use such optimizations only where it matters, because they often reduce -code readability.
- -There's a fast path for JSON.stringify()
serialization which
-is used when there is no "replacer" argument and no indent. The fast path
-assumes that it can serialize the argument value without risking any side
-effects (which might mutate the value being serialized), and will fall back
-to the slower default algorithm if necessary. This happens at least when:
-(1) any object has a .toJSON()
property, and (2) any object
-property is a getter. See:
-test-bi-json-enc-fastpath.js
-for detailed notes on current limitations; the fast path preconditions and
-limitations are very likely to change between Duktape releases.
The fast path is currently not enabled by default. To enable, ensure
-DUK_USE_JSON_STRINGIFY_FASTPATH
is enabled in your
-duk_config.h
.
See How to optimize performance +for discussion of Duktape performance characteristics and hints to optimize code +for performance.