======================= Low memory environments ======================= Overview ======== This document describes suggested feature options for reducing Duktape memory usage for memory-constrained environments, which are one important portability target for Duktape. The default Duktape options are quite memory conservative, and significant Ecmascript programs can be executed with e.g. 1MB of memory. Currently realistic memory targets are roughly: * 256-384kB flash memory (code) and 256kB system RAM - Duktape compiled with default options is feasible - Duktape compiles to around 200-210kB of code (x86), so 256kB is technically feasible but leaves little space for user bindings, hardware initialization, communications, etc; 384kB is a more realistic flash target * 256-384kB flash memory (code) and 128kB system RAM - Duktape feature options are needed to reduce memory usage - A custom pool-based memory allocation with manually tuned pools may be required - Aggressive measures like lightweight functions, 16-bit fields for various internal structures (strings, buffers, objects), pointer compression, external strings, etc may need to be used * 256kB flash memory (code) and 96kB system RAM - Requires a bare metal system, possibly a custom C library, etc. - http://pt.slideshare.net/seoyounghwang77/js-onmicrocontrollers There are four basic goals for low memory optimization: 1. Reduce Duktape code (flash) footprint. This is currently a low priority item because flash size doesn't seem to be a bottleneck for most users. 2. Reduce initial memory usage of a Duktape heap. This provides a baseline for memory usage which won't be available for user code (technically some memory can be reclaimed by deleting some built-ins after heap creation). 3. Minimize the growth of the Duktape heap relative to the scope and complexity of user code, so that as large programs as possible can be compiled and executed in a given space. Important contributing factors include the footprint of user-defined Ecmascript and Duktape/C functions, the size of compiled bytecode, etc. 4. Make remaining memory allocations as friendly as possible for the memory allocator, especially a pool-based memory allocator. Concretely, prefer small chunks over large contiguous allocations. Suggested feature options ========================= * Use the default memory management settings: although reference counting increases heap header size, it also reduces memory usage fluctuation which is often more important than absolute footprint. * Reduce error handling footprint with one or more of: - ``DUK_OPT_NO_AUGMENT_ERRORS`` - ``DUK_OPT_NO_TRACEBACKS`` - ``DUK_OPT_NO_VERBOSE_ERRORS`` - ``DUK_OPT_NO_PC2LINE`` * If you don't need the Duktape-specific additional JX/JC formats, use: - ``DUK_OPT_NO_JX`` - ``DUK_OPT_NO_JC`` * Features borrowed from Ecmascript E6 can usually be disabled: - ``DUK_OPT_NO_ES6_OBJECT_SETPROTOTYPEOF`` - ``DUK_OPT_NO_ES6_OBJECT_PROTO_PROPERTY`` - ``DUK_OPT_NO_ES6_PROXY`` * If you don't need regexp support, use: - ``DUK_OPT_NO_REGEXP_SUPPORT``. * Duktape debug code uses a large, static temporary buffer for formatting debug log lines. If you're running with debugging enabled, use e.g. the following to reduce this overhead: - ``-DDUK_OPT_DEBUG_BUFSIZE=2048`` More aggressive options ======================= The following may be needed for very low memory environments (e.g. 128kB system RAM): * Consider using lightweight functions for your Duktape/C bindings and to force Duktape built-ins to be lightweight functions: - ``DUK_OPT_LIGHTFUNC_BUILTINS`` * Enable other 16-bit fields to reduce header size; these are typically used together (all or none): - ``DUK_OPT_REFCOUNT16`` - ``DUK_OPT_STRHASH16`` - ``DUK_OPT_STRLEN16`` - ``DUK_OPT_BUFLEN16`` - ``DUK_OPT_OBJSIZES16`` * Enable heap pointer compression, assuming pointers provided by your allocator can be packed into 16 bits: - ``DUK_OPT_HEAPPTR16`` - ``DUK_OPT_HEAPPTR_ENC16`` - ``DUK_OPT_HEAPPTR_DEC16`` * Enable data pointer compression if possible. Note that these pointers can point to arbitrary memory locations (outside Duktape heap) so this may not be possible even if Duktape heap pointers can be compressed: - ``DUK_OPT_DATAPTR16`` - ``DUK_OPT_DATAPTR_ENC16`` - ``DUK_OPT_DATAPTR_DEC16`` - **UNIMPLEMENTED AT THE MOMENT** * Enable C function pointer compression if possible. Duktape compiles to around 200kB of code, so assuming an alignment of 4 this may only be possible if there is less than 56kB of user code. - ``DUK_OPT_FUNCPTR16`` - ``DUK_OPT_FUNCPTR_ENC16`` - ``DUK_OPT_FUNCPTR_DEC16`` - **UNIMPLEMENTED AT THE MOMENT** * Enable struct packing in compiler options if your platform doesn't have strict alignment requirements, e.g. on gcc/x86 you can: - `-fpack-struct=1` or `-fpack-struct=2` Notes on potential low memory measures ====================================== Pointer compression ------------------- Can be applied throughout (where it matters) for three pointer types: * Compressed 16-bit Duktape heap pointers, assuming Duktape heap pointers can fit into 16 bits, e.g. max 256kB memory pool with 4-byte alignment * Compressed 16-bit function pointers, assuming C function pointers can fit into 16 bits * Compressed 16-bit non-Duktape-heap data pointers, assuming C data pointers can fit into 16 bits Pointer compression can be quite slow because often memory mappings are not linear, so the required operations are not trivial. NULL also needs specific handling. Heap headers ------------ * Compressed 16-bit heap pointers * 16-bit field for refcount * Move one struct specific field (e.g. 16-bit string length) into the unused bits of the ``duk_heaphdr`` 32-bit flags field Objects ------- * Tweak growth factors to keep objects always or nearly always compact * 16-bit field for property count, array size, etc. * Drop hash part entirely: it's rarely needed in low memory environments and hash part size won't need to be tracked * Compressed pointers Strings ------- * Use an indirect string type which stores string data behind a pointer (same as dynamic buffer); allow user code to indicate which C strings are immutable and can be used in this way * Allow user code to move a string to e.g. memory-mapped flash when it is interned or when the compiler interns its constants (this is referred to as "static strings" or "external strings") * Memory map built-in strings (about 2kB bit packed) directly from flash * 16-bit fields for string char and byte length * 16-bit string hash * Rework string table to avoid current issues: (1) large reallocations, (2) rehashing needs both old and new string table as it's not in-place. Multiple options, including: - Separate chaining (open hashing, closed addressing) with a fixed or bounded top level hash table - Various tree structures like red-black trees * Compressed pointers Duktape/C function footprint ---------------------------- * Lightweight functions, converting built-ins into lightweight functions * Lightweight functions for user Duktape/C binding functions * Magic value to share native code cheaply for multiple function objects * Compressed pointers Ecmascript function footprint ----------------------------- * Motivation - Small lexically nested callbacks are often used in Ecmascript code, so it's important to keep their size small * Reduce property count: - _pc2line: can be dropped, lose line numbers in tracebacks - _formals: can be dropped for most functions (affects debugging) - _varmap: can be dropped for most functions (affects debugging) * Reduce compile-time maximum alloc size for bytecode: currently each instruction takes 8 bytes, 4 bytes for the instruction itself and 4 bytes for line number. Change this into two allocations so that the maximum allocation size is not double that of final bytecode, as that is awkward for pool allocators. * Improve property format, e.g. ``_formals`` is now a regular array which is quite wasteful; it could be converted to a ``\xFF`` separated string for instance. * Spawn ``.prototype`` on demand to eliminate one unnecessary object per function * Use virtual properties when possible, e.g. if ``nargs`` equals desired ``length``, use virtual property for it (either non-writable or create concrete property when written) * Write bytecode and pc2line to flash during compilation * Compressed pointers Contiguous allocations ---------------------- Unbounded contiguous allocations are a problem for pool allocators. There are at least the following sources for these: * Large user strings and buffers. Not much can be done about these without a full rework of the Duktape C programming model (which assumes string and buffer data is available as plain ``const char *``). * Bytecode/const buffer for long Ecmascript functions: - Bytecode and constants can be placed in separate buffers. - Bytecode could be "segmented" so that bytecode would be stored in chunks (e.g. 64 opcodes = 256 bytes). An explicit JUMP to jump from page to page could make the executor impact minimal. - During compilation Duktape uses a single buffer to track bytecode instructions and their line numbers. This takes 8 bytes per instruction while the final bytecode takes 4 bytes per instruction. This is easy to fix by using two separate buffers. * Value stacks of Duktape threads. Start from 1kB and grow without (practical) bound depending on call nesting. * Catch and call stacks of Duktape threads. Also contiguous but since these are much smaller, they're unlikely to be a problem before the value stack becomes one. Notes on function memory footprint ================================== Normal function representation ------------------------------ In Duktape 1.0.0 functions are represented as: * A ``duk_hcompiledfunction`` (a superset of ``duk_hobject``): represents an Ecmascript function which may have a set of properties, and points to the function's data area (bytecode, constants, inner function refs). * A ``duk_hnativefunction`` (a superset of ``duk_hobject``): represents a Duktape/C function which may also have a set of properties. A pointer to the C function is inside the ``duk_hnativefunction`` structure. In Duktape 1.1.0 a lightfunc type is available: * A lightfunc is an 8-byte ``duk_tval`` with no heap allocations, and provides a cheap way to represent many Duktape/C functions. RAM footprints for each type are discussed below. Ecmascript functions -------------------- An ordinary Ecmascript function takes around 300-500 bytes of RAM. There are three objects involved: - a function template - a function instance (multiple instances can be created from one template) - automatic prototype object allocated for the function instance The function template is used to instantiate a function. The resulting function is not dependent on the template after creation, so that the template can be garbage collected. However, the template often remains reachable in callback style programming, through the enclosing function's inner function templates table. The function instance contains a ``.prototype`` property while the prototype contains a ``.constructor`` property, so that both functions require a property table. This is the case even for the majority of user functions which will never be used as constructors; built-in functions are oddly exempt from having an automatic prototype. Duktape/C functions ------------------- A Duktape/C function takes about 70-80 bytes of RAM. Unlike Ecmascript functions, Duktape/C function are already stripped of unnecessary properties and don't have an automatic prototype object. Even so, there are close to 200 built-in functions, so the footprint of the ``duk_hnativefunction`` objects is around 14-16kB, not taking into account allocator overhead. Duktape/C lightfuncs -------------------- Lightfuncs require only a ``duk_tval``, 8 bytes. There are no additional heap allocations.