============================== Moving objects to code section ============================== For low memory targets it's be very useful to be able to relocate built-in strings, built-in objects, and even user strings/objects into the code section to avoid RAM usage. This document provides some design notes on the issues involved. Misc notes ========== * Built-ins in code section will be read only, but need to share the same structure as their RAM counterparts (for the most part) to avoid complicating internals too much. Any attempt to write over the structs (flags, refcount, heap next/prev link pointers, etc) may be a segfault, depending on the platform. * Because internal structures like ``duk_heaphdr``, ``duk_hstring``, and ``duk_hobject`` are normally writable, static ROM initializers (which are ``const``) must be eventually cast into non-const pointers. This causes harmless warnings and/or requires a cast workaround to silence the warnings. * If objects are moved into the code section, their property tables will become fixed. The objects will be implicitly non-extensible, so it'd make sense to mark them as non-extensible explicitly. * Somewhat unintuitively the properties of non-writable ROM objects need to be "writable" from the Ecmascript attributes perspective to ensure it's possible to create objects inheriting from the ROM objects and establish overriding properties on the objects created. * From a mark-and-sweep perspective the built-ins form an island: there can be references to the built-ins from outside, but the built-ins cannot reference values outside. One implication of this is that when mark-and-sweep encounters a read-only built-in, it can stop marking and doesn't need to recurse further because no heap allocated objects can be reached via the built-ins. * While most built-ins can be read-only, a lot of user code will expect to see a writable global object. A memory conservative approach for this is to use an empty global object which inherits its (unchanged) properties from a ROM-based global object ancestor. * Pointer compression of ROM pointers is non-trivial because ROM pointers are outside Duktape heap and because ROM pointers must also be compressed at compile time to create proper static initializers. Object structure fields ======================= duk_heaphdr ----------- * Can't update flags. - Mark-and-sweep: cannot mark object reachable, temproot, etc. Built-in strings/objects must not be marked "visited". - No finalizer support: cannot mark finalizable, finalized. * Can't update refcount. - Refcount macros: must check and avoid writing refcount; increases code footprint because affects every INCREF/DECREF. If user code is compiled without refcounts this impact is avoided. * Can't use prev/next pointers. - Objects cannot be queued to e.g. refzero or finalization lists. - No refcount handling, no refzero queuing, no finalization queueing. - Pointers are set to NULL. duk_hstring ----------- * String hash will be fixed at dist time. - Can't incorporate a dynamic seed into the string hash. The string hash seed must be dropped or fixed at dist time. - Multiple hash variants must be precomputed. * Strings can be added to the string table, but this would be counterproductive (and unnecessary). - Strings don't have a prev/next pointer at all, string table refers to them from outside. - But adding built-in strings to the string table would consume ~2-4 bytes per string. * If ``duk_hstring`` objects are in code section, their pointers will be outside of Duktape heap. - Current solution is for ROM pointers to be compressed specially, see discussion below. duk_hobject ----------- * Many built-in objects are actually native functions, so the relevant structure is often ``duk_hnativefunction``. * Heap header will have more bits in use, but no effect otherwise. * Property table pointer 'p' is pointer compressed and assumed to be in Duktape heap at the moment. - Solved using ROM pointer compression. * Internal prototype pointer 'prototype' is pointer compressed and assumed to be in Duktape heap at the moment. - Solved using ROM pointer compression. * Entry, array, and hash sizes are not an issue. * If hash part is present (for large objects), need to duplicate hash probing in dist code. For lowmem environments hash part is usually disabled, so this should not be necessary in practice. duk_hobject properties allocation --------------------------------- * There are three layouts: static initializers generated by ``genbuiltins.py`` must duplicate each and choose active layout at compile time. * Property key pointer is *not* compressed at the moment. * Property value format depends on packing. To avoid union initializers may need separate ``duk_tval_string``, ``duk_tval_boolean``, etc struct definitions. - A lot of variants will be needed for packed and unpacked ``duk_tval``, endianness variants for packed ``duk_tval``, etc. - Union initializers would be available assuming C99 which might be reasonable for ROM object support. * Property values may contain heap pointers; these are not compressed so this should not be an issue even if pointers are for built-ins moved to code section (objects or strings). * Property values may be in circular references, so built-in objects may need forward declarations before their actual definition. * Properties can be accessors; ``duk_propvalue`` is a union. Probably best to duplicate into ``duk_propvalue_dataprop`` and ``duk_propvalue_accessorprop`` so that non-union initializers can be used (which is more portable). duk_hnativefunction ------------------- Same issues as ``duk_hobject`` plus the following: * Function pointer should have no issues. * Nargs and magic should have no issues. duk_hcompiledfunction --------------------- Same issues as ``duk_hobject`` plus the following: * Bytecode constants table may refer to ROM builtins, should not be a difficult issue. * Bytecode needs to be precompiled, which is mainly a tooling issue. User strings and objects ======================== The initializers for built-in strings and objects are rather arcane: * To avoid union initializers a lot of concrete initializers where union(s) are replace with specific values are needed. These seem impossible to automate cleanly with e.g. macros. * At least strings and property tables have variable size initializers. Property tables further have multiple initializer variants depending on chosen object memory layout. The best approach for supporting user strings and objects so far is to include them in the dist (genbuiltins) processing: * User build script runs the dist process, giving YAML metadata file(s) to provide information about user strings and/or objects. * The user strings and objects are merged with the active built-in strings and objects and then processed normally: all layout variants are generated, arcane initializers are generated etc. The downside of this that a fresh dist is required as part of the user build, but this will probably become the normal process for low memory targets anyway to support proper optional built-ins. Practical issues ================ Avoiding writes on read-only objects ------------------------------------ * Refcount operations * Setting/clearing heaphdr or object/string flags * Reallocating object ``props`` allocation (e.g. compact) * Inserting, deleting, and modifying properties * Compact, seal, freeze * ``setPrototypeOf``: from Ecmascript code this is prevented because the ROM built-ins are not extensible and setPrototypeOf() fails if the target is not extensible. * ``duk_set_prototype``: from C code allowed for any object, must reject this for read-only objects explicitly. Dynamic initializations ----------------------- * ``Duktape.modLoaded`` is established using an initialization JS script. This won't work with a read-only Duktape; modLoaded has to be part of init data. * ``Duktape.errCreate`` and ``Duktape.errThrow`` must be established when creating the ROM built-ins as they cannot be set at runtime. * ``Date.prototype.toGMTString`` must be the *same object reference* as ``Date.prototype.toUTCString``. For RAM built-ins this was handled by omitting ``toGMTString`` from the init data and inserting it during init. For ROM built-ins both have to be present from the beginning, but must point to the same object. Compile time pointer compression -------------------------------- If ROM strings/objects are enabled and pointer compression is used, Duktape must be able to compress and decompress ROM pointers pointing to strings and objects using the user-supplied compression macros. This poses a few issues: * ROM pointers are outside the Duktape heap which must be accounted for in the compression/decompression macros. A simple approach is to dedicate a certain compressed pointer value range for ROM pointers. * Compressed ROM pointers are needed to express the static ROM initializers so ROM pointers must compress to values *known at compile time*. There are at least two basic approaches to this: - Require user code to provide a pointer compression macro which supports ROM pointers and can be computed at compile time (which is critical so that the string/object initializer can go into the read only section). This can get very messy. - Collect a list of ROM pointers in need of compression during genbuiltins (the list itself also going to ROM). Expose this list to the user provided compression macros which can then determine if a certain pointer is a ROM pointer, and use the list index to compute a compressed pointer. For example, if the range ``[0xf800,0xffff]`` is dedicated to compressed ROM pointers, the first ROM pointer in the list would compress to 0xf800 + 0 = 0xf800, the second to 0xf800 + 1 = 0xf801, etc. Pointer decompression would similarly recognize that range and use the ROM pointer list to decompress the pointer. Finally, genbuiltins can also compress pointers to that range, assuming it knows the base value (0xf800 here). For now the approach is based on that ROM pointer table; the integration with user code is not (yet) very clean, see: * ``examples/cmdline/duk_cmdline_ajduk.c``