================= Memory management ================= Overview ======== Duktape memory management is based on the following basic concepts: * **Allocation functions**. The user provides a set of functions for allocating, reallocating, and freeing blocks of memory. These "raw" functions can be used directly, but the implementation also provides variants which behave the same as the raw functions externally but force a garbage collection if an allocation attempt fails due to out of memory. Both of these variants are used internally, and can also be used by external code. * **Heap element tracking**. Actual memory management happens on the heap level. Heap elements are tracked after being allocated, which allows unreachable elements to be freed by reference counting and/or mark-and-sweep garbage collection. Freeing a heap causes all related allocations to be freed, regardless of their reference count or reachability. * **Reference counting and/or mark-and-sweep**. These algorithms are used to detect which heap elements can be freed. A finalizer method may be executed when an element is about to be freed by reference counting or mark-and-sweep. This document covers the memory management related aspects of the implementation: * The raw allocation functions and their behavior * The heap memory layout (for "tracked" allocations) * Details of the reference counting algorithm * Details of the mark-and-sweep algorithm * Implementation notes, such as how to manage reference counting correctly, how code must be structured to work correctly with potential ``longjmp()``\ s, etc Duktape supports three basic models for memory management; one of these is selected during build: #. Reference counting and mark-and-sweep, has reclamation for reference loops #. Reference counting alone, has no reclamation for reference loops #. Mark-and-sweep alone, has reclamation for reference loops but memory usage fluctuates considerably between mark-and-sweep collections At a high level, the implementation code must ensure that reference counts and heap element reachability are consistently and correctly updated where reference relationships are changed. In particular, reachability and reference counts must be correct whenever an operation which may cause a ``longjmp()`` or a garbage collection is performed. This is very tricky in practice. There is a "GC torture" compilation option to shake out memory management bugs. Some terminology ================ Heap element The term "heap-allocated element" or "heap element" is used to refer broadly to all memory allocations which are automatically tracked. The term "heap-allocated object" or "heap object" is not used because it is easy to confuse with other notions of an "object". In particular, all Ecmascript objects are heap elements, but there are other heap element types too. Heap-allocated elements subject to memory management are: * ``duk_hstring`` * ``duk_hobject`` and its subtypes * ``duk_hbuffer`` Only ``duk_hobject`` contains further internal references to other heap elements. These references are kept in the object property table and the object internal prototype pointer. Currently only ``duk_hobject`` may have a finalizer. Heap elements have a **stable pointer** which means that the (main) heap element is not relocated during its lifetime. Auxiliary allocations referenced by the heap element (such as an object property table) can be reallocated/relocated. Reference A pointer from a source heap element to a target heap element. The reference count of the target heap element must be incremented when a reference is created and decremented when the reference is removed. Only ``duk_hobject`` heap elements currently contain references, either through object properties (keys and values) or the object internal prototype reference. Borrowed reference A pointer from a source heap element to a target heap element which is not reflected in the target element's reference count. Borrowed references can be used when an actual reference is guaranteed to exist somewhere while the borrowed reference is in use. If this cannot be guaranteed, the resulting bugs will be very difficult to diagnose. Weak reference A pointer to a target heap element which is not reflected in the target element's reference count. A weak reference can exist even when no other references to the target exists, and does not prevent collection of the target. However, if the target is collected, the weak reference must be deleted to avoid dangling pointers. Currently there is no user visible support for weak references as such. Weak references would be useful for e.g. cache data structures. However, there are specialized internal weak references which need to be taken into account. For instance, there is a "string access cache" which optimizes access to individual characters of strings. This cache weakly references heap strings and must be updated when strings are collected. Finalizer Objects (``duk_hobject`` and its subtypes) stored in the heap may have a finalizer, which is called when the object is about to be freed. This allows user code to e.g. free native resources related to the object. A finalizer could, for instance, close a native socket or free memory allocated outside Duktape tracking. Finalizers are not required or supported by the E5 standard. Finalizers require a separate implementation mechanism for reference counting and mark-and-sweep; these two implementations need to coexist peacefully. Allocation functions ==================== Raw functions ------------- When creating an ``duk_heap``, three memory allocation related functions are associated with the heap: ``alloc``, ``realloc``, and ``free``. The related typedefs are:: typedef void *(*duk_alloc_function) (void *udata, size_t size); typedef void *(*duk_realloc_function) (void *udata, void *ptr, size_t size); typedef void (*duk_free_function) (void *udata, void *ptr); The semantics of these functions are essentially the same as their ANSI C equivalents. In particular: * The return value for a zero-sized ``alloc`` and ``realloc`` may be ``NULL`` or some non-``NULL``, unique pointer value. Whatever the return value is, it must be accepted by ``realloc`` and ``free``. * ``realloc(NULL, size)`` is equivalent to ``malloc(size)``. * ``realloc(ptr, 0)`` is equivalent to ``free(ptr)`` (assuming ``ptr`` is not ``NULL``), and must either return ``NULL`` or some non-``NULL`` unique pointer value accepted by ``realloc`` and ``free``. * ``free(NULL)`` is a no-op. The default implementations map directly to the corresponding ANSI C functions (``udata`` is ignored). If the platform allocator does not fulfill the ANSI C requirements, replacement functions must be provided by user code. The memory returned by the allocation and reallocation functions must be properly aligned to support Duktape data structures. In particular, it must be possible, as far as alignment is concerned, to store a ``double`` or an ``int64_t`` at the start of the returned memory. This does always imply alignment by 8: on x86 there is usually no alignment requirement at all, while on ARM alignment by 4 usually suffices. Even when not strictly required, some level of alignment is often good for performance. (Technically these alignment requirements differ from the ANSI C requirements, especially when the allocation size is smaller than 8 bytes, but these cases don't really matter with Duktape.) Internal macros --------------- The following internal macros use the raw allocation functions and do not trigger garbage collection or any other side effects: * ``DUK_ALLOC_RAW`` * ``DUK_REALLOC_RAW`` * ``DUK_FREE_RAW`` The natural downside of using these functions is that an allocation or reallocation may fail even if some memory would be available after a garbage collection. The following internal macros may trigger a garbage collection (even when not strictly out of memory): * ``DUK_ALLOC``, ``DUK_ALLOC_CHECKED`` * ``DUK_ALLOC_ZEROED``, ``DUK_ALLOC_CHECKED_ZEROED`` * ``DUK_REALLOC``, ``DUK_REALLOC_CHECKED`` * ``DUK_REALLOC_INDIRECT``, ``DUK_REALLOC_INDIRECT_CHECKED`` * ``DUK_FREE``, ``DUK_FREE_CHECKED`` Triggering a garbage collection has a wide set of possible side effects. If a finalizer is executed, arbitrary Ecmascript or even native code may run. Garbage collection side effects are discussed in detail in a separate section below. Memory reallocation (e.g. ``DUK_REALLOC()``) has a particularly nasty interaction with garbage collection. Mark-and-sweep side effects may potentially change the original pointer being reallocated. This must be taken into account when retrying the reallocation operation. There is a separate macro for these cases, ``DUK_REALLOC_INDIRECT()``, see detailed discussion below. Note that even if user code is allocating buffers to be used outside of automatic memory management, the garbage collection triggering variants are usually preferable because memory pressure is then communicated properly between user allocations and Duktape managed allocations. Use the raw variants only when invoking a garbage collection would be detrimental; this is rarely the case, especially for user code. Because a (non-raw) memory allocation or reallocation may invoke garbage collection, any function or macro call which allocates memory directly or indirectly may have such side effects. Any direct or indirect checked memory allocations may also throw an out-of-memory error (leading ultimately to a ``longjmp()``). Public API ---------- The heap-associated memory allocation functions can also be called by user code through the exposed API. This is useful for e.g. C functions which need temporary buffers. Note, however, that such allocations are, of course, not automatically managed so care must be taken to avoid memory leaks caused by e.g. errors (``longjmp()``\ s) in user code and the functions it calls. The raw API calls behave essentially as direct wrappers for the memory management functions registered into the heap. The API calls providing garbage collection are unchecked and simply return a ``NULL`` on errors. A ``NULL`` is only returned when an allocation request cannot be satisfied even after garbage collection. Expect in fatal errors, the API calls are guaranteed to return and will hide e.g. errors thrown by finalizer functions. Another alternative, perhaps more robust, is to push a ``buffer`` object into the value stack; the buffer will be automatically memory managed. Also, if the buffer is a fixed size one, a stable pointer can be obtained after allocation and passed anywhere in user code without further checks. The buffer is viable until it is no longer reachable (i.e. is pushed off the value stack and is not stored in any reachable object or variable). The public API is:: /* no garbage collection */ void *duk_alloc_raw(duk_context *ctx, size_t size); void duk_free_raw(duk_context *ctx, void *ptr); void *duk_realloc_raw(duk_context *ctx, void *ptr, size_t size); /* may cause garbage collection, doesn't longjmp() */ void *duk_alloc(duk_context *ctx, size_t size); void duk_free(duk_context *ctx, void *ptr); void *duk_realloc(duk_context *ctx, void *ptr, size_t size); DUK_REALLOC() issues with mark-and-sweep; DUK_REALLOC_INDIRECT() ---------------------------------------------------------------- There is a subtle gotcha when using DUK_REALLOC(). If the initial attempt to reallocate fails, the DUK_REALLOC() implementation will trigger a mark-and-sweep and then retry the reallocation. This does not work if the mark-and-sweep may have an effect on the original pointer being reallocated. In that case, the second attempt to reallocate will use an invalid "original pointer"! A more conrete example of reallocating a valuestack (``thr->valstack``): * Calling code calls ``DUK_REALLOC(thr, thr->valstack, new_size)``. Assume that the value of ``thr->valstack`` is ``P1`` at this point. * The ``DUK_REALLOC()`` implementation attempts to use the raw realloc, giving ``P1`` as its pointer argument. This attempt fails. * A mark-and-sweep is triggered. The mark-and-sweep invokes a number of finalizer methods, which cause **the same valstack** to be resized. This resize succeeds, and ``thr->valstack`` pointer is updated to ``P2``. * The ``DUK_REALLOC()`` implementation retries the raw realloc, again giving ``P1`` as its pointer argument. Here, ``P1`` is a garbage pointer and the realloc call has undefined behavior. The correct pointer for the second realloc would be ``P2``. However, the helper behind the macro doesn't know where the pointer came from. A naive approach is to use an indirect realloc function which gets a pointer to the storage location of the pointer being reallocated (e.g. ``(void **) &thr->valstack``). The realloc implementation then re-lookups the current pointer right before every reallocation, which works correctly even if the pointer has changed by garbage collection. Note that heap headers have stable pointers so that the header which contains the pointer is never relocated so the location of the pointer itself never changes. Even so, this approach suffers from C type-punning and strict aliasing issues. Such issues could be fixed by changing all the base pointers to a union but this would be very invasive, of course. The current solution is to use an indirect realloc function which gets a callback function with a userdata pointer as its argument. The callback is used to request for the current value of the pointer being reallocated. This bloats code to be strict aliasing compatible, but is the most portable way. Implications: * DUK_REALLOC_RAW() can be used reliably for anything, but is not guaranteed to succeed (even if memory would be available after garbage collection). * DUK_REALLOC() can be used reliably for pointers which are guaranteed not to be affected by mark-and-sweep -- considering that mark-and-sweep runs arbitrary code, including even arbitrary native function, e.g. as part of object finalization. * DUK_REALLOC_INDIRECT() (or DUK_ALLOC() + DUK_FREE()) should be used for pointers which are not stable across a mark-and-sweep. The storage location of such pointers must be stable, e.g. reside in the meain allocation of a heap object. Heap structure ============== Overview -------- All heap-allocated elements must be recorded in the ``duk_heap``, either as part of the string table (for ``duk_hstring`` elements) or as part of the "heap allocated" list (or temporary work queues). This is required so that all allocated elements can always be enumerated and freed, regardless of their reference counts or reachability. Heap elements which are currently in use somewhere must have a positive reference count, and they must be reachable through the actual reachability roots starting from the ``duk_heap`` structure. These form the actual reachability graph from a garbage collection point of view; any objects tracked by the heap but not part of the reachability graph are garbage and can be freed. Such objects, assuming reference counts are correct, either have a zero reference count or belong to a reference cycle. The following figure summarizes the elements managed by a single heap structure, with arrows indicating basic reachability or ownership relationships:: All non-string heap elements reside in one of the following object lists: * "heap allocated" * "refzero work list" * "mark-and-sweep finalization work list" +-------------+ h_next +-------------+ h_next .------>| duk_hobject |<-------->| duk_hbuffer |<--------> ... | +-------------+ (h_prev) +-------------+ (h_prev) | +==========+ (Above illustrates "heap allocated", there are | duk_heap | similar lists for "refzero" and "finalization") +==========+ | | | | | | All duk_hstrings reside in the string table. | | | | +--------+ | | : string : +-------------+ | +------>: intern :----->| duk_hstring | | | : table : +-------------+ | | +--------+ ^ ^ | | | : | | +------+ | : | +------>: strs :-----------' : | | +------+ (built-in : | | strings) : | | +--------+ : | `------>: string : : | : access :- - - - - - - - ' | : cache : (weak refs) | +--------+ | | | (reachability graph roots) | | +-------------+ +---> | duk_hthread | heap_thread: internal thread, also used | +-------------+ for (some) finalization | | +-------------+ `---> | duk_hthread | curr_thread: currently running thread +-------------+ | | | +----------+ +-------------+ +--->: builtins :----->| duk_hobject | | +----------+ +-------------+ | | | +--> object properties | | | `--> (type specific) +--> object properties | +--> value stack | +--> call stack | +--> catch stack | +-------------+ `--> resumer -----------> | duk_hthread | (another duk_hthread +-------------+ or NULL) Notation:: +=====+ +-----+ +-----+ | xxx | | xxx | : xxx : +=====+ +-----+ +-----+ backbone heap element auxiliary (Many details are omitted from the figure; for instance, there are back pointers and duplicate pointers for faster access which are not illustrated at all.) The primary memory management models relate to the figure as follows (omitting details such as recursion depth limits, finalization, interaction between reference counting and mark-and-sweep, etc): * Reference counting works by inspecting a reference count field which is a part of the header of every heap allocated element (including strings). Whenever a reference is removed, the reference count of the target is decreased, and if the reference count becomes zero, the target object can be freed. Before freeing, any outgoing references from object must be iterated and the reference count of the target heap elements needs to be decreased, possibly setting off a cascade of further "refzero" situations. Note that incoming references don't need to be considered: if reference counts are correct and the reference count of the current object is zero, there cannot be any live incoming references. * Mark-and-sweep works by traversing the reachability graph originating from the ``duk_heap`` structure referenced, marking all reachable objects, and then walking the comprehensive "heap allocated" list to see which objects are unreachable and can be freed. The only "backbone" element which is not itself a heap element is the ``duk_heap`` object. Heap elements include both internal and external objects which may reference each other in an arbitrary conceptual graph. Finally, auxiliary elements are either struct members or additional allocations "owned" by the main heap element types. They are an integral part of their parent element and cannot be referenced directly by other elements. They are freed when their parent is freed. The primary roots for reachability are the threads referenced by the heap object. In particular, the currently running thread is reachable, and the thread structure maintains a pointer to the thread which resumed the current thread (if any). All heap element references ultimately reside in: * Object properties * Thread value stack * Thread call stack * Thread catch stack * Thread resumer reference * Compiled function constant table * Compiled function inner function table These references form the heap-level reachability graph, and provides the basis for mark-and-sweep collection. There are, of course, temporary references to both heap-allocated and non-heap-allocated memory areas in CPU registers and the stack frames of the C call stack. Such references must be very carefully maintained: an abrupt completion (concretely, a ``longjmp()``) will unwind the C stack to some catch point (concretely, a ``setjmp()``) and any such references are lost. Also, any unreachable heap elements may be freed if a mark-and-sweep is triggered directly or indirectly. See separate discussion on error handling and memory management. Heap elements ------------- All heap tracked elements have a shared header structure, ``duk_heaphdr``, defined in ``duk_heaphdr.h``. String elements use a smaller ``duk_heaphdr_string`` header which is a prefix of ``duk_heaphdr``. The difference between these two headers is that ``duk_heaphdr_string`` does not contain next/previous links required to maintain heap allocated objects in a single or double linked list. These are not needed because strings are always kept in the heap-level string intern table, and are thus enumerable (regardless of their reachability) through the string intern hash table. Heap-allocated elements are always allocated with a fixed size, and are never reallocated (and hence never moved) during their life cycle. This allows all heap-allocated elements to be pointed to with *stable pointers*. Non-fixed parts of an element are allocated separately and pointed to by the main heap element. Such allocations are "owned" by the heap element and are automatically freed when the heap element is freed. The upside of having stable pointers is simplicity and compatibility with existing allocators. The downside is that memory fragmentation may become an issue over time because there is no way to compact the heap. The full size of the fixed part of the heap element needs to be known at the time of allocation. Normally, heap elements are typed by the tagged value (``duk_tval``) which holds the heap pointer, or if the heap element reference is in a struct field, the field is usually already correctly typed through its C type (e.g. a field might have the type "``duk_hcompiledfunction *``"). However, heap elements do have a "heap type" field as part of the ``h_flags`` field of the header; this is not normally used, but is needed by e.g. reference counting. As a separate issue, some heap types (such as ``duk_hobject``) have "sub-types" with various extended memory layouts; these are not reflected in the heap type. The current specific heap element types are: * ``duk_hstring`` (heap type ``DUK_HTYPE_STRING``): + Fixed size allocation consisting of a header with string data following the header. Header does not contain next/previous pointers (uses ``duk_heaphdr_string``). + No references to other heap elements. * ``duk_hobject`` (heap type ``DUK_HTYPE_OBJECT``): + Fixed size allocation consisting of a header, whose size depends on the object type (``duk_hobject``, ``duk_hthread``, ``duk_hcompiledfunction``, or ``duk_hnativefunction``). + The specific "sub type" and its associated struct definition can be determined using object flags, using the macros: - ``DUK_HOBJECT_IS_COMPILEDFUNCTION`` - ``DUK_HOBJECT_IS_NATIVEFUNCTION`` - ``DUK_HOBJECT_IS_THREAD`` - If none of the above are true, the object is a plain object (``duk_hobject`` without any extended structure) + Properties are stored in a separate, dynamic allocation, and contain references to other heap elements. + For ``duk_hcompiledfunction``, function bytecode, constants, and references to inner functions are stored in a fixed ``duk_hbuffer`` referenced by the ``duk_hcompiledfunction`` header. These provide further references to other heap elements. + For ``duk_hthread`` the heap header contains references to the value stack, call stack, catch stack, etc, which provide references to other heap elements. * ``duk_hbuffer`` (heap type ``DUK_HTYPE_BUFFER``): + Fixed buffer (``DUK_HBUFFER_HAS_DYNAMIC()`` is false): - Fixed size allocation consisting of a header with buffer data following the header. + Dynamic buffer (``DUK_HBUFFER_HAS_DYNAMIC()`` is true): - Fixed size allocation consisting of a header with a pointer to the current buffer allocation following the header. - Buffer data is allocated separately and the buffer may be resized. The address of the buffer data may change during a resize. + No references to other heap elements. String table ============ String interning ---------------- All strings are `interned`__ into the hash level string table: only one, immutable copy of any particular string is ever stored at a certain point in time. .. __: http://en.wikipedia.org/wiki/String_interning When a new string is constructed e.g. by string concatenation, the string table is checked to see if the resulting string has already been interned. If yes, the existing string is used; if not, the string is added to the string table. Regardless, the string is represented by an ``duk_hstring`` pointer which is stable for the lifetime of the string. String interning has many nice features: * When a string is interned, precomputations can be done and stored as part of the string representation. For example, a string hash can be precomputed and used elsewhere in e.g. hash tables. Other precomputations would also be possible, e.g. numeric conversions (not currently used). * Strings can be compared using direct pointer comparisons without comparing actual string data, since at any given time, a given string can only have one ``duk_hstring`` instance with a stable address. * Memory is saved for strings which occur multiple times. For instance, object properties of the same name are simply referenced with a string pointer instead of storing multiple instances of the same property name. But, there are downsides as well: * String manipulation is slower because any intermediate, referenceable results need to be interned (which implies string hashing, a lookup from the string table, etc). This can be mitigated e.g. by doing string concatenation of multiple parts in an atomic fashion. * For small strings which only occur once or twice in the heap, there is additional overhead in the interned ``duk_hstring`` heap element compared to simply storing the string in an object's property table, for instance. * Using string values as "data buffers" which are continuously manipulated (appended or predended to, sliced, etc) is very inefficient and causes a lot of garbage collection churn. Buffer objects should be used instead, but these are not part of the Ecmascript standard. Memory management of strings ---------------------------- Interned strings are garbage collected normally when they are no longer needed. They are later re-interned if they are needed again; at this point they usually get a different pointer than before. String table algorithm ---------------------- The string table structure is similar to the "entry part" of the ``duk_hobject`` property allocation: * Closed hash table (probe sequences). Probe sequences use an initial index based on string hash value, and a probe step looked up from a precomputed table of step values using a string hash value based index. * Hash table size is rounded upwards to a prime in a precomputed sequence. Hash table load factor is kept within a certain range by resizing whenever necessary. * Deleted entries are explicitly marked DELETED to avoid breaking hash probe chains. DELETED entries are eliminated on rehashing, and are counted as "used" entries before a resize to ensure there are always NULL entries in the string table to break probe sequences. For more details, see: * ``hstring-design.txt`` for discussion on the string hash algorithm. * ``hobject-design.txt``, entry part hash algorithm, for discussion on the basic closed hash structure. .. note:: This discussion should be expanded. Reference counting ================== Introduction ------------ For background, see: * http://en.wikipedia.org/wiki/Reference_counting In basic reference counting each heap object has a reference count field which indicates how many other objects in the heap point to this object. Whenever a new reference is created, its target object's reference count is incremented; whenever a reference is destroyed, its target object's reference counter is decreased. If a reference count goes to zero when it is decreased, the object can be freed directly. When the object is freed, any heap objects it refers to need to have their reference counts decremented, which may trigger an arbitrarily long chain of objects to be freed recursively. There are variations of reference counting where objects are not freed immediately after their reference count goes to zero. Objects-to-be-freed can be managed in a work list and freed later. However, for our purposes it is useful to free any reference counted objects as soon as possible (otherwise we could just use the mark-and-sweep collector). There are also reference counting variants which handle reference loops correctly without resorting to mark-and-sweep. These seem to be too complex in practice for a small interpreter. Reference counting increases code size, decreases performance due to reference count updates, and increases heap header size for every object. On the other hand it minimizes variance in memory usage (compared to plain mark-and-sweep, even an incremental one) and is very useful for small scripts running without a pre-allocated heap. Reference counting also reduces the impact of having non-relocatable heap elements: memory fragmentation still happens, but is comparable to memory fragmentation encountered by ordinary C code. Reference count field --------------------- The reference count field is embedded into the ``duk_heaphdr`` structure whose layout varies depending on the memory management model chosen for the build. The reference count field applies to all heap allocated elements, including strings, so it appears in the header before the next/previous pointers required for managing non-string heap elements. The current struct definitions are in ``duk_heaphdr.h``. Two structures are defined: * ``duk_heaphdr``: applies to all heap elements except strings. * ``duk_heaphdr_string``: applies to strings, beginning of struct matches ``duk_heaphdr``. The reference count field must have enough bits to ensure that it will never overflow. This is easy to satisfy by making the field as large as a data pointer type. Currently ``size_t`` is used which is technically incorrect (one could for instance have a platform with maximum allocation size of 32 bits but a memory space of 64 bits). Reference count macros ---------------------- Macros: * ``DUK_TVAL_INCREF`` * ``DUK_TVAL_DECREF`` * ``DUK_HEAPHDR_INCREF`` * ``DUK_HEAPHDR_DECREF`` * and a bunch of heap element type specific INCREF/DECREF macros, defined in ``heaphdr.h`` Notes on macro semantics: * The macros tolerate ``NULL`` pointers, which are simply ignored. This reduces caller code size but requires a pointer check which is unnecessary in the vast majority of cases. * An ``INCREF`` is guaranteed not to have any side effects. * A ``DECREF`` may have a wide variety of side effects. + ``DECREF`` may free the target object and an arbitrary number of other objects whose reference count drops to zero as a result. + If a finalizer is invoked, arbitrary C or Ecmascript code is executed which may have essentially arbitrary side effects, including triggering the mark-and-sweep garbage collector. + The mark-and-sweep garbage collector may also be voluntarily invoked at the end of "refzero" handling. + Any ``duk_tval`` pointers pointing to dynamic structures (like a value stack) may be invalidated; heap element pointers are not affected because they are stable. See discussion on "side effects" below for more particulars on the implementation impact. Updating reference counts ------------------------- Updating reference counts is a bit tricky. Some important rules: * Whenever a ``longjmp()`` or garbage collection may occur, reachability and reference counts must be correct. * If a reference count drops to zero, even temporarily, the target is *immediately* freed. If this is not desired, ``INCREF``/``DECREF`` order may need to be changed. * A ``DECREF`` call may invalidate *any* ``duk_tval`` pointers to resizable locations, such as the value stack. It may also invalidate indices to object property structures if a property allocation is resized. So, ``DECREF`` must be called with utmost care. Note that it is *not enough* to artificially increase a target's reference count to prevent the object from being freed, at least when mark-and-sweep collection is also enabled. Mark-and-sweep may be triggered very easily, and *will* free an unreachable object, regardless of its reference count, unless specific measures are taken to avoid it. In fact, mark-and-sweep *must* collect unreachable objects with a non-zero reference count, to deal with reference loops which cannot be collected using reference counting alone. Even if mark-and-sweep issues were avoided (perhaps with a flag preventing collection), if a reference count is artificially increased without there being a corresponding, actual heap-based reference to the target, there must be a guarantee that the reference count is also decreased later. This would require a ``setjmp()`` catchpoint. Specific considerations: * ``DECREF`` + ``INCREF`` on the same target object is dangerous. If the refcount drops to zero between the calls, the object is freed. It's usually preferable to do ``INCREF`` + ``DECREF`` instead to avoid this potential issue. The INCREF algorithm -------------------- The ``INCREF`` algorithm is very simple: 1. If the target reference is ``NULL`` or the target is not a heap element, return. 2. Increase the target's reference count by one. The practical implementation depends on whether ``INCREF`` is used on a tagged value pointer or a heap element pointer. The DECREF algorithm -------------------- The ``DECREF`` algorithm is a bit more complicated: 1. If the target reference is ``NULL`` or the target is not a heap element, return. 2. Decrease the target's reference count by one. 3. If the reference count dropped to zero: a. If mark-and-sweep is currently running, ignore and return. (Note: mark-and-sweep is expected to perform a full reachability analysis and have correct reference counts at the end of the mark-and-sweep algorithm.) b. If the target is a string: 1. Remove the string from the string table. 2. Remove any references to the string from the "string access cache" (which accelerates character index to byte index conversions). Note that this is a special, internal "weak" reference. 3. Free the string. There are no auxiliary allocations to free for strings. 4. Return. c. If the target is a buffer: 1. Remove the buffer from the "heap allocated" list. 2. If the buffer is dynamic, free the auxiliary buffer (which is allocated separately). 3. Free the buffer. 4. Return. d. Else the target is an object: 1. Move the object from the "heap allocated" list to the "refzero" work list. Note that this prevents the mark-and-sweep algorithm from freeing the object (the "sweep" phase does not affect objects in the "refzero" work list). 2. If the "refzero" algorithm is already running, return. 3. Else, call the "refzero" algorithm to free pending objects. The refzero algorithm returns when the entire work list has been successfully cleared. 4. Return. The REFZERO algorithm --------------------- The ``DECREF`` algorithm ensures that only one instance of the "refzero" algorithm may run at any given time. The "refzero" work list model is used to avoid an unbounded C call stack depth caused by a cascade of reference counts which drop to zero. The algorithm is as follows: 1. While the "refzero" work list is not empty: a. Let ``O`` be the element at the head of the work list. Note: * ``O`` is always an object, because only objects are placed in the work list. * ``O`` must not be removed from the work list yet. ``O`` must be on the work list in case a finalizer is executed, so that a mark-and-sweep triggered by the finalizer works correctly (concretely: to be able to clear the ``DUK_HEAPHDR_FLAG_REACHABLE`` of the object.) b. If ``O`` is an object (this is always the case, currently), and has a finalizer (i.e. has a ``_finalizer`` internal property): 1. Create a ``setjmp()`` catchpoint. 2. Increase the reference count of ``O`` temporarily by one (back to 1). 3. Note: the presence of ``O`` in the "refzero" work list is enough to guarantee that the mark-and-sweep algorithm will not free the object despite it not being reachable. 4. Call the finalizer method. Ignore the return value and a possible error thrown by the finalizer (except for debug logging an error). Any error or other ``longjmp()`` is caught by the ``setjmp()`` catchpoint. Note: * The thread used for finalization is currently the thread which executed ``DECREF``. *This is liable to be changed later.* 5. Regardless of how the finalizer finishes, decrease the reference count of ``O`` by one. 6. If the reference count of ``O`` is non-zero, the object has been "rescued" and: a. Place the object back into the "heap allocated" list (and debug log the object as "rescued"). b. Continue the while-loop with the next object. c. Remove ``O`` from the work list. d. Call ``DECREF`` for any references that ``O`` contains (this is called "refcount finalization" in the source). Concretely: * String: no internal references. * Buffer: no internal references. * Object: properties contain references; specific sub-types (like ``duk_hthread``) contain further references. * Note: this step is recursive with respect to ``DECREF`` but not the "refzero" algorithm: a ``DECREF`` is executed inside a ``DECREF`` which started the "refzero" algorithm, but the inner ``DECREF`` doesn't restart the "refzero" algorithm. Recursion is thus limited to two levels. e. Free any auxiliary references (such as object properties) contained in ``O``, and finally ``O`` itself. 2. Check for a voluntary mark-and-sweep. Notes: * "Churning" the work list requires that the type of a heap element can be determined by looking at the heap header. + This is one of the rare places where this would be necessary: usually the tagged type of a ``duk_tval`` is sufficient to type an arbitrary value, and when following pointer references from one heap element to another, the pointers themselves are typed. + Right now, this type determination is not actually needed because only object (``duk_hobject``) values will be placed in the work list. * The finalizer thread selection is not a trivial issue, especially for mark-and-sweep. See discussion under mark-and-sweep. * Because the reference count is artifially increased by one during finalization, the object being finalized cannot encounter a "refcount drops to zero" situation while being finalized (assuming of course that all ``INCREF`` and ``DECREF`` calls are properly "nested"). * If mark-and-sweep is triggered during finalization, the target may or may not be reachable, but will have a non-zero reference count in either case due to the artificial ``INCREF`` in the finalization algorithm. The reference count is inconsistent with the actual reference count in the reachability graph but this is not an issue for mark-and-sweep. In any case, mark-and-sweep will not free any object in the "refzero" work list, regardless of its reachability status, so mark-and-sweep during REFZERO is not a problem. * Although finalization increases C call stack size, another finalization triggered by reference counting cannot occur while finalization for one object is in progress: any objects whose refcounts drop to zero during finalization are simply placed in the refzero work list and dealt with when the object being finalization has been fully processed. However, there can still be **two** active finalizers at the same time, one initiated by reference counting and another by a mark-and-sweep triggered inside REFZERO. Background on the refzero algorithm, limiting C recursion depth --------------------------------------------------------------- When a reference count drops to zero, the heap element will be freed. If the heap element contains references (like an Ecmascript object does), all target elements need to be ``DECREF``'d before the element is freed. These ``DECREF`` calls may cause the reference count of further elements to drop to zero; this "cascade" of zero reference counts may be arbitrarily long. Since we need to live with limited and sometimes very small C stacks in some embedded environments (some environments may have less than 64 kilobytes of usable stack), the reference count zero handling must have a limited C recursion level to work reliably. This is currently handled by using a "work list" model. Heap elements whose reference count has dropped to zero are placed in a "to be freed" work list (see ``duk_heap`` structure, ``refzero_list`` member in ``duk_heap.h``). The list is then freed using a loop which frees one element at a time until the list is free. New elements may be added to the list while it is being iterated. The C recursion level is fixed. The ``h_prev``/``h_next`` fields of the ``duk_heaphdr`` structure, normally used for the "heap allocated" list, are used for the "refzero" work list. Because ``duk_hstring``\ s do not have embedded references so they are freed directly when their reference count drops to zero. This is fortunate, because strings don't have ``h_prev``/``h_next`` fields at all. *Finalization* of an object whose refcount becomes zero is very useful for e.g. freeing any native resources or handles associated with an object. For instance, socket or file handles can be closed when the object is being freed. The finalizer is an internal method associated with an ``duk_hobject`` which is called just before the object is freed either by reference counting or by the mark-and-sweep collector. The finalizer gets a reference to the object in question, and may "rescue" the reference. Mark-and-sweep may be triggered during the "refzero" algorithm, currently only by finalization. If mark-and-sweep is triggered, it must not touch any object in the "refzero" work list (i.e. any object whose reference count is zero, but which has not yet been processed). Mark-and-sweep ============== Introduction ------------ For background, see: * http://en.wikipedia.org/wiki/Garbage_collection_(computer_science) The variant used is a "stop the world" mark-and-sweep collector, which is used instead of an incremental one for simplicity and small footprint. When combined with reference counting, the mark-and-sweep collector is only required for handling reference cycles anyway, so the particular variant is not that important. A definite downside of a "stop the world" collector is that it introduces an annoying pause in application behavior which is otherwise avoided by reference counting. The mark-and-sweep algorithm used has support for: * object finalization (requires two collector passes) * object compaction (in emergency mode) * string table resizing An "emergency mode" is provided for situations where allocation fails repeatedly, even after a few ordinary mark-and-sweep attempts. In emergency mode the collector tries to find memory even by expensive means (such as forcibly compacting object property allocations). Control flags are also provided to limit side effects of mark-and-sweep, which is required to implement a few critical algorithms: resizing the string table, and resizing object property allocation. During these operations mark-and-sweep must avoid interfering with the object being resized. Mark-and-sweep flags -------------------- Mark-and-sweep control flags are defined in ``duk_heap.h``: * ``DUK_MS_FLAG_EMERGENCY`` * ``DUK_MS_FLAG_NO_STRINGTABLE_RESIZE`` * ``DUK_MS_FLAG_NO_FINALIZERS`` * ``DUK_MS_FLAG_NO_OBJECT_COMPACTION`` In addition to the explicitly requested flags, the bit mask in ``mark_and_sweep_base_flags`` in ``duk_heap`` is bitwise ORed into the requested flags to form effective flags. The flags added to the "base flags" control restrictions on mark-and-sweep side effects, and are used for certain critical sections. To protect against such side effects, the critical algorithms: * Store the original value of ``heap->mark_and_sweep_base_flags`` * Set the suitable restriction flags into ``heap->mark_and_sweep_base_flags`` * Attempt the allocation / reallocation operation, *without throwing errors* * Restore the ``heap->mark_and_sweep_base_flags`` to its previous value * Examine the allocation result and act accordingly It is important not to throw an error without restoring the base flags field. The concrete flags used are: * String table resize: + ``DUK_MS_FLAG_NO_STRINGTABLE_RESIZE``: prevents another stringtable resize attempt when one is already running + ``DUK_MS_FLAG_NO_FINALIZERS``: prevent finalizers from adding new interned strings to the string table, possibly requiring a resize + ``DUK_MS_FLAG_NO_OBJECT_COMPACTION``: prevent object compaction, because object compaction may lead to an array part being abandoned, which leads to string interning of array keys. * Object property allocation resize: + ``DUK_MS_FLAG_NO_FINALIZERS``: prevent finalizers from manipulating the properties of any object. It would suffice to protect only the object being resized, but a finalizer may potentially operate on any set of objects; hence no finalizers are executed at all. + ``DUK_MS_FLAG_NO_OBJECT_COMPACTION``: prevent objects from being compacted (i.e., resized). It would suffice to protect only the object being resized from a recursive resize; this is currently not done, however, but would be easy to fix. Heap header flags ----------------- The following flags in the heap element header are used for controlling mark-and-sweep: * ``DUK_HEAPHDR_FLAG_REACHABLE``: element is reachable through the reachability graph * ``DUK_HEAPHDR_FLAG_TEMPROOT``: element's reachability has been marked, but its children have not been processed; this is required to limit the C recursion level * ``DUK_HEAPHDR_FLAG_FINALIZABLE``: element is not reachable after the first marking pass (see algorithm), has a finalizer, and the finalizer has not been called in the previous mark-and-sweep round; object will be moved to the finalization work list and will be considered (temporarily) a reachability root * ``DUK_HEAPHDR_FLAG_FINALIZED``: element's finalizer has been executed, and if still unreachable, object can be collected These are referred to as ``REACHABLE``, ``TEMPROOT``, ``FINALIZABLE``, and ``FINALIZED`` below for better readability. All the flags are clear when a heap element is first allocated. Explicit "clearing passes" are avoided by careful handling of the flags so that the flags are always in a known state when mark-and-sweep begins and ends. Basic algorithm --------------- The mark-and-sweep algorithm is triggered by a failed memory allocation either in "normal" mode or "emergency" mode. Emergency mode is used if a normal mark-and-sweep pass did not resolve the allocation failure; the emergency mode is a more aggressive attempt to free memory. Mark-and-sweep is controlled by a set of flags. The effective flags set is a bitwise OR of explicit flags and "base flags" stored in ``heap->mark_and_sweep_base_flags``. The "base flags" essentially prohibit specific garbage collection operations (like finalizers) when a certain critical code section is active. The mark-and-sweep algorithm is as follows: 1. The ``REACHABLE`` and ``TEMPROOT`` flags of all heap elements are assumed to be cleared at this point. * Note: this is the case for all elements regardless of whether they reside in the string table, the "heap allocated" list, the "refzero" work list, or anywhere else. 2. **Mark phase**. The reachability graph is traversed recursively, and the ``REACHABLE`` flags is set for all reachable elements. This is complicated by the necessity to impose a limit on maximum C recursion depth: a. At the beginning the heap level flag ``DUK_HEAP_FLAG_MARKANDSWEEP_RECLIMIT_REACHED`` is asserted to be cleared. b. The reachability graph of the heap is traversed with a depth-first algorithm: 1. Marking starts from the reachability roots: * the heap structure itself (including the current thread, its resuming thread, etc) * the "refzero_list" for reference counting 2. If the reachability traversal hits the C recursion limit (``mark_and_sweep_recursion_limit`` member of the heap) for some heap element ``E``: a. The ``DUK_HEAP_HAS_MARKANDSWEEP_RECLIMIT_REACHED`` flag is set. b. The reachability status of ``E`` is updated, but its internal references are not processed (to avoid further recursion). c. The ``TEMPROOT`` flag is set for ``E``, indicating that it should be processed later. 3. Unreachable objects which need finalization (but whose finalizers haven't been executed in the last round) are marked FINALIZABLE and are marked as reachable with the normal recursive marking algorithm. 4. The algorithm of step 2 (handling ``TEMPROOT`` markings) is repeated to ensure reachability graph has been fully processed (elements are marked reachable and TEMPROOT flags are set), also for the objects just marked FINALIZABLE. c. While the ``DUK_HEAP_HAS_MARKANDSWEEP_RECLIMIT_REACHED`` flag is set for the heap: 1. Clear the ``DUK_HEAP_HAS_MARKANDSWEEP_RECLIMIT_REACHED`` flag of the heap. 2. Scan all elements in the "heap allocated" or "refzero work list" (note that "refzero work list" *must* be included here but not in the sweep phase). For each element with the ``TEMPROOT`` flag set: a. Clear the ``TEMPROOT`` flag. b. Process the internal references of the element recursively, imposing a similar recursion limit as before (i.e. setting the ``DUK_HEAP_HAS_MARKANDSWEEP_RECLIMIT_REACHED`` flag etc). 3. **Sweep phase 1 (refcount adjustments)**. Inspect all heap elements in the "heap allocated" list (string table doesn't need to be considered as strings have no internal references): a. If the heap element would be freed in sweep phase 2 (i.e., element is not reachable, and has no finalizer which needs to be run): 1. Decrease reference counts of heap elements the element points to, but don't execute "refzero" queueing or the "refzero" algorithm. Any elements whose refcount drops to zero will be dealt with by mark-and-sweep and objects in the refzero list are handled by reference counting. 4. **Sweep phase 2 (actual freeing)**. Inspect all heap elements in the "heap allocated" list and the string table (note that objects in the "refzero" work list are NOT processed and thus never freed here): a. If the heap element is ``REACHABLE``: 1. If ``FINALIZED`` is set, the object has been rescued by the finalizer. This requires no action as such, but can be debug logged. 2. Clear ``REACHABLE`` and ``FINALIZED`` flags. 3. Continue with next heap element. b. Else the heap element is not reachable, and: 1. If the heap element is an ``duk_hobject`` (its heap type is ``DUK_HTYPE_OBJECT``) and the object has a finalizer (i.e. it has the internal property ``_finalizer``), and the ``FINALIZED`` flag is not set: a. Move the heap element from "heap allocated" to "to be finalized" work list. b. Continue with next heap element. 2. Free the element and any of its "auxiliary allocations". 3. Continue with next heap element. 5. For every heap element in the "refzero" work list: a. Clear the element's ``REACHABLE`` flag. (See notes below why this seemingly unnecessary step is in fact necessary.) 6. If doing an emergency mark-and-sweep and object compaction is not explicitly prohibited by heap flags: a. Compact the object's property allocation in the hopes of freeing memory for the emergency. 7. If string table resize is not explicitly prohibited by heap flags: a. Compact and rehash the string table. This can be controlled by build flags as it may not be appropriate in all environments. 8. Run finalizers: a. While the "to be finalized" work queue is not empty: 1. Select object from head of the list. 2. Set up a ``setjmp()`` catchpoint. 3. Execute finalizer. Note: * The thread used for this is the currently running thread (``heap->curr_thread``), or if no thread is running, ``heap->heap_thread``. This is liable to change in the future. 4. Ignore finalizer result (except for logging errors). 5. Mark the object ``FINALIZED``. 6. Move the object back to the "heap allocated" list. The object will be collected on the next pass if it is still unreachable. (Regardless of actual reachability, the ``REACHABLE`` flag of the object is clear at this point.) 9. Finish. a. All ``TEMPROOT`` and ``REACHABLE`` flags are clear at this point. b. All "heap allocated" elements either (a) are reachable and have a non-zero reference count, or (b) were finalized and their reachability status is unknown. c. The "to be finalized" list is empty. d. No object in the "refzero" work list has been freed. Notes: * Elements on the refzero list are considered reachability roots, as we need to preserve both the object itself (which happens automatically because we don't sweep the refzero_list) and its children. If the refzero list elements were not considered reachability roots, their children might be swept by the sweep phase. This would be problematic for processing the objects in the refzero list, regardless of whether they have a finalizer or not, as some references would be dangling pointers. * Elements marked FINALIZABLE are considered reachability roots to ensure that their children (e.g. property values) are not swept during the sweep phase. This would obviously be problematic for running the finalizer, regardless of whether the object would be rescued or not. * While mark-and-sweep is running: + Another mark-and-sweep cannot execute. + A ``DECREF`` resulting in a zero reference count is not processed at all. The object is not placed into the "refzero" work list, as mark-and-sweep is assumed to be a comprehensive pass, including running finalizers. * Finalizers are executed after the sweep phase to ensure that finalizers have as much available memory as possible. Since mark-and-sweep is running, if a finalizer runs out of memory, no memory can be reclaimed as recursive mark-and-sweep is explicitly blocked. This is probably a very minor issue in practice. * Finalizers could be executed from their work list after the mark-and-sweep has finished to allow mark-and-sweep to run if mark-and-sweep is required by a finalizer. The mark-and-sweep could then append more objects to be finalized into the "to be finalized" work list; this is not a problem. However, since finalizers are used with a rather limited scope, this is not currently done. * The sweep phase is divided into two separate scans: one to adjust refcounts and one to actually free the objects. If these were performed in a single heap scan, refcount adjustments might refer to already freed heap elements (dangling pointers). This may happen even without reference counting bugs for unreachable reference loops. * Clearing the ``REACHABLE`` flags explicitly for objects in the "refzero" list is necessary: + The "refzero" work list is not processed at all in the sweep phase but the marking phase could theoretically mark objects in the "refzero" work list. Since the sweeping phase is the only place where ``REACHABLE`` flags are cleared, some object in the "refzero" work list might be left with its ``REACHABLE`` flag set at the end of the algorithm. At first it might seem that this can never happen if reference counts are correct: all objects in the "refzero" work list are unreachable by definition. However, this is not the case for objects with finalizers. + A finalizer call made by the "refzero" algorithm makes the object reachable again (through the finalizer thread value stack; the finalizer method itself can also create reachable references for the target). If a mark-and-sweep is triggered during finalization, the target will be marked ``REACHABLE`` during the mark phase. Thus, ``REACHABLE`` flags of "refzero" work list elements must be cleared explicitly after or during the sweep phase. Note that there is a small "hole" in the reclamation right now, when mark-and-sweep finalizers are used: * If a finalizer executed by mark-and-sweep removes a reference to another object (not the object being finalized), causing the target object's reference count to drop to zero, the object is *not* placed in the "refzero" work list, as mark-and-sweep is still running. * As a result, the object will be unreachable and will not be freed by the reference count algorithm, regardless of whether the object was part of a reference loop. Instead, the next mark-and-sweep will free the object. If the object has a finalizer, the finalizer will be called later than would be preferable. * This is not ideal but will not result in memory leaks, so it's not really worth fixing right now. Interactions between reference counting and mark-and-sweep ========================================================== If mark-and-sweep is triggered e.g. by an out-of-memory condition, reference counting is essentially "disabled" for the duration of the mark-and-sweep phase: * Reference counts are updated normally. In fact, mark-and-sweep uses the same refcount macros to update element refcounts while freeing them. * If a reference count reaches zero due after a ``DECREF`` operation, the object is not freed nor is it placed on the "refzero" work list because mark-and-sweep is expected to deal with the object directly. If the "refzero" algorithm is triggered first (with some objects in the "refzero" work list), mark-and-sweep may be triggered while the "refzero" algorithm is running. In more detail: * A ``DECREF`` happens while neither mark-and-sweep nor "refzero" algorithm is running. * A reference count reaches zero, and the object is placed on the "refzero" work list and the "refzero" algorithm is invoked. * The "refzero" algorithm cannot trigger another "refzero" algorithm to execute recursively. Instead, the work list is churned until it becomes empty. Any objects whose reference count reaches zero are added to the work list, though, so will be processed eventually. * The "refzero" algorithm may trigger a mark-and-sweep while it is running, e.g. by running a finalizer which runs out of memory: + This mark-and-sweep will mark any elements in the "refzero" work list but will not free them. + While the mark-and-sweep is running, no new elements are placed into the "refzero" work list, even if their reference count reaches zero. Instead, the mark-and-sweep algorithm is assumed to deal with them. + The mark-and-sweep algorithm may also execute finalizers, so two finalizers (but no more) can be running simultaneously, though on different objects. + Another recursive mark-and-sweep run cannot happen. Finalizer behavior ================== General notes: * If reference counting is used, finalizers are called either when reference count drops to zero, or when mark-and-sweep wants to collect the object (which is required for circular references and may also happen if reference counts have been incorrectly updated for whatever reason). * If mark-and-sweep is used, finalizers are called only when mark-and-sweep wants to collect the object. * Finalizer may reinstate a reference to the target object. In this case the object is "rescued" and its finalizer may be called again if it becomes unreachable again. Regardless of whether an object is rescued or not, it's a good practice to make the finalizer re-entrant, i.e. allow multiple finalizer calls even if the finalizer doesn't rescue the object. * Finalizers are guaranteed to run when objects are collected, unless a heap is destroyed forcibly. The Duktape API ``duk_destroy_heap()`` call runs a few rounds of mark-and-sweep to allow finalizers a chance to run at least once before the heap is forcibly freed. This allows user code to e.g. free any native resources more or less reliably, but note that there may also be reachable objects with user-allocated resources not tracked by Duktape. * The finalizer return value is ignored. Also, if the finalizer throws an error, this is only debug logged but is considered to be a successful finalization. * The thread running a finalizer is not very logical right now and is liable to change: + Reference counting: the thread which executed ``DECREF`` is used as the finalizer thread. + Mark-and-sweep: the thread which caused mark-and-sweep is used as the finalizer thread; if there is no active thread, ``heap->heap_thread`` is used instead. * The finalizer may technically launch other threads and do arbitrary things in general, but it is a good practice to make the finalizer very simple and unintrusive. Ideally it should only operate on the target object and its properties. * A finalizer should not be able to terminate any threads in the active call stack, in particular the thread which triggered a finalization or the finalizer thread (if these are different). Finalizer thread selection is currently not optimal; there are several approaches: * The thread triggering mark-and-sweep is not a good thread for finalization, as it may be from a different conceptual virtual machine, and may thus have a different global context (global object) than where the finalized object was created. * A heap-level dedicated finalizer thread has a similar problem: the finalizer will run in a different global context than where the finalized object was created. Voluntary mark-and-sweep interval ================================= There are many ways to decide when to do a voluntary mark-and-sweep pass: byte count based, object count based, probabilistic, etc. The current approach is to count the number of heap objects and heap strings kept at the end of a mark-and-sweep pass, and initialize the voluntary sweep trigger count based on that as (the computation actually happens using fixed point arithmetic):: trigger_count = ((kept_objects + kept_strings) * MULT) + ADD // MULT and ADD are tuning parameters The trigger count is decreased on every memory (re)allocation, and for every object processed by the refzero algorithm. If the trigger reaches zero when memory is about to be (re)allocated, a voluntary mark-and-sweep pass is done. When ``MULT`` is 1 and ``ADD`` is 0, a voluntary sweep is done when the number of "operations" matches the previous heap object/string count. When reference counting is enabled, ``MULT`` can be quite large (e.g. 10) because only circular references need to be swept. When reference counting is not enabled, ``MULT`` should be closer to 1 (or even below). The ``ADD`` tuning parameter is not that important; its purpose is to avoid too frequent mark-and-sweep on very small heaps and to counteract some inaccuracy of fixed point arithmetic. Implementation issues ===================== Error handling -------------- When a ``longjmp()`` takes place, the C stack is unwound and all references to the unwound part of the stack are lost. To avoid memory leaks and other correctness issues, care must be taken to: * Ensure that the reference count of every heap-allocated element is correct whenever entering code which may ``longjmp()``. * Ensure that all heap-allocated objects which should be subject to automatic garbage collection are reachable whenever entering code which may ``longjmp()``. * Use a ``setjmp()`` catchpoint whenever control must be regained to clean up properly. To avoid the need for ``setjmp()`` catchpoints, many innermost helper functions return error codes rather than throwing errors. This makes error handling a bit easier. Side effects of memory management --------------------------------- Automatic memory management may be triggered by various operations, and has a wide variety of side effects which must be taken into account by calling code. This affects internal code in particular, which must be very careful not to reference dangling pointers, deal with valstack and object property allocation resizes, etc. The fundamental triggers for memory management side effects are: * An attempt to ``alloc`` or ``realloc`` memory may trigger a garbage collection. A collection is triggered by an out-of-memory condition, but a voluntary garbage collection also occurs periodically. A ``free`` operation cannot, at the moment, trigger a collection. * An explicit request for garbage collection. * A ``DECREF`` operation which drops the target heap element reference count to zero triggers the element (and possibly a bunch of other elements) to be freed, and may invoke a number of finalizers. Also, a mark-and-sweep may be triggered (e.g. by finalizers or voluntarily). The following primitives do not trigger any side effects: * An ``INCREF`` operation never causes a side effect. * A ``free`` operation never causes a side effect. Because of finalizers, the side effects of a ``DECREF`` and a mark-and-sweep are potentially the same as running arbitrary C or Ecmascript code, including: * Calling (further) finalizer functions (= running arbitrary Ecmascript and C code). * Resizing object allocations, value stacks, catch stacks, call stacks, buffers, object property allocations, etc. * Compacting object property allocations, abandoning array parts. * In particular: + Any ``duk_tval`` pointers referring any value stack may be invalidated, because any value stack may be resized. Value stack indices are OK. + Any ``duk_tval`` pointers referring any object property values may be invalidated, because any property allocation may be resized. Also, any indices to object property slots may be invalidated due to "compaction" which happens during a property allocation resize. + Heap element pointers are stable, so they are never affected. The side effects can be avoided by many techniques: * Refer to value stack using a numeric index. * Make a copy of an ``duk_tval`` to a C local to ensure the value can still be used after a side effect occurs. If the value is primitive, it will OK in any case. If the value is a heap reference, the reference uses a stable pointer which is OK as long as the target is still reachable. * Re-lookup object property slots after a potential side effect. Misc notes ========== Garbage collection of value stacks ---------------------------------- While an Ecmascript function is running, the value stack frame allocated for it has a minimum size matching the "register count" of the function. All of these registers are reachable from a mark-and-sweep viewpoint, even if the values held by the registers are never referenced by the bytecode of the function. For instance, any temporaries created during expression evaluation may leave unused but technically reachable values behind. Consider for instance:: function f(x,y,z) { var w = (x + y) + z; } the bytecode created for this will: * Compute ``x + y`` into a temporary register ``T``. * Compute ``T + z`` into the register allocated for ``w``. Before exiting the function, ``T`` is reachable for mark-and-sweep. If ``T`` is a heap element (e.g. a string), it has a positive reference count. The situation is fixed if the function exits or the temporary register ``T`` is reused by the evaluation of another expression, so this is not usually a relevant issue. However: * If a function runs in an infinite loop, such references may never become collectable. Consider, for instance, a main event loop which never exits. * Even if the function eventually exits, such references may cause an out-of-memory situation before the function exits. The out-of-memory situation may not be recoverable using garbage collection because the values are technically reachable until the exit. There is currently no actual solution to this issue, but any code containing an infinite loop should be structured to avoid "dangling values", e.g. by using an auxiliary function for any computations:: function stuff() { // ... } function infloop() { for (;;) { stuff(); } } The issue could be fixed technically by: * Making the function use an actual stack of values instead of direct register references. This would make function evaluation slower. * Add a bytecode instruction to "wipe" any registers above a certain index to ensure they contain no bogus references. These could be issued after expression evaluation or in loop headers. This would bloat bytecode. Function closures are reference loops by default ------------------------------------------------ Function closures contain a reference loop by default:: var f = function() {}; print(f.prototype.constructor === f); // --> true Unless user code explicitly sets a different ``f.prototype``, every function closure requires a mark-and-sweep to be collected which makes plain reference counting unattractive if there are a lot of function temporaries. Such temporaries will then be reachable and only freed when the heap is destroyed. This should be fixed in the future somehow if possible. Requirements for tracking heap allocated objects ------------------------------------------------ Mark-and-sweep only requires a single (forward) linked list to track objects. Objects are inserted at the head, and scanned linearly during mark and sweep. The sweep phase can remove an object by keeping track of its predecessor when traversing the list. The same applies to work lists. Reference counting requires the ability to remove an arbitrarily chosen object to be removed from the heap allocated list. To do this efficiently, a double linked list is needed to avoid scanning the list from the beginning. Future work =========== * During object property allocation resize, don't prevent compaction of other objects in mark-and-sweep. * Special handling for built-in strings and objects, so that they can be allocated from a contiguous buffer, only freed when heap is freed. * Incremental mark-and-sweep at least as an option in semi real-time environments. * Optimize reference count handling in performance critical code sections. For instance: - a primitive to INCREF a slice of tagged values would be useful - often the target of an INCREF can be assumed to be non-NULL; a fast path macro could assert for this but avoid otherwise checking for it * Develop a fix for the function temporary register reachability issue. * Develop a fix for function instance prototype reference loop issue. * Add a figure of where objects may reside (string table, heap allocated, refzero work list, mark-and-sweep to be finalized work list).