============ Side effects ============ Overview ======== Duktape is a single threaded interpreter, so when the internal C code deals with memory allocations, pointers, and internal data structures it is safe to assume, for example, that pointers are stable while they're being used and that internal state and data structures are not modified simultaneously from other threads. However, many internal operations trigger quite extensive side effects such as resizing the value stack (invalidating any pointers to it) or clobbering the current heap error handling (longjmp) state. There are a few primary causes for the side effects, such as memory management reallocating data structures, finalizer invocation, and Proxy trap invocation. The primary causes are also triggered by a lot of secondary causes. The practical effect is that any internal helper should be assumed to potentially invoke arbitrary side effects unless there's a specific reason to assume otherwise. Some of the side effects can be surprising when simply looking at calling code, which makes side effects an error prone element when maintaining Duktape internals. Incorrect call site assumptions can cause immediate issues like segfaults, assert failures, or valgrind warnings. But it's also common for an incorrect assumption to work out fine in practice, only to be triggered by rare conditions like voluntary mark-and-sweep or a unrecoverable out-of-memory error happening in just the right place. Such bugs have crept into the code base several times -- they're easy to make and hard to catch with tests or code review. This document describes the different side effects, how they may be triggered, what mechanisms are in place to deal with them internally, and how tests try to cover side effects. Basic side effect categories ============================ Primary causes -------------- Side effects are ultimately caused by: * A refcount dropping to zero, causing a "refzero cascade" where a set of objects is refcount finalized and freed. If any objects in the cascade have finalizers, the finalizer calls have a lot of side effects. Object freeing itself is nearly side effect free, but does invalidate any pointers to unreachable but not-yet-freed objects which are held at times. * Mark-and-sweep similarly frees objects and can make finalizer calls. Mark-and-sweep may also resize/compact the string table and object property tables. The set of mark-and-sweep side effects are likely to slowly change over time (e.g. better emergency GC capabilities). * Error throwing overwrites heap-wide error handling state, and causes a long control transfer. Concrete impact on call site is that e.g. calling code may not be able to store/restore internal flags or counters if an error gets thrown. Almost anything involving a memory allocation, property operation, etc may throw. Any operation doing a DECREF may thus have side effects. Any operation doing anything to cause a mark-and-sweep (like allocating memory) may similarly have side effects. Finalizers cause the most wide ranging side effects, but even with finalizers disabled there are significant side effects in mark-and-sweep. Full side effects ----------------- The most extensive type of side effect is arbitrary code execution, caused by e.g. a finalizer or a Proxy trap call (and a number of indirect causes). The potential side effects are very wide: * Because a call is made, the value stack may be grown (but not shrunk) and its base pointer may change. As a result, any duk_tval pointers to the value stack are (potentially) invalidated. Since Duktape 2.2 duk_activation and duk_catcher structs are allocated separately and have a stable pointer. Before Duktape 2.2 duk_activations were held in a call stack and duk_catchers in a catch stack, and their pointers might be invalidated by side effects. * Value stack allocated size may grow or shrink. However, value stack bottom, top, and reserved space won't change. * An error throw may happen, clobbering heap longjmp state. This is a problem particularly in error handling where we're dealing with a previous throw. A long control transfer may skip intended cleanup code. * A new thread may be resumed and yielded from. The resumed thread may even duk_suspend(). * A native thread switch may occur, for an arbitrarily long time, if any function called uses duk_suspend() and duk_resume(). This is not currently supported for finalizers, but may happen, for example, for Proxy trap calls. * Because called code may operate on any object (except those we're certain not to be reachable), objects may undergo arbitrary mutation. For example, object properties may be added, deleted, or modified; dynamic and external buffer data pointers may change; external buffer length may change. An object's property table may be resized and its base pointer may change, invalidating both pointers to the property table. Object property slot indices may also be invalidated due to object resize/compaction. The following will be stable at all times: * Value stack entries in the current activation won't be unwound or modified. Similarly, the current call stack and catch stack entries and entries below them won't be unwound or modified. * All heap object (duk_heaphdr) pointers are valid and stable regardless of any side effects, provided that the objects in question are reachable and correctly refcounted for. Called code cannot (in the absence of bugs) remove references from previous activations in the call stack and thread resume chain. * In particular, while duk_tval pointers to the value stack may change, if an object pointer is encapsulated in a duk_tval, the pointer to the actual object is still stable. * All string data pointers, including external strings. String data is immutable, and can't be reallocated or relocated. * All fixed buffer data pointers, because fixed buffer data follows the stable duk_heaphdr directly. Dynamic and external buffer data pointers are not stable. Side effects without finalizers, but with mark-and-sweep allowed ---------------------------------------------------------------- If code execution side effects (finalizer calls, Proxy traps, getter/setter calls, etc) are avoided, most of the side effects are avoided. In particular, refzero situations are then side effect free because object freeing has no side effects beyond memory free calls. The following side effects still remain: * Refzero processing still frees objects whose refcount reaches zero. Any pointers to such objects will thus be invalidated. This may happen e.g. when a borrowed pointer is used and that pointer loses its backing reference. * Mark-and-sweep may reallocate/compact the string table. This affects the string table data structure pointers and indices/offsets into them. Strings themselves are not affected (but unreachable strings may be freed). * Mark-and-sweep may reallocate/compact object property tables. All property keys and values will remain reachable, but pointers and indices to an object property table may be invalidated. This mostly affects property code which often finds a property's "slot index" and then operates on the index. * Mark-and-sweep may free unreachable objects, invalidating any pointers to them. This affects only objects which have been allocated and added to heap_allocated list. Objects not on heap_allocated list are not affected because mark-and-sweep isn't aware of them; such objects are thus safe from collection, but at risk for leaking if an error is thrown, so such situations are usually very short lived. Other side effects don't happen with current mark-and-sweep implementation. For example, the following don't happen (but could, if mark-and-sweep scope and side effect lockouts are changed): * Thread value stack is never reallocated and all pointers to duk_tvals remain valid; duk_activation and duk_catcher pointers are stable in Duktape 2.2. (This could easily change if mark-and-sweep were to "compact" the value stack in an emergency GC.) The mark-and-sweep side effects listed above are not fundamental to the engine and could be removed if they became inconvenient. For example, it's nice that emergency GC can compact objects in an attempt to free memory, but it's not a critical feature (and many other engines don't do it either). Side effects with finalizers and mark-and-sweep disabled -------------------------------------------------------- When both finalizers and mark-and-sweep are disabled, the only remaining side effects come from DECREF (plain or NORZ): * Refzero processing still frees objects whose refcount reaches zero. Any pointers to such objects will thus be invalidated. This may happen e.g. when a borrowed pointer is used and that pointer loses its backing reference. When DECREF operations happen during mark-and-sweep they get handled specially: the refcounts are updated normally, but the objects are never freed or even queued to refzero_list. This is done because mark-and-sweep will free any unreachable objects; DECREF still gets called because mark-and-sweep finalizes refcounts of any freed objects (or rather other objects they point to) so that refcounts remain in sync. Controls in place ================= Finalizer execution, pf_prevent_count ------------------------------------- Objects with finalizers are queued to finalize_list and are processed later by duk_heap_process_finalize_list(). The queueing doesn't need any side effect protection as it is side effect free. duk_heap_process_finalize_list() is guarded by heap->pf_prevent_count which prevents recursive finalize_list processing. If the count is zero on entry, it's bumped and finalize_list is processed until it becomes empty. New finalizable objects may be queued while the list is being processed, but only the first call will process the list. If the count is non-zero on entry, the call is a no-op. The count can also be bumped upwards to prevent finalizer execution in the first place, even if no call site is currently processing finalizers. If the count is bumped, there must be a reliable mechanism of unbumping the count or finalizer execution will be prevented permanently. Because only the first finalizer processing site processes the finalize_list, using duk_suspend() from a finalizer or anything called by a finalizer is not currently supported. If duk_suspend() were called in a finalizer, finalization would be stuck until duk_resume() was called. Processing finalizers from multiple call sites would by itself be relatively straightforward (each call site would just process the list head or notice it is NULL and finish); however, at present mark-and-sweep also needs to be disabled while finalizers run. Mark-and-sweep prevent count, ms_prevent_count ---------------------------------------------- Stacking counter to prevent mark-and-sweep. Also used to prevent recursive mark-and-sweep entry when mark-and-sweep runs. Mark-and-sweep running, ms_running ---------------------------------- This flag is set only when mark-and-sweep is actually running, and doesn't stack because recursive mark-and-sweep is not allowed. The flag is used by DECREF macros to detect that mark-and-sweep is running and that objects must not be queued to refzero_list or finalize_list; their refcounts must still be updated. Mark-and-sweep flags, ms_base_flags ----------------------------------- Mark-and-sweep base flags from duk_heap are ORed to mark-and-sweep argument flags. This allows a section of code to avoid e.g. object compaction regardless of how mark-and-sweep gets triggered. Using the base flags is useful when mark-and-sweep by itself is desirable but e.g. object compaction is not. Finalizers are prevented using a separate flag. Calling code must restore the flags reliably -- e.g. catching errors or having assurance of no errors being thrown in any situation. It might be nice to make this easier by allowing flags to be modified, the modification flagged, and for error throw handling to do the restoration automatically. Creating an error object, creating_error ---------------------------------------- This flag is set when Duktape internals are creating an error to be thrown. If an error happens during that process (which includes a user errCreate() callback), the flag is set and avoids recursion. A pre-allocated "double error" object is thrown instead. Call stack unwind or handling an error, error_not_allowed --------------------------------------------------------- This flag is only enabled when using assertions. It is set in code sections which must be protected against an error being thrown. This is a concern because currently the error state is global in duk_heap and doesn't stack, so an error throw (even a caught and handled one) clobbers the state which may be fatal in code sections working to handle an error. DECREF NORZ (no refzero) macros ------------------------------- DECREF NORZ (no refzero) macro variants behave the same as plain DECREF macros except that they don't trigger side effects. Since Duktape 2.1 NORZ macros will handle refzero cascades inline (freeing all the memory directly); however, objects with finalizers will be placed in finalize_list without finalizer calls being made. Once a code segment with NORZ macros is complete, DUK_REFZERO_CHECK_{SLOW,FAST}() should be called. The macro checks for any pending finalizers and processes them. No error catcher is necessary: error throw path also calls the macros and processes pending finalizers. (The NORZ name is a bit of a misnomer since Duktape 2.1 reworks.) Mitigation, test coverage ========================= There are several torture test options to exercise side effect handling: * Triggering a mark-and-sweep for every allocation (and in a few other places like DECREF too). * Causing a simulated finalizer run with error throwing and call side effects every time a finalizer might have executed. Some specific cold paths like out-of-memory errors in critical places are difficult to exercise with black box testing. There is a small set of DUK_USE_INJECT_xxx config options which allow errors to be injected into specific critical functions. These can be combined with e.g. valgrind and asserts, to cover assertions, memory leaks, and memory safety. Operations causing side effects =============================== The main reasons and controls for side effects are covered above. Below is a (non-exhaustive) list of common operations with side effects. Any internal helper may invoke some of these primitives and thus also have side effects. DUK_ALLOC() * May trigger voluntary or emergency mark-and-sweep, with arbitrary code execution side effects. DUK_REALLOC() * May trigger voluntary or emergency mark-and-sweep, with arbitrary code execution side effects. * In particular, if reallocating e.g. the value stack, the triggered mark-and-sweep may change the base pointer being reallocated! To avoid this, the DUK_REALLOC_INDIRECT() call queries the base pointer from the caller for every realloc() attempt. DUK_FREE() * No side effects at present. Property read, write, delete, existence check * May invoke getters, setters, and Proxy traps with arbitrary code execution side effects. * Memory allocation is potentially required for every operation, thus causing arbitrary code execution side effects. Memory allocation is obviously needed for property writes, but any other operations may also allocate memory e.g. to coerce a number to a string. Value stack pushes * Pushing to the value stack is side effect free. The space must be allocated beforehand, and a pushed value is INCREF'd if it isn't primitive, and INCREF is side effect free. * A duk_check_stack() / duk_require_stack() + push has arbitrary side effects because of a potential reallocation. Value stack pops * Popping a value may invoke a finalizer, and thus may cause arbitrary code execution side effects. Value stack coercions * Value stack type coercions may, depending on the coercion, invoke methods like .toString() and .valueOf(), and thus have arbitrary code execution side effects. Even failed attempts may cause side effects due to memory allocation attempts. * In specific cases it may be safe to conclude that a coercion is side effect free; for example, doing a ToNumber() conversion on a plain string is side effect free at present. (This may not always be the case in the future, e.g. if numbers become heap allocated.) * Some coercions not involving an explicit method call may require an allocation call -- which may then trigger a voluntary or emergency mark-and-sweep leading to arbitrary code execution side effects. INCREF * No side effects at present. Object is never freed or queued anywhere. DECREF_NORZ * No side effects other than freeing one or more objects, strings, and buffers. The freed objects don't have finalizers; objects with finalizers are queued to finalize_list but finalizers are not executed. * Queries finalizer existence which is side effect free. * When mark-and-sweep is running, DECREF_NORZ adjusts target refcount but won't do anything else like queue object to refzero_list or free it; that's up to mark-and-sweep. DECREF * If refcount doesn't reach zero, no side effects. * If refcount reaches zero, one or more objects, strings, and buffers are freed which is side effect free. Objects with finalizers are queued to finalize_list, and the list is processed when the cascade of objects without finalizers has been freed. Finalizer execution had arbitrary code execution side effects. * Queries finalizer existence which is side effect free. * When mark-and-sweep is running, DECREF adjusts target refcount but won't do anything else. * All objects on finalize_list have an artificial +1 refcount bump, so that they can never trigger refzero processing (assuming refcounts are correct). This allows refzero code to assume a refzero object is on heap_allocated. duk__refcount_free_pending() * As of Duktape 2.1 no side effects, just frees objects without a finalizer until refzero_list is empty. (Equivalent in Duktape 2.0 and prior would process finalizers inline.) * Recursive entry is prevented; first caller processes a cascade until it's done. Pending finalizers are executed after the refzero_list is empty (unless prevented). Finalizers are executed outside of refzero_list processing protection so DECREF freeing may happen normally during finalizer execution. Mark-and-sweep * Queries finalizer existence which is side effect free. * Object compaction. * String table compaction. * Recursive entry prevented. * Executes finalizers after mark-and-sweep is complete (unless prevented), which has arbitrary code execution side effects. Finalizer execution happens outside of mark-and-sweep protection, and may trigger mark-and-sweep. However, when mark-and-sweep runs with finalize_list != NULL, rescue decisions are postponed to avoid incorrect rescue decisions caused by finalize_list being (artificially) treated as a reachability root; in concrete terms, FINALIZED flags are not cleared so they'll be rechecked later. Error throw * Overwrites heap longjmp state, so an error throw while handling a previous one is a fatal error. * Because finalizer calls may involve error throws, finalizers cannot be executed in error handling (at least without storing/restoring longjmp state). * Memory allocation may involve side effects or fail with out-of-memory, so it must be used carefully in error handling. For example, creating an object may potentially fail, throwing an error inside error handling. The error that is thrown is constructed *before* error throwing critical section begins. * Protected call error handling must also never throw (without catching) for sandboxing reasons: the error handling path of a protected call is assumed to never throw. * Ecmascript try-catch handling isn't currently fully protected against out of memory: if setting up the catch execution fails, an out-of-memory error is propagated from the try-catch block. Try-catch isn't as safe as protected calls for sandboxing. Even if catch execution setup didn't allocate memory, it's difficult to write script code that is fully memory allocation free (whereas writing C code which is allocation free is much easier). * Mark-and-sweep without error throwing or (finalizer) call side effects is OK. Debugger message writes * Code writing a debugger message to the current debug client transport must ensure, somehow, that it will never happen when another function is doing the same (including nested call to itself). * If nesting happens, memory unsafe behavior won't happen, but the debug connection becomes corrupted. * There are some current issues for debugger message handling, e.g. debugger code uses duk_safe_to_string() which may have side effects or even busy loop. Call sites needing side effect protection ========================================= Error throw and resulting unwind * Must protect against another error: longjmp state doesn't nest. * Prevent finalizers, avoid Proxy traps and getter/setter calls. * Avoid out-of-memory error throws, trial allocation is OK. * Refzero with pure memory freeing is OK. * Mark-and-sweep without finalizer execution is OK. Object and string table compaction is OK, at least present. * Error code must be very careful not to throw an error in any part of the error unwind process. Otherwise sandboxing/protected call guarantees are broken, and some of the side effect prevention changes are not correctly undone (e.g. pf_prevent_count is bumped again!). There are asserts in place for the entire critical part (heap->error_not_allowed). Success unwind * Must generally avoid (or protect against) error throws: otherwise state may be only partially unwound. Same issues as with error unwind. * However, if the callstack state is consistent, it may be safe to throw in specific places in the success unwind code path. String table resize * String table resize must be protected against string interning. * Prevent finalizers, avoid Proxy traps. * Avoid any throws, so that state is not left incomplete. * Refzero with pure memory freeing is OK. * Mark-and-sweep without finalizer execution is OK. As of Duktape 2.1 string interning is OK because it no longer causes a recursive string table resize regardless of interned string count. String table itself protects against recursive resizing, so both object and string table compaction attempts are OK. Object property table resize * Prevent compaction of the object being resized. * In practice, prevent finalizers (they may mutate objects) and proxy traps. Prevent compaction of all objects because there's no fine grained control now (could be changed). JSON fast path * Prevent all side effects affecting property tables which are walked by the fast path. * Prevent object and string table compaction, mark-and-sweep otherwise OK. Object property slot updates (e.g. data -> accessor conversion) * Property slot index being modified must not change. * Prevent finalizers and proxy traps/getters (which may operate on the object). * Prevent object compaction which affects slot indices even when properties are not deleted. * In practice, use NORZ macros which avoids all relevant side effects.