You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

546 lines
23 KiB

============
Side effects
============
Overview
========
Duktape is a single threaded interpreter, so when the internal C code deals
with memory allocations, pointers, and internal data structures it is safe
to assume, for example, that pointers are stable while they're being used and
that internal state and data structures are not modified simultaneously from
other threads.
However, many internal operations trigger quite extensive side effects such
as resizing the value stack (invalidating any pointers to it) or clobbering
the current heap error handling (longjmp) state. There are a few primary
causes for the side effects, such as memory management reallocating data
structures, finalizer invocation, and Proxy trap invocation. The primary
causes are also triggered by a lot of secondary causes. The practical effect
is that any internal helper should be assumed to potentially invoke arbitrary
side effects unless there's a specific reason to assume otherwise.
Some of the side effects can be surprising when simply looking at calling
code, which makes side effects an error prone element when maintaining Duktape
internals. Incorrect call site assumptions can cause immediate issues like
segfaults, assert failures, or valgrind warnings. But it's also common for
an incorrect assumption to work out fine in practice, only to be triggered by
rare conditions like voluntary mark-and-sweep or a unrecoverable out-of-memory
error happening in just the right place. Such bugs have crept into the code
base several times -- they're easy to make and hard to catch with tests or
code review.
This document describes the different side effects, how they may be triggered,
what mechanisms are in place to deal with them internally, and how tests try
to cover side effects.
Basic side effect categories
============================
Primary causes
--------------
Side effects are ultimately caused by:
* A refcount dropping to zero, causing a "refzero cascade" where a set of
objects is refcount finalized and freed. If any objects in the cascade
have finalizers, the finalizer calls have a lot of side effects. Object
freeing itself is nearly side effect free, but does invalidate any pointers
to unreachable but not-yet-freed objects which are held at times.
* Mark-and-sweep similarly frees objects and can make finalizer calls.
Mark-and-sweep may also resize/compact the string table and object property
tables. The set of mark-and-sweep side effects are likely to slowly change
over time (e.g. better emergency GC capabilities).
* Error throwing overwrites heap-wide error handling state, and causes a long
control transfer. Concrete impact on call site is that e.g. calling code
may not be able to store/restore internal flags or counters if an error gets
thrown. Almost anything involving a memory allocation, property operation,
etc may throw.
Any operation doing a DECREF may thus have side effects. Any operation doing
anything to cause a mark-and-sweep (like allocating memory) may similarly have
side effects. Finalizers cause the most wide ranging side effects, but even
with finalizers disabled there are significant side effects in mark-and-sweep.
Full side effects
-----------------
The most extensive type of side effect is arbitrary code execution, caused
by e.g. a finalizer or a Proxy trap call (and a number of indirect causes).
The potential side effects are very wide:
* Because a call is made, the value stack may be grown (but not shrunk) and
its base pointer may change. As a result, any duk_tval pointers to the
value stack are (potentially) invalidated. Since Duktape 2.2 duk_activation
and duk_catcher structs are allocated separately and have a stable pointer.
Before Duktape 2.2 duk_activations were held in a call stack and duk_catchers
in a catch stack, and their pointers might be invalidated by side effects.
* Value stack allocated size may grow or shrink. However, value stack bottom,
top, and reserved space won't change.
* An error throw may happen, clobbering heap longjmp state. This is a
problem particularly in error handling where we're dealing with a previous
throw. A long control transfer may skip intended cleanup code.
* A new thread may be resumed and yielded from. The resumed thread may even
duk_suspend().
* A native thread switch may occur, for an arbitrarily long time, if any
function called uses duk_suspend() and duk_resume(). This is not currently
supported for finalizers, but may happen, for example, for Proxy trap calls.
* Because called code may operate on any object (except those we're certain
not to be reachable), objects may undergo arbitrary mutation. For example,
object properties may be added, deleted, or modified; dynamic and external
buffer data pointers may change; external buffer length may change. An
object's property table may be resized and its base pointer may change,
invalidating both pointers to the property table. Object property slot
indices may also be invalidated due to object resize/compaction.
The following will be stable at all times:
* Value stack entries in the current activation won't be unwound or modified.
Similarly, the current call stack and catch stack entries and entries below
them won't be unwound or modified.
* All heap object (duk_heaphdr) pointers are valid and stable regardless of
any side effects, provided that the objects in question are reachable and
correctly refcounted for. Called code cannot (in the absence of bugs)
remove references from previous activations in the call stack and thread
resume chain.
* In particular, while duk_tval pointers to the value stack may change, if
an object pointer is encapsulated in a duk_tval, the pointer to the actual
object is still stable.
* All string data pointers, including external strings. String data is
immutable, and can't be reallocated or relocated.
* All fixed buffer data pointers, because fixed buffer data follows the stable
duk_heaphdr directly. Dynamic and external buffer data pointers are not
stable.
Side effects without finalizers, but with mark-and-sweep allowed
----------------------------------------------------------------
If code execution side effects (finalizer calls, Proxy traps, getter/setter
calls, etc) are avoided, most of the side effects are avoided. In particular,
refzero situations are then side effect free because object freeing has no
side effects beyond memory free calls.
The following side effects still remain:
* Refzero processing still frees objects whose refcount reaches zero.
Any pointers to such objects will thus be invalidated. This may happen
e.g. when a borrowed pointer is used and that pointer loses its backing
reference.
* Mark-and-sweep may reallocate/compact the string table. This affects
the string table data structure pointers and indices/offsets into them.
Strings themselves are not affected (but unreachable strings may be freed).
* Mark-and-sweep may reallocate/compact object property tables. All property
keys and values will remain reachable, but pointers and indices to an object
property table may be invalidated. This mostly affects property code which
often finds a property's "slot index" and then operates on the index.
* Mark-and-sweep may free unreachable objects, invalidating any pointers to
them. This affects only objects which have been allocated and added to
heap_allocated list. Objects not on heap_allocated list are not affected
because mark-and-sweep isn't aware of them; such objects are thus safe from
collection, but at risk for leaking if an error is thrown, so such
situations are usually very short lived.
Other side effects don't happen with current mark-and-sweep implementation.
For example, the following don't happen (but could, if mark-and-sweep scope
and side effect lockouts are changed):
* Thread value stack is never reallocated and all pointers to duk_tvals remain
valid; duk_activation and duk_catcher pointers are stable in Duktape 2.2.
(This could easily change if mark-and-sweep were to "compact" the value stack
in an emergency GC.)
The mark-and-sweep side effects listed above are not fundamental to the
engine and could be removed if they became inconvenient. For example, it's
nice that emergency GC can compact objects in an attempt to free memory, but
it's not a critical feature (and many other engines don't do it either).
Side effects with finalizers and mark-and-sweep disabled
--------------------------------------------------------
When both finalizers and mark-and-sweep are disabled, the only remaining side
effects come from DECREF (plain or NORZ):
* Refzero processing still frees objects whose refcount reaches zero.
Any pointers to such objects will thus be invalidated. This may happen
e.g. when a borrowed pointer is used and that pointer loses its backing
reference.
When DECREF operations happen during mark-and-sweep they get handled specially:
the refcounts are updated normally, but the objects are never freed or even
queued to refzero_list. This is done because mark-and-sweep will free any
unreachable objects; DECREF still gets called because mark-and-sweep finalizes
refcounts of any freed objects (or rather other objects they point to) so that
refcounts remain in sync.
Controls in place
=================
Finalizer execution, pf_prevent_count
-------------------------------------
Objects with finalizers are queued to finalize_list and are processed later
by duk_heap_process_finalize_list(). The queueing doesn't need any side
effect protection as it is side effect free.
duk_heap_process_finalize_list() is guarded by heap->pf_prevent_count which
prevents recursive finalize_list processing. If the count is zero on entry,
it's bumped and finalize_list is processed until it becomes empty. New
finalizable objects may be queued while the list is being processed, but
only the first call will process the list. If the count is non-zero on entry,
the call is a no-op.
The count can also be bumped upwards to prevent finalizer execution in the
first place, even if no call site is currently processing finalizers. If the
count is bumped, there must be a reliable mechanism of unbumping the count or
finalizer execution will be prevented permanently.
Because only the first finalizer processing site processes the finalize_list,
using duk_suspend() from a finalizer or anything called by a finalizer is not
currently supported. If duk_suspend() were called in a finalizer, finalization
would be stuck until duk_resume() was called. Processing finalizers from
multiple call sites would by itself be relatively straightforward (each call
site would just process the list head or notice it is NULL and finish);
however, at present mark-and-sweep also needs to be disabled while finalizers
run.
Mark-and-sweep prevent count, ms_prevent_count
----------------------------------------------
Stacking counter to prevent mark-and-sweep. Also used to prevent recursive
mark-and-sweep entry when mark-and-sweep runs.
Mark-and-sweep running, ms_running
----------------------------------
This flag is set only when mark-and-sweep is actually running, and doesn't
stack because recursive mark-and-sweep is not allowed.
The flag is used by DECREF macros to detect that mark-and-sweep is running
and that objects must not be queued to refzero_list or finalize_list; their
refcounts must still be updated.
Mark-and-sweep flags, ms_base_flags
-----------------------------------
Mark-and-sweep base flags from duk_heap are ORed to mark-and-sweep argument
flags. This allows a section of code to avoid e.g. object compaction
regardless of how mark-and-sweep gets triggered.
Using the base flags is useful when mark-and-sweep by itself is desirable
but e.g. object compaction is not. Finalizers are prevented using a
separate flag.
Calling code must restore the flags reliably -- e.g. catching errors or having
assurance of no errors being thrown in any situation. It might be nice to
make this easier by allowing flags to be modified, the modification flagged,
and for error throw handling to do the restoration automatically.
Creating an error object, creating_error
----------------------------------------
This flag is set when Duktape internals are creating an error to be thrown.
If an error happens during that process (which includes a user errCreate()
callback), the flag is set and avoids recursion. A pre-allocated "double
error" object is thrown instead.
Call stack unwind or handling an error, error_not_allowed
---------------------------------------------------------
This flag is only enabled when using assertions. It is set in code sections
which must be protected against an error being thrown. This is a concern
because currently the error state is global in duk_heap and doesn't stack,
so an error throw (even a caught and handled one) clobbers the state which
may be fatal in code sections working to handle an error.
DECREF NORZ (no refzero) macros
-------------------------------
DECREF NORZ (no refzero) macro variants behave the same as plain DECREF macros
except that they don't trigger side effects. Since Duktape 2.1 NORZ macros
will handle refzero cascades inline (freeing all the memory directly); however,
objects with finalizers will be placed in finalize_list without finalizer
calls being made.
Once a code segment with NORZ macros is complete, DUK_REFZERO_CHECK_{SLOW,FAST}()
should be called. The macro checks for any pending finalizers and processes
them. No error catcher is necessary: error throw path also calls the macros and
processes pending finalizers. (The NORZ name is a bit of a misnomer since
Duktape 2.1 reworks.)
Mitigation, test coverage
=========================
There are several torture test options to exercise side effect handling:
* Triggering a mark-and-sweep for every allocation (and in a few other places
like DECREF too).
* Causing a simulated finalizer run with error throwing and call side effects
every time a finalizer might have executed.
Some specific cold paths like out-of-memory errors in critical places are
difficult to exercise with black box testing. There is a small set of
DUK_USE_INJECT_xxx config options which allow errors to be injected into
specific critical functions. These can be combined with e.g. valgrind and
asserts, to cover assertions, memory leaks, and memory safety.
Operations causing side effects
===============================
The main reasons and controls for side effects are covered above. Below is
a (non-exhaustive) list of common operations with side effects. Any internal
helper may invoke some of these primitives and thus also have side effects.
DUK_ALLOC()
* May trigger voluntary or emergency mark-and-sweep, with arbitrary
code execution side effects.
DUK_REALLOC()
* May trigger voluntary or emergency mark-and-sweep, with arbitrary
code execution side effects.
* In particular, if reallocating e.g. the value stack, the triggered
mark-and-sweep may change the base pointer being reallocated!
To avoid this, the DUK_REALLOC_INDIRECT() call queries the base pointer
from the caller for every realloc() attempt.
DUK_FREE()
* No side effects at present.
Property read, write, delete, existence check
* May invoke getters, setters, and Proxy traps with arbitrary code execution
side effects.
* Memory allocation is potentially required for every operation, thus causing
arbitrary code execution side effects. Memory allocation is obviously
needed for property writes, but any other operations may also allocate
memory e.g. to coerce a number to a string.
Value stack pushes
* Pushing to the value stack is side effect free. The space must be allocated
beforehand, and a pushed value is INCREF'd if it isn't primitive, and INCREF
is side effect free.
* A duk_check_stack() / duk_require_stack() + push has arbitrary side effects
because of a potential reallocation.
Value stack pops
* Popping a value may invoke a finalizer, and thus may cause arbitrary code
execution side effects.
Value stack coercions
* Value stack type coercions may, depending on the coercion, invoke methods
like .toString() and .valueOf(), and thus have arbitrary code execution
side effects. Even failed attempts may cause side effects due to memory
allocation attempts.
* In specific cases it may be safe to conclude that a coercion is side effect
free; for example, doing a ToNumber() conversion on a plain string is side
effect free at present. (This may not always be the case in the future,
e.g. if numbers become heap allocated.)
* Some coercions not involving an explicit method call may require an
allocation call -- which may then trigger a voluntary or emergency
mark-and-sweep leading to arbitrary code execution side effects.
INCREF
* No side effects at present. Object is never freed or queued anywhere.
DECREF_NORZ
* No side effects other than freeing one or more objects, strings, and
buffers. The freed objects don't have finalizers; objects with finalizers
are queued to finalize_list but finalizers are not executed.
* Queries finalizer existence which is side effect free.
* When mark-and-sweep is running, DECREF_NORZ adjusts target refcount but
won't do anything else like queue object to refzero_list or free it; that's
up to mark-and-sweep.
DECREF
* If refcount doesn't reach zero, no side effects.
* If refcount reaches zero, one or more objects, strings, and buffers are
freed which is side effect free. Objects with finalizers are queued to
finalize_list, and the list is processed when the cascade of objects without
finalizers has been freed. Finalizer execution had arbitrary code execution
side effects.
* Queries finalizer existence which is side effect free.
* When mark-and-sweep is running, DECREF adjusts target refcount but won't
do anything else.
* All objects on finalize_list have an artificial +1 refcount bump, so that
they can never trigger refzero processing (assuming refcounts are correct).
This allows refzero code to assume a refzero object is on heap_allocated.
duk__refcount_free_pending()
* As of Duktape 2.1 no side effects, just frees objects without a finalizer
until refzero_list is empty. (Equivalent in Duktape 2.0 and prior would
process finalizers inline.)
* Recursive entry is prevented; first caller processes a cascade until it's
done. Pending finalizers are executed after the refzero_list is empty
(unless prevented). Finalizers are executed outside of refzero_list
processing protection so DECREF freeing may happen normally during finalizer
execution.
Mark-and-sweep
* Queries finalizer existence which is side effect free.
* Object compaction.
* String table compaction.
* Recursive entry prevented.
* Executes finalizers after mark-and-sweep is complete (unless prevented),
which has arbitrary code execution side effects. Finalizer execution
happens outside of mark-and-sweep protection, and may trigger mark-and-sweep.
However, when mark-and-sweep runs with finalize_list != NULL, rescue
decisions are postponed to avoid incorrect rescue decisions caused by
finalize_list being (artificially) treated as a reachability root; in
concrete terms, FINALIZED flags are not cleared so they'll be rechecked
later.
Error throw
* Overwrites heap longjmp state, so an error throw while handling a previous
one is a fatal error.
* Because finalizer calls may involve error throws, finalizers cannot be
executed in error handling (at least without storing/restoring longjmp
state).
* Memory allocation may involve side effects or fail with out-of-memory, so
it must be used carefully in error handling. For example, creating an object
may potentially fail, throwing an error inside error handling. The error
that is thrown is constructed *before* error throwing critical section
begins.
* Protected call error handling must also never throw (without catching) for
sandboxing reasons: the error handling path of a protected call is assumed
to never throw.
* ECMAScript try-catch handling isn't currently fully protected against out of
memory: if setting up the catch execution fails, an out-of-memory error is
propagated from the try-catch block. Try-catch isn't as safe as protected
calls for sandboxing. Even if catch execution setup didn't allocate memory,
it's difficult to write script code that is fully memory allocation free
(whereas writing C code which is allocation free is much easier).
* Mark-and-sweep without error throwing or (finalizer) call side effects is
OK.
Debugger message writes
* Code writing a debugger message to the current debug client transport
must ensure, somehow, that it will never happen when another function
is doing the same (including nested call to itself).
* If nesting happens, memory unsafe behavior won't happen, but the debug
connection becomes corrupted.
* There are some current issues for debugger message handling, e.g. debugger
code uses duk_safe_to_string() which may have side effects or even busy
loop.
Call sites needing side effect protection
=========================================
Error throw and resulting unwind
* Must protect against another error: longjmp state doesn't nest.
* Prevent finalizers, avoid Proxy traps and getter/setter calls.
* Avoid out-of-memory error throws, trial allocation is OK.
* Refzero with pure memory freeing is OK.
* Mark-and-sweep without finalizer execution is OK. Object and string
table compaction is OK, at least present.
* Error code must be very careful not to throw an error in any part of the
error unwind process. Otherwise sandboxing/protected call guarantees are
broken, and some of the side effect prevention changes are not correctly
undone (e.g. pf_prevent_count is bumped again!). There are asserts in place
for the entire critical part (heap->error_not_allowed).
Success unwind
* Must generally avoid (or protect against) error throws: otherwise state may
be only partially unwound. Same issues as with error unwind.
* However, if the callstack state is consistent, it may be safe to throw in
specific places in the success unwind code path.
String table resize
* String table resize must be protected against string interning.
* Prevent finalizers, avoid Proxy traps.
* Avoid any throws, so that state is not left incomplete.
* Refzero with pure memory freeing is OK.
* Mark-and-sweep without finalizer execution is OK. As of Duktape 2.1
string interning is OK because it no longer causes a recursive string
table resize regardless of interned string count. String table itself
protects against recursive resizing, so both object and string table
compaction attempts are OK.
Object property table resize
* Prevent compaction of the object being resized.
* In practice, prevent finalizers (they may mutate objects) and proxy
traps. Prevent compaction of all objects because there's no fine
grained control now (could be changed).
JSON fast path
* Prevent all side effects affecting property tables which are walked
by the fast path.
* Prevent object and string table compaction, mark-and-sweep otherwise OK.
Object property slot updates (e.g. data -> accessor conversion)
* Property slot index being modified must not change.
* Prevent finalizers and proxy traps/getters (which may operate on the object).
* Prevent object compaction which affects slot indices even when properties
are not deleted.
* In practice, use NORZ macros which avoids all relevant side effects.