duktape/doc/sandboxing.rst

==================
Sandboxing Duktape
==================

Overview
========

Sandboxed environments allow execution of untrusted code with two broad
goals in mind:

1. Security: prevent untrusted code from doing unsafe operations such as
   accessing memory directly, causing segfaults, etc.

2. Availability: prevent untrusted code from hogging resources, e.g.
   consuming all available memory or entering an infinite loop.

For some sandbox environments it's sufficient to protect against accidents,
e.g. user code accidentally entering an infinite loop.  In other environments
the executed code may be potentially hostile, which is of course much more
challenging to secure against.

Duktape provides mechanisms to allow these goals to be achieved for running
untrusted Ecmascript code.  All C code is expected to be trusted and must be
carefully written with these sandboxing goals in mind.

This document describes best practices for Duktape sandboxing.

.. note:: This document is in a rough draft state.  Duktape 1.0 does not yet
          have full support for sandboxing, e.g. there is no bytecode
          execution timeout yet.  Sandboxing shortcomings will be fixed in
          later versions.

Suggested measures
==================

Isolation approaches
--------------------

There are two basic alternatives to sandboxing Ecmascript code with Duktape:

* Use a separate Duktape heap for each sandbox

* Use a separate Duktape thread (with a separate global environment) for
  each sandbox

Pros and cons of using a Duktape heap for sandboxing:

* Duktape heaps cannot exchange values or object references except through
  explicit serialization, so values or references don't accidentally leak
  from one heap to another.

* Each Duktape heap can be assigned its own memory pool, which allows separate
  memory limits to be placed for each sandbox.

* Multiple native threads can be used in parallel to execute code in different
  heaps.  Only one native thread may be active at a time for each particular
  heap, however.

* One downside is that there is some per-heap overhead which accumulates for
  each sandbox.  If the memory pool for a heap is pre-allocated, some of the
  pool will be unused.  If memory is not pre-allocated, actual memory usage
  is quite tightly bound because of reference counting.

Pros and cons of using a Duktape thread for sandboxing:

* Duktape threads can exchange values and object references.  This can be
  useful at times, but is also a risk for sandboxing.

* If you create a thread with a separate global environment
  (``duk_push_thread_new_globalenv()``), two threads can be isolated to
  a large extent.  It's still possible to leak values and references
  between threads through Duktape/C functions, carelessly written finalizers,
  and so on.

* All threads in a certain heap share the same memory pool which means that
  one sandbox can starve other sandboxes of memory.

* Only a single native thread can execute Ecmascript code at a time.

These two approaches can of course be mixed: you can have multiple heaps,
each with one or more sandboxed threads.

Replace the global object
-------------------------

The first thing you should do is replace the global object with a minimal
replacement, providing only those bindings that are absolutely necessary
for the sandboxed environment.  Sometimes this means the full E5 bindings;
sometimes it means just a few bindings that sandboxed code is expecting.

In general you should be confident that every exposed binding is safe from
both security and availability viewpoints.

Risky bindings:

* The ``Duktape`` object provides access to Duktape internals in several ways
  which is not ideal.  It may also gain new properties in new Duktape versions,
  which may be easy to accidentally overlook, so the safest default is to hide
  it from sandboxed code.  You can still cherry pick individual functions to
  be exposed directly or through a wrapper.

* ``Duktape.Buffer`` allows creation of buffers and internal keys (through
  buffer-to-string coercion) and thus provides access to internal properties.
  See separate section on internal properties.

* ``Duktape.dec()`` allows decoding of string data into a buffer value and thus
  provides access to internal properties.

* ``Duktape.act()`` provides access to calling functions which may matter to
  some sandboxing environments.

* ``Duktape.fin()`` provides access to setting and getting a finalizer.  Since
  a finalizer may run in a different thread than where it was created,
  finalizers are a sandboxing risk.

You should also:

* Remove the ``require`` module loading function in the global object.
  If you need module loading in the sandbox, it's better to write a specific,
  constrained module loader for that environment.

Restrict access to internal properties
--------------------------------------

Internal properties are intended to be used by Duktape and user C code
to store "hidden properties" in objects.  The mechanism currently relies on
using strings with an invalid UTF-8 encoding which cannot normally be
created by Ecmascript code.  Such properties should be non-writable and
non-configurable when possible, but it's still a risk to let user code
access them.

If Ecmascript code has access to buffer values, it can easily create internal
keys and then access internal properties, e.g.::

    // With access to Duktape.dec: decodes to \xFFfoo, invalid UTF-8 data
    var key = Duktape.dec('hex', 'ff666f6f');

    // With an arbitrary buffer value 'buf' (with length >= 1)
    buf[0] = 0xff;  // create invalid utf-8 prefix
    var key = String(buf).substring(0, 1) + 'foo';

The risk in being able to access a certain internal property depends on the
internal property in question.  Some internal properties are non-writable and
non-configurable, so the sandboxed code can only read the property value; quite
often this is not an issue by itself.  If the value of an internal property can
be modified, concrete security issues may arise.  For instance, if an internal
property stores a raw pointer to a native handle (such as a ``FILE *``),
changing its value can lead to a potentially exploitable segfault.

To prevent access to internal keys:

* Ensure that sandboxed code has no direct access to buffer values, either
  by creating one using ``Duktape.Buffer`` or through some C binding which
  returns a buffer value in some way.

* Ensure that sandboxed code has minimal access to objects with potentially
  dangerous keys like raw pointers.

* If user code needs to deal with buffers, provide access through an accessor
  object without giving direct access to the underlying buffer.

The fact that access to buffer values provides access to internal properties
is not ideal.  There are several future work issues to improve this situation,
e.g. to prevent access to internal properties from Ecmascript code even with
the correct internal string key.

Restrict access to function instances
-------------------------------------

In some environments giving the user code access to calling functions can be
dangerous.  For instance, if user code gets access to a function it cannot
normally see through bindings in the global object, it can call that function
and perhaps sidestep sandboxing.

Prevent access to function references in the call stack:

* Prevent access to ``Duktape.act()`` which provides programmatic access to
  the call stack and its function references.

* If ``DUK_OPT_NONSTD_FUNC_CALLER_PROPERTY`` is enabled, the ``caller``
  property provides access to calling functions.  Don't use this option
  with sandboxing, or at least carefully control the ``caller`` property
  values seen by the sandboxed code.

* The ``_Tracedata`` internal property of error objects contains references
  to calling functions.  Because this property is internal, sandboxed code
  has no access to it as long as access to internal properties is prevented
  in general.

Restrict access to finalizers
-----------------------------

Allowing user to code to run a finalizer is dangerous: a finalizer can
execute in another thread than where it was created, so it can potentially
breach sandboxing.

Suggestions for sandboxing:

* Don't give user code access to ``Duktape.fin()``.

* Don't give user code access to internal properties: with access to
  internal properties, user code can read/write the internal finalizer
  reference directly.

* Write finalizers very carefully.  Make minimal assumptions on which
  thread they run, i.e. which global object they see.

* For sandboxed environments it may be sensible to make all finalizers
  native code so that they can access the necessary thread contexts
  regardless of the finalizer thread.

Sanitize built-in prototype objects
-----------------------------------

Plain values inherit implicitly from built-in prototype objects.  For instance,
string values inherit from ``String.prototype``, which allows one to access
string methods with a plain base value::

    print("foo".toUpperCase());

Duktape 1.0 will use the original built-in prototype functions in these
inheritance situations.  There is currently no way to replace these built-ins
so that the replacements would be used for instead (see
``test-dev-sandbox-prototype-limitation.js``).

As a result, sandboxed code will always have access to the built-in prototype
objects which participate in implicit inheritance:

* ``Boolean.prototype``: through plain booleans such as ``true``

* ``Number.prototype``: through numbers such as ``123``

* ``String.prototype``: through strings such as ``"foo"``

* ``Object.prototype``: through object literals such as ``{}``

* ``Array.prototype``: through array literals such as ``[]``

* ``Function.prototype``: through function expressions and declarations,
  such as ``function(){}``

* ``RegExp.prototype``: through RegExp literals such as ``/foo/``

* ``Error.prototype`` and all subclasses like ``URIError.prototype``:
  through explicit construction (if constructors visible) or implicitly
  through internal errors, e.g. ``/foo\123/`` which throws a SyntaxError

* ``Duktape.Buffer.prototype``: through buffer values (if available); since
  there is no buffer literal, user cannot construct buffer values directly

* ``Duktape.Pointer.prototype`` through pointer values (if available); since
  there is no pointer literal, user cannot construct pointer values directly

It's not sufficient to avoid exposing these prototype objects in a replacement
global object: Duktape will use the original built-in prototype objects
regardless when dealing with plain value inheritance.  It is possible, however,
to delete individual properties of the prototype objects, e.g.::

    delete String.prototype.toUpperCase

This will cause the original example to fail::

    delete String.prototype.toUpperCase
    print("foo".toUpperCase());  // TypeError: call target not an object

Suggestions for sandboxing:

* Be aware that user code can access built-in prototypes through implicit
  inheritance through various plain values.

* Sanitize built-in prototype objects by deleting unnecessary methods.

**XXX: This will probably need improvement.  There may need to be API to
replace all built-in values.  They are kept in an internal array so perhaps
just exposing a primitive to set arbitrary values in the array would be
sufficient (though cryptic).**

Use the bytecode execution timeout mechanism
--------------------------------------------

**XXX: Bytecode execution timeout not yet implemented in Duktape 1.0.**

The bytecode execution timeout mechanism allows a user callback to interact
with the bytecode executor to forcibly abort execution if a script has been
running for too long.  The mechanism relies on Duktape/C functions always
returning to the bytecode executor within a reasonable time so that the
execution timeout check can be done from time to time.  (Because there is only
one execution thread, the executor cannot interrupt on-going Duktape/C calls
otherwise.)

Duktape tries to place execution time and recursion depth limits on risky
internal operations.  For instance, there is a sanity limit on the number of
operations executed during regexp matching.  When these internal limits are
hit, a ``RangeError`` is thrown.  User code can catch such an error and
continue execution.  However, the error will return control to the bytecode
executor so that the execution timeout mechanism can kick in if necessary.

Use a fixed size memory pool for the sandbox
--------------------------------------------

You should usually restrict the amount of memory that the sandbox can use.

One common approach is to use a pooled memory allocator to impose a hard
limit on the memory available to the sandbox.  The memory can be split into
memory areas of a fixed size or a free-list based approach can be used.

Another approach is to use wrappers around standard ``malloc``, ``realloc``,
and ``free`` and keep track of total allocated memory.  One difficulty with
this is that ``realloc`` calls are not given the original allocation size
so you need to track that separately e.g. by prepending a small header to
every allocated memory block.

Review your C bindings for safety
---------------------------------

Review every C binding exposed to the sandbox.  There should be no way to
violate the safety goals through the C binding.  In particular:

* It shouldn't be possible to cause memory unsafe behavior.

* It shouldn't be possible to execute for an unreasonable amount of time
  within the C binding.

* It shouldn't be possible to access internal properties indirectly
  through the C binding.

Particular issues to look out for:

* Check typing of all arguments.  Avoid ``NULL`` pointers by using the
  "require" variants of getters (e.g. ``duk_require_lstring()`` instead
  of ``duk_require_string()``).

* Check every loop for termination.  Add a sanity termination limit if
  a loop is suspect.  Your goal is to return to the bytecode executor so
  that bytecode execution timeout can happen.

* When creating buffer values, avoid returning them to the caller and
  avoid using the buffer values e.g. as property lookup keys (which could
  accidentally access an internal property).

* When calling platform APIs, ensure they can never block indefinitely.

* Also ensure that native code doesn't compromise sandboxing goals at a
  higher level.  For instance, an API call must not allow sandboxed code
  to perform unauthenticated database writes or breach memory safety
  through file I/O on a Unix device file.
First draft of sandboxing documentation 10 years ago			`==================`
			`Sandboxing Duktape`
			`==================`

			`Overview`
			`========`

			`Sandboxed environments allow execution of untrusted code with two broad`
			`goals in mind:`

			`1. Security: prevent untrusted code from doing unsafe operations such as`
			`accessing memory directly, causing segfaults, etc.`

			`2. Availability: prevent untrusted code from hogging resources, e.g.`
			`consuming all available memory or entering an infinite loop.`

			`For some sandbox environments it's sufficient to protect against accidents,`
			`e.g. user code accidentally entering an infinite loop. In other environments`
			`the executed code may be potentially hostile, which is of course much more`
			`challenging to secure against.`

			`Duktape provides mechanisms to allow these goals to be achieved for running`
			`untrusted Ecmascript code. All C code is expected to be trusted and must be`
			`carefully written with these sandboxing goals in mind.`

			`This document describes best practices for Duktape sandboxing.`

			`.. note:: This document is in a rough draft state. Duktape 1.0 does not yet`
			`have full support for sandboxing, e.g. there is no bytecode`
			`execution timeout yet. Sandboxing shortcomings will be fixed in`
			`later versions.`

			`Suggested measures`
			`==================`

			`Isolation approaches`
			`--------------------`

			`There are two basic alternatives to sandboxing Ecmascript code with Duktape:`

			`* Use a separate Duktape heap for each sandbox`

			`* Use a separate Duktape thread (with a separate global environment) for`
			`each sandbox`

			`Pros and cons of using a Duktape heap for sandboxing:`

			`* Duktape heaps cannot exchange values or object references except through`
			`explicit serialization, so values or references don't accidentally leak`
			`from one heap to another.`

			`* Each Duktape heap can be assigned its own memory pool, which allows separate`
			`memory limits to be placed for each sandbox.`

			`* Multiple native threads can be used in parallel to execute code in different`
			`heaps. Only one native thread may be active at a time for each particular`
			`heap, however.`

			`* One downside is that there is some per-heap overhead which accumulates for`
			`each sandbox. If the memory pool for a heap is pre-allocated, some of the`
			`pool will be unused. If memory is not pre-allocated, actual memory usage`
			`is quite tightly bound because of reference counting.`

			`Pros and cons of using a Duktape thread for sandboxing:`

			`* Duktape threads can exchange values and object references. This can be`
			`useful at times, but is also a risk for sandboxing.`

			`* If you create a thread with a separate global environment`
			(``duk_push_thread_new_globalenv()``), two threads can be isolated to
			`a large extent. It's still possible to leak values and references`
			`between threads through Duktape/C functions, carelessly written finalizers,`
			`and so on.`

			`* All threads in a certain heap share the same memory pool which means that`
			`one sandbox can starve other sandboxes of memory.`

			`* Only a single native thread can execute Ecmascript code at a time.`

			`These two approaches can of course be mixed: you can have multiple heaps,`
			`each with one or more sandboxed threads.`

			`Replace the global object`
			`-------------------------`

			`The first thing you should do is replace the global object with a minimal`
			`replacement, providing only those bindings that are absolutely necessary`
			`for the sandboxed environment. Sometimes this means the full E5 bindings;`
			`sometimes it means just a few bindings that sandboxed code is expecting.`

			`In general you should be confident that every exposed binding is safe from`
			`both security and availability viewpoints.`

			`Risky bindings:`

			* The ``Duktape`` object provides access to Duktape internals in several ways
			`which is not ideal. It may also gain new properties in new Duktape versions,`
			`which may be easy to accidentally overlook, so the safest default is to hide`
			`it from sandboxed code. You can still cherry pick individual functions to`
			`be exposed directly or through a wrapper.`

			* ``Duktape.Buffer`` allows creation of buffers and internal keys (through
			`buffer-to-string coercion) and thus provides access to internal properties.`
			`See separate section on internal properties.`

			* ``Duktape.dec()`` allows decoding of string data into a buffer value and thus
			`provides access to internal properties.`

			* ``Duktape.act()`` provides access to calling functions which may matter to
			`some sandboxing environments.`

			* ``Duktape.fin()`` provides access to setting and getting a finalizer. Since
			`a finalizer may run in a different thread than where it was created,`
			`finalizers are a sandboxing risk.`

			`You should also:`

			* Remove the ``require`` module loading function in the global object.
			`If you need module loading in the sandbox, it's better to write a specific,`
			`constrained module loader for that environment.`

			`Restrict access to internal properties`
			`--------------------------------------`

			`Internal properties are intended to be used by Duktape and user C code`
			`to store "hidden properties" in objects. The mechanism currently relies on`
			`using strings with an invalid UTF-8 encoding which cannot normally be`
			`created by Ecmascript code. Such properties should be non-writable and`
			`non-configurable when possible, but it's still a risk to let user code`
			`access them.`

			`If Ecmascript code has access to buffer values, it can easily create internal`
			`keys and then access internal properties, e.g.::`

			`// With access to Duktape.dec: decodes to \xFFfoo, invalid UTF-8 data`
			`var key = Duktape.dec('hex', 'ff666f6f');`

			`// With an arbitrary buffer value 'buf' (with length >= 1)`
			`buf[0] = 0xff; // create invalid utf-8 prefix`
			`var key = String(buf).substring(0, 1) + 'foo';`

			`The risk in being able to access a certain internal property depends on the`
			`internal property in question. Some internal properties are non-writable and`
			`non-configurable, so the sandboxed code can only read the property value; quite`
			`often this is not an issue by itself. If the value of an internal property can`
			`be modified, concrete security issues may arise. For instance, if an internal`
			property stores a raw pointer to a native handle (such as a ``FILE *``),
			`changing its value can lead to a potentially exploitable segfault.`

			`To prevent access to internal keys:`

			`* Ensure that sandboxed code has no direct access to buffer values, either`
			by creating one using ``Duktape.Buffer`` or through some C binding which
			`returns a buffer value in some way.`

			`* Ensure that sandboxed code has minimal access to objects with potentially`
			`dangerous keys like raw pointers.`

			`* If user code needs to deal with buffers, provide access through an accessor`
			`object without giving direct access to the underlying buffer.`

			`The fact that access to buffer values provides access to internal properties`
			`is not ideal. There are several future work issues to improve this situation,`
			`e.g. to prevent access to internal properties from Ecmascript code even with`
			`the correct internal string key.`

			`Restrict access to function instances`
			`-------------------------------------`

			`In some environments giving the user code access to calling functions can be`
			`dangerous. For instance, if user code gets access to a function it cannot`
			`normally see through bindings in the global object, it can call that function`
			`and perhaps sidestep sandboxing.`

			`Prevent access to function references in the call stack:`

			* Prevent access to ``Duktape.act()`` which provides programmatic access to
			`the call stack and its function references.`

			* If ``DUK_OPT_NONSTD_FUNC_CALLER_PROPERTY`` is enabled, the ``caller``
			`property provides access to calling functions. Don't use this option`
			with sandboxing, or at least carefully control the ``caller`` property
			`values seen by the sandboxed code.`

Update internal doc on internal property naming 10 years ago			* The ``_Tracedata`` internal property of error objects contains references
First draft of sandboxing documentation 10 years ago			`to calling functions. Because this property is internal, sandboxed code`
			`has no access to it as long as access to internal properties is prevented`
			`in general.`

			`Restrict access to finalizers`
			`-----------------------------`

			`Allowing user to code to run a finalizer is dangerous: a finalizer can`
			`execute in another thread than where it was created, so it can potentially`
			`breach sandboxing.`

			`Suggestions for sandboxing:`

			* Don't give user code access to ``Duktape.fin()``.

			`* Don't give user code access to internal properties: with access to`
			`internal properties, user code can read/write the internal finalizer`
			`reference directly.`

			`* Write finalizers very carefully. Make minimal assumptions on which`
			`thread they run, i.e. which global object they see.`

			`* For sandboxed environments it may be sensible to make all finalizers`
			`native code so that they can access the necessary thread contexts`
			`regardless of the finalizer thread.`

Sandbox note on implicit inheritance 10 years ago			`Sanitize built-in prototype objects`
			`-----------------------------------`

			`Plain values inherit implicitly from built-in prototype objects. For instance,`
			string values inherit from ``String.prototype``, which allows one to access
			`string methods with a plain base value::`

			`print("foo".toUpperCase());`

			`Duktape 1.0 will use the original built-in prototype functions in these`
			`inheritance situations. There is currently no way to replace these built-ins`
			`so that the replacements would be used for instead (see`
			``test-dev-sandbox-prototype-limitation.js``).

			`As a result, sandboxed code will always have access to the built-in prototype`
			`objects which participate in implicit inheritance:`

			* ``Boolean.prototype``: through plain booleans such as ``true``

			* ``Number.prototype``: through numbers such as ``123``

			* ``String.prototype``: through strings such as ``"foo"``

			* ``Object.prototype``: through object literals such as ``{}``

			* ``Array.prototype``: through array literals such as ``[]``

			* ``Function.prototype``: through function expressions and declarations,
			such as ``function(){}``

			* ``RegExp.prototype``: through RegExp literals such as ``/foo/``

			* ``Error.prototype`` and all subclasses like ``URIError.prototype``:
			`through explicit construction (if constructors visible) or implicitly`
			through internal errors, e.g. ``/foo\123/`` which throws a SyntaxError

			* ``Duktape.Buffer.prototype``: through buffer values (if available); since
			`there is no buffer literal, user cannot construct buffer values directly`

			* ``Duktape.Pointer.prototype`` through pointer values (if available); since
			`there is no pointer literal, user cannot construct pointer values directly`

			`It's not sufficient to avoid exposing these prototype objects in a replacement`
			`global object: Duktape will use the original built-in prototype objects`
			`regardless when dealing with plain value inheritance. It is possible, however,`
			`to delete individual properties of the prototype objects, e.g.::`

			`delete String.prototype.toUpperCase`

			`This will cause the original example to fail::`

			`delete String.prototype.toUpperCase`
			`print("foo".toUpperCase()); // TypeError: call target not an object`

			`Suggestions for sandboxing:`

			`* Be aware that user code can access built-in prototypes through implicit`
			`inheritance through various plain values.`

			`* Sanitize built-in prototype objects by deleting unnecessary methods.`

			`**XXX: This will probably need improvement. There may need to be API to`
			`replace all built-in values. They are kept in an internal array so perhaps`
			`just exposing a primitive to set arbitrary values in the array would be`
			`sufficient (though cryptic).**`

First draft of sandboxing documentation 10 years ago			`Use the bytecode execution timeout mechanism`
			`--------------------------------------------`

			`XXX: Bytecode execution timeout not yet implemented in Duktape 1.0.`

			`The bytecode execution timeout mechanism allows a user callback to interact`
			`with the bytecode executor to forcibly abort execution if a script has been`
			`running for too long. The mechanism relies on Duktape/C functions always`
			`returning to the bytecode executor within a reasonable time so that the`
			`execution timeout check can be done from time to time. (Because there is only`
			`one execution thread, the executor cannot interrupt on-going Duktape/C calls`
			`otherwise.)`

			`Duktape tries to place execution time and recursion depth limits on risky`
			`internal operations. For instance, there is a sanity limit on the number of`
			`operations executed during regexp matching. When these internal limits are`
			hit, a ``RangeError`` is thrown. User code can catch such an error and
			`continue execution. However, the error will return control to the bytecode`
			`executor so that the execution timeout mechanism can kick in if necessary.`

			`Use a fixed size memory pool for the sandbox`
			`--------------------------------------------`

			`You should usually restrict the amount of memory that the sandbox can use.`

			`One common approach is to use a pooled memory allocator to impose a hard`
			`limit on the memory available to the sandbox. The memory can be split into`
			`memory areas of a fixed size or a free-list based approach can be used.`

			Another approach is to use wrappers around standard ``malloc``, ``realloc``,
			and ``free`` and keep track of total allocated memory. One difficulty with
			this is that ``realloc`` calls are not given the original allocation size
			`so you need to track that separately e.g. by prepending a small header to`
			`every allocated memory block.`

			`Review your C bindings for safety`
			`---------------------------------`

			`Review every C binding exposed to the sandbox. There should be no way to`
			`violate the safety goals through the C binding. In particular:`

			`* It shouldn't be possible to cause memory unsafe behavior.`

			`* It shouldn't be possible to execute for an unreasonable amount of time`
			`within the C binding.`

			`* It shouldn't be possible to access internal properties indirectly`
			`through the C binding.`

			`Particular issues to look out for:`

			* Check typing of all arguments. Avoid ``NULL`` pointers by using the
			"require" variants of getters (e.g. ``duk_require_lstring()`` instead
			of ``duk_require_string()``).

			`* Check every loop for termination. Add a sanity termination limit if`
			`a loop is suspect. Your goal is to return to the bytecode executor so`
			`that bytecode execution timeout can happen.`

			`* When creating buffer values, avoid returning them to the caller and`
			`avoid using the buffer values e.g. as property lookup keys (which could`
			`accidentally access an internal property).`

			`* When calling platform APIs, ensure they can never block indefinitely.`

			`* Also ensure that native code doesn't compromise sandboxing goals at a`
			`higher level. For instance, an API call must not allow sandboxed code`
			`to perform unauthenticated database writes or breach memory safety`
			`through file I/O on a Unix device file.`