======= Buffers ======= Overview ======== Ecmascript did not originally have a binary array (or binary string) data type so various approaches are used: * Khronos typed array: - https://www.khronos.org/registry/typedarray/specs/latest/ - ``svn co -r 30720 https://cvs.khronos.org/svn/repos/registry/trunk/public/typedarray`` - https://developer.mozilla.org/en/docs/Web/JavaScript/Typed_arrays - http://www.html5rocks.com/en/tutorials/webgl/typed_arrays/ - http://clokep.blogspot.fi/2012/11/javascript-typed-arrays-pain.html - http://blogs.msdn.com/b/ie/archive/2011/12/01/working-with-binary-data-using-typed-arrays.aspx - https://www.inkling.com/read/javascript-definitive-guide-david-flanagan-6th/chapter-22/typed-arrays-and-arraybuffers * ES2015 has adopted the Khronos specification, with more specific semantics: - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-arraybuffer-constructor - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-typedarray-constructors - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-arraybuffer-objects - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-dataview-objects These section links point to specific built-ins but link to the above: - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-constructor-properties-of-the-global-object-arraybuffer - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-constructor-properties-of-the-global-object-dataview - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-uint8array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-uint8clampedarray - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-uint16array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-uint32array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-int8array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-int16array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-int32array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-float32array - http://www.ecma-international.org/ecma-262/6.0/index.html#sec-float64array ES2015 spec: - http:/www.ecma-international.org/ecma-262/6.0/index.html/ * Node.js Buffer: - http://nodejs.org/api/buffer.html - https://nodejs.org/docs/v6.9.1/api/buffer.html - https://github.com/joyent/node/blob/master/lib/buffer.js * Duktape has a custom plain buffer type which has a minimal memory footprint: - The plain buffer data type which can point to a fixed size buffer, a dynamic (resizable) buffer, or an external (user allocated) buffer. - The plain buffer behaves like Uint8Array for Ecmascript code but maintains separate typing in the C API. It object coerces to an actual ``Uint8Array`` sharing the same underlying storage. * Blob; not very relevant for Duktape: - https://developer.mozilla.org/en-US/docs/Web/API/Blob This document describes how various buffer types have been implemented in Duktape. The goal is to minimize footprint, so the internal buffer type implementation shares a lot of code even though multiple APIs are provided. Duktape buffer support ====================== Overview -------- Duktape currently supports the following buffer and buffer-related values: * Plain Duktape buffer * Node.js Buffer object * ArrayBuffer, DataView, and TypedArray (Uint8Array etc) objects The plain buffer value forms the basic underlying type for all the other buffer types; the relationship is similar to a plain string and a String object. Plain buffers can be fixed, dynamic, or external: * Fixed buffers cannot be resized but have a stable data pointer. * Dynamic buffers can be resized at the cost of an unstable data pointer. You can also "steal" the current buffer allocation through the duk_steal_buffer() API call. * External buffers point to user-allocated external data area whose pointer and length can be changed but Duktape won't resize or automatically free the buffer. Plain buffers have Uint8Array-like virtual properties for buffer byte indices, .length, .byteOffset, .byteLength, and .BYTES_PER_ELEMENT. Assignment has the same semantics as Uint8Array: bytes are written with modulo 256 semantics, bytes read back as unsigned 8-bit values. The plain buffer type is designed to be as friendly as possible for low level embedded programming, and has a minimal footprint because there's no Ecmascript object associated with it. It is mostly intended to be accessed from C code. Duktape also uses buffer values internally. The various buffer and view objects ultimately point to an underlying buffer and provide access to the full buffer or a slice/view of the buffer using either accessor methods (like getUint32()) or virtual index properties. All buffer objects are internally implemented using the ``duk_hbufobj`` type which makes it easy to mix values between different APIs. As a result Duktape e.g. accepts a Node.js Buffer as an input for a Khronos DataView. duk_hbufobj ----------- The internal ``duk_hbufobj`` type is a heap-allocated structure extended from ``duk_hobject``. In addition having the usual object property tables and such, it has C struct fields for quick access: * A reference to an underlying plain buffer value (``duk_hbuffer``), which may be of any type: fixed, dynamic, or external. * A byte offset/length pair providing a window into the underlying buffer. These values directly map to the virtual ``byteOffset`` and ``byteLength`` properties. * An element type and a shift count. These provide enough information to support Khronos TypedArray views so that index values can be mapped to byte offsets and encoded/decoded appropriately. The virtual ``length`` property indicates the number of *elements* (not bytes) available, and is provided by dividing the byte length field with the element size (rounding downwards). The virtual ``BYTES_PER_ELEMENT`` is provided based on the element shift count (as "1 << shift"). The following figure illustrates these for a fictional Int16-view:: : 0 1: 2 3 4 5 6 7 8 9 10 11:12 13 14 15 : +------+-----------------------------+------------+ | xx xx:xx xx xx xx xx xx xx xx xx xx:xx xx xx xx | underlying buffer +------+-----------------------------+------------+ (16 bytes) : : : : : : : : : : : : shift is 1, element size is : : : : : : (1 << 1) => 2 bytes |-----|-----|-----|-----|-----| (= .BYTES_PER_ELEMENT) : [0] : [1] : [2] : [3] : [4] : : : elem. type is Int16 (signed) : : :<--->: (2-byte elements) byte offset: 2 (= .byteOffset) byte length: 10 (= .byteLength) => view maps byte range [2,12[ length in elements: 5 (= .length) virtual indices: 0, 1, 2, 3, 4 Each ``duk_hbufobj`` has virtual index behavior with indices mapping logically to elements in the range [0,length[. Elements may be signed or unsigned integers of multiple sizes, IEEE floats, or IEEE doubles. All accesses to the underlying buffer are byte-based, and no alignment is required by Duktape; however, Khronos TypedArray specification restricts creation of non-element-aligned views. All multi-byte elements are accessed in the host endianness (this is required by the ES2015 TypedArray specification). A ``duk_hbufobj`` acts as a both a buffer representation (providing Node.js Buffer and ArrayBuffer) and a view representation (prodiving e.g. DataView, Uint8Array, and other TypedArray views). It supports both a direct 1:1 mapping to an underlying buffer and a slice/view mapping to a subset of the buffer. The byteLength/byteOffset pair provides a logical window for the buffer object. The underlying buffer may be smaller, e.g. as a result of a dynamic buffer being resized after a ``duk_hbufobj`` was created. For example:: +------+---------------------+ | xx xx:xx xx xx xx xx xx xx | / / / / underlying buffer resized to 9 bytes +------+---------------------+ : : : : : : : : : : ? : ? : index 3 is only partially mapped : : : : : : inde4 5 is not mapped |-----|-----|-----|-----|-----: : [0] : [1] : [2] : [3] : [4] : This is not intended to be a normal usage scenario, so the main goal for Duktape is only to provide memory safe behavior: * The virtual properties (byteLength, byteOffset, length) are unchanged. * Attempt to read outside the view (fully or partially) returns zero values. * Attempt to write outside the view (fully or partially) is silently ignored. * Other operations requiring access to the underlying buffer vary in behavior, some operations are silently skipped, some cause a TypeError, etc. Beyond memory safety, any specific behavior is not part of versioning guarantees and may change even between minor versions. Summary of buffer-related values -------------------------------- +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Type | Specification | .length | .byteLength | .byteOffset | .BYTES_PER_ELEMENT | .buffer | [index] | Element type | Read coercion | Write coercion | Endianness | Accessor methods | Notes | +===================+===============+================+=============+=============+====================+=========+=========+==============+===============+=====================+=============+==================+===================================+ | plain buffer | Duktape | yes (bytes) | yes | yes | 1 | no | yes | uint8 | uint8 | ToUint32() & 0xff | n/a | no | Mimic Uint8Array, inherit | | | | | | | | | | | | | | | from Uint8Array.prototype. | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Buffer | Node.js | yes (bytes) | yes | yes | 1 | no | yes | uint8 | uint8 | ToUint32() & 0xff | n/a | yes | Based on Node.js v0.12.1. | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | ArrayBuffer | TypedArray | no | yes | no | no | no | no | n/a | n/a | n/a | n/a | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | DataView | TypedArray | yes (bytes) | yes | yes | 1 | yes | yes | uint8 | uint8 | ToUint32() & 0xff | n/a | yes | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Int8Array | TypedArray | yes (bytes) | yes | yes | 1 | yes | yes | int8 | int8 | ToUint32() & 0xff | n/a | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Uint8Array | TypedArray | yes (bytes) | yes | yes | 1 | yes | yes | uint8 | uint8 | ToUint32() & 0xff | n/a | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Uint8ClampedArray | TypedArray | yes (bytes) | yes | yes | 1 | yes | yes | uint8 | uint8 | special | n/a | no | Write: special clamp/round. | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Int16Array | TypedArray | yes (elements) | yes | yes | 2 | yes | yes | int16 | int16 | ToUint32() & 0xffff | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Uint16Array | TypedArray | yes (elements) | yes | yes | 2 | yes | yes | uint16 | uint16 | ToUint32() & 0xffff | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Int32Array | TypedArray | yes (elements) | yes | yes | 4 | yes | yes | int32 | int32 | ToUint32() | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Uint32Array | TypedArray | yes (elements) | yes | yes | 4 | yes | yes | uint32 | uint32 | ToUint32() | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Float32Array | TypedArray | yes (elements) | yes | yes | 4 | yes | yes | float | float | cast to float | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ | Float64Array | TypedArray | yes (elements) | yes | yes | 8 | yes | yes | double | double | cast to double | host | no | | +-------------------+---------------+----------------+-------------+-------------+--------------------+---------+---------+--------------+---------------+---------------------+-------------+------------------+-----------------------------------+ Notes: * A plain buffer mimics an Uint8Array wherever possible, and inherits methods and other properties through ``Uint8Array.prototype``. * DataView and Node.js Buffer inherit a set of accessor methods from their prototype. These accessors allow fields of different width and type to be manipulated directly. Endianness can be specified, but is limited to little/big (there's no support for ARM mixed endian IEEE doubles). * TypedArray views are host endian. Their byte offset relative to the ArrayBuffer they are used on must also be a multiple of the element size (i.e. views must be naturally aligned). These requirements are not very useful from Duktape point of view but they are required by the ES2015 specification. (It would be trivial to use a specific endianness or allow unaligned views because Duktape works with the values byte-by-byte anyway.) * ``Uint8ClampedArray`` has a very specific clamping and rounding behavior which differs from all other view types. * An unsigned ``ToUint32()`` coercion is used in writing signed values too. For the bytes written to memory the signedness of this coercion doesn't really matter. Built-in objects related to buffers ----------------------------------- Duktape plain buffer value: * None Node.js Buffer: * Buffer * Buffer.prototype * SlowBuffer, only available if one does: require("buffer") and omitted from Duktape implementation TypedArray: * ArrayBuffer * ArrayBuffer.prototype * DataView * DataView.prototype * Int8Array * Int8Array.prototype * Uint8Array * Uint8Array.prototype * Uint8ClampedArray * Uint8ClampedArray.prototype * Int16Array * Int16Array.prototype * Uint16Array * Uint16Array.prototype * Int32Array * Int32Array.prototype * Uint32Array * Uint32Array.prototype * Float32Array * Float32Array.prototype * Float64Array * Float64Array.prototype None of the prototype objects are mandated by the Khronos specification but are present in ES2015. Conversions between buffer values --------------------------------- Because Duktape supports three Buffer object APIs, it's important that buffer values can be comfortably exchanged between the APIs (none of the API specifications require such behavior, of course). As a general rule: * Any Buffer object/view (implemented internally as a ``duk_hbufobj``) is accepted by any API expecting a specific object/view. For example, Khronos DataView() constructor accepts a Node.js Buffer, and Node.js Buffer() accepts a Uint8Array as an input. * A plain Duktape buffer is accepted as if it was coerced to an Uint8Array. To simplify implementation many internals actually do an explicit Uint8Array coercion when given plain buffers. This general rules is complicated by a few practical issues: * Some APIs create slices/views that share an underlying buffer value, while others create copies. Both behaviors are necessary in some situations. * A slice/view which doesn't map 1:1 to an underlying buffer cannot be coerced to a plain buffer value without copying, as the extra offset and length information is not supported for plain buffer values. The current mixing behavior is described in Duktape Wiki: * http://wiki.duktape.org/HowtoBuffers.html Buffer values in the Duktape C API ---------------------------------- The C API for plain buffer and buffer object handling is described in Duktape Wiki: * http://wiki.duktape.org/HowtoBuffers.html Node.js Buffer notes ==================== The Node.js ``Buffer`` type is widely used in server-side programming but is not standardized as such. Specification notes ------------------- Specification notes: * A Buffer may point to a slice of an underlying buffer. * String-to-buffer coercion has a set of encoding values (other than UTF-8). * Buffer prototype's ``slice()`` does not copy contents of the slice, but creates a new Buffer which points to the same underlying buffer. This is similar to the TypedArray ``subarray()`` operation, but different from the ArrayBuffer ``slice()`` operation which creates a new buffer for the slice. With typed arrays a non-copying slice would just be a new view on top of a previous one instead of a new ArrayBuffer. * The ``slice()`` operation provides offsetted access to the underlying buffer (same as with e.g. Uint8Array). However, a slice is a fully fledged buffer and can be used to create another slice() etc. * Buffers have virtual index properties and a virtual 'length' property. * Reads and writes have an optional offset and value range check which causes an error for out-of-bounds indices (RangeError) and values (TypeError); the behavior is not always consistent, and chosen Duktape behavior is documented in testcases. When the checks are disabled (noAssert == true), the behavior is memory unsafe and variable; some memory unsafe behavior results. Duktape semantics are always memory safe even at the cost of some performance. * Buffer accessor method read and write offsets are byte offsets regardless of data type being accessed. This is similar to Khronos DataView, but different from Khronos TypedArray views whose indices are element-based. * There are no alignment requirements for field access. This also matches Khronos DataView behavior, but differs from Khronos TypedArrays which must be aligned. * write(U)Int(LE|BE) and read(U)Int(LE|BE) operate on variable-size integers (up to 48-bit) and caller selects number of bytes (and endianness) to read or write. * Newly created buffers don't seem to be zeroed automatically. Duktape zeroes buffer data as a side effect of underlying ``duk_hbuffer`` values being automatically zeroed. However, if DUK_USE_ZERO_BUFFER_DATA is not set, Node.js Buffers are not zeroed. * Buffer inspect() provides a limited hex dump of buffer contents. Duktape doesn't currently provide a similar function by default. * SlowBuffer: probably not needed. * User code can ``require('buffer')``; this is not supported by Duktape. Implementation notes -------------------- * Representation must point to a plain buffer and also needs internal slice offset/length properties to implement slice semantics. Slices must be valid inputs for other slices; such slice-of-slice objects can point to the same plain buffer with offset/length pairs resolved at each step. * For fast operations, guaranteed property slots could be used. Alternatively a dedicated ``duk_hobject`` subtype can be used. (The latter was chosen.) * Should be optional and disabled by default because of footprint concerns. * Should have a toLogString() which prints inspect() output or some other useful oneliner? Buffers are not automatically zeroed ------------------------------------ :: > b = new Buffer(16) > b.fill(0) undefined > b Range checks and partial writes ------------------------------- By default offset and value ranges are checked:: > b.writeUInt8(0x101, 0) TypeError: value is out of bounds at TypeError () at checkInt (buffer.js:784:11) [...] With an explicit option asserts can be turned off. With assertions disabled invalid offsets are ignored and values are treated with modulo semantics:: > b.writeUInt8(0x101, 0, true) undefined > b When writing values larger than a byte, partial writes are allowed:: > b.fill(0) undefined > b.writeUInt32BE(0xdeadbeef, 13) RangeError: Trying to write outside buffer length at RangeError () at checkInt (buffer.js:788:11) [...] > b.writeUInt32BE(0xdeadbeef, 13, true) undefined > b > b.fill(0) undefined > b.writeUInt32BE(0xdeadbeef, -1, true) undefined > b However, such values are not actually "dropped" but can actually be read back with an unchecked out-of-bounds read:: > b = new Buffer(16); b.fill(0); b.writeUInt32BE(0xdeadbeef, -1, true); b > b.readUInt32BE(-1, true).toString(16) 'deadbeef' > b.fill(1); b > b.readUInt32BE(-1, true).toString(16) 'de010101' This is not just a "safe zone" to avoid implementing partial writes: the out-of-bounds offsets can be large:: > b = new Buffer(16); b.fill(0); b.writeUInt32BE(0xdeadbeef, -10000, true); b > b.readUInt32BE(-10003, true).toString(16) 'de' > b.readUInt32BE(-10000, true).toString(16) 'deadbeef' Running under valgrind this causes no valgrind gripes, so apparently this is supported behavior. It might be caused by "buffer sharing" where Node.js actually uses a large Buffer to provide multiple smaller Buffers (as slices), and these out-of-bounds accesses hit the shared large Buffer. Sometimes memory unsafe behavior occurs, though. This behavior is difficult to implement in Duktape, so probably the best approach is to either ignore partial reads/writes, or implement them in an actual "clipping" manner. Khronos typed array notes ========================= The Khronos typed array specification is related to HTML canvas and WebGL programming. Some of the design choices are affected by this, e.g. the endianness handling and clamped byte write support. The Khronos specification has been refined and merged into ES2015 so this specification has an official status now. Specification notes ------------------- * ArrayBuffer wraps an underlying buffer object, ArrayBufferView and DataView classes provide "windowed" access to some underlying ArrayBuffer. A buffer object can be "neutered". Apparently neutering happens when "transferring" an ArrayBuffer which is HTML specific. Unsure if neutering needs to be supported. * ArrayBuffer does not have virtual indices or 'length' behavior, but TypedArray views do. DataView does not have virtual indices but e.g. V8 provides them in practice. * ArrayBuffer has 'byteLength'. Views have a 'byteLength' and a 'length', where 'length' refers to number of elements, not bytes. For example a Uint32Array view with length 4 would have byteLength 16. (For internal reasons, all Duktape ArrayBuffer and view objects provide 'length', 'byteLength', and 'byteOffset'.) * ArrayBufferView classes are host endian. DataView is endian independent because caller specifies endianness for each call. * TypedArray instances must be created with a byte offset that is a multiple of the element size (i.e. aligned). DataView doesn't have this restriction. (This requirement is unnecessary for Duktape because the implementation never assumes alignment. But, this requirement is implemented for compatibility.) * NaN handling is rather fortunate, as it is compatible with packed duk_tval: in other words, NaNs can be substituted with one another. When coerced to integer, NaN is coerced to zero. * Modulo semantics for number writes, except Uint8ClampedArray which provides clamped semantics with special rounding when writin values. Both modulo and clamping coerces NaN to zero. With modulo semantics flooring is used (1.999 writes as 1) while clamped semantics uses a specific form of rounding. * For the clamping behavior, see: - http://heycam.github.io/webidl/#Clamp - http://heycam.github.io/webidl/#es-type-mapping - http://heycam.github.io/webidl/#es-byte Steps for unsigned byte (octet) clamped coercion: - Set x to min(max(x, 0), 2^8 - 1). - Round x to the nearest integer, choosing the even integer if it lies halfway between two, and choosing +0 rather than -0. - Return the IDL octet value that represents the same numeric value as x. * Error is thrown for out-of-bounds accesses. * When using ``set()`` the arrays may refer to the same underlying array and the write source and destination may overlap. Must handle as if a temporary copy was made, i.e. like ``memmove()``. * DataView and Node.js buffer have similar (but not identical) methods, which can share the same underlying implementation. Endianness is specified with an argument in DataView but is implicit in Node.js buffer:: // DataView setUint16(unsigned long byteOffset, unsigned short value, optional boolean littleEndian) // Node.js buffer buf.writeUInt16LE(value, offset, [noAssert]) buf.writeUInt16BE(value, offset, [noAssert]) Unfortunately also the argument order (value/offset) are swapped. * There are explicit zeroing guarantees for ArrayBuffer constructor and typedarray constructors, so buffer data must be zeroed even when DUK_USE_ZERO_BUFFER_DATA is not set. Implementation notes -------------------- * ArrayBuffer wraps an underlying buffer object. A buffer object can be "neutered". * ArrayBufferView classes and DataView refer to an underlying ArrayBuffer, and may have an offset. These could be implemented similar to Node.js Buffer: refer to a plain underlying buffer, byte offset, and byte length in internal properties. Reference to the original ArrayBuffer (boxed buffer) is unfortunately also needed, via the '.buffer' property. * There are a lot of classes in the typed array specification. Each class is an object, so this is rather heavyweight. * Should be optional and disabled by default because of footprint concerns. Merged read/write algorithm for element access ============================================== This section describes a merged algorithm for reading and writing fields (uint8, int8, uint16, int16, etc) with the explicit read/write calls provided by DataView and Node.js Buffer. The same native code can be used with "magic" value providing flags for differences in behavior. Virtual index properties also need handling; they can either be implemented separately or call into this algorithm. Summary of read methods ----------------------- Related methods are summarized in the table below, notes: * "buf.XXX" refers to Node.JS Buffer instance methods (inherited) * "dv.XXX" refers to Khronos DataView instance methods (inherited) * "XyzArray index" refers to Khronos typed array view number index reads * Endianness "user" means that caller gives a littleEndian flag so that effective endianness is either big or little (there's no support for ARM mixed endian) * Endianness "host" means that host endianness is used * When reading values, there's no clamping behavior because integers are converted to IEEE doubles upon read in the natural way (zeroes read out as positive zeroes). * Bounds "arg" means argument indicates yes/no, "yes" means bounds are checked, "n/a" means not applicable. Virtual indices don't really have bounds checking, as any reads outside the range [0,length[ just become concrete string-keyed property lookups. +-------------------------+--------+-------+--------+---------------------------------------------------+ | Method | Endian | Bytes | Bounds | Notes | +=========================+========+=======+========+===================================================+ | buf.readIntLE | little | 1-6 | arg | Can read up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readIntBE | big | 1-6 | arg | Can read up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUIntLE | little | 1-6 | arg | Can read up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUIntBE | big | 1-6 | arg | Can read up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readInt8 | n/a | 1 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUInt8 | n/a | 1 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readInt16LE | little | 2 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readInt16BE | big | 2 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUInt16LE | little | 2 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUInt16BE | big | 2 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readInt32LE | little | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readInt32BE | big | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUInt32LE | little | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readUInt32BE | big | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readFloatLE | little | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readFloatBE | big | 4 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readDoubleLE | little | 8 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | buf.readDoubleBE | big | 8 | arg | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getInt8 | n/a | 1 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getUint8 | n/a | 1 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getInt16 | user | 2 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getUint16 | user | 2 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getInt32 | user | 4 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getUint32 | user | 4 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getFloat32 | user | 4 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | DataView.getFloat64 | user | 8 | yes | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Int8Array index | n/a | 1 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Uint8Array index | n/a | 1 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Uint8ClampedArray index | n/a | 1 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Int16Array index | host | 2 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Uint16Array index | host | 2 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Int32Array index | host | 4 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Uint32Array index | host | 4 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Float32Array index | host | 4 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ | Float64Array index | host | 8 | n/a | | +-------------------------+--------+-------+--------+---------------------------------------------------+ Summary of write methods ------------------------ Related methods are summarized in the table below, notes: * "buf.XXX" refers to Node.JS Buffer instance methods (inherited) * "dv.XXX" refers to Khronos DataView instance methods (inherited) * "XyzArray index" refers to Khronos typed array view number index writes * Endianness "user" means that caller gives a littleEndian flag so that effective endianness is either big or little (there's no support for ARM mixed endian) * Endianness "host" means that host endianness is used * Coercion behavior describes how an input value is coerced into an integer value; usually truncation but there are special cases. "truncate*" means that truncation happens in Node.js Buffer API calls when "noAssert==true"; a TypeError occurs for out-of-range writes (though fractional values are still silently accepted). * Bounds "arg" means argument indicates yes/no, "yes" means bounds are checked, "n/a" means not applicable. Virtual indices don't really have bounds checking, as any writes outside the range [0,length[ just become concrete string-keyed properties of the object (provided the object is extensible). * Return value of Node.js Buffer write calls is the number of bytes written. TypedArray write return value is ``undefined``. * Node.js Buffer write() method is left out because it's not an element write +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Method | Endian | Bytes | Bounds | Coercion | Notes | +=========================+========+=======+========+===========+===================================================+ | buf.writeIntLE | little | 1-6 | arg | truncate* | Can write up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeIntBE | big | 1-6 | arg | truncate* | Can write up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUIntLE | little | 1-6 | arg | truncate* | Can write up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUIntBE | big | 1-6 | arg | truncate* | Can write up to 48-bit integers, caller specifies | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeInt8 | n/a | 1 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUInt8 | n/a | 1 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeInt16LE | little | 2 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeInt16BE | big | 2 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUInt16LE | little | 2 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUInt16BE | big | 2 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeInt32LE | little | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeInt32BE | big | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUInt32LE | little | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeUInt32BE | big | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeFloatLE | little | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeFloatBE | big | 4 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeDoubleLE | little | 8 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | buf.writeDoubleBE | big | 8 | arg | truncate* | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setInt8 | n/a | 1 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setUint8 | n/a | 1 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setInt16 | user | 2 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setUint16 | user | 2 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setInt32 | user | 4 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setUint32 | user | 4 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setFloat32 | user | 4 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | DataView.setFloat64 | user | 8 | yes | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Int8Array index | n/a | 1 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Uint8Array index | n/a | 1 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Uint8ClampedArray index | n/a | 1 | n/a | special | Coercion is rounding with specific rules | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Int16Array index | host | 2 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Uint16Array index | host | 2 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Int32Array index | host | 4 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Uint32Array index | host | 4 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Float32Array index | host | 4 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ | Float64Array index | host | 8 | n/a | truncate | | +-------------------------+--------+-------+--------+-----------+---------------------------------------------------+ Implementation notes ==================== TypedArray inheritance ---------------------- The prototype chain for a TypedArray instance in V8 is:: view object -> Uint8Array.prototype -> Object.prototype This means that view properties like ``set()`` and ``subarray()`` are provided by the prototype, and each view type has its own prototype with these properties. This duplicates the properties several times. Duktape now inherits from an intermediate object:: view object -> Uint8Array.prototype -> TypedArray prototype -> Object.prototype The ``set()`` and ``subarray()`` methods are inherited from the intermediate prototype object. This reduces property count by about 16 at the cost of one additional object. ES2015 makes this the standard model; the TypedArreay prototype is referred to as %TypedArrayPrototype% intrinsic object in the ES2015 specification. View/slice notes ---------------- * Affects all code that accesses the underlying buffer through an Object reference (Buffer, ArrayBuffer, DataView, Uint8Array, etc): - Must look up internal plain buffer but also check for offset/length information. - Lookups should be fast, so: + Use an extended structure like for compiled functions + Use slotted internal properties (must be non-configurable so that their location won't change by accident) * Need reference to underlying buffer: - Could use a raw pointer to the buffer data as long as there's also a buffer reference to avoid freeing the underlying data. - But a raw pointer would only work with a fixed buffer which has a stable buffer pointer. - So, must reference the original buffer and figure out its data area dynamically. * Need byte offset and length for the view: - These should be validated on creation so that sanity checks are not necessary for every access. - If internal properties, should be non-writable and non-configurable to ensure that only C code can create a situation where assertions fail. * Need element size for the view: - For Node.js Buffer the element size is the byte size. For TypedArrays it may be 1, 2, 4, or 8 bytes. - Virtual "length" property must provide length in elements. Maintain two length fields (byte and element) or only the other and shift as necessary. - Virtual element "length": easier index/bound checks, virtual "length" read needs no change. Must be taken into account when byte length is needed. Buffer validity checksand unbacked buffers ------------------------------------------ To ensure memory safety, all memory accesses need to be checked against the size of the underlying buffer even if the access is within the configured view/slice. This is needed because an underlying buffer may be dynamic or external and can be resized/reconfigured at any point. In particular, the underlying buffer may be resized as a side effect of any operation that triggers code to run: the code may call into user code which manipulates the buffer. As a result, the following checks must be made just before an operation and there must be no side effects between the check and the operation: * Checking that byte range is covered by underlying buffer * Checking that bufferobject is neutered (buf == NULL vs. buf != NULL) Future work =========== Missing ES2015 features ----------------------- General semantics: * ToLength() coercion allows ArrayBuffer and typed array length up to ``2^53 - 1``. * Virtual index getters/setters don't handle out-of-bound accesses correctly (they should not be inherited through the inheritance chain). * Behavior for "detached" ArrayBuffers don't necessarily implement the behavior described in http://www.ecma-international.org/ecma-262/6.0/#sec-properties-of-the-arraybuffer-instances: "... all operators to access or modify data contained in the ArrayBuffer instance will fail." However, there's no support for creating a detached buffer now, so this doesn't really matter. * Coercion behavior may be correct, but needs to be checked for typed arrays and DataView. ArrayBuffer: * ``ArrayBuffer.prototype[@@toStringTag]`` missing. DataView: * ``DataView.prototype.buffer`` is an accessor property, currently ``.buffer`` is a concrete property. * ``DataView.prototype[@@toStringTag]`` missing. Typed arrays: * Typed array constructors like ``Uint8Array`` should inherit from an unnamed prototype object which hosts shared properties like ``.from()``. * ``%TypedArray%.from`` missing. * ``%TypedArray%.of`` missing. * ``%TypedArray%.prototype`` is ``%TypedArrayPrototype%`` which Duktape is actually already using. * The ``.buffer`` property should be inherited from ``%TypedArray%.prototype.buffer`` instead of being a concrete property. * The ``.byteLength`` property should be inherited, but is virtual. The difference matters if inheritance relationship is altered. * The ``.byteOffset`` property should be inherited, but is virtual. * The ``.length`` property should be inherited, but is virtual. * ``%TypedArray%.prototype.copyWithin()`` is missing. * ``%TypedArray%.prototype.entries()`` is missing. * ``%TypedArray%.prototype.every()`` is missing. * ``%TypedArray%.prototype.fill()`` is missing. * ``%TypedArray%.prototype.filter()`` is missing. * ``%TypedArray%.prototype.find()`` is missing. * ``%TypedArray%.prototype.findIndex()`` is missing. * ``%TypedArray%.prototype.forEach()`` is missing. * ``%TypedArray%.prototype.indexOf()`` is missing. * ``%TypedArray%.prototype.join()`` is missing. * ``%TypedArray%.prototype.keys()`` is missing. * ``%TypedArray%.prototype.lastIndexOf()`` is missing. * ``%TypedArray%.prototype.map()`` is missing. * ``%TypedArray%.prototype.reduce()`` is missing. * ``%TypedArray%.prototype.reduceRight()`` is missing. * ``%TypedArray%.prototype.reverse()`` is missing. * ``%TypedArray%.prototype.set()`` exists, semantics need to be checked. * ``%TypedArray%.prototype.slice()`` is missing. * ``%TypedArray%.prototype.some()`` is missing. * ``%TypedArray%.prototype.sort()`` is missing. * ``%TypedArray%.prototype.subarray()`` exists, semantics need to be checked. * ``%TypedArray%.prototype.toLocaleString()`` is missing. * ``%TypedArray%.prototype.toString()`` is missing. * ``%TypedArray%.prototype.values()`` is missing. * ``%TypedArray%.prototype[@@iterator]()`` is missing. * ``get %TypedArray%.prototype[@@toStringTag]`` is missing. The initial implementations for some of the missing methods can be the equivalent methods in ``Array.prototype`` with the caveat that ``.length`` should be accessed directly without invoking side effects. For now this would not be an issue because typed array ``.length`` is a virtual own property, and accessing it has no side effects. Update to newer Node.js Buffer API version ------------------------------------------ Current: * https://nodejs.org/docs/v0.12.1/api/buffer.html Latest at time of writing: * https://nodejs.org/docs/v6.9.1/api/buffer.html Gap between current implementation and latest: * Buffers can be for-of iterated; buf.values(), buf.keys(), and buf.entries() create iterators. For-of iteration requires @@iterator support. * ``new Buffer(...)`` is deprecated in favor of ``Buffer.from()``, ``Buffer.alloc()``, and ``Buffer.allocUnsafe()``. * ``new Buffer(arrayBuffer[, byteOffset [, length]])`` variant is not supported. This variant is already deprecated. * ``new Buffer(string[, encoding])`` does not handle encoding correctly. The internal string representation (CESU-8, extended UTF-8, or even invalid UTF-8) is used as buffer bytes as-is. This is also incorrect for Node.js Buffer v0.12.1 (and already incorrect in master). * ``Buffer.alloc(size[, fill[, encoding]])`` is missing. * ``Buffer.allocUnsafe(size)`` is missing. In practice could be implemented as ``Buffer.alloc(size)`` (ignoring any further parameters) so that a zero-filled buffer is allocated. * ``Buffer.allocUnsafeSlow(size)`` is missing. Could be implemented as ``Buffer.alloc(size)`` (ignoring any further parameters). * ``Buffer.byteLength(string[, encoding])`` ignores the encoding argument and just returns the byte length of the internal string representation (CESU-8 typically, but not always). It also doesn't handle buffer, Uint8Array, etc values which now have special handling in v6.9.1. * ``Buffer.from()`` is missing. It can share most code with the constructor. * ``Buffer.isEncoding()`` is implemented but (still) only narrowly recognizes the exact string ``"utf8"``. * ``Buffer.poolSize`` is not provided. The value is meaningless if it's not used by the implementation (i.e. no "unsafe" buffers are implemented). * ``SlowBuffer`` is not implemented; it's part of v0.12.1 and deprecated (but present) in v6.9.1. If deprecated features are supported, it should be implemented. * ``buf.compare()`` has additional arguments in v6.9.1 (source/target indices) which are not implemented. * ``buf.copy()`` return value has been specified explicitly, must compare against current implementation (also for truncated copys). * ``buf.entries()`` missing. Depends on general iterator support. * ``buf.fill()`` has an explicit encoding argument which has an effect if the fill argument is a string. Depends on generally supporting string encodings for Buffer API. * ``buf.indexOf()`` missing. Note that this is not the same as the typed array indexOf() because it recognizes Buffers. * ``buf.includes()`` missing. Note that this is not the same as typed array includes(). * ``buf.keys()`` missing. * ``buf.lastIndexOf()`` missing. Note that this is not the same as typed array lastIndexOf(). * ``buf.length`` writability comments in v6.9.1 may need documentation. * ``buf.readDoubleBE()``, ``buf.writeDoubleBE()`` and all the other read/write accessors seem to be the same in v6.9.1. Duktape doesn't implement the ``noAssert`` argument and always checks the offsets (which should be within the specification because: Setting noAssert to true allows offset to be beyond the end of buf, but the result should be considered undefined behavior. * ``buf.swap16()``, ``buf.swap32()``, ``buf.swap64()`` missing. * ``buf.toString()`` always decodes the buffer using UTF-8 (with replacement characters for invalid sequences), and ignores the encoding argument. * ``buf.values()`` missing. * ``buf.write()`` doesn't implement encoding. In both v0.12.1 and v6.9.1 partially encoded characters won't be written at all so that a few bytes at the end of the buffer may (apparently) be left untouched on a truncated write. Duktape doesn't currently implement this behavior. * ``buffer.INSPECT_MAX_BYTES`` not implemented. It's a property on the ``require('buffer')`` module rather than ``Buffer`` or a ``Buffer`` instance. * ``SlowBuffer`` is not implemented. Other notes: * Deprecated features could be moved behind a config option, e.g. ``DUK_USE_NODEJS_BUFFER_DEPRECATED``. * Node.js Buffer binding should have its own config option, e.g. ``DUK_USE_NODEJS_BUFFER``. Improve consistency of argument coercion ---------------------------------------- For Node.js Buffer bindings there's considerable variation of how arguments are coerced (in both Node.js and Duktape; and these are not always the same now). Improve consistency either by matching Node.js more closely, or by making Duktape specific behavior more consistent with itself. Add support for neutering (detached buffer) ------------------------------------------- Currently not supported. Neutering an ArrayBuffer must also affect all views referencing that ArrayBuffer. Because duk_hbufobj has a direct duk_hbuffer pointer (not a pointer to ArrayBuffer which is stored as .buffer) the neutering cannot be implemented by replacing the duk_hbuffer pointer with zero, as that wouldn't affect all the shared views. Instead, neutering probably needs to be implemented at the plain buffer level; for example, by adding a "neutered" flag to duk_hbuffer. A dynamic buffer can also be resized to zero bytes at neutering time. Another option is to support neutering only when the underlying buffer is dynamic, and simply resize the buffer to zero bytes. This produces much of the required behavior (e.g. zero .byteLength) but not all (e.g. zero .byteOffset). So an explicit neutered check, or a change in data structures, may be necessary. In ES2015 neutering seems to be covered under the name "detached buffer" and many operations on detached buffers (like reads and writes) throw a TypeError which is close to what current code is doing: - See e.g. Step 9 of http://www.ecma-international.org/ecma-262/6.0/index.html#sec-setviewvalue Configurable endianness for TypedArray views -------------------------------------------- Change duk_hbufobj so that it records requested endianness explicitly: host, little, or big endian. Then use the specified endianness in readfield and writefield internal primitives. This should be relatively straightforward to do, and perhaps useful. Allow non-aligned views ----------------------- The ES2015 alignment limitation is not necessary with Duktape because all element accesses are ultimately done using byte-by-byte reads without making any alignment assumptions. Additional arguments to TypedArray .set() ----------------------------------------- It would be nice to be able to specify an offset/length (or offset/end) for a .set() call, so that one could:: v1.set(v2, 5, 10); Currently one needs to do something like:: v1.set(v2.subarray(5, 15)); Additional arguments to TypedArray constructor ---------------------------------------------- It would be nice to have offset/length when constructing a TypedArray from another TypedArray. Node.js .parent property ------------------------ Not currently included in Node.js Buffer instances. Testcase coverage improvements ------------------------------ * Fine-grained tests for argument/this coercion * Property attributes * Object.defineProperty() and Object.getOwnPropertyDescriptor() for virtual properties * Constructing DataView and TypedArray from another view (allowed now but semantics may need improvement) * Node.js Buffer slice() coverage, argument coercion, etc. Low memory support ------------------ Implement low-memory support (16-bit fields, pointer compression, etc) for Buffer objects. Currently buffer objects will have "long" fields. Improve fastint support ----------------------- Improve fastint handling for buffer indices, lengths, values, etc. Unsorted future work -------------------- * Clean up ``duk_hbufobj`` ``buf == NULL`` handling. Perhaps don't allow ``NULL`` at all; this depends on the neutering / detached buffer solution. * Implement and test for integer arithmetic wrap checks e.g. when coercing an index into a byte offset by shifting. * duk_to_buffer(): coerce a Buffer object into a plain buffer value (similarly to how duk_to_string() coerces a String to a plain string)? Slice information will be lost unless a copy is made. * duk_is_buffer(): return true for a Buffer object? For comparison, duk_is_string() returns false for a String object, so returning false might be most consistent. * Other Duktape C API changes to interact with Buffer objects. * Node.js Buffer.isBuffer(): what is the best behavior for plain buffer and other buffer object values? * What to do with Node.js SlowBuffer, INSPECT_MAX_BYTES, and code that does ``require('buffer')``? * Mixing buffer types between APIs: go through the various cases, document, add testcases, etc. * Implement fast path for Node.js Buffer constructor when argument is another duk_hbufobj (now reads indexed properties explicitly). * Duktape C API tests for buffer handling. * Duktape C API test exercising "underlying buffer doesn't cover logical buffer slice" cases which cannot be exercised with plain Ecmascript code. * Document Buffer object relationship to JSON, JX, and JC. * Explicit maximum element and byte size checks for all operations that create new bufferobjects. * Change the TypedArray subarray() implementation to avoid copying the argument internal prototype and use a "default" prototype instead (e.g. Uint8Array.prototype instead of copying the argument internal prototype which may be different).