You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 
Sami Vaarala bcb37439d6 Preparations for 2.4.0 release 5 years ago
..
Makefile Makefile whitespace consistency in extras 7 years ago
README.rst CBOR extra improvements 5 years ago
cbordecode.py Add placeholder CBOR as (unreleased) extra 7 years ago
duk_cbor.c Preparations for 2.4.0 release 5 years ago
duk_cbor.h CBOR extra improvements 7 years ago
jsoncbor.c Use 'ECMAScript' spelling in extras/ 7 years ago
run_testvectors.js CBOR extra improvements 7 years ago

README.rst

============
CBOR binding
============

Overview
========

C functions to encode/decode values in CBOR format, and a simple command
line utility to convert between JSON and CBOR.

To integrate CBOR into your application:

* Call ``duk_cbor_encode()`` and ``duk_cbor_decode()`` directly if a C API
is enough.

* Call ``duk_cbor_init()`` to register a global ``CBOR`` object with
ECMAScript bindings ``CBOR.encode()`` and ``CBOR.decode()``, roughly
matching https://github.com/paroga/cbor-js.

Basic usage of the ``jsoncbor`` conversion tool::

$ make jsoncbor
[...]
$ cat test.json | ./jsoncbor -e # writes CBOR to stdout
$ cat test.cbor | ./jsoncbor -d # writes JSON to stdout

CBOR objects are decoded into ECMAScript objects, with non-string keys
coerced into strings.

Direct support for CBOR is likely to be included in the Duktape API in the
future. This extra will then become unnecessary.

CBOR
====

CBOR is a standard format for JSON-like binary interchange. It is
faster and smaller, and can encode more data types than JSON. In particular,
binary data can be serialized without encoding e.g. in base-64. These
properties make it useful for storing state files, IPC, etc.

Some CBOR shortcomings for preserving information:

* No property attribute or inheritance support.

* No DAGs or looped graphs.

* Array objects with properties lose their non-index properties.

* Array objects with gaps lose their gaps as they read back as undefined.

* Buffer objects and views lose much of their detail besides the raw data.

* ECMAScript strings cannot be fully represented; strings must be UTF-8.

* Functions and native objects lose most of their detail.

* CBOR tags are useful to provide soft decoding information, but the tags
are just integers from an IANA controlled space with no space for custom
tags. So tags cannot be easily used for private, application specific tags.
IANA allows reserving custom tags with little effort however, see
https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml.

Future work
===========

General:

* Add flags to control encode/decode behavior.

* Allow decoding with a trailer so that stream parsing is easier.
Similar change would be useful for JSON decoding.

* Reserve CBOR tag for missing value.

* Reserve other necessary CBOR tags.

* Explicit support for encoding with and without side effects (e.g.
skipping Proxy traps and getters).

* JSON encoding supports .toJSON(), maybe something like .toCBOR()?

* Optimize encoding and decoding more.

Encoding:

* Tagging of typed arrays:
https://datatracker.ietf.org/doc/draft-ietf-cbor-array-tags/.
Mixed endian encode must convert to e.g. little endian because
no mixed endian tag exists.

* Encoding typed arrays as integer arrays instead?

* Float16Array encoding support (once/if supported by main engine).

* Tagging of array gaps, once IANA reservation is complete:
https://github.com/svaarala/duktape/blob/master/doc/cbor-missing-tag.rst.

* Support 64-bit integer when encoding, e.g. up to 2^53?

* Definite-length object encoding even when object has more than 23 keys.

* Map/Set encoding (once supported in the main engine), maybe tagged
so they decode back into Map/Set.

* Bigint encoding (once supported in the main engine), as tagged byte
strings like in Python CBOR.

* String encoding options: combining surrogate pairs, tagging non-UTF-8
byte strings so they decode back to string, using U+FFFD replacement,
etc.

* Detection of Symbols, encode them in a useful tagged form.

* Better encoding of functions.

* Hook for serialization, to allow caller to serialize values (especially
objects) in a context specific manner (e.g. serialize functions with
IPC metadata to allow them to be called remotely). Such a hook should
be able to emit tag(s) to mark custom values for decode processing.

Decoding:

* Typed array decoding support. Should decoder convert to host
endianness?

* Float16Array decoding support (once/if supported by main engine).

* Decoding objects with non-string keys, could be represented as a Map.

* Use bare objects and arrays when decoding?

* Use a Map rather than a plain object when decoding, which would allow
non-string keys.

* Bigint decoding (once supported in the main engine).

* Decoding of non-BMP codepoints into surrogate pairs.

* Decoding of Symbols when call site indicates it is safe.

* Hooking for revival, to allow caller to revive objects in a context
specific manner (e.g. revive serialized function objects into IPC
proxy functions). Such a hook should have access to encoding tags,
so that revival can depend on tags present.

* Option to compact decoded objects and arrays.

* Improve fastint decoding support, e.g. decode non-optimally encoded
integers as fastints, decode compatible floating point values as
fastints.