Custom JSON formats

Ecmascript JSON shortcomings

The standard JSON format has a number of shortcomings when used with Ecmascript:

undefined and function values are not supported
NaN and infinity values are not supported
Duktape custom types are, of course, not supported
Codepoints above BMP cannot be represented except as surrogate pairs
Codepoints above U+10FFFF cannot be represented even as surrogate pairs
The output is not printable ASCII which is often inconvenient

These limitations are part of the Ecmascript specification which explicitly prohibits more lenient behavior. Duktape provides two more programmer friendly custom JSON format variants: JSONX and JSONC, described below.

Custom JSONX format

JSONX encodes all values in a very readable manner and parses back almost all values in a faithful manner (function values being the most important exception). Output is pure printable ASCII, codepoints above U+FFFF are encoded with a custom escape format, and quotes around object keys are omitted in most cases. JSONX is not JSON compatible but a very readable format, most suitable for debugging, logging, etc.

JSONX is used as follows:

var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.jxEnc(obj));
// prints out: {foo:NaN,bar:[1,undefined,3]}

var dec = Duktape.jxDec('{ foo: 123, bar: undefined, quux: NaN }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 undefined NaN

Custom JSONC format

JSONC encodes all values into standard JSON. Values not supported by standard JSON are encoded as objects with a marker key beginning with an underscore (e.g. {"_ptr":"0xdeadbeef"}). Such values parse back as ordinary objects. However, you can revive them manually more or less reliably. Output is pure printable ASCII; codepoints above U+FFFF are encoded as plain string data with the format "U+nnnnnnnn" (e.g. U+0010fedc).

JSONC is used as follows:

var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.jcEnc(obj));
// prints out: {"foo":{"_nan":true},"bar":[1,{"_undef":true},3]}

var dec = Duktape.jcDec('{ "foo": 123, "bar": {"_undef":true}, "quux": {"_nan":true} }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 [object Object] [object Object]

The JSONC decoder is essentially the same as the standard JSON decoder at the moment: all JSONC outputs are valid JSON and no custom syntax is needed. As shown in the example, custom values (like {"_undef":true}) are not revived automatically. They parse back as ordinary objects instead.

Codepoints above U+FFFF and invalid UTF-8 data

All standard Ecmascript strings are valid CESU-8 data internally, so behavior for codepoints above U+FFFF never poses compliance issues. However, Duktape strings may contain extended UTF-8 codepoints and may even contain invalid UTF-8 data.

The Duktape JSON implementation, including the standard Ecmascript JSON API, use replacement characters to deal with invalid UTF-8 data. The resulting string may look a bit odd, but this behavior is preferable to throwing an error.

JSON format examples

The table below summarizes how different values encode in each encoding:

Value	Standard JSON	JSONX	JSONC	Notes
undefined	n/a	`undefined`	`{"_undef":true}`	Standard JSON: encoded as `null` inside arrays, otherwise omitted
null	`null`	`null`	`null`	standard JSON
true	`true`	`true`	`true`	standard JSON
false	`false`	`false`	`false`	standard JSON
123.4	`123.4`	`123.4`	`123.4`	standard JSON
NaN	`null`	`NaN`	`{"_nan":true}`	Standard JSON: always encoded as `null`
Infinity	`null`	`Infinity`	`{"_inf":true}`	Standard JSON: always encoded as `null`
-Infinity	`null`	`-Infinity`	`{"_ninf":true}`	Standard JSON: always encoded as `null`
köhä	`"köhä"`	`"k\xf6h\xe4"`	`"k\u00f6h\u00e4"`
U+00FC	`"\u00fc"`	`"\xfc"`	`"\u00fc"`
U+ABCD	`"\uabcd"`	`"\uabcd"`	`"\uabcd"`
U+1234ABCD	`"U+1234abcd"`	`"\U1234abcd"`	`"U+1234abcd"`	Non-BMP characters are not standard Ecmascript, JSONX format borrowed from Python
object	`{"my_key":123}`	`{my_key:123}`	`{"my_key":123}`	ASCII keys matching identifer requirements encoded without quotes in JSONX
array	`["foo","bar"]`	`["foo","bar"]`	`["foo","bar"]`
buffer	`n/a`	`\|deadbeef\|`	`{"_buf":"deadbeef"}`
pointer	`n/a`	`(0xdeadbeef) (DEADBEEF)`	`{"_ptr":"0xdeadbeef"} {"_ptr":"DEADBEEF"}`	Representation inside parentheses or quotes is platform specific
NULL pointer	`n/a`	`(null)`	`{"_ptr":"null"}`
function	`n/a`	`{_func:true}`	`{"_func":true}`	Standard JSON: encoded as `null` inside arrays, otherwise omitted

Limitations

Some limitations include:

Only enumerable own properties are serialized in any of the formats.
Array properties (other than the entries) are not serialized. This would be useful in e.g. logging, e.g. as [1,2,3,"type":"point"].
There is no automatic revival of special values when parsing JSONC data.
There is no canonical encoding. This would be easy to arrange with a simple option to sort object keys during encoding.

(See internal documentation for more future work issues.)