You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 
 

255 lines
7.4 KiB

<h1 id="customjson">Custom JSON formats</h1>
<h2>Ecmascript JSON shortcomings</h2>
<p>The standard JSON format has a number of shortcomings when used with
Ecmascript:</p>
<ul>
<li><code>undefined</code> and function values are not supported</li>
<li>NaN and infinity values are not supported</li>
<li>Duktape custom types are, of course, not supported</li>
<li>Codepoints above BMP cannot be represented except as surrogate pairs</li>
<li>Codepoints above U+10FFFF cannot be represented even as surrogate pairs</li>
<li>The output is not printable ASCII which is often inconvenient</li>
</ul>
<p>These limitations are part of the Ecmascript specification which
explicitly prohibits more lenient behavior. Duktape provides two more
programmer friendly custom JSON format variants: <b>JX</b> and <b>JC</b>,
described below.</p>
<h2>Custom JX format</h2>
<p>JX encodes all values in a very readable manner and parses back
almost all values in a faithful manner (function values being the most
important exception). Output is pure printable ASCII, codepoints above
U+FFFF are encoded with a custom escape format, and quotes around object
keys are omitted in most cases. JX is not JSON compatible but a very
readable format, most suitable for debugging, logging, etc.</p>
<p>JX is used as follows:</p>
<pre class="ecmascript-code">
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.enc('jx', obj));
// prints out: {foo:NaN,bar:[1,undefined,3]}
var dec = Duktape.dec('jx', '{ foo: 123, bar: undefined, quux: NaN }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 undefined NaN
</pre>
<h2>Custom JC format</h2>
<p>JC encodes all values into standard JSON. Values not supported by
standard JSON are encoded as objects with a marker key beginning with an
underscore (e.g. <code>{"_ptr":"0xdeadbeef"}</code>). Such values parse
back as ordinary objects. However, you can revive them manually more or
less reliably. Output is pure printable ASCII; codepoints above U+FFFF
are encoded as plain string data with the format "U+nnnnnnnn"
(e.g. <code>U+0010fedc</code>).</p>
<p>JC is used as follows:</p>
<pre class="ecmascript-code">
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
print(Duktape.enc('jc', obj));
// prints out: {"foo":{"_nan":true},"bar":[1,{"_undef":true},3]}
var dec = Duktape.dec('jc', '{ "foo": 123, "bar": {"_undef":true}, "quux": {"_nan":true} }');
print(dec.foo, dec.bar, dec.quux);
// prints out: 123 [object Object] [object Object]
</pre>
<p>The JC decoder is essentially the same as the standard JSON decoder
at the moment: all JC outputs are valid JSON and no custom syntax is needed.
As shown in the example, custom values (like <code>{"_undef":true}</code>)
are <b>not</b> revived automatically. They parse back as ordinary objects
instead.</p>
<h2>Codepoints above U+FFFF and invalid UTF-8 data</h2>
<p>All standard Ecmascript strings are valid CESU-8 data internally, so
behavior for codepoints above U+FFFF never poses compliance issues. However,
Duktape strings may contain extended UTF-8 codepoints and may even contain
invalid UTF-8 data.</p>
<p>The Duktape JSON implementation, including the standard Ecmascript JSON API,
use replacement characters to deal with invalid UTF-8 data. The resulting
string may look a bit odd, but this behavior is preferable to throwing an
error.</p>
<h2>JSON format examples</h2>
<p>The table below summarizes how different values encode in each
encoding:</p>
<table>
<thead>
<tr>
<th>Value</th>
<th>Standard JSON</th>
<th>JX</th>
<th>JC</th>
<th>Notes</th>
</tr>
</thead>
<tbody>
<tr>
<td>undefined</td>
<td>n/a</td>
<td><code>undefined</code></td>
<td><code>{"_undef":true}</code></td>
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
</tr>
<tr>
<td>null</td>
<td><code>null</code></td>
<td><code>null</code></td>
<td><code>null</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>true</td>
<td><code>true</code></td>
<td><code>true</code></td>
<td><code>true</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>false</td>
<td><code>false</code></td>
<td><code>false</code></td>
<td><code>false</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>123.4</td>
<td><code>123.4</code></td>
<td><code>123.4</code></td>
<td><code>123.4</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>+0</td>
<td><code>0</code></td>
<td><code>0</code></td>
<td><code>0</code></td>
<td>standard JSON</td>
</tr>
<tr>
<td>-0</td>
<td><code>0</code></td>
<td><code>-0</code></td>
<td><code>-0</code></td>
<td>Standard JSON allows <code>-0</code> but serializes negative
zero as <code>0</code> (losing the sign unnecessarily)</td>
</tr>
<tr>
<td>NaN</td>
<td><code>null</code></td>
<td><code>NaN</code></td>
<td><code>{"_nan":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
<td></td>
</tr>
<tr>
<td>Infinity</td>
<td><code>null</code></td>
<td><code>Infinity</code></td>
<td><code>{"_inf":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
</tr>
<tr>
<td>-Infinity</td>
<td><code>null</code></td>
<td><code>-Infinity</code></td>
<td><code>{"_ninf":true}</code></td>
<td>Standard JSON: always encoded as <code>null</code></td>
</tr>
<tr>
<td>k&#xf6;h&#xe4;</td>
<td><code>"k&#xf6;h&#xe4;"</code></td>
<td><code>"k\xf6h\xe4"</code></td>
<td><code>"k\u00f6h\u00e4"</code></td>
<td></td>
</tr>
<tr>
<td>U+00FC</td>
<td><code>"\u00fc"</code></td>
<td><code>"\xfc"</code></td>
<td><code>"\u00fc"</code></td>
<td></td>
</tr>
<tr>
<td>U+ABCD</td>
<td><code>"\uabcd"</code></td>
<td><code>"\uabcd"</code></td>
<td><code>"\uabcd"</code></td>
<td></td>
</tr>
<tr>
<td>U+1234ABCD</td>
<td><code>"U+1234abcd"</code></td>
<td><code>"\U1234abcd"</code></td>
<td><code>"U+1234abcd"</code></td>
<td>Non-BMP characters are not standard Ecmascript, JX format borrowed from Python</td>
</tr>
<tr>
<td>object</td>
<td><code>{"my_key":123}</code></td>
<td><code>{my_key:123}</code></td>
<td><code>{"my_key":123}</code></td>
<td>ASCII keys matching identifer requirements encoded without quotes in JX</td>
</tr>
<tr>
<td>array</td>
<td><code>["foo","bar"]</code></td>
<td><code>["foo","bar"]</code></td>
<td><code>["foo","bar"]</code></td>
<td></td>
</tr>
<tr>
<td>buffer</td>
<td><code>n/a</code></td>
<td><code>|deadbeef|</code></td>
<td><code>{"_buf":"deadbeef"}</code></td>
<td></td>
</tr>
<tr>
<td>pointer</td>
<td><code>n/a</code></td>
<td><code>(0xdeadbeef)<br />(DEADBEEF)</code></td>
<td><code>{"_ptr":"0xdeadbeef"}<br />{"_ptr":"DEADBEEF"}</code></td>
<td>Representation inside parentheses or quotes is platform specific</td>
</tr>
<tr>
<td>NULL pointer</td>
<td><code>n/a</code></td>
<td><code>(null)</code></td>
<td><code>{"_ptr":"null"}</code></td>
<td></td>
</tr>
<tr>
<td>function</td>
<td><code>n/a</code></td>
<td><code>{_func:true}</code></td>
<td><code>{"_func":true}</code></td>
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
</tr>
</tbody>
</table>
<h2>Limitations</h2>
<p>Some limitations include:</p>
<ul>
<li>Only enumerable own properties are serialized in any of the formats.</li>
<li>Array properties (other than the entries) are not serialized. This would
be useful in e.g. logging, e.g. as <code>[1,2,3,"type":"point"]</code>.</li>
<li>There is no automatic revival of special values when parsing JC data.</li>
<li>There is no canonical encoding. This would be easy to arrange with a simple
option to sort object keys during encoding.</li>
</ul>
<p>(See internal documentation for more future work issues.)</p>