mirror of https://github.com/svaarala/duktape.git
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
255 lines
7.4 KiB
255 lines
7.4 KiB
<h1 id="customjson">Custom JSON formats</h1>
|
|
|
|
<h2>Ecmascript JSON shortcomings</h2>
|
|
|
|
<p>The standard JSON format has a number of shortcomings when used with
|
|
Ecmascript:</p>
|
|
<ul>
|
|
<li><code>undefined</code> and function values are not supported</li>
|
|
<li>NaN and infinity values are not supported</li>
|
|
<li>Duktape custom types are, of course, not supported</li>
|
|
<li>Codepoints above BMP cannot be represented except as surrogate pairs</li>
|
|
<li>Codepoints above U+10FFFF cannot be represented even as surrogate pairs</li>
|
|
<li>The output is not printable ASCII which is often inconvenient</li>
|
|
</ul>
|
|
|
|
<p>These limitations are part of the Ecmascript specification which
|
|
explicitly prohibits more lenient behavior. Duktape provides two more
|
|
programmer friendly custom JSON format variants: <b>JX</b> and <b>JC</b>,
|
|
described below.</p>
|
|
|
|
<h2>Custom JX format</h2>
|
|
|
|
<p>JX encodes all values in a very readable manner and parses back
|
|
almost all values in a faithful manner (function values being the most
|
|
important exception). Output is pure printable ASCII, codepoints above
|
|
U+FFFF are encoded with a custom escape format, and quotes around object
|
|
keys are omitted in most cases. JX is not JSON compatible but a very
|
|
readable format, most suitable for debugging, logging, etc.</p>
|
|
|
|
<p>JX is used as follows:</p>
|
|
<pre class="ecmascript-code">
|
|
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
|
|
print(Duktape.enc('jx', obj));
|
|
// prints out: {foo:NaN,bar:[1,undefined,3]}
|
|
|
|
var dec = Duktape.dec('jx', '{ foo: 123, bar: undefined, quux: NaN }');
|
|
print(dec.foo, dec.bar, dec.quux);
|
|
// prints out: 123 undefined NaN
|
|
</pre>
|
|
|
|
<h2>Custom JC format</h2>
|
|
|
|
<p>JC encodes all values into standard JSON. Values not supported by
|
|
standard JSON are encoded as objects with a marker key beginning with an
|
|
underscore (e.g. <code>{"_ptr":"0xdeadbeef"}</code>). Such values parse
|
|
back as ordinary objects. However, you can revive them manually more or
|
|
less reliably. Output is pure printable ASCII; codepoints above U+FFFF
|
|
are encoded as plain string data with the format "U+nnnnnnnn"
|
|
(e.g. <code>U+0010fedc</code>).</p>
|
|
|
|
<p>JC is used as follows:</p>
|
|
<pre class="ecmascript-code">
|
|
var obj = { foo: 0/0, bar: [ 1, undefined, 3 ] };
|
|
print(Duktape.enc('jc', obj));
|
|
// prints out: {"foo":{"_nan":true},"bar":[1,{"_undef":true},3]}
|
|
|
|
var dec = Duktape.dec('jc', '{ "foo": 123, "bar": {"_undef":true}, "quux": {"_nan":true} }');
|
|
print(dec.foo, dec.bar, dec.quux);
|
|
// prints out: 123 [object Object] [object Object]
|
|
</pre>
|
|
|
|
<p>The JC decoder is essentially the same as the standard JSON decoder
|
|
at the moment: all JC outputs are valid JSON and no custom syntax is needed.
|
|
As shown in the example, custom values (like <code>{"_undef":true}</code>)
|
|
are <b>not</b> revived automatically. They parse back as ordinary objects
|
|
instead.</p>
|
|
|
|
<h2>Codepoints above U+FFFF and invalid UTF-8 data</h2>
|
|
|
|
<p>All standard Ecmascript strings are valid CESU-8 data internally, so
|
|
behavior for codepoints above U+FFFF never poses compliance issues. However,
|
|
Duktape strings may contain extended UTF-8 codepoints and may even contain
|
|
invalid UTF-8 data.</p>
|
|
|
|
<p>The Duktape JSON implementation, including the standard Ecmascript JSON API,
|
|
use replacement characters to deal with invalid UTF-8 data. The resulting
|
|
string may look a bit odd, but this behavior is preferable to throwing an
|
|
error.</p>
|
|
|
|
<h2>JSON format examples</h2>
|
|
|
|
<p>The table below summarizes how different values encode in each
|
|
encoding:</p>
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Value</th>
|
|
<th>Standard JSON</th>
|
|
<th>JX</th>
|
|
<th>JC</th>
|
|
<th>Notes</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>undefined</td>
|
|
<td>n/a</td>
|
|
<td><code>undefined</code></td>
|
|
<td><code>{"_undef":true}</code></td>
|
|
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
|
|
</tr>
|
|
<tr>
|
|
<td>null</td>
|
|
<td><code>null</code></td>
|
|
<td><code>null</code></td>
|
|
<td><code>null</code></td>
|
|
<td>standard JSON</td>
|
|
</tr>
|
|
<tr>
|
|
<td>true</td>
|
|
<td><code>true</code></td>
|
|
<td><code>true</code></td>
|
|
<td><code>true</code></td>
|
|
<td>standard JSON</td>
|
|
</tr>
|
|
<tr>
|
|
<td>false</td>
|
|
<td><code>false</code></td>
|
|
<td><code>false</code></td>
|
|
<td><code>false</code></td>
|
|
<td>standard JSON</td>
|
|
</tr>
|
|
<tr>
|
|
<td>123.4</td>
|
|
<td><code>123.4</code></td>
|
|
<td><code>123.4</code></td>
|
|
<td><code>123.4</code></td>
|
|
<td>standard JSON</td>
|
|
</tr>
|
|
<tr>
|
|
<td>+0</td>
|
|
<td><code>0</code></td>
|
|
<td><code>0</code></td>
|
|
<td><code>0</code></td>
|
|
<td>standard JSON</td>
|
|
</tr>
|
|
<tr>
|
|
<td>-0</td>
|
|
<td><code>0</code></td>
|
|
<td><code>-0</code></td>
|
|
<td><code>-0</code></td>
|
|
<td>Standard JSON allows <code>-0</code> but serializes negative
|
|
zero as <code>0</code> (losing the sign unnecessarily)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>NaN</td>
|
|
<td><code>null</code></td>
|
|
<td><code>NaN</code></td>
|
|
<td><code>{"_nan":true}</code></td>
|
|
<td>Standard JSON: always encoded as <code>null</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>Infinity</td>
|
|
<td><code>null</code></td>
|
|
<td><code>Infinity</code></td>
|
|
<td><code>{"_inf":true}</code></td>
|
|
<td>Standard JSON: always encoded as <code>null</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>-Infinity</td>
|
|
<td><code>null</code></td>
|
|
<td><code>-Infinity</code></td>
|
|
<td><code>{"_ninf":true}</code></td>
|
|
<td>Standard JSON: always encoded as <code>null</code></td>
|
|
</tr>
|
|
<tr>
|
|
<td>köhä</td>
|
|
<td><code>"köhä"</code></td>
|
|
<td><code>"k\xf6h\xe4"</code></td>
|
|
<td><code>"k\u00f6h\u00e4"</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>U+00FC</td>
|
|
<td><code>"\u00fc"</code></td>
|
|
<td><code>"\xfc"</code></td>
|
|
<td><code>"\u00fc"</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>U+ABCD</td>
|
|
<td><code>"\uabcd"</code></td>
|
|
<td><code>"\uabcd"</code></td>
|
|
<td><code>"\uabcd"</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>U+1234ABCD</td>
|
|
<td><code>"U+1234abcd"</code></td>
|
|
<td><code>"\U1234abcd"</code></td>
|
|
<td><code>"U+1234abcd"</code></td>
|
|
<td>Non-BMP characters are not standard Ecmascript, JX format borrowed from Python</td>
|
|
</tr>
|
|
<tr>
|
|
<td>object</td>
|
|
<td><code>{"my_key":123}</code></td>
|
|
<td><code>{my_key:123}</code></td>
|
|
<td><code>{"my_key":123}</code></td>
|
|
<td>ASCII keys matching identifer requirements encoded without quotes in JX</td>
|
|
</tr>
|
|
<tr>
|
|
<td>array</td>
|
|
<td><code>["foo","bar"]</code></td>
|
|
<td><code>["foo","bar"]</code></td>
|
|
<td><code>["foo","bar"]</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>buffer</td>
|
|
<td><code>n/a</code></td>
|
|
<td><code>|deadbeef|</code></td>
|
|
<td><code>{"_buf":"deadbeef"}</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>pointer</td>
|
|
<td><code>n/a</code></td>
|
|
<td><code>(0xdeadbeef)<br />(DEADBEEF)</code></td>
|
|
<td><code>{"_ptr":"0xdeadbeef"}<br />{"_ptr":"DEADBEEF"}</code></td>
|
|
<td>Representation inside parentheses or quotes is platform specific</td>
|
|
</tr>
|
|
<tr>
|
|
<td>NULL pointer</td>
|
|
<td><code>n/a</code></td>
|
|
<td><code>(null)</code></td>
|
|
<td><code>{"_ptr":"null"}</code></td>
|
|
<td></td>
|
|
</tr>
|
|
<tr>
|
|
<td>function</td>
|
|
<td><code>n/a</code></td>
|
|
<td><code>{_func:true}</code></td>
|
|
<td><code>{"_func":true}</code></td>
|
|
<td>Standard JSON: encoded as <code>null</code> inside arrays, otherwise omitted</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
<h2>Limitations</h2>
|
|
|
|
<p>Some limitations include:</p>
|
|
|
|
<ul>
|
|
<li>Only enumerable own properties are serialized in any of the formats.</li>
|
|
<li>Array properties (other than the entries) are not serialized. This would
|
|
be useful in e.g. logging, e.g. as <code>[1,2,3,"type":"point"]</code>.</li>
|
|
<li>There is no automatic revival of special values when parsing JC data.</li>
|
|
<li>There is no canonical encoding. This would be easy to arrange with a simple
|
|
option to sort object keys during encoding.</li>
|
|
</ul>
|
|
|
|
<p>(See internal documentation for more future work issues.)</p>
|
|
|
|
|