mirror of https://github.com/svaarala/duktape.git
Sami Vaarala
11 years ago
4 changed files with 274 additions and 16 deletions
@ -0,0 +1,199 @@ |
|||||
|
========== |
||||
|
Test cases |
||||
|
========== |
||||
|
|
||||
|
Introduction |
||||
|
============ |
||||
|
|
||||
|
There are two separate test case sets for Duktape: |
||||
|
|
||||
|
1. Ecmascript test cases for testing Ecmascript compliance |
||||
|
|
||||
|
2. Duktape API test cases for testing that the exposed user API works |
||||
|
|
||||
|
Ecmascript test cases |
||||
|
===================== |
||||
|
|
||||
|
How to test? |
||||
|
------------ |
||||
|
|
||||
|
There are many unit testing frameworks for Ecmascript such as `Jasmine`_ |
||||
|
(see also `List of unit testing frameworks`_). However, when testing an |
||||
|
Ecmascript *implementation*, a testing framework cannot always assume |
||||
|
that even simple language features like functions or exceptions work |
||||
|
correctly. |
||||
|
|
||||
|
How to do automatic testing then? |
||||
|
|
||||
|
.. _Jasmine: http://pivotal.github.com/jasmine/ |
||||
|
.. _List of unit testing frameworks: http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#JavaScript |
||||
|
|
||||
|
The current solution is to run an Ecmascript test case file with a command |
||||
|
line interpreter and compare the resulting ``stdout`` text to expected. |
||||
|
Control information, including expected ``stdout`` results, are embedded |
||||
|
into Ecmascript comments which the test runner parses. |
||||
|
|
||||
|
The intent of the test cases is to test various features of the implementation |
||||
|
against the specification *and real world behavior*. Thus, the tests are |
||||
|
*not* intended to be strict conformance tests: implementation specific |
||||
|
features and internal behavior are also covered by tests. However, whenever |
||||
|
possible, test output can be compared to output of other Ecmascript engines, |
||||
|
currently: Rhino, NodeJS (V8), and Smjs. |
||||
|
|
||||
|
Test case scripts write their output using the ``print()`` function. If |
||||
|
``print()`` is not available for a particular interpretation (as is the case |
||||
|
with NodeJS), a prologue defining it is injected. |
||||
|
|
||||
|
Test case format |
||||
|
---------------- |
||||
|
|
||||
|
Test cases are plain Ecmascript files ending with the extension ``.js`` with |
||||
|
special markup inside comments. |
||||
|
|
||||
|
Example:: |
||||
|
|
||||
|
/* |
||||
|
* Example test. |
||||
|
* |
||||
|
* Expected result is delimited as follows; the expected response |
||||
|
* here is "hello world\n". |
||||
|
*/ |
||||
|
|
||||
|
/*--- |
||||
|
{ |
||||
|
"slow": false, |
||||
|
"_comment": "optional metadata is encoded as a single JSON object" |
||||
|
} |
||||
|
---*/ |
||||
|
|
||||
|
/*=== |
||||
|
hello world |
||||
|
===*/ |
||||
|
|
||||
|
if (1) { |
||||
|
print("hello world"); /* automatic newline */ |
||||
|
} else { |
||||
|
print("not quite"); |
||||
|
} |
||||
|
|
||||
|
/*=== |
||||
|
second test |
||||
|
===*/ |
||||
|
|
||||
|
/* there can be multiple "expected" blocks (but only one metadata block) */ |
||||
|
print("second test"); |
||||
|
|
||||
|
The metadata block and all metadata keys are optional. Boolean flags |
||||
|
default to false if metadata block or the key is not present. Current |
||||
|
metadata keys: |
||||
|
|
||||
|
* ``slow``: if true, test is slow and increased timelimits are applied |
||||
|
to avoid incorrect timeout errors. |
||||
|
|
||||
|
* ``skip``: if true, test is not finished yet, and a failure is not |
||||
|
counted towards failcount. |
||||
|
|
||||
|
* ``custom``: if true, some implementation dependent features are tested, |
||||
|
and comparison to other Ecmascript engines is not relevant. |
||||
|
|
||||
|
Practices |
||||
|
--------- |
||||
|
|
||||
|
Indentation |
||||
|
::::::::::: |
||||
|
|
||||
|
Indent with space, 4 spaces. |
||||
|
|
||||
|
Verifying exception type |
||||
|
:::::::::::::::::::::::: |
||||
|
|
||||
|
Since Ecmascript doesn't require specific error messages for errors |
||||
|
thrown, the messages should not be inspected or printed out in test |
||||
|
cases. Ecmascript does require specific error types though (such as |
||||
|
``TypeError``. These can be verified by printing the ``name`` |
||||
|
property of an error object. |
||||
|
|
||||
|
For instance:: |
||||
|
|
||||
|
try { |
||||
|
null.foo = 1; |
||||
|
} catch (e) { |
||||
|
print(e.name); |
||||
|
} |
||||
|
|
||||
|
prints:: |
||||
|
|
||||
|
TypeError |
||||
|
|
||||
|
When an error is not supposed to occur in a successful test run, the |
||||
|
exception message can (and should) be printed, as it makes it easier |
||||
|
to resolve a failing test case. This can be done most easily as:: |
||||
|
|
||||
|
try { |
||||
|
null.foo = 1; |
||||
|
} catch (e) { |
||||
|
print(e); |
||||
|
} |
||||
|
|
||||
|
Test cases |
||||
|
---------- |
||||
|
|
||||
|
Test cases filenames consist of lowercase words delimited by dashes, e.g.:: |
||||
|
|
||||
|
test-stmt-trycatch.js |
||||
|
|
||||
|
The first part of each test case is ``test``. The second part indicates a |
||||
|
major test category. The test categories are not very strictly defined, and |
||||
|
there is currently no tracking of specification coverage. |
||||
|
|
||||
|
Test cases starting with ``test-dev-`` are development time test cases |
||||
|
which demonstrate a particular issue and may not be very well documented. |
||||
|
|
||||
|
Test cases starting with ``test-dev-bug-`` illustrate a particular |
||||
|
development time bug which has usually already been fixed. |
||||
|
|
||||
|
Duktape API test cases |
||||
|
====================== |
||||
|
|
||||
|
Test case format |
||||
|
---------------- |
||||
|
|
||||
|
Test case files are C files with a ``test()`` function. The test function |
||||
|
gets as its argument an already initialized ``duk_context *`` and print out |
||||
|
text to ``stdout``. The test case can assume ``duktape.h`` and common headers |
||||
|
like ``stdio.h`` have been included. There are also some predefined macros |
||||
|
(like ``TEST_SAFE_CALL()`` and ``TEST_PCALL()``) to minimize duplication in |
||||
|
test case code. |
||||
|
|
||||
|
Expected output is defined as for Ecmascript test cases. There is currently |
||||
|
no metadata. |
||||
|
|
||||
|
Example:: |
||||
|
|
||||
|
/*=== |
||||
|
Hello world from Ecmascript! |
||||
|
Hello world from C! |
||||
|
===*/ |
||||
|
|
||||
|
void test(duk_context *ctx) { |
||||
|
duk_push_string("print('Hello world from Ecmascript!');"); |
||||
|
duk_eval(ctx); |
||||
|
printf("Hello world from C!\n"); |
||||
|
} |
||||
|
|
||||
|
Test runner |
||||
|
=========== |
||||
|
|
||||
|
The current test runner is a NodeJS program which handles both Ecmascript |
||||
|
and API testcases. See ``runtests/runtests.js``. |
||||
|
|
||||
|
Future work |
||||
|
=========== |
||||
|
|
||||
|
* Put test cases in a directory hierarchy instead (``test/stmt/trycatch.js``), |
||||
|
perhaps scales better (at the expense of adding hassle to e.g. grepping). |
||||
|
|
||||
|
* Keep simple input-output model but add includes. There is a lot of |
||||
|
boilerplate now for basic things like dumping descriptors. |
||||
|
|
||||
|
|
@ -0,0 +1,60 @@ |
|||||
|
========================= |
||||
|
URI encoding and decoding |
||||
|
========================= |
||||
|
|
||||
|
Specification notes |
||||
|
=================== |
||||
|
|
||||
|
Reserved set / unescaped set |
||||
|
---------------------------- |
||||
|
|
||||
|
The "unescaped set" for encoding and the "reserved set" for decoding always |
||||
|
consist of only ASCII codepoints. Thus comparing codepoints against the sets |
||||
|
should only be necessary when processing ASCII range characters. |
||||
|
|
||||
|
When encoding, step 4.c will catch characters in the "unescaped set" and |
||||
|
encode them as-is into the output. Note that these can only be single-byte |
||||
|
ASCII characters. If we go to step 4.d, the codepoint may either be ASCII |
||||
|
or non-ASCII, and will be escaped regardless. |
||||
|
|
||||
|
When decoding percent escaped codepoints, one-byte encoded codepoints (i.e. |
||||
|
ASCII) are checked in step 4.d.vi; multi-byte encoded codepoints in the BMP |
||||
|
range are checked in step 4.d.vii but codepoints above BMP are not checked. |
||||
|
|
||||
|
Apparently the idea here is to ensure no characters in the reserved set are |
||||
|
decoded from percent escapes even if invalid UTF-8 (non-shortest) encodings |
||||
|
are allowed. Because characters above BMP are encoded with surrogate pairs, |
||||
|
the formula for surrogate pairs ensures that the codepoint cannot be below |
||||
|
U+00010000 (0x10000 is added to the surrogate pair bits), and thus no check |
||||
|
against the "reserved set" is needed. |
||||
|
|
||||
|
However, at the end of Section 15.1.3: |
||||
|
|
||||
|
RFC 3629 prohibits the decoding of invalid UTF-8 octet sequences. For |
||||
|
example, the invalid sequence C0 80 must not decode into the character |
||||
|
U+0000. Implementations of the Decode algorithm are required to throw a |
||||
|
URIError when encountering such invalid sequences. |
||||
|
|
||||
|
Because "reserved set" / "unescaped set" always consists of only ASCII |
||||
|
codepoints, the check in step 4.d.vii should not be necessary. The UTF-8 |
||||
|
validity check happens in step 4.d.vii.8. |
||||
|
|
||||
|
Decoding characters outside BMP |
||||
|
------------------------------- |
||||
|
|
||||
|
The URI decoding algorithm requires that UTF-8 encoded codepoints consisting |
||||
|
of more than 4 encoded bytes are rejected. 4 byte encoding contains 21 bits, |
||||
|
so the maximum codepoint which can be expressed is U+1FFFFF. However, since |
||||
|
the bytes must also be valid UTF-8 (step 4.d.vii.8) the highest allowed |
||||
|
codepoint is actually U+10FFFF. |
||||
|
|
||||
|
It would be nice to be able to: |
||||
|
|
||||
|
* decode higher codepoints because Duktape can represent them |
||||
|
|
||||
|
* decode codepoints up to U+10FFFF without surrogate pairs |
||||
|
|
||||
|
Because the API requirements are strict, these cannot be added to the standard |
||||
|
API without breaking compliance. Custom URI encoding/decoding functions could |
||||
|
provide these extended semantics. |
||||
|
|
Loading…
Reference in new issue