mirror of https://github.com/svaarala/duktape.git
Sami Vaarala
11 years ago
4 changed files with 274 additions and 16 deletions
@ -0,0 +1,199 @@ |
|||
========== |
|||
Test cases |
|||
========== |
|||
|
|||
Introduction |
|||
============ |
|||
|
|||
There are two separate test case sets for Duktape: |
|||
|
|||
1. Ecmascript test cases for testing Ecmascript compliance |
|||
|
|||
2. Duktape API test cases for testing that the exposed user API works |
|||
|
|||
Ecmascript test cases |
|||
===================== |
|||
|
|||
How to test? |
|||
------------ |
|||
|
|||
There are many unit testing frameworks for Ecmascript such as `Jasmine`_ |
|||
(see also `List of unit testing frameworks`_). However, when testing an |
|||
Ecmascript *implementation*, a testing framework cannot always assume |
|||
that even simple language features like functions or exceptions work |
|||
correctly. |
|||
|
|||
How to do automatic testing then? |
|||
|
|||
.. _Jasmine: http://pivotal.github.com/jasmine/ |
|||
.. _List of unit testing frameworks: http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#JavaScript |
|||
|
|||
The current solution is to run an Ecmascript test case file with a command |
|||
line interpreter and compare the resulting ``stdout`` text to expected. |
|||
Control information, including expected ``stdout`` results, are embedded |
|||
into Ecmascript comments which the test runner parses. |
|||
|
|||
The intent of the test cases is to test various features of the implementation |
|||
against the specification *and real world behavior*. Thus, the tests are |
|||
*not* intended to be strict conformance tests: implementation specific |
|||
features and internal behavior are also covered by tests. However, whenever |
|||
possible, test output can be compared to output of other Ecmascript engines, |
|||
currently: Rhino, NodeJS (V8), and Smjs. |
|||
|
|||
Test case scripts write their output using the ``print()`` function. If |
|||
``print()`` is not available for a particular interpretation (as is the case |
|||
with NodeJS), a prologue defining it is injected. |
|||
|
|||
Test case format |
|||
---------------- |
|||
|
|||
Test cases are plain Ecmascript files ending with the extension ``.js`` with |
|||
special markup inside comments. |
|||
|
|||
Example:: |
|||
|
|||
/* |
|||
* Example test. |
|||
* |
|||
* Expected result is delimited as follows; the expected response |
|||
* here is "hello world\n". |
|||
*/ |
|||
|
|||
/*--- |
|||
{ |
|||
"slow": false, |
|||
"_comment": "optional metadata is encoded as a single JSON object" |
|||
} |
|||
---*/ |
|||
|
|||
/*=== |
|||
hello world |
|||
===*/ |
|||
|
|||
if (1) { |
|||
print("hello world"); /* automatic newline */ |
|||
} else { |
|||
print("not quite"); |
|||
} |
|||
|
|||
/*=== |
|||
second test |
|||
===*/ |
|||
|
|||
/* there can be multiple "expected" blocks (but only one metadata block) */ |
|||
print("second test"); |
|||
|
|||
The metadata block and all metadata keys are optional. Boolean flags |
|||
default to false if metadata block or the key is not present. Current |
|||
metadata keys: |
|||
|
|||
* ``slow``: if true, test is slow and increased timelimits are applied |
|||
to avoid incorrect timeout errors. |
|||
|
|||
* ``skip``: if true, test is not finished yet, and a failure is not |
|||
counted towards failcount. |
|||
|
|||
* ``custom``: if true, some implementation dependent features are tested, |
|||
and comparison to other Ecmascript engines is not relevant. |
|||
|
|||
Practices |
|||
--------- |
|||
|
|||
Indentation |
|||
::::::::::: |
|||
|
|||
Indent with space, 4 spaces. |
|||
|
|||
Verifying exception type |
|||
:::::::::::::::::::::::: |
|||
|
|||
Since Ecmascript doesn't require specific error messages for errors |
|||
thrown, the messages should not be inspected or printed out in test |
|||
cases. Ecmascript does require specific error types though (such as |
|||
``TypeError``. These can be verified by printing the ``name`` |
|||
property of an error object. |
|||
|
|||
For instance:: |
|||
|
|||
try { |
|||
null.foo = 1; |
|||
} catch (e) { |
|||
print(e.name); |
|||
} |
|||
|
|||
prints:: |
|||
|
|||
TypeError |
|||
|
|||
When an error is not supposed to occur in a successful test run, the |
|||
exception message can (and should) be printed, as it makes it easier |
|||
to resolve a failing test case. This can be done most easily as:: |
|||
|
|||
try { |
|||
null.foo = 1; |
|||
} catch (e) { |
|||
print(e); |
|||
} |
|||
|
|||
Test cases |
|||
---------- |
|||
|
|||
Test cases filenames consist of lowercase words delimited by dashes, e.g.:: |
|||
|
|||
test-stmt-trycatch.js |
|||
|
|||
The first part of each test case is ``test``. The second part indicates a |
|||
major test category. The test categories are not very strictly defined, and |
|||
there is currently no tracking of specification coverage. |
|||
|
|||
Test cases starting with ``test-dev-`` are development time test cases |
|||
which demonstrate a particular issue and may not be very well documented. |
|||
|
|||
Test cases starting with ``test-dev-bug-`` illustrate a particular |
|||
development time bug which has usually already been fixed. |
|||
|
|||
Duktape API test cases |
|||
====================== |
|||
|
|||
Test case format |
|||
---------------- |
|||
|
|||
Test case files are C files with a ``test()`` function. The test function |
|||
gets as its argument an already initialized ``duk_context *`` and print out |
|||
text to ``stdout``. The test case can assume ``duktape.h`` and common headers |
|||
like ``stdio.h`` have been included. There are also some predefined macros |
|||
(like ``TEST_SAFE_CALL()`` and ``TEST_PCALL()``) to minimize duplication in |
|||
test case code. |
|||
|
|||
Expected output is defined as for Ecmascript test cases. There is currently |
|||
no metadata. |
|||
|
|||
Example:: |
|||
|
|||
/*=== |
|||
Hello world from Ecmascript! |
|||
Hello world from C! |
|||
===*/ |
|||
|
|||
void test(duk_context *ctx) { |
|||
duk_push_string("print('Hello world from Ecmascript!');"); |
|||
duk_eval(ctx); |
|||
printf("Hello world from C!\n"); |
|||
} |
|||
|
|||
Test runner |
|||
=========== |
|||
|
|||
The current test runner is a NodeJS program which handles both Ecmascript |
|||
and API testcases. See ``runtests/runtests.js``. |
|||
|
|||
Future work |
|||
=========== |
|||
|
|||
* Put test cases in a directory hierarchy instead (``test/stmt/trycatch.js``), |
|||
perhaps scales better (at the expense of adding hassle to e.g. grepping). |
|||
|
|||
* Keep simple input-output model but add includes. There is a lot of |
|||
boilerplate now for basic things like dumping descriptors. |
|||
|
|||
|
@ -0,0 +1,60 @@ |
|||
========================= |
|||
URI encoding and decoding |
|||
========================= |
|||
|
|||
Specification notes |
|||
=================== |
|||
|
|||
Reserved set / unescaped set |
|||
---------------------------- |
|||
|
|||
The "unescaped set" for encoding and the "reserved set" for decoding always |
|||
consist of only ASCII codepoints. Thus comparing codepoints against the sets |
|||
should only be necessary when processing ASCII range characters. |
|||
|
|||
When encoding, step 4.c will catch characters in the "unescaped set" and |
|||
encode them as-is into the output. Note that these can only be single-byte |
|||
ASCII characters. If we go to step 4.d, the codepoint may either be ASCII |
|||
or non-ASCII, and will be escaped regardless. |
|||
|
|||
When decoding percent escaped codepoints, one-byte encoded codepoints (i.e. |
|||
ASCII) are checked in step 4.d.vi; multi-byte encoded codepoints in the BMP |
|||
range are checked in step 4.d.vii but codepoints above BMP are not checked. |
|||
|
|||
Apparently the idea here is to ensure no characters in the reserved set are |
|||
decoded from percent escapes even if invalid UTF-8 (non-shortest) encodings |
|||
are allowed. Because characters above BMP are encoded with surrogate pairs, |
|||
the formula for surrogate pairs ensures that the codepoint cannot be below |
|||
U+00010000 (0x10000 is added to the surrogate pair bits), and thus no check |
|||
against the "reserved set" is needed. |
|||
|
|||
However, at the end of Section 15.1.3: |
|||
|
|||
RFC 3629 prohibits the decoding of invalid UTF-8 octet sequences. For |
|||
example, the invalid sequence C0 80 must not decode into the character |
|||
U+0000. Implementations of the Decode algorithm are required to throw a |
|||
URIError when encountering such invalid sequences. |
|||
|
|||
Because "reserved set" / "unescaped set" always consists of only ASCII |
|||
codepoints, the check in step 4.d.vii should not be necessary. The UTF-8 |
|||
validity check happens in step 4.d.vii.8. |
|||
|
|||
Decoding characters outside BMP |
|||
------------------------------- |
|||
|
|||
The URI decoding algorithm requires that UTF-8 encoded codepoints consisting |
|||
of more than 4 encoded bytes are rejected. 4 byte encoding contains 21 bits, |
|||
so the maximum codepoint which can be expressed is U+1FFFFF. However, since |
|||
the bytes must also be valid UTF-8 (step 4.d.vii.8) the highest allowed |
|||
codepoint is actually U+10FFFF. |
|||
|
|||
It would be nice to be able to: |
|||
|
|||
* decode higher codepoints because Duktape can represent them |
|||
|
|||
* decode codepoints up to U+10FFFF without surrogate pairs |
|||
|
|||
Because the API requirements are strict, these cannot be added to the standard |
|||
API without breaking compliance. Custom URI encoding/decoding functions could |
|||
provide these extended semantics. |
|||
|
Loading…
Reference in new issue