add some internal docs, improve READMEs, fix dist script

11 years ago · 09d5d5e7fd
4 changed files with 274 additions and 16 deletions
--- a/README.txt.dist
+++ b/README.txt.dist
@ -2,12 +2,12 @@
 Duktape
 =======

-Duktape is a small and portable Ecmascript E5/E5.1 implementation.
-It is intended to be easily embeddable into C programs, with a C API
-similar in spirit to Lua's.
+Duktape is a small and portable Ecmascript E5/E5.1 implementation.  It is
+intended to be easily embeddable into C programs, with a C API similar in
+spirit to Lua's.

-The goal is to support the full E5 feature set like Unicode strings
-and regular expressions.  Other feature highlights include:
+The goal is to support the full E5 feature set like Unicode strings and
+regular expressions.  Other feature highlights include:

  * Custom types (like pointers and buffers) for C integration

@ -45,13 +45,13 @@ To build an example command line tool, use the following::
  Hello world!
  = undefined

-The source code should currently compile cleanly on Linux and OSX
-(Darwin), for both x86 and ARM.  The goal is of course to compile
-on almost any reasonable platform.
+The source code should currently compile cleanly on Linux, OSX (Darwin), and
+FreeBSD, for both x86 and ARM.  The goal is of course to compile on almost
+any reasonable platform.

-There is a separate tar ball for developing Duktape: it contains
-internal documentation and unit tests which are not necessary to
-use Duktape.
+There is a separate tar ball ("full distribution") for developing Duktape.
+It contains internal documentation and unit tests which are not necessary
+to use Duktape.

 Duktape is licensed under the MIT license (see ``LICENSE.txt``).
 MurmurHash2 is used internally; it is also under the MIT license.
--- a/doc/testcases.txt
+++ b/doc/testcases.txt
@ -0,0 +1,199 @@
+==========
+Test cases
+==========
+
+Introduction
+============
+
+There are two separate test case sets for Duktape:
+
+1. Ecmascript test cases for testing Ecmascript compliance
+
+2. Duktape API test cases for testing that the exposed user API works
+
+Ecmascript test cases
+=====================
+
+How to test?
+------------
+
+There are many unit testing frameworks for Ecmascript such as `Jasmine`_
+(see also `List of unit testing frameworks`_).  However, when testing an
+Ecmascript *implementation*, a testing framework cannot always assume
+that even simple language features like functions or exceptions work
+correctly.
+
+How to do automatic testing then?
+
+.. _Jasmine: http://pivotal.github.com/jasmine/
+.. _List of unit testing frameworks: http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#JavaScript
+
+The current solution is to run an Ecmascript test case file with a command
+line interpreter and compare the resulting ``stdout`` text to expected.
+Control information, including expected ``stdout`` results, are embedded
+into Ecmascript comments which the test runner parses.
+
+The intent of the test cases is to test various features of the implementation
+against the specification *and real world behavior*.  Thus, the tests are
+*not* intended to be strict conformance tests: implementation specific
+features and internal behavior are also covered by tests.  However, whenever
+possible, test output can be compared to output of other Ecmascript engines,
+currently: Rhino, NodeJS (V8), and Smjs.
+
+Test case scripts write their output using the ``print()`` function.  If
+``print()`` is not available for a particular interpretation (as is the case
+with NodeJS), a prologue defining it is injected.
+
+Test case format
+----------------
+
+Test cases are plain Ecmascript files ending with the extension ``.js`` with
+special markup inside comments.
+
+Example::
+
+  /*
+   *  Example test.
+   *
+   *  Expected result is delimited as follows; the expected response
+   *  here is "hello world\n".
+   */
+
+  /*---
+  {
+     "slow": false,
+     "_comment": "optional metadata is encoded as a single JSON object"
+  }
+  ---*/
+
+  /*===
+  hello world
+  ===*/
+
+  if (1) {
+      print("hello world");   /* automatic newline */
+  } else {
+      print("not quite");
+  }
+
+  /*===
+  second test
+  ===*/
+
+  /* there can be multiple "expected" blocks (but only one metadata block) */
+  print("second test");
+
+The metadata block and all metadata keys are optional.  Boolean flags
+default to false if metadata block or the key is not present.  Current
+metadata keys:
+
+* ``slow``: if true, test is slow and increased timelimits are applied
+  to avoid incorrect timeout errors.
+
+* ``skip``: if true, test is not finished yet, and a failure is not
+  counted towards failcount.
+
+* ``custom``: if true, some implementation dependent features are tested,
+  and comparison to other Ecmascript engines is not relevant.
+
+Practices
+---------
+
+Indentation
+:::::::::::
+
+Indent with space, 4 spaces.
+
+Verifying exception type
+::::::::::::::::::::::::
+
+Since Ecmascript doesn't require specific error messages for errors
+thrown, the messages should not be inspected or printed out in test
+cases.  Ecmascript does require specific error types though (such as
+``TypeError``.  These can be verified by printing the ``name``
+property of an error object.
+
+For instance::
+
+  try {
+      null.foo = 1;
+  } catch (e) {
+      print(e.name);
+  }
+
+prints::
+
+  TypeError
+
+When an error is not supposed to occur in a successful test run, the
+exception message can (and should) be printed, as it makes it easier
+to resolve a failing test case.  This can be done most easily as::
+
+  try {
+      null.foo = 1;
+  } catch (e) {
+      print(e);
+  }
+
+Test cases
+----------
+
+Test cases filenames consist of lowercase words delimited by dashes, e.g.::
+
+  test-stmt-trycatch.js
+
+The first part of each test case is ``test``.  The second part indicates a
+major test category.  The test categories are not very strictly defined, and
+there is currently no tracking of specification coverage.
+
+Test cases starting with ``test-dev-`` are development time test cases
+which demonstrate a particular issue and may not be very well documented.
+
+Test cases starting with ``test-dev-bug-`` illustrate a particular
+development time bug which has usually already been fixed.
+
+Duktape API test cases
+======================
+
+Test case format
+----------------
+
+Test case files are C files with a ``test()`` function.  The test function
+gets as its argument an already initialized ``duk_context *`` and print out
+text to ``stdout``.  The test case can assume ``duktape.h`` and common headers
+like ``stdio.h`` have been included.  There are also some predefined macros
+(like ``TEST_SAFE_CALL()`` and ``TEST_PCALL()``) to minimize duplication in
+test case code.
+
+Expected output is defined as for Ecmascript test cases.  There is currently
+no metadata.
+
+Example::
+
+  /*===
+  Hello world from Ecmascript!
+  Hello world from C!
+  ===*/
+
+  void test(duk_context *ctx) {
+      duk_push_string("print('Hello world from Ecmascript!');");
+      duk_eval(ctx);
+      printf("Hello world from C!\n");
+  }
+
+Test runner
+===========
+
+The current test runner is a NodeJS program which handles both Ecmascript
+and API testcases.  See ``runtests/runtests.js``.
+
+Future work
+===========
+
+* Put test cases in a directory hierarchy instead (``test/stmt/trycatch.js``),
+  perhaps scales better (at the expense of adding hassle to e.g. grepping).
+
+* Keep simple input-output model but add includes.  There is a lot of
+  boilerplate now for basic things like dumping descriptors.
+
+
--- a/doc/uri.txt
+++ b/doc/uri.txt
@ -0,0 +1,60 @@
+=========================
+URI encoding and decoding
+=========================
+
+Specification notes
+===================
+
+Reserved set / unescaped set
+----------------------------
+
+The "unescaped set" for encoding and the "reserved set" for decoding always
+consist of only ASCII codepoints.  Thus comparing codepoints against the sets
+should only be necessary when processing ASCII range characters.
+
+When encoding, step 4.c will catch characters in the "unescaped set" and
+encode them as-is into the output.  Note that these can only be single-byte
+ASCII characters.  If we go to step 4.d, the codepoint may either be ASCII
+or non-ASCII, and will be escaped regardless.
+
+When decoding percent escaped codepoints, one-byte encoded codepoints (i.e.
+ASCII) are checked in step 4.d.vi; multi-byte encoded codepoints in the BMP
+range are checked in step 4.d.vii but codepoints above BMP are not checked.
+
+Apparently the idea here is to ensure no characters in the reserved set are
+decoded from percent escapes even if invalid UTF-8 (non-shortest) encodings
+are allowed.  Because characters above BMP are encoded with surrogate pairs,
+the formula for surrogate pairs ensures that the codepoint cannot be below
+U+00010000 (0x10000 is added to the surrogate pair bits), and thus no check
+against the "reserved set" is needed.
+
+However, at the end of Section 15.1.3:
+
+  RFC 3629 prohibits the decoding of invalid UTF-8 octet sequences. For
+  example, the invalid sequence C0 80 must not decode into the character
+  U+0000. Implementations of the Decode algorithm are required to throw a
+  URIError when encountering such invalid sequences.
+
+Because "reserved set" / "unescaped set" always consists of only ASCII
+codepoints, the check in step 4.d.vii should not be necessary.  The UTF-8
+validity check happens in step 4.d.vii.8.
+
+Decoding characters outside BMP
+-------------------------------
+
+The URI decoding algorithm requires that UTF-8 encoded codepoints consisting
+of more than 4 encoded bytes are rejected.  4 byte encoding contains 21 bits,
+so the maximum codepoint which can be expressed is U+1FFFFF.  However, since
+the bytes must also be valid UTF-8 (step 4.d.vii.8) the highest allowed
+codepoint is actually U+10FFFF.
+
+It would be nice to be able to:
+
+* decode higher codepoints because Duktape can represent them
+
+* decode codepoints up to U+10FFFF without surrogate pairs
+
+Because the API requirements are strict, these cannot be added to the standard
+API without breaking compliance.  Custom URI encoding/decoding functions could
+provide these extended semantics.
+
--- a/make_full.sh
+++ b/make_full.sh
@ -30,6 +30,8 @@ for i in \
 	doc/number_conversion.txt \
 	doc/regexp.txt \
 	doc/sorting.txt \
+	doc/uri.txt \
+	doc/testcases.txt \
 	; do
 	cp --parents $i $FULL/
 done
@ -43,20 +45,17 @@ for i in \
 done

 for i in \
-	examples/test.c \
 	examples/cmdline/duk_cmdline.c \
 	examples/cmdline/duk_ncurses.c \
 	examples/cmdline/duk_socket.c \
 	examples/cmdline/duk_fileio.c \
-	examples/coffee/mandel.js \
-	examples/coffee/hello.js \
-	examples/coffee/globals.js \
+	examples/coffee/Makefile \
 	examples/coffee/mandel.coffee \
 	examples/coffee/hello.coffee \
 	examples/coffee/globals.coffee \
+	examples/hello/hello.c \
 	examples/Makefile.cmdline \
 	examples/Makefile.example \
-	examples/hello/hello.c \
 	; do
 	cp --parents $i $FULL/
 done