add some internal docs, improve READMEs, fix dist script

11 years ago · 09d5d5e7fd
4 changed files with 274 additions and 16 deletions
--- a/README.txt.dist
+++ b/README.txt.dist
@ -2,12 +2,12 @@
 Duktape
 =======
-Duktape is a small and portable Ecmascript E5/E5.1 implementation.
+Duktape is a small and portable Ecmascript E5/E5.1 implementation.  It is
-It is intended to be easily embeddable into C programs, with a C API
+intended to be easily embeddable into C programs, with a C API similar in
-similar in spirit to Lua's.
+spirit to Lua's.
-The goal is to support the full E5 feature set like Unicode strings
+The goal is to support the full E5 feature set like Unicode strings and
-and regular expressions.  Other feature highlights include:
+regular expressions.  Other feature highlights include:
  * Custom types (like pointers and buffers) for C integration
@ -45,13 +45,13 @@ To build an example command line tool, use the following::
  Hello world!
  = undefined
-The source code should currently compile cleanly on Linux and OSX
+The source code should currently compile cleanly on Linux, OSX (Darwin), and
-(Darwin), for both x86 and ARM.  The goal is of course to compile
+FreeBSD, for both x86 and ARM.  The goal is of course to compile on almost
-on almost any reasonable platform.
+any reasonable platform.
-There is a separate tar ball for developing Duktape: it contains
+There is a separate tar ball ("full distribution") for developing Duktape.
-internal documentation and unit tests which are not necessary to
+It contains internal documentation and unit tests which are not necessary
-use Duktape.
+to use Duktape.
 Duktape is licensed under the MIT license (see ``LICENSE.txt``).
 MurmurHash2 is used internally; it is also under the MIT license.
--- a/doc/testcases.txt
+++ b/doc/testcases.txt
@ -0,0 +1,199 @@
 ==========
 Test cases
 ==========
 Introduction
 ============
 There are two separate test case sets for Duktape:
 1. Ecmascript test cases for testing Ecmascript compliance
 2. Duktape API test cases for testing that the exposed user API works
 Ecmascript test cases
 =====================
 How to test?
 ------------
 There are many unit testing frameworks for Ecmascript such as `Jasmine`_
 (see also `List of unit testing frameworks`_).  However, when testing an
 Ecmascript *implementation*, a testing framework cannot always assume
 that even simple language features like functions or exceptions work
 correctly.
 How to do automatic testing then?
 .. _Jasmine: http://pivotal.github.com/jasmine/
 .. _List of unit testing frameworks: http://en.wikipedia.org/wiki/List_of_unit_testing_frameworks#JavaScript
 The current solution is to run an Ecmascript test case file with a command
 line interpreter and compare the resulting ``stdout`` text to expected.
 Control information, including expected ``stdout`` results, are embedded
 into Ecmascript comments which the test runner parses.
 The intent of the test cases is to test various features of the implementation
 against the specification *and real world behavior*.  Thus, the tests are
 *not* intended to be strict conformance tests: implementation specific
 features and internal behavior are also covered by tests.  However, whenever
 possible, test output can be compared to output of other Ecmascript engines,
 currently: Rhino, NodeJS (V8), and Smjs.
 Test case scripts write their output using the ``print()`` function.  If
 ``print()`` is not available for a particular interpretation (as is the case
 with NodeJS), a prologue defining it is injected.
 Test case format
 ----------------
 Test cases are plain Ecmascript files ending with the extension ``.js`` with
 special markup inside comments.
 Example::
  /*
   *  Example test.
   *
   *  Expected result is delimited as follows; the expected response
   *  here is "hello world\n".
   */
  /*---
  {
     "slow": false,
     "_comment": "optional metadata is encoded as a single JSON object"
  }
  ---*/
  /*===
  hello world
  ===*/
  if (1) {
      print("hello world");   /* automatic newline */
  } else {
      print("not quite");
  }
  /*===
  second test
  ===*/
  /* there can be multiple "expected" blocks (but only one metadata block) */
  print("second test");
 The metadata block and all metadata keys are optional.  Boolean flags
 default to false if metadata block or the key is not present.  Current
 metadata keys:
 * ``slow``: if true, test is slow and increased timelimits are applied
  to avoid incorrect timeout errors.
 * ``skip``: if true, test is not finished yet, and a failure is not
  counted towards failcount.
 * ``custom``: if true, some implementation dependent features are tested,
  and comparison to other Ecmascript engines is not relevant.
 Practices
 ---------
 Indentation
 :::::::::::
 Indent with space, 4 spaces.
 Verifying exception type
 ::::::::::::::::::::::::
 Since Ecmascript doesn't require specific error messages for errors
 thrown, the messages should not be inspected or printed out in test
 cases.  Ecmascript does require specific error types though (such as
 ``TypeError``.  These can be verified by printing the ``name``
 property of an error object.
 For instance::
  try {
      null.foo = 1;
  } catch (e) {
      print(e.name);
  }
 prints::
  TypeError
 When an error is not supposed to occur in a successful test run, the
 exception message can (and should) be printed, as it makes it easier
 to resolve a failing test case.  This can be done most easily as::
  try {
      null.foo = 1;
  } catch (e) {
      print(e);
  }
 Test cases
 ----------
 Test cases filenames consist of lowercase words delimited by dashes, e.g.::
  test-stmt-trycatch.js
 The first part of each test case is ``test``.  The second part indicates a
 major test category.  The test categories are not very strictly defined, and
 there is currently no tracking of specification coverage.
 Test cases starting with ``test-dev-`` are development time test cases
 which demonstrate a particular issue and may not be very well documented.
 Test cases starting with ``test-dev-bug-`` illustrate a particular
 development time bug which has usually already been fixed.
 Duktape API test cases
 ======================
 Test case format
 ----------------
 Test case files are C files with a ``test()`` function.  The test function
 gets as its argument an already initialized ``duk_context *`` and print out
 text to ``stdout``.  The test case can assume ``duktape.h`` and common headers
 like ``stdio.h`` have been included.  There are also some predefined macros
 (like ``TEST_SAFE_CALL()`` and ``TEST_PCALL()``) to minimize duplication in
 test case code.
 Expected output is defined as for Ecmascript test cases.  There is currently
 no metadata.
 Example::
  /*===
  Hello world from Ecmascript!
  Hello world from C!
  ===*/
  void test(duk_context *ctx) {
      duk_push_string("print('Hello world from Ecmascript!');");
      duk_eval(ctx);
      printf("Hello world from C!\n");
  }
 Test runner
 ===========
 The current test runner is a NodeJS program which handles both Ecmascript
 and API testcases.  See ``runtests/runtests.js``.
 Future work
 ===========
 * Put test cases in a directory hierarchy instead (``test/stmt/trycatch.js``),
  perhaps scales better (at the expense of adding hassle to e.g. grepping).
 * Keep simple input-output model but add includes.  There is a lot of
  boilerplate now for basic things like dumping descriptors.
--- a/doc/uri.txt
+++ b/doc/uri.txt
@ -0,0 +1,60 @@
 =========================
 URI encoding and decoding
 =========================
 Specification notes
 ===================
 Reserved set / unescaped set
 ----------------------------
 The "unescaped set" for encoding and the "reserved set" for decoding always
 consist of only ASCII codepoints.  Thus comparing codepoints against the sets
 should only be necessary when processing ASCII range characters.
 When encoding, step 4.c will catch characters in the "unescaped set" and
 encode them as-is into the output.  Note that these can only be single-byte
 ASCII characters.  If we go to step 4.d, the codepoint may either be ASCII
 or non-ASCII, and will be escaped regardless.
 When decoding percent escaped codepoints, one-byte encoded codepoints (i.e.
 ASCII) are checked in step 4.d.vi; multi-byte encoded codepoints in the BMP
 range are checked in step 4.d.vii but codepoints above BMP are not checked.
 Apparently the idea here is to ensure no characters in the reserved set are
 decoded from percent escapes even if invalid UTF-8 (non-shortest) encodings
 are allowed.  Because characters above BMP are encoded with surrogate pairs,
 the formula for surrogate pairs ensures that the codepoint cannot be below
 U+00010000 (0x10000 is added to the surrogate pair bits), and thus no check
 against the "reserved set" is needed.
 However, at the end of Section 15.1.3:
  RFC 3629 prohibits the decoding of invalid UTF-8 octet sequences. For
  example, the invalid sequence C0 80 must not decode into the character
  U+0000. Implementations of the Decode algorithm are required to throw a
  URIError when encountering such invalid sequences.
 Because "reserved set" / "unescaped set" always consists of only ASCII
 codepoints, the check in step 4.d.vii should not be necessary.  The UTF-8
 validity check happens in step 4.d.vii.8.
 Decoding characters outside BMP
 -------------------------------
 The URI decoding algorithm requires that UTF-8 encoded codepoints consisting
 of more than 4 encoded bytes are rejected.  4 byte encoding contains 21 bits,
 so the maximum codepoint which can be expressed is U+1FFFFF.  However, since
 the bytes must also be valid UTF-8 (step 4.d.vii.8) the highest allowed
 codepoint is actually U+10FFFF.
 It would be nice to be able to:
 * decode higher codepoints because Duktape can represent them
 * decode codepoints up to U+10FFFF without surrogate pairs
 Because the API requirements are strict, these cannot be added to the standard
 API without breaking compliance.  Custom URI encoding/decoding functions could
 provide these extended semantics.
--- a/make_full.sh
+++ b/make_full.sh
@ -30,6 +30,8 @@ for i in \
 	doc/number_conversion.txt \
 	doc/regexp.txt \
 	doc/sorting.txt \
 	doc/uri.txt \
 	doc/testcases.txt \
 	; do
 	cp --parents $i $FULL/
 done
@ -43,20 +45,17 @@ for i in \
 done
 for i in \
 	examples/test.c \
 	examples/cmdline/duk_cmdline.c \
 	examples/cmdline/duk_ncurses.c \
 	examples/cmdline/duk_socket.c \
 	examples/cmdline/duk_fileio.c \
-	examples/coffee/mandel.js \
+	examples/coffee/Makefile \
 	examples/coffee/hello.js \
 	examples/coffee/globals.js \
 	examples/coffee/mandel.coffee \
 	examples/coffee/hello.coffee \
 	examples/coffee/globals.coffee \
 	examples/hello/hello.c \
 	examples/Makefile.cmdline \
 	examples/Makefile.example \
 	examples/hello/hello.c \
 	; do
 	cp --parents $i $FULL/
 done