Regexp doc RST fixes

10 years ago · 3c533bd30e
1 changed files with 32 additions and 21 deletions
--- a/doc/regexp.rst
+++ b/doc/regexp.rst
@ -12,18 +12,26 @@ challenging.  See the following three excellent articles by Russ Cox
 for background:

 * http://swtch.com/~rsc/regexp/regexp1.html
+
 * http://swtch.com/~rsc/regexp/regexp2.html
+
 * http://swtch.com/~rsc/regexp/regexp3.html

 Ecmascript regular expression set is described in E5 Section 15.10,
 and includes:

 * Disjunction
+
 * Quantifiers, counted repetition and both greedy and minimal variants
+
 * Assertions, negative and positive lookaheads
+
 * Character classes, normal and inverted
+
 * Captures and backreferences
+
 * Unicode character support
+
 * Unanchored matching (only) (e.g. ``/x/.exec('fooxfoo')`` matches ``'x'``)

 Counted repetition quantifiers, assertions, captures, and backreferences
@ -36,10 +44,14 @@ and compactness.  More generally, the following prioritized requirements
 should be fulfilled:

 #. Ecmascript compatibility
+
 #. Compactness
+
 #. Avoiding deep or unbounded C recursion, and providing recursion and
   execution time sanity limits
+
 #. Regexp execution performance
+
 #. Regexp compilation performance

 Further, it should be possible to leave out regexp support during
@ -411,11 +423,11 @@ which is useful for encoding bytecode jump distances.

 The compiled regexp begins with a header, containing:

- * unsigned integer: flags, any combination of ``DUK_RE_FLAG_*``
+* unsigned integer: flags, any combination of ``DUK_RE_FLAG_*``

- * unsigned integer: ``nsaved`` (number of save slots), which should be
-   ``2n+2`` where ``n`` equals ``NCapturingParens`` (number of capture
-   groups)
+* unsigned integer: ``nsaved`` (number of save slots), which should be
+  ``2n+2`` where ``n`` equals ``NCapturingParens`` (number of capture
+  groups)

 Regexp body bytecode then follows.  Each instruction consists of an opcode
 value (``DUK_REOP_*``) (encoded as an unsigned integer) followed by a
@ -598,17 +610,17 @@ will be one byte shorter than ``len2``, but ``len2`` will be correct.
 For instance, if the code block in the second example had been 1022 bytes
 long:

- * The first offset ``L1 - L2 - 1`` would be -1023 which is converted to
-   the unsigned value ``2*1023+1 = 2047 = 0x7ff``.  This encodes to two
-   UTF-8 bytes, i.e. ``len1 = 2``.
+* The first offset ``L1 - L2 - 1`` would be -1023 which is converted to
+  the unsigned value ``2*1023+1 = 2047 = 0x7ff``.  This encodes to two
+  UTF-8 bytes, i.e. ``len1 = 2``.

- * The second offset ``L1 - L2 - 1 - 2`` would be -1025 which is converted
-   to the unsigned value ``2*1025+1 = 2051 = 0x803``.  This encodes to
-   *three* UTF-8 bytes, i.e. ``len2 = 3``.
+* The second offset ``L1 - L2 - 1 - 2`` would be -1025 which is converted
+  to the unsigned value ``2*1025+1 = 2051 = 0x803``.  This encodes to
+  *three* UTF-8 bytes, i.e. ``len2 = 3``.

- * The final skip offset ``L1 - L2 - 1 - 3`` is -1026, which converts to
-   the unsigned value ``2*1026+1 = 2053 = 0x805``.  This again encodes to
-   three UTF-8 bytes, and is thus "self consistent".
+* The final skip offset ``L1 - L2 - 1 - 3`` is -1026, which converts to
+  the unsigned value ``2*1026+1 = 2053 = 0x805``.  This again encodes to
+  three UTF-8 bytes, and is thus "self consistent".

 This could also be solved into closed form directly.

@ -753,15 +765,15 @@ it during execution.
 During regexp execution, regexp flags are kept in the regexp matching
 context, and affect opcode execution as follows:

- * global (``/g``): does not affect regexp execution, only the behavior of
-   ``RegExp.prototype.exec()`` and ``RegExp.prototype.toString()``.
+* global (``/g``): does not affect regexp execution, only the behavior of
+  ``RegExp.prototype.exec()`` and ``RegExp.prototype.toString()``.

- * ignoreCase (``/i``): affects all opcodes which match characters or
-   character ranges, through the ``Canonicalize`` operation defined in
-   E5 Section 15.10.2.8.  It also affects ``RegExp.prototype.toString()``.
+* ignoreCase (``/i``): affects all opcodes which match characters or
+  character ranges, through the ``Canonicalize`` operation defined in
+  E5 Section 15.10.2.8.  It also affects ``RegExp.prototype.toString()``.

- * multiline (``/m``): affects the start and end assertion opcodes
-   (``^`` and ``$``).  It also affects ``RegExp.prototype.toString()``.
+* multiline (``/m``): affects the start and end assertion opcodes
+  (``^`` and ``$``).  It also affects ``RegExp.prototype.toString()``.

 A bytecode opcode for matching a string instead of an individual character
 seems useful at first glance.  The compiler could join successive
@ -1201,4 +1213,3 @@ Executor

 * Optimized primitive for testing a regexp (match without captures) would be
  easy by just skipping 'save' instructions but would waste space.
-