Merge pull request #547 from svaarala/regexp-literal-brace-cleanups

Cleanups for literal regexp curly brace handling
9 years ago · f936da50d3
24 changed files with 293 additions and 129 deletions
--- a/RELEASES.rst
+++ b/RELEASES.rst
@ -1367,6 +1367,11 @@ Planned
 1.5.0 (XXXX-XX-XX)
 ------------------

+* Allow non-standard unescaped braces ('{' and '}') in regular expressions
+  when no valid quantifier can be parsed; this improves compatibility with
+  existing Javascript code which often assumes support for some non-standard
+  regexp expressions (GH-142, GH-513, GH-547)
+
 * Fix potentially memory unsafe behavior when a refcount-triggered finalizer
  function rescues an object; the memory unsafe behavior doesn't happen
  immediately which makes the cause of the unsafe behavior difficult to
--- a/config/config-options/DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER.yaml
+++ b/config/config-options/DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  For better compatibility with existing code, enable non-standard
  Array.prototype.concat() behavior for trailing non-existent elements of
--- a/config/config-options/DUK_USE_NONSTD_ARRAY_MAP_TRAILER.yaml
+++ b/config/config-options/DUK_USE_NONSTD_ARRAY_MAP_TRAILER.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  For better compatibility with existing code, enable non-standard
  Array.prototype.map() behavior for trailing non-existent elements of
--- a/config/config-options/DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT.yaml
+++ b/config/config-options/DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  For better compatibility with existing code, enable non-standard
  Array.prototype.splice() behavior when the second argument (deleteCount)
--- a/config/config-options/DUK_USE_NONSTD_FUNC_CALLER_PROPERTY.yaml
+++ b/config/config-options/DUK_USE_NONSTD_FUNC_CALLER_PROPERTY.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: false
 tags:
  - ecmascript
+  - compliance
 description: >
  Add a non-standard "caller" property to non-strict function instances
  for better compatibility with existing code.  The semantics of this
--- a/config/config-options/DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY.yaml
+++ b/config/config-options/DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: false
 tags:
  - ecmascript
+  - compliance
 description: >
  Add a non-standard "source" property to function instances.  This allows
  function toString() to print out the actual function source.  The property
--- a/config/config-options/DUK_USE_NONSTD_FUNC_STMT.yaml
+++ b/config/config-options/DUK_USE_NONSTD_FUNC_STMT.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  Enable support for function declarations outside program or function top
  level (also known as "function statements").  Such declarations are
--- a/config/config-options/DUK_USE_NONSTD_GETTER_KEY_ARGUMENT.yaml
+++ b/config/config-options/DUK_USE_NONSTD_GETTER_KEY_ARGUMENT.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  Give getter calls the accessed property name as an additional non-standard
  argument.  This allows a single getter function to be reused for multiple
--- a/config/config-options/DUK_USE_NONSTD_JSON_ESC_U2028_U2029.yaml
+++ b/config/config-options/DUK_USE_NONSTD_JSON_ESC_U2028_U2029.yaml
@ -4,6 +4,7 @@ introduced: 1.1.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  When enabled, Duktape JSON.stringify() will escape U+2028 and U+2029 which
  is non-compliant behavior.  This is recommended to make JSON.stringify()
--- a/config/config-options/DUK_USE_NONSTD_REGEXP_BRACES.yaml
+++ b/config/config-options/DUK_USE_NONSTD_REGEXP_BRACES.yaml
@ -1,10 +1,12 @@
 define: DUK_USE_NONSTD_REGEXP_BRACES
 feature_enables: DUK_OPT_NONSTD_REGEXP_BRACES
-introduced: 1.3.2
+introduced: 1.5.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
-  Enable support for non-standard '{' literal. Ecmascript requires
-  curly braces to be escaped, but most regex engine support them
-  when they are not used in valid quantifier. This option is recommended.
+  Enable support for non-standard '{' and '}' literals.  Ecmascript requires
+  literal curly braces to be escaped, but most Ecmascript engines support them
+  when they are not used in valid quantifier.  This option is recommended
+  because a lot of existing code depends on non-standard literal braces.
--- a/config/config-options/DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE.yaml
+++ b/config/config-options/DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  Enable support for non-standard regexp dollar escape "\$".  This option is
  recommended because such regexps are used by existing code bases.
--- a/config/config-options/DUK_USE_NONSTD_SETTER_KEY_ARGUMENT.yaml
+++ b/config/config-options/DUK_USE_NONSTD_SETTER_KEY_ARGUMENT.yaml
@ -4,6 +4,7 @@ introduced: 1.0.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  Give setter calls the accessed property name as an additional non-standard
  argument.  This allows a single setter function to be reused for multiple
--- a/config/config-options/DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT.yaml
+++ b/config/config-options/DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT.yaml
@ -4,6 +4,7 @@ introduced: 1.2.0
 default: true
 tags:
  - ecmascript
+  - compliance
 description: >
  Allow 32-bit codepoints in String.fromCharCode().  This is non-compliant
  (the E5.1 specification has a ToUint16() coercion for the codepoints) but
--- a/config/examples/compliance.yaml
+++ b/config/examples/compliance.yaml
@ -0,0 +1,14 @@
+# Enable compliant behavior, defaults favor "real world" compatibility.
+
+DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER: false
+DUK_USE_NONSTD_ARRAY_MAP_TRAILER: false
+DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT: false
+DUK_USE_NONSTD_FUNC_CALLER_PROPERTY: false
+DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY: false
+DUK_USE_NONSTD_FUNC_STMT: false
+DUK_USE_NONSTD_GETTER_KEY_ARGUMENT: false
+DUK_USE_NONSTD_JSON_ESC_U2028_U2029: false
+DUK_USE_NONSTD_REGEXP_BRACES: false
+DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE: false
+DUK_USE_NONSTD_SETTER_KEY_ARGUMENT: false
+DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT: false
--- a/config/tags.yaml
+++ b/config/tags.yaml
@ -9,6 +9,9 @@ ecmascript:
 ecmascript6:
  title: Ecmascript Edition 6 (ES6) feature options

+compliance:
+  title: Compliance related options
+
 debugger:
  title: Debugger options

--- a/doc/emscripten-status.rst
+++ b/doc/emscripten-status.rst
@ -15,7 +15,10 @@ Tweaks needed:

 * ``--memory-init-file 0``: don't use an external memory file.

-* Some RegExps need to be fixed, see ``util/fix_emscripten.py``.
+* Emscripten expects a function's ``.toString()`` to match a certain
+  pattern which is not guaranteed (and Duktape doesn't match), see
+  ``util/fix_emscripten.py``.  Since Duktape 1.5.0 non-standard regexp
+  fixes for unescaped curly braces are no longer needed.

 Normally this suffices.  If you're running Duktape with a small amount of
 memory (e.g. when running the Duktape command line tool with the ``-r``
--- a/doc/regexp.rst
+++ b/doc/regexp.rst
@ -380,6 +380,20 @@ Empty quantifier bodies in complex quantifiers
  This problem could also be fixed for complex quantifiers, but the
  fix is not as trivial as for simple quantifiers.

+Non-standard RegExp syntax in existing code
+:::::::::::::::::::::::::::::::::::::::::::
+
+Some Ecmascript code bases depend on non-standard RegExp syntax, such as
+using literal braces without escaping::
+
+    /{(\d+)}/    non-standard
+    /\{(\d+)\}/  standard
+
+Duktape's regexp engine supports a few non-standard expressions to reduce
+issues with existing code.  A longer term, more flexible solution is to
+allow the built-in minimal engine to be replaced with an external engine
+with wider regexp syntax, better performance, etc.
+
 Miscellaneous
 :::::::::::::

--- a/src/duk_lexer.c
+++ b/src/duk_lexer.c
@ -179,7 +179,7 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
 	duk_ucodepoint_t x;
 	duk_small_uint_t contlen;
 	const duk_uint8_t *p, *p_end;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 	duk_ucodepoint_t mincp;
 #endif
 	duk_int_t input_line;
@ -243,21 +243,21 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
 		} else if (x < 0xe0UL) {
 			/* 110x xxxx   10xx xxxx  */
 			contlen = 1;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 			mincp = 0x80UL;
 #endif
 			x = x & 0x1fUL;
 		} else if (x < 0xf0UL) {
 			/* 1110 xxxx   10xx xxxx   10xx xxxx */
 			contlen = 2;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 			mincp = 0x800UL;
 #endif
 			x = x & 0x0fUL;
 		} else if (x < 0xf8UL) {
 			/* 1111 0xxx   10xx xxxx   10xx xxxx   10xx xxxx */
 			contlen = 3;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 			mincp = 0x10000UL;
 #endif
 			x = x & 0x07UL;
@ -288,7 +288,7 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
 		if (x > 0x10ffffUL) {
 			goto error_encoding;
 		}
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 		if (x < mincp || (x >= 0xd800UL && x <= 0xdfffUL) || x == 0xfffeUL) {
 			goto error_encoding;
 		}
@ -352,7 +352,7 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
 	duk_small_uint_t len;
 	duk_small_uint_t i;
 	const duk_uint8_t *p;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 	duk_ucodepoint_t mincp;
 #endif
 	duk_size_t input_offset;
@ -407,21 +407,21 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
 	} else if (x < 0xe0UL) {
 		/* 110x xxxx   10xx xxxx  */
 		len = 2;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 		mincp = 0x80UL;
 #endif
 		x = x & 0x1fUL;
 	} else if (x < 0xf0UL) {
 		/* 1110 xxxx   10xx xxxx   10xx xxxx */
 		len = 3;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 		mincp = 0x800UL;
 #endif
 		x = x & 0x0fUL;
 	} else if (x < 0xf8UL) {
 		/* 1111 0xxx   10xx xxxx   10xx xxxx   10xx xxxx */
 		len = 4;
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 		mincp = 0x10000UL;
 #endif
 		x = x & 0x07UL;
@ -452,7 +452,7 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
 	if (x > 0x10ffffUL) {
 		goto error_encoding;
 	}
-#ifdef DUK_USE_STRICT_UTF8_SOURCE
+#if defined(DUK_USE_STRICT_UTF8_SOURCE)
 	if (x < mincp || (x >= 0xd800UL && x <= 0xdfffUL) || x == 0xfffeUL) {
 		goto error_encoding;
 	}
@ -564,7 +564,7 @@ DUK_INTERNAL void duk_lexer_initctx(duk_lexer_ctx *lex_ctx) {
 	DUK_ASSERT(lex_ctx != NULL);

 	DUK_MEMZERO(lex_ctx, sizeof(*lex_ctx));
-#ifdef DUK_USE_EXPLICIT_NULL_INIT
+#if defined(DUK_USE_EXPLICIT_NULL_INIT)
 #if defined(DUK_USE_LEXER_SLIDING_WINDOW)
 	lex_ctx->window = NULL;
 #endif
@ -814,7 +814,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
 			}
 			goto restart_lineupdate;
 		} else if (regexp_mode) {
-#ifdef DUK_USE_REGEXP_SUPPORT
+#if defined(DUK_USE_REGEXP_SUPPORT)
 			/*
 			 *  "/" followed by something in regexp mode.  See E5 Section 7.8.5.
 			 *
@ -1169,7 +1169,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
 						/* Zero escape (also allowed in non-strict mode) */
 						ch = 0;
 						/* adv = 2 - 1 default OK */
-#ifdef DUK_USE_OCTAL_SUPPORT
+#if defined(DUK_USE_OCTAL_SUPPORT)
 					} else if (strict_mode) {
 						/* No other escape beginning with a digit in strict mode */
 						DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
@ -1411,7 +1411,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
 			DUK__ADVANCECHARS(lex_ctx, 2);
 			int_only = 1;
 			allow_hex = 1;
-#ifdef DUK_USE_OCTAL_SUPPORT
+#if defined(DUK_USE_OCTAL_SUPPORT)
 		} else if (!strict_mode && x == '0' && DUK__ISDIGIT(y)) {
 			/* Note: if DecimalLiteral starts with a '0', it can only be
 			 * followed by a period or an exponent indicator which starts
@ -1471,7 +1471,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
 		            DUK_S2N_FLAG_ALLOW_FRAC |
 		            DUK_S2N_FLAG_ALLOW_NAKED_FRAC |
 		            DUK_S2N_FLAG_ALLOW_EMPTY_FRAC |
-#ifdef DUK_USE_OCTAL_SUPPORT
+#if defined(DUK_USE_OCTAL_SUPPORT)
 		            (strict_mode ? 0 : DUK_S2N_FLAG_ALLOW_AUTO_OCT_INT) |
 #endif
 		            DUK_S2N_FLAG_ALLOW_AUTO_HEX_INT;
@ -1528,7 +1528,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
 	}
 }

-#ifdef DUK_USE_REGEXP_SUPPORT
+#if defined(DUK_USE_REGEXP_SUPPORT)

 /*
 *  Parse a RegExp token.  The grammar is described in E5 Section 15.10.
@ -1609,31 +1609,31 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
 		duk_uint_fast32_t val1 = 0;
 		duk_uint_fast32_t val2 = DUK_RE_QUANTIFIER_INFINITE;
 		duk_small_int_t digits = 0;
+#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
+		duk_lexer_point lex_pt;
+#endif

+#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
 		/*
-		 *  Store lexer position, restoring if quantifier is invalid
+		 *  Store lexer position, restoring if quantifier is invalid.
 		 */
-
-#ifdef DUK_USE_NONSTD_REGEXP_BRACES
-		duk_lexer_point lex_pt;
 		DUK_LEXER_GETPOINT(lex_ctx, &lex_pt);
 #endif

 		for (;;) {
-			DUK__ADVANCECHARS(lex_ctx, 1); /* eat '{' on entry */
+			DUK__ADVANCECHARS(lex_ctx, 1);  /* eat '{' on entry */
 			x = DUK__L0();
 			if (DUK__ISDIGIT(x)) {
 				digits++;
 				val1 = val1 * 10 + (duk_uint_fast32_t) duk__hexval(lex_ctx, x);
 			} else if (x == ',') {
-				if (digits >= DUK__MAX_RE_QUANT_DIGITS) {
-					DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
-					          "invalid regexp quantifier (too many digits)");
+				if (digits > DUK__MAX_RE_QUANT_DIGITS) {
+					goto invalid_quantifier;
 				}
 				if (val2 != DUK_RE_QUANTIFIER_INFINITE) {
 					goto invalid_quantifier;
 				}
-				if ( DUK__L1() == '}') {
+				if (DUK__L1() == '}') {
 					/* form: { DecimalDigits , }, val1 = min count */
 					if (digits == 0) {
 						goto invalid_quantifier;
@ -1647,9 +1647,8 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
 				val1 = 0;
 				digits = 0;  /* not strictly necessary because of lookahead '}' above */
 			} else if (x == '}') {
-				if (digits >= DUK__MAX_RE_QUANT_DIGITS) {
-					DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
-						"invalid regexp quantifier (too many digits)");
+				if (digits > DUK__MAX_RE_QUANT_DIGITS) {
+					goto invalid_quantifier;
 				}
 				if (digits == 0) {
 					goto invalid_quantifier;
@ -1677,10 +1676,11 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
 		}
 		advtok = DUK__ADVTOK(0, DUK_RETOK_QUANTIFIER);
 		break;
-invalid_quantifier:
-#ifdef DUK_USE_NONSTD_REGEXP_BRACES
-
-		/* Failed to match the quantifier, restore lexer */
+ invalid_quantifier:
+#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
+		/* Failed to match the quantifier, restore lexer and parse
+		 * opening brace as a literal.
+		 */
 		DUK_LEXER_SETPOINT(lex_ctx, &lex_pt);
 		advtok = DUK__ADVTOK(1, DUK_RETOK_ATOM_CHAR);
 		out_token->num = '{';
--- a/tests/ecmascript/test-dev-regexp-quantifier-digits.js
+++ b/tests/ecmascript/test-dev-regexp-quantifier-digits.js
@ -0,0 +1,63 @@
+/*
+ *  Duktape has an internal digit limit (9 digits) for regexp quantifier
+ *  min/max counts.
+ */
+
+/*---
+{
+    "custom": true
+}
+---*/
+
+/*===
+["xxx"]
+null
+===*/
+
+// 8 digits
+try {
+    print(eval("JSON.stringify(/x{3,99999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
+try {
+    print(eval("JSON.stringify(/x{88888888,99999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
+
+/*===
+["xxx"]
+null
+===*/
+
+// 9 digits, still accepted
+try {
+    print(eval("JSON.stringify(/x{3,999999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
+try {
+    print(eval("JSON.stringify(/x{333333333,999999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
+
+/*===
+null
+null
+===*/
+
+// 10 digits: SyntaxError without non-standard literal curly braces
+// (DUK_USE_NONSTD_REGEXP_BRACES), treated as a literal with non-standard
+// curly braces.
+try {
+    print(eval("JSON.stringify(/x{3,9999999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
+try {
+    print(eval("JSON.stringify(/x{3333333333,9999999999}/.exec('xxx'))"));
+} catch (e) {
+    print(e);
+}
--- a/tests/ecmascript/test-regexp-non-std-brace.js
+++ b/tests/ecmascript/test-regexp-non-std-brace.js
@ -1,75 +0,0 @@
-var t;
-
-/*===
-a{abc}
-a{1b}
-a{2,b}
-===*/
-
-// Any non-valid character cancels quantifier parsing
-
-t = /a{.*}/.exec("aa{abc}");
-print(t[0]);
-t = /a{1.}/.exec("aa{1b}");
-print(t[0]);
-t = /a{2,.}/.exec("aa{2,b}");
-print(t[0]);
-
-/*===
-a{abc}
-===*/
-
-// Closing brace is allowed
-t = /a\{.*}/.exec("aa{abc}");
-print(t[0]);
-
-/*===
-a{1}
-a{1,2}
-===*/
-
-// Valid quantifier but for the closing brace
-t = /a{1\}/.exec("aa{1}");
-print(t[0]);
-t = /a{1,2\}/.exec("aa{1,2}");
-print(t[0]);
-
-/*===
-{1111111111111111111111111
-===*/
-
-// Do not fail on digits before , or }
-t = /{1111111111111111111111111/.exec('{1111111111111111111111111');
-print(t[0]);
-
-/*===
-a{}
-a{,}
-a{1,2,3}
-===*/
-
-//On parsing failure, treat as a brace
-
-t = /a{}/.exec('a{}');
-print(t[0]);
-
-t = /a{,}/.exec('a{,}');
-print(t[0]);
-
-t = /a{1,2,3}/.exec('a{1,2,3}');
-print(t[0]);
-
-
-/*===
-SyntaxError
-===*/
-
-// Current implementation does not allow all types of error
-
-// Too many numbers
-try {
-    eval("/{1111111111111111111111111}/.exec('foo');");
-    print("no exception");
-} catch (e) {
-    print(e.name);
-}
--- a/tests/ecmascript/test-regexp-nonstandard-brace.js
+++ b/tests/ecmascript/test-regexp-nonstandard-brace.js
@ -0,0 +1,105 @@
+/*
+ *  Ecmascript regexp pattern character production does not allow literal
+ *  curly braces in any position, but many Ecmascript regexp engines allow
+ *  them when the meaning is unambiguous.  Since Duktape 1.5.0 Duktape also
+ *  allows literal curly braces in regexps.
+ */
+
+// Behavior is custom because e.g. quantifier digit limits are Duktape specific.
+/*---
+{
+    "custom": true
+}
+---*/
+
+var t;
+
+/*===
+a{abc}
+a{1b}
+a{2,b}
+===*/
+
+// Any invalid character cancels quantifier parsing, and causes the left
+// curly brace to be treated as a literal (i.e. same as /\{/).
+
+t = /a{.*}/.exec("aa{abc}");
+print(t[0]);
+t = /a{1.}/.exec("aa{1b}");
+print(t[0]);
+t = /a{2,.}/.exec("aa{2,b}");
+print(t[0]);
+
+/*===
+a{abc}
+===*/
+
+// Unescaped right (closing) brace is allowed anywhere outside a quantifier
+// because it's unambiguous.
+
+t = /a\{.*}/.exec("aa{abc}");
+print(t[0]);
+
+/*===
+a{1}
+a{1,2}
+===*/
+
+// Valid quantifier except for the closing brace: quantifier parsing is
+// cancelled and left curly brace is treated as a literal.
+
+t = /a{1\}/.exec("aa{1}");
+print(t[0]);
+t = /a{1,2\}/.exec("aa{1,2}");
+print(t[0]);
+
+/*===
+{1111111111111111111111111
+===*/
+
+// Do not fail on digits before , or }.
+
+t = /{1111111111111111111111111/.exec('{1111111111111111111111111');
+print(t[0]);
+
+/*===
+a{}
+a{,}
+a{1,2,3}
+===*/
+
+// On any quantifier parsing failure, treat as a literal brace.
+
+t = /a{}/.exec('a{}');
+print(t[0]);
+
+t = /a{,}/.exec('a{,}');
+print(t[0]);
+
+t = /a{1,2,3}/.exec('a{1,2,3}');
+print(t[0]);
+
+/*===
+{1111111111111111111111111,}
+{1111111111111111111111111,2222222222222222222222222222}
+{1111,1111111111}
+xxxxxxxxxxx
+===*/
+
+// Duktape has an internal limitation on the maximum number of quantifier
+// digits: in this case the limits are exceeded and the quantifier is
+// rejected and the curly brace is then parsed as a literal.  At the moment
+// the maximum number of digits allowed for quantifier min/max value is 9.
+
+t = /{1111111111111111111111111,}/.exec('{1111111111111111111111111,}foo');
+print(t[0]);
+
+t = /{1111111111111111111111111,2222222222222222222222222222}/.exec('{1111111111111111111111111,2222222222222222222222222222}');
+print(t[0]);
+
+t = /{1111,1111111111}/.exec('{1111,1111111111}foo');
+print(t[0]);
+
+// Here the max limit is exactly 9 digits so it's treated as a valid quantifier.
+t = /x{11,111111111}/.exec('xxxxxxxxxxx');
+print(t[0]);
--- a/util/fix_emscripten.py
+++ b/util/fix_emscripten.py
@ -12,28 +12,33 @@ replacements = {
 	# RegExp fix, now fixed in the Emscripten repository and should no longer
 	# be necessary.
 	# https://github.com/kripken/emscripten/commit/277ac5239057721ebe3c6e7813dc478eeab2cea0
-	r"""if (/<?{ ?[^}]* ?}>?/.test(type)) return true""":
-		r"""if (/<?\{ ?[^}]* ?\}>?/.test(type)) return true""",
+	# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
+	#r"""if (/<?{ ?[^}]* ?}>?/.test(type)) return true""":
+	#	r"""if (/<?\{ ?[^}]* ?\}>?/.test(type)) return true""",

 	# GH-11: Another RegExp escaping fix.
-	r"""var sourceRegex = /^function\s\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
-		r"""var sourceRegex = /^function\s\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
-	r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
-		r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
+	# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
+	#r"""var sourceRegex = /^function\s\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
+	#	r"""var sourceRegex = /^function\s\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
+	#r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
+	#	r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",

 	# GH-11: Attempt to parse a function's toString() output with a RegExp.
 	# The RegExp makes invalid assumptions and won't parse Duktape's function
 	# toString output ("function empty() {/* source code*/)}").
 	# This stopgap will prevent a 'TypeError: invalid base reference for property read'
 	# and allows at least a hello world to run.
+	# Still needed with Duktape 1.5.0 because the issue is what Emscripten
+	# expects from .toString() of a function.
 	r"""var parsed = jsfunc.toString().match(sourceRegex).slice(1);""":
 		r"""var parsed = (jsfunc.toString().match(sourceRegex) || []).slice(1);""",
 	r"""jsfunc.toString().match(sourceRegex).slice(1);""":
 		r"""(jsfunc.toString().match(sourceRegex) || []).slice(1);""",

 	# Newer emscripten has this at least with -O2
-	r"""/^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/""":
-		r"""/^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/""",
+	# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
+	#r"""/^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/""":
+	#	r"""/^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/""",
 }

 repl_keys = replacements.keys()
--- a/website/guide/compatibility.html
+++ b/website/guide/compatibility.html
@ -59,7 +59,8 @@ Javascript.  There are no known issues.</p>
 <p><a href="https://github.com/Microsoft/TypeScript/">TypeScript</a>
 compiles to Javascript.  There are no known issues with compiling TypeScript
 using the Microsoft TypeScript compiler (in the ES5/CommonJS mode) and
-running the resulting Javascript using Duktape.</p>
+running the resulting Javascript using Duktape.  It's also possible to
+<a href="http://wiki.duktape.org/CompatibilityTypeScript.html">run the TypeScript compiler with Duktape</a>.</p>

 <h2 id="compatibility-underscorejs">Underscore.js</h2>

@ -93,8 +94,10 @@ support yet, no "heap object" can be provided.</p>

 <p><a href="https://github.com/kripken/emscripten">Emscripten</a> compiles
 C/C++ into Javascript.  Duktape is currently Emscripten compatible except
-for a few RegExp issues, see:
+for an assumption about the format of a function's <code>toString()</code>
+output, see:
 <a href="https://github.com/svaarala/duktape/blob/master/util/fix_emscripten.py">fix_emscripten.py</a>.
+Since Duktape 1.5.0 fixes for non-standard regexps are no longer needed.
 </p>

 <p>As of Duktape 1.3 there is support for Khronos/ES6 TypedArray which improves
--- a/website/guide/custombehavior.html
+++ b/website/guide/custombehavior.html
@ -93,16 +93,27 @@ binding in any of the points A, B, or C.</p>

 <h2>RegExp leniency</h2>

-<p>Although not allowed by E5.1, the following escape is allowed in RegExp
-syntax:</p>
+<p>Most Ecmascript engines support more syntax than guaranteed by the
+<a href="http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.1">Ecmascript
+E5.1 specification (Section 15.10.1 Patterns)</a>.  As a result there's quite
+a lot of code that won't work with strict Ecmascript regexp syntax.  Duktape also
+allows some non-standard syntax to better support existing code (you can turn
+this non-standard behavior off using config options if you prefer).</p>
+
+<p>Curly braces (<code>{</code> and <code>}</code>) are treated as literals
+when they don't parse as a valid quantifier:</p>
+
 <pre>
-  /\$/       /* matches dollar literally, non-standard */
-  /\u0024/   /* same, standard */
+  /{(\d+)}/    /* left curly, digits, right curly; non-standard */
+  /\{(\d+)\}/  /* same, standard */
 </pre>

-<p>This escape occurs in real world code so it is allowed.  (More leniency
-will be added in future versions to deal with real world RegExps; dollar
-escapes are not the only issue.)</p>
+<p>Escaping a dollar sign as <code>\$</code> is not allowed by E5.1, but
+is accepted by Duktape:</p>
+<pre>
+  /\$/       /* matches dollar literally; non-standard */
+  /\u0024/   /* same, standard */
+</pre>

 <h2>Array.prototype.splice() when deleteCount not given</h2>