Browse Source

Merge pull request #547 from svaarala/regexp-literal-brace-cleanups

Cleanups for literal regexp curly brace handling
pull/548/head
Sami Vaarala 9 years ago
parent
commit
f936da50d3
  1. 5
      RELEASES.rst
  2. 1
      config/config-options/DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER.yaml
  3. 1
      config/config-options/DUK_USE_NONSTD_ARRAY_MAP_TRAILER.yaml
  4. 1
      config/config-options/DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT.yaml
  5. 1
      config/config-options/DUK_USE_NONSTD_FUNC_CALLER_PROPERTY.yaml
  6. 1
      config/config-options/DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY.yaml
  7. 1
      config/config-options/DUK_USE_NONSTD_FUNC_STMT.yaml
  8. 1
      config/config-options/DUK_USE_NONSTD_GETTER_KEY_ARGUMENT.yaml
  9. 1
      config/config-options/DUK_USE_NONSTD_JSON_ESC_U2028_U2029.yaml
  10. 10
      config/config-options/DUK_USE_NONSTD_REGEXP_BRACES.yaml
  11. 1
      config/config-options/DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE.yaml
  12. 1
      config/config-options/DUK_USE_NONSTD_SETTER_KEY_ARGUMENT.yaml
  13. 1
      config/config-options/DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT.yaml
  14. 14
      config/examples/compliance.yaml
  15. 3
      config/tags.yaml
  16. 5
      doc/emscripten-status.rst
  17. 14
      doc/regexp.rst
  18. 64
      src/duk_lexer.c
  19. 63
      tests/ecmascript/test-dev-regexp-quantifier-digits.js
  20. 75
      tests/ecmascript/test-regexp-non-std-brace.js
  21. 105
      tests/ecmascript/test-regexp-nonstandard-brace.js
  22. 21
      util/fix_emscripten.py
  23. 7
      website/guide/compatibility.html
  24. 25
      website/guide/custombehavior.html

5
RELEASES.rst

@ -1367,6 +1367,11 @@ Planned
1.5.0 (XXXX-XX-XX)
------------------
* Allow non-standard unescaped braces ('{' and '}') in regular expressions
when no valid quantifier can be parsed; this improves compatibility with
existing Javascript code which often assumes support for some non-standard
regexp expressions (GH-142, GH-513, GH-547)
* Fix potentially memory unsafe behavior when a refcount-triggered finalizer
function rescues an object; the memory unsafe behavior doesn't happen
immediately which makes the cause of the unsafe behavior difficult to

1
config/config-options/DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
For better compatibility with existing code, enable non-standard
Array.prototype.concat() behavior for trailing non-existent elements of

1
config/config-options/DUK_USE_NONSTD_ARRAY_MAP_TRAILER.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
For better compatibility with existing code, enable non-standard
Array.prototype.map() behavior for trailing non-existent elements of

1
config/config-options/DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
For better compatibility with existing code, enable non-standard
Array.prototype.splice() behavior when the second argument (deleteCount)

1
config/config-options/DUK_USE_NONSTD_FUNC_CALLER_PROPERTY.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: false
tags:
- ecmascript
- compliance
description: >
Add a non-standard "caller" property to non-strict function instances
for better compatibility with existing code. The semantics of this

1
config/config-options/DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: false
tags:
- ecmascript
- compliance
description: >
Add a non-standard "source" property to function instances. This allows
function toString() to print out the actual function source. The property

1
config/config-options/DUK_USE_NONSTD_FUNC_STMT.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
Enable support for function declarations outside program or function top
level (also known as "function statements"). Such declarations are

1
config/config-options/DUK_USE_NONSTD_GETTER_KEY_ARGUMENT.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
Give getter calls the accessed property name as an additional non-standard
argument. This allows a single getter function to be reused for multiple

1
config/config-options/DUK_USE_NONSTD_JSON_ESC_U2028_U2029.yaml

@ -4,6 +4,7 @@ introduced: 1.1.0
default: true
tags:
- ecmascript
- compliance
description: >
When enabled, Duktape JSON.stringify() will escape U+2028 and U+2029 which
is non-compliant behavior. This is recommended to make JSON.stringify()

10
config/config-options/DUK_USE_NONSTD_REGEXP_BRACES.yaml

@ -1,10 +1,12 @@
define: DUK_USE_NONSTD_REGEXP_BRACES
feature_enables: DUK_OPT_NONSTD_REGEXP_BRACES
introduced: 1.3.2
introduced: 1.5.0
default: true
tags:
- ecmascript
- compliance
description: >
Enable support for non-standard '{' literal. Ecmascript requires
curly braces to be escaped, but most regex engine support them
when they are not used in valid quantifier. This option is recommended.
Enable support for non-standard '{' and '}' literals. Ecmascript requires
literal curly braces to be escaped, but most Ecmascript engines support them
when they are not used in valid quantifier. This option is recommended
because a lot of existing code depends on non-standard literal braces.

1
config/config-options/DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
Enable support for non-standard regexp dollar escape "\$". This option is
recommended because such regexps are used by existing code bases.

1
config/config-options/DUK_USE_NONSTD_SETTER_KEY_ARGUMENT.yaml

@ -4,6 +4,7 @@ introduced: 1.0.0
default: true
tags:
- ecmascript
- compliance
description: >
Give setter calls the accessed property name as an additional non-standard
argument. This allows a single setter function to be reused for multiple

1
config/config-options/DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT.yaml

@ -4,6 +4,7 @@ introduced: 1.2.0
default: true
tags:
- ecmascript
- compliance
description: >
Allow 32-bit codepoints in String.fromCharCode(). This is non-compliant
(the E5.1 specification has a ToUint16() coercion for the codepoints) but

14
config/examples/compliance.yaml

@ -0,0 +1,14 @@
# Enable compliant behavior, defaults favor "real world" compatibility.
DUK_USE_NONSTD_ARRAY_CONCAT_TRAILER: false
DUK_USE_NONSTD_ARRAY_MAP_TRAILER: false
DUK_USE_NONSTD_ARRAY_SPLICE_DELCOUNT: false
DUK_USE_NONSTD_FUNC_CALLER_PROPERTY: false
DUK_USE_NONSTD_FUNC_SOURCE_PROPERTY: false
DUK_USE_NONSTD_FUNC_STMT: false
DUK_USE_NONSTD_GETTER_KEY_ARGUMENT: false
DUK_USE_NONSTD_JSON_ESC_U2028_U2029: false
DUK_USE_NONSTD_REGEXP_BRACES: false
DUK_USE_NONSTD_REGEXP_DOLLAR_ESCAPE: false
DUK_USE_NONSTD_SETTER_KEY_ARGUMENT: false
DUK_USE_NONSTD_STRING_FROMCHARCODE_32BIT: false

3
config/tags.yaml

@ -9,6 +9,9 @@ ecmascript:
ecmascript6:
title: Ecmascript Edition 6 (ES6) feature options
compliance:
title: Compliance related options
debugger:
title: Debugger options

5
doc/emscripten-status.rst

@ -15,7 +15,10 @@ Tweaks needed:
* ``--memory-init-file 0``: don't use an external memory file.
* Some RegExps need to be fixed, see ``util/fix_emscripten.py``.
* Emscripten expects a function's ``.toString()`` to match a certain
pattern which is not guaranteed (and Duktape doesn't match), see
``util/fix_emscripten.py``. Since Duktape 1.5.0 non-standard regexp
fixes for unescaped curly braces are no longer needed.
Normally this suffices. If you're running Duktape with a small amount of
memory (e.g. when running the Duktape command line tool with the ``-r``

14
doc/regexp.rst

@ -380,6 +380,20 @@ Empty quantifier bodies in complex quantifiers
This problem could also be fixed for complex quantifiers, but the
fix is not as trivial as for simple quantifiers.
Non-standard RegExp syntax in existing code
:::::::::::::::::::::::::::::::::::::::::::
Some Ecmascript code bases depend on non-standard RegExp syntax, such as
using literal braces without escaping::
/{(\d+)}/ non-standard
/\{(\d+)\}/ standard
Duktape's regexp engine supports a few non-standard expressions to reduce
issues with existing code. A longer term, more flexible solution is to
allow the built-in minimal engine to be replaced with an external engine
with wider regexp syntax, better performance, etc.
Miscellaneous
:::::::::::::

64
src/duk_lexer.c

@ -179,7 +179,7 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
duk_ucodepoint_t x;
duk_small_uint_t contlen;
const duk_uint8_t *p, *p_end;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
duk_ucodepoint_t mincp;
#endif
duk_int_t input_line;
@ -243,21 +243,21 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
} else if (x < 0xe0UL) {
/* 110x xxxx 10xx xxxx */
contlen = 1;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x80UL;
#endif
x = x & 0x1fUL;
} else if (x < 0xf0UL) {
/* 1110 xxxx 10xx xxxx 10xx xxxx */
contlen = 2;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x800UL;
#endif
x = x & 0x0fUL;
} else if (x < 0xf8UL) {
/* 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx */
contlen = 3;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x10000UL;
#endif
x = x & 0x07UL;
@ -288,7 +288,7 @@ DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t s
if (x > 0x10ffffUL) {
goto error_encoding;
}
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
if (x < mincp || (x >= 0xd800UL && x <= 0xdfffUL) || x == 0xfffeUL) {
goto error_encoding;
}
@ -352,7 +352,7 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
duk_small_uint_t len;
duk_small_uint_t i;
const duk_uint8_t *p;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
duk_ucodepoint_t mincp;
#endif
duk_size_t input_offset;
@ -407,21 +407,21 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
} else if (x < 0xe0UL) {
/* 110x xxxx 10xx xxxx */
len = 2;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x80UL;
#endif
x = x & 0x1fUL;
} else if (x < 0xf0UL) {
/* 1110 xxxx 10xx xxxx 10xx xxxx */
len = 3;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x800UL;
#endif
x = x & 0x0fUL;
} else if (x < 0xf8UL) {
/* 1111 0xxx 10xx xxxx 10xx xxxx 10xx xxxx */
len = 4;
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
mincp = 0x10000UL;
#endif
x = x & 0x07UL;
@ -452,7 +452,7 @@ DUK_LOCAL duk_codepoint_t duk__read_char(duk_lexer_ctx *lex_ctx) {
if (x > 0x10ffffUL) {
goto error_encoding;
}
#ifdef DUK_USE_STRICT_UTF8_SOURCE
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
if (x < mincp || (x >= 0xd800UL && x <= 0xdfffUL) || x == 0xfffeUL) {
goto error_encoding;
}
@ -564,7 +564,7 @@ DUK_INTERNAL void duk_lexer_initctx(duk_lexer_ctx *lex_ctx) {
DUK_ASSERT(lex_ctx != NULL);
DUK_MEMZERO(lex_ctx, sizeof(*lex_ctx));
#ifdef DUK_USE_EXPLICIT_NULL_INIT
#if defined(DUK_USE_EXPLICIT_NULL_INIT)
#if defined(DUK_USE_LEXER_SLIDING_WINDOW)
lex_ctx->window = NULL;
#endif
@ -814,7 +814,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
}
goto restart_lineupdate;
} else if (regexp_mode) {
#ifdef DUK_USE_REGEXP_SUPPORT
#if defined(DUK_USE_REGEXP_SUPPORT)
/*
* "/" followed by something in regexp mode. See E5 Section 7.8.5.
*
@ -1169,7 +1169,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
/* Zero escape (also allowed in non-strict mode) */
ch = 0;
/* adv = 2 - 1 default OK */
#ifdef DUK_USE_OCTAL_SUPPORT
#if defined(DUK_USE_OCTAL_SUPPORT)
} else if (strict_mode) {
/* No other escape beginning with a digit in strict mode */
DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
@ -1411,7 +1411,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
DUK__ADVANCECHARS(lex_ctx, 2);
int_only = 1;
allow_hex = 1;
#ifdef DUK_USE_OCTAL_SUPPORT
#if defined(DUK_USE_OCTAL_SUPPORT)
} else if (!strict_mode && x == '0' && DUK__ISDIGIT(y)) {
/* Note: if DecimalLiteral starts with a '0', it can only be
* followed by a period or an exponent indicator which starts
@ -1471,7 +1471,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
DUK_S2N_FLAG_ALLOW_FRAC |
DUK_S2N_FLAG_ALLOW_NAKED_FRAC |
DUK_S2N_FLAG_ALLOW_EMPTY_FRAC |
#ifdef DUK_USE_OCTAL_SUPPORT
#if defined(DUK_USE_OCTAL_SUPPORT)
(strict_mode ? 0 : DUK_S2N_FLAG_ALLOW_AUTO_OCT_INT) |
#endif
DUK_S2N_FLAG_ALLOW_AUTO_HEX_INT;
@ -1528,7 +1528,7 @@ void duk_lexer_parse_js_input_element(duk_lexer_ctx *lex_ctx,
}
}
#ifdef DUK_USE_REGEXP_SUPPORT
#if defined(DUK_USE_REGEXP_SUPPORT)
/*
* Parse a RegExp token. The grammar is described in E5 Section 15.10.
@ -1609,31 +1609,31 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
duk_uint_fast32_t val1 = 0;
duk_uint_fast32_t val2 = DUK_RE_QUANTIFIER_INFINITE;
duk_small_int_t digits = 0;
#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
duk_lexer_point lex_pt;
#endif
#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
/*
* Store lexer position, restoring if quantifier is invalid
* Store lexer position, restoring if quantifier is invalid.
*/
#ifdef DUK_USE_NONSTD_REGEXP_BRACES
duk_lexer_point lex_pt;
DUK_LEXER_GETPOINT(lex_ctx, &lex_pt);
#endif
for (;;) {
DUK__ADVANCECHARS(lex_ctx, 1); /* eat '{' on entry */
DUK__ADVANCECHARS(lex_ctx, 1); /* eat '{' on entry */
x = DUK__L0();
if (DUK__ISDIGIT(x)) {
digits++;
val1 = val1 * 10 + (duk_uint_fast32_t) duk__hexval(lex_ctx, x);
} else if (x == ',') {
if (digits >= DUK__MAX_RE_QUANT_DIGITS) {
DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
"invalid regexp quantifier (too many digits)");
if (digits > DUK__MAX_RE_QUANT_DIGITS) {
goto invalid_quantifier;
}
if (val2 != DUK_RE_QUANTIFIER_INFINITE) {
goto invalid_quantifier;
}
if ( DUK__L1() == '}') {
if (DUK__L1() == '}') {
/* form: { DecimalDigits , }, val1 = min count */
if (digits == 0) {
goto invalid_quantifier;
@ -1647,9 +1647,8 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
val1 = 0;
digits = 0; /* not strictly necessary because of lookahead '}' above */
} else if (x == '}') {
if (digits >= DUK__MAX_RE_QUANT_DIGITS) {
DUK_ERROR(lex_ctx->thr, DUK_ERR_SYNTAX_ERROR,
"invalid regexp quantifier (too many digits)");
if (digits > DUK__MAX_RE_QUANT_DIGITS) {
goto invalid_quantifier;
}
if (digits == 0) {
goto invalid_quantifier;
@ -1677,10 +1676,11 @@ DUK_INTERNAL void duk_lexer_parse_re_token(duk_lexer_ctx *lex_ctx, duk_re_token
}
advtok = DUK__ADVTOK(0, DUK_RETOK_QUANTIFIER);
break;
invalid_quantifier:
#ifdef DUK_USE_NONSTD_REGEXP_BRACES
/* Failed to match the quantifier, restore lexer */
invalid_quantifier:
#if defined(DUK_USE_NONSTD_REGEXP_BRACES)
/* Failed to match the quantifier, restore lexer and parse
* opening brace as a literal.
*/
DUK_LEXER_SETPOINT(lex_ctx, &lex_pt);
advtok = DUK__ADVTOK(1, DUK_RETOK_ATOM_CHAR);
out_token->num = '{';

63
tests/ecmascript/test-dev-regexp-quantifier-digits.js

@ -0,0 +1,63 @@
/*
* Duktape has an internal digit limit (9 digits) for regexp quantifier
* min/max counts.
*/
/*---
{
"custom": true
}
---*/
/*===
["xxx"]
null
===*/
// 8 digits
try {
print(eval("JSON.stringify(/x{3,99999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}
try {
print(eval("JSON.stringify(/x{88888888,99999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}
/*===
["xxx"]
null
===*/
// 9 digits, still accepted
try {
print(eval("JSON.stringify(/x{3,999999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}
try {
print(eval("JSON.stringify(/x{333333333,999999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}
/*===
null
null
===*/
// 10 digits: SyntaxError without non-standard literal curly braces
// (DUK_USE_NONSTD_REGEXP_BRACES), treated as a literal with non-standard
// curly braces.
try {
print(eval("JSON.stringify(/x{3,9999999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}
try {
print(eval("JSON.stringify(/x{3333333333,9999999999}/.exec('xxx'))"));
} catch (e) {
print(e);
}

75
tests/ecmascript/test-regexp-non-std-brace.js

@ -1,75 +0,0 @@
var t;
/*===
a{abc}
a{1b}
a{2,b}
===*/
// Any non-valid character cancels quantifier parsing
t = /a{.*}/.exec("aa{abc}");
print(t[0]);
t = /a{1.}/.exec("aa{1b}");
print(t[0]);
t = /a{2,.}/.exec("aa{2,b}");
print(t[0]);
/*===
a{abc}
===*/
// Closing brace is allowed
t = /a\{.*}/.exec("aa{abc}");
print(t[0]);
/*===
a{1}
a{1,2}
===*/
// Valid quantifier but for the closing brace
t = /a{1\}/.exec("aa{1}");
print(t[0]);
t = /a{1,2\}/.exec("aa{1,2}");
print(t[0]);
/*===
{1111111111111111111111111
===*/
// Do not fail on digits before , or }
t = /{1111111111111111111111111/.exec('{1111111111111111111111111');
print(t[0]);
/*===
a{}
a{,}
a{1,2,3}
===*/
//On parsing failure, treat as a brace
t = /a{}/.exec('a{}');
print(t[0]);
t = /a{,}/.exec('a{,}');
print(t[0]);
t = /a{1,2,3}/.exec('a{1,2,3}');
print(t[0]);
/*===
SyntaxError
===*/
// Current implementation does not allow all types of error
// Too many numbers
try {
eval("/{1111111111111111111111111}/.exec('foo');");
print("no exception");
} catch (e) {
print(e.name);
}

105
tests/ecmascript/test-regexp-nonstandard-brace.js

@ -0,0 +1,105 @@
/*
* Ecmascript regexp pattern character production does not allow literal
* curly braces in any position, but many Ecmascript regexp engines allow
* them when the meaning is unambiguous. Since Duktape 1.5.0 Duktape also
* allows literal curly braces in regexps.
*/
// Behavior is custom because e.g. quantifier digit limits are Duktape specific.
/*---
{
"custom": true
}
---*/
var t;
/*===
a{abc}
a{1b}
a{2,b}
===*/
// Any invalid character cancels quantifier parsing, and causes the left
// curly brace to be treated as a literal (i.e. same as /\{/).
t = /a{.*}/.exec("aa{abc}");
print(t[0]);
t = /a{1.}/.exec("aa{1b}");
print(t[0]);
t = /a{2,.}/.exec("aa{2,b}");
print(t[0]);
/*===
a{abc}
===*/
// Unescaped right (closing) brace is allowed anywhere outside a quantifier
// because it's unambiguous.
t = /a\{.*}/.exec("aa{abc}");
print(t[0]);
/*===
a{1}
a{1,2}
===*/
// Valid quantifier except for the closing brace: quantifier parsing is
// cancelled and left curly brace is treated as a literal.
t = /a{1\}/.exec("aa{1}");
print(t[0]);
t = /a{1,2\}/.exec("aa{1,2}");
print(t[0]);
/*===
{1111111111111111111111111
===*/
// Do not fail on digits before , or }.
t = /{1111111111111111111111111/.exec('{1111111111111111111111111');
print(t[0]);
/*===
a{}
a{,}
a{1,2,3}
===*/
// On any quantifier parsing failure, treat as a literal brace.
t = /a{}/.exec('a{}');
print(t[0]);
t = /a{,}/.exec('a{,}');
print(t[0]);
t = /a{1,2,3}/.exec('a{1,2,3}');
print(t[0]);
/*===
{1111111111111111111111111,}
{1111111111111111111111111,2222222222222222222222222222}
{1111,1111111111}
xxxxxxxxxxx
===*/
// Duktape has an internal limitation on the maximum number of quantifier
// digits: in this case the limits are exceeded and the quantifier is
// rejected and the curly brace is then parsed as a literal. At the moment
// the maximum number of digits allowed for quantifier min/max value is 9.
t = /{1111111111111111111111111,}/.exec('{1111111111111111111111111,}foo');
print(t[0]);
t = /{1111111111111111111111111,2222222222222222222222222222}/.exec('{1111111111111111111111111,2222222222222222222222222222}');
print(t[0]);
t = /{1111,1111111111}/.exec('{1111,1111111111}foo');
print(t[0]);
// Here the max limit is exactly 9 digits so it's treated as a valid quantifier.
t = /x{11,111111111}/.exec('xxxxxxxxxxx');
print(t[0]);

21
util/fix_emscripten.py

@ -12,28 +12,33 @@ replacements = {
# RegExp fix, now fixed in the Emscripten repository and should no longer
# be necessary.
# https://github.com/kripken/emscripten/commit/277ac5239057721ebe3c6e7813dc478eeab2cea0
r"""if (/<?{ ?[^}]* ?}>?/.test(type)) return true""":
r"""if (/<?\{ ?[^}]* ?\}>?/.test(type)) return true""",
# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
#r"""if (/<?{ ?[^}]* ?}>?/.test(type)) return true""":
# r"""if (/<?\{ ?[^}]* ?\}>?/.test(type)) return true""",
# GH-11: Another RegExp escaping fix.
r"""var sourceRegex = /^function\s\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
r"""var sourceRegex = /^function\s\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
#r"""var sourceRegex = /^function\s\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
# r"""var sourceRegex = /^function\s\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
#r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/;""":
# r"""var sourceRegex = /^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/;""",
# GH-11: Attempt to parse a function's toString() output with a RegExp.
# The RegExp makes invalid assumptions and won't parse Duktape's function
# toString output ("function empty() {/* source code*/)}").
# This stopgap will prevent a 'TypeError: invalid base reference for property read'
# and allows at least a hello world to run.
# Still needed with Duktape 1.5.0 because the issue is what Emscripten
# expects from .toString() of a function.
r"""var parsed = jsfunc.toString().match(sourceRegex).slice(1);""":
r"""var parsed = (jsfunc.toString().match(sourceRegex) || []).slice(1);""",
r"""jsfunc.toString().match(sourceRegex).slice(1);""":
r"""(jsfunc.toString().match(sourceRegex) || []).slice(1);""",
# Newer emscripten has this at least with -O2
r"""/^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/""":
r"""/^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/""",
# Duktape 1.5.0: no longer needed with non-standard regexp curly brace support
#r"""/^function\s*\(([^)]*)\)\s*{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?}$/""":
# r"""/^function\s*\(([^)]*)\)\s*\{\s*([^*]*?)[\s;]*(?:return\s*(.*?)[;\s]*)?\}$/""",
}
repl_keys = replacements.keys()

7
website/guide/compatibility.html

@ -59,7 +59,8 @@ Javascript. There are no known issues.</p>
<p><a href="https://github.com/Microsoft/TypeScript/">TypeScript</a>
compiles to Javascript. There are no known issues with compiling TypeScript
using the Microsoft TypeScript compiler (in the ES5/CommonJS mode) and
running the resulting Javascript using Duktape.</p>
running the resulting Javascript using Duktape. It's also possible to
<a href="http://wiki.duktape.org/CompatibilityTypeScript.html">run the TypeScript compiler with Duktape</a>.</p>
<h2 id="compatibility-underscorejs">Underscore.js</h2>
@ -93,8 +94,10 @@ support yet, no "heap object" can be provided.</p>
<p><a href="https://github.com/kripken/emscripten">Emscripten</a> compiles
C/C++ into Javascript. Duktape is currently Emscripten compatible except
for a few RegExp issues, see:
for an assumption about the format of a function's <code>toString()</code>
output, see:
<a href="https://github.com/svaarala/duktape/blob/master/util/fix_emscripten.py">fix_emscripten.py</a>.
Since Duktape 1.5.0 fixes for non-standard regexps are no longer needed.
</p>
<p>As of Duktape 1.3 there is support for Khronos/ES6 TypedArray which improves

25
website/guide/custombehavior.html

@ -93,16 +93,27 @@ binding in any of the points A, B, or C.</p>
<h2>RegExp leniency</h2>
<p>Although not allowed by E5.1, the following escape is allowed in RegExp
syntax:</p>
<p>Most Ecmascript engines support more syntax than guaranteed by the
<a href="http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.1">Ecmascript
E5.1 specification (Section 15.10.1 Patterns)</a>. As a result there's quite
a lot of code that won't work with strict Ecmascript regexp syntax. Duktape also
allows some non-standard syntax to better support existing code (you can turn
this non-standard behavior off using config options if you prefer).</p>
<p>Curly braces (<code>{</code> and <code>}</code>) are treated as literals
when they don't parse as a valid quantifier:</p>
<pre>
/\$/ /* matches dollar literally, non-standard */
/\u0024/ /* same, standard */
/{(\d+)}/ /* left curly, digits, right curly; non-standard */
/\{(\d+)\}/ /* same, standard */
</pre>
<p>This escape occurs in real world code so it is allowed. (More leniency
will be added in future versions to deal with real world RegExps; dollar
escapes are not the only issue.)</p>
<p>Escaping a dollar sign as <code>\$</code> is not allowed by E5.1, but
is accepted by Duktape:</p>
<pre>
/\$/ /* matches dollar literally; non-standard */
/\u0024/ /* same, standard */
</pre>
<h2>Array.prototype.splice() when deleteCount not given</h2>

Loading…
Cancel
Save