You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

231 lines
9.3 KiB

/*
* Ecmascript compiler.
*/
#ifndef DUK_JS_COMPILER_H_INCLUDED
#define DUK_JS_COMPILER_H_INCLUDED
/* ecmascript compiler limits */
#define DUK_COMPILER_TOKEN_LIMIT 100000000L /* 1e8: protects against deeply nested inner functions */
/* maximum loopcount for peephole optimization */
#define DUK_COMPILER_PEEPHOLE_MAXITER 3
/* maximum bytecode length in instructions */
#define DUK_COMPILER_MAX_BYTECODE_LENGTH (256L * 1024L * 1024L) /* 1 GB */
/*
* Compiler intermediate values
*
* Intermediate values describe either plain values (e.g. strings or
* numbers) or binary operations which have not yet been coerced into
* either a left-hand-side or right-hand-side role (e.g. object property).
*/
#define DUK_IVAL_NONE 0 /* no value */
#define DUK_IVAL_PLAIN 1 /* register, constant, or value */
#define DUK_IVAL_ARITH 2 /* binary arithmetic; DUK_OP_ADD, DUK_OP_EQ, other binary ops */
Rework executor to use a single level dispatch Change dispatch to use an 8-bit main opcode instead of a 6-bit one. This removes the need for "EXTRA" opcodes and a secondary switch clause in the executor dispatch loop. The new opcode layout uses four 8-bit fields: opcode, A, B, C. The previous reg/const concept which used 9-bit B and C fields, with the top bit reserved to denote reg vs const, is now implemented by using four consecutive opcode slots and moving the B and C reg/const flags into the opcode. For example: ADD_RR reg(A) <- reg(B) + reg(C) ADD_CR reg(A) <- const(B) + reg(C) ADD_RC reg(A) <- reg(B) + const(C) ADD_CC reg(A) <- const(B) + const(C) From a footprint standpoint this allows the executor to remain roughly the same size: four dispatched opcodes (each a function pointer in a compiled jump table) point to the same case clause handler, which does the reg/const decision based on an instruction bit test as before. However, when performance is more important than footprint, each reg/const case can be handled separately in the executor so that there's no longer a reg/const check when the opcode executes. Note that not all opcodes require a reg/const qualifier, so that opcode space is effectively increased even if reg/const opcodes consume multiple entries from the opcode table. Other minor changes: * Optimize behavior of several opcodes to e.g. avoid unnecessary support for shuffling/indirection when wider register arguments are now available.
8 years ago
#define DUK_IVAL_PROP 3 /* property access */
#define DUK_IVAL_VAR 4 /* variable access */
#define DUK_ISPEC_NONE 0 /* no value */
#define DUK_ISPEC_VALUE 1 /* value resides in 'valstack_idx' */
#define DUK_ISPEC_REGCONST 2 /* value resides in a register or constant */
Rework executor to use a single level dispatch Change dispatch to use an 8-bit main opcode instead of a 6-bit one. This removes the need for "EXTRA" opcodes and a secondary switch clause in the executor dispatch loop. The new opcode layout uses four 8-bit fields: opcode, A, B, C. The previous reg/const concept which used 9-bit B and C fields, with the top bit reserved to denote reg vs const, is now implemented by using four consecutive opcode slots and moving the B and C reg/const flags into the opcode. For example: ADD_RR reg(A) <- reg(B) + reg(C) ADD_CR reg(A) <- const(B) + reg(C) ADD_RC reg(A) <- reg(B) + const(C) ADD_CC reg(A) <- const(B) + const(C) From a footprint standpoint this allows the executor to remain roughly the same size: four dispatched opcodes (each a function pointer in a compiled jump table) point to the same case clause handler, which does the reg/const decision based on an instruction bit test as before. However, when performance is more important than footprint, each reg/const case can be handled separately in the executor so that there's no longer a reg/const check when the opcode executes. Note that not all opcodes require a reg/const qualifier, so that opcode space is effectively increased even if reg/const opcodes consume multiple entries from the opcode table. Other minor changes: * Optimize behavior of several opcodes to e.g. avoid unnecessary support for shuffling/indirection when wider register arguments are now available.
8 years ago
/* Bit mask which indicates that a regconst is a constant instead of a register.
* Chosen so that when a regconst is cast to duk_int32_t, all consts are
* negative values.
*/
#define DUK_REGCONST_CONST_MARKER 0x80000000UL
/* type to represent a reg/const reference during compilation */
typedef duk_uint32_t duk_regconst_t;
/* type to represent a straight register reference, with <0 indicating none */
typedef duk_int32_t duk_reg_t;
typedef struct {
duk_small_uint_t t; /* DUK_ISPEC_XXX */
duk_regconst_t regconst;
duk_idx_t valstack_idx; /* always set; points to a reserved valstack slot */
} duk_ispec;
typedef struct {
/*
* PLAIN: x1
* ARITH: x1 <op> x2
* PROP: x1.x2
* VAR: x1 (name)
*/
/* XXX: can be optimized for smaller footprint esp. on 32-bit environments */
duk_small_uint_t t; /* DUK_IVAL_XXX */
Rework pre/post inc/dec in compiler and executor Pre/post inc/dec are an important fast path for fastints. Reimplement these operations as atomic opcodes so that fast pathing is easier. Opcodes emitted for typical loop constructs are also reduced by the change. Several opcodes were moved to extraops to make space for the 12 pre/post inc/dec reg/var/prop variants, and the compiler was changed to support two-argument (dest, dest, src) extraop arithmetic for its ispec/ivalue. Example of bytecode change, in Duktape 1.1.0: function foo() { var x = 10; print(x++, x++, x--, x--, x--); print(x); } BC 0000: LDINT 0, 10, 256 ; 0x80028003 op=3 (LDINT) a=0 b=10 c=256 BC 0001: CSVAR 1, 256, 0 ; 0x00400052 op=18 (CSVAR) a=1 b=256 c=0 BC 0002: LDREG 3, 0, 0 ; 0x000000c0 op=0 (LDREG) a=3 b=0 c=0 BC 0003: TONUM 3, 3 ; 0x0180c2fe op=62 (EXTRA) a=11 b=3 c=3 BC 0004: INC 0, 3 ; 0x0180067e op=62 (EXTRA) a=25 b=0 c=3 BC 0005: LDREG 4, 0, 0 ; 0x00000100 op=0 (LDREG) a=4 b=0 c=0 BC 0006: TONUM 4, 4 ; 0x020102fe op=62 (EXTRA) a=11 b=4 c=4 BC 0007: INC 0, 4 ; 0x0200067e op=62 (EXTRA) a=25 b=0 c=4 BC 0008: LDREG 5, 0, 0 ; 0x00000140 op=0 (LDREG) a=5 b=0 c=0 BC 0009: TONUM 5, 5 ; 0x028142fe op=62 (EXTRA) a=11 b=5 c=5 BC 0010: DEC 0, 5 ; 0x028006be op=62 (EXTRA) a=26 b=0 c=5 BC 0011: LDREG 6, 0, 0 ; 0x00000180 op=0 (LDREG) a=6 b=0 c=0 BC 0012: TONUM 6, 6 ; 0x030182fe op=62 (EXTRA) a=11 b=6 c=6 BC 0013: DEC 0, 6 ; 0x030006be op=62 (EXTRA) a=26 b=0 c=6 BC 0014: LDREG 7, 0, 0 ; 0x000001c0 op=0 (LDREG) a=7 b=0 c=0 BC 0015: TONUM 7, 7 ; 0x0381c2fe op=62 (EXTRA) a=11 b=7 c=7 BC 0016: DEC 0, 7 ; 0x038006be op=62 (EXTRA) a=26 b=0 c=7 BC 0017: CALL 0, 1, 5 ; 0x02804034 op=52 (CALL) a=0 b=1 c=5 BC 0018: CSVAR 1, 256, 0 ; 0x00400052 op=18 (CSVAR) a=1 b=256 c=0 BC 0019: LDREG 3, 0, 0 ; 0x000000c0 op=0 (LDREG) a=3 b=0 c=0 BC 0020: CALL 0, 1, 1 ; 0x00804034 op=52 (CALL) a=0 b=1 c=1 BC 0021: RETURN 1, 0, 0 ; 0x00000073 op=51 (RETURN) a=1 b=0 c=0 After this commit: function foo() { var x = 10; print(x++, x++, x--, x--, x--); print(x); } BC 0000: LDINT 0, 10, 256 ; 0x80028003 op=3 (LDINT) a=0 b=10 c=256 BC 0001: CSVAR 1, 256, 0 ; 0x00400052 op=18 (CSVAR) a=1 b=256 c=0 BC 0002: POSTINC 3, 0, 0 ; 0x000000f9 op=57 (POSTINC) a=3 b=0 c=0 BC 0003: POSTINC 4, 0, 0 ; 0x00000139 op=57 (POSTINC) a=4 b=0 c=0 BC 0004: POSTDEC 5, 0, 0 ; 0x0000017c op=60 (POSTDEC) a=5 b=0 c=0 BC 0005: POSTDEC 6, 0, 0 ; 0x000001bc op=60 (POSTDEC) a=6 b=0 c=0 BC 0006: POSTDEC 7, 0, 0 ; 0x000001fc op=60 (POSTDEC) a=7 b=0 c=0 BC 0007: CALL 0, 1, 5 ; 0x02804030 op=48 (CALL) a=0 b=1 c=5 BC 0008: CSVAR 1, 256, 0 ; 0x00400052 op=18 (CSVAR) a=1 b=256 c=0 BC 0009: LDREG 3, 0, 0 ; 0x000000c0 op=0 (LDREG) a=3 b=0 c=0 BC 0010: CALL 0, 1, 1 ; 0x00804030 op=48 (CALL) a=0 b=1 c=1 BC 0011: RETURN 1, 0, 0 ; 0x0000006f op=47 (RETURN) a=1 b=0 c=0 For an empty for-loop, in Duktape 1.1.0: function foo() { for (var i = 0; i &lt; 1000; i++) {} } BC 0000: LABEL 0, 0, 0 ; 0x00000036 op=54 (LABEL) a=0 b=0 c=0 BC 0001: JUMP 11 (to pc+12) ; 0x800002f2 op=50 (JUMP) a=11 b=0 c=256 BC 0002: JUMP 5 (to pc+6) ; 0x80000172 op=50 (JUMP) a=5 b=0 c=256 BC 0003: LDINT 0, 0, 256 ; 0x80000003 op=3 (LDINT) a=0 b=0 c=256 BC 0004: LT 1, 0, 256 ; 0x8000006d op=45 (LT) a=1 b=0 c=256 BC 0005: IF 0, 1, 0 ; 0x0000402f op=47 (IF) a=0 b=1 c=0 BC 0006: JUMP 1 (to pc+2) ; 0x80000072 op=50 (JUMP) a=1 b=0 c=256 BC 0007: JUMP 5 (to pc+6) ; 0x80000172 op=50 (JUMP) a=5 b=0 c=256 BC 0008: LDREG 1, 0, 0 ; 0x00000040 op=0 (LDREG) a=1 b=0 c=0 BC 0009: TONUM 1, 1 ; 0x008042fe op=62 (EXTRA) a=11 b=1 c=1 BC 0010: INC 0, 1 ; 0x0080067e op=62 (EXTRA) a=25 b=0 c=1 BC 0011: JUMP -8 (to pc-7) ; 0x7ffffe32 op=50 (JUMP) a=248 b=511 c=255 BC 0012: JUMP -5 (to pc-4) ; 0x7ffffef2 op=50 (JUMP) a=251 b=511 c=255 BC 0013: ENDLABEL 0, 0, 0 ; 0x00000037 op=55 (ENDLABEL) a=0 b=0 c=0 BC 0014: RETURN 1, 0, 0 ; 0x00000073 op=51 (RETURN) a=1 b=0 c=0 After this commit: function foo() { for (var i = 0; i &lt; 1000; i++) {} } BC 0000: LABEL 0, 0 ; 0x0000083f op=63 (EXTRA) a=32 b=0 c=0 BC 0001: JUMP 9 (to pc+10) ; 0x8000026e op=46 (JUMP) a=9 b=0 c=256 BC 0002: JUMP 5 (to pc+6) ; 0x8000016e op=46 (JUMP) a=5 b=0 c=256 BC 0003: LDINT 0, 0, 256 ; 0x80000003 op=3 (LDINT) a=0 b=0 c=256 BC 0004: LT 1, 0, 256 ; 0x8000006b op=43 (LT) a=1 b=0 c=256 BC 0005: IF 0, 1, 0 ; 0x0000402d op=45 (IF) a=0 b=1 c=0 BC 0006: JUMP 1 (to pc+2) ; 0x8000006e op=46 (JUMP) a=1 b=0 c=256 BC 0007: JUMP 3 (to pc+4) ; 0x800000ee op=46 (JUMP) a=3 b=0 c=256 BC 0008: POSTINC 1, 0, 0 ; 0x00000079 op=57 (POSTINC) a=1 b=0 c=0 BC 0009: JUMP -6 (to pc-5) ; 0x7ffffeae op=46 (JUMP) a=250 b=511 c=255 BC 0010: JUMP -3 (to pc-2) ; 0x7fffff6e op=46 (JUMP) a=253 b=511 c=255 BC 0011: ENDLABEL 0, 0 ; 0x0000087f op=63 (EXTRA) a=33 b=0 c=0 BC 0012: RETURN 1, 0, 0 ; 0x0000006f op=47 (RETURN) a=1 b=0 c=0
10 years ago
duk_small_uint_t op; /* bytecode opcode (or extraop) for binary ops */
duk_ispec x1;
duk_ispec x2;
} duk_ivalue;
/*
* Bytecode instruction representation during compilation
*
* Contains the actual instruction and (optionally) debug info.
*/
struct duk_compiler_instr {
duk_instr_t ins;
#if defined(DUK_USE_PC2LINE)
duk_uint32_t line;
#endif
};
/*
* Compiler state
*/
#define DUK_LABEL_FLAG_ALLOW_BREAK (1 << 0)
#define DUK_LABEL_FLAG_ALLOW_CONTINUE (1 << 1)
#define DUK_DECL_TYPE_VAR 0
#define DUK_DECL_TYPE_FUNC 1
/* XXX: optimize to 16 bytes */
typedef struct {
duk_small_uint_t flags;
duk_int_t label_id; /* numeric label_id (-1 reserved as marker) */
duk_hstring *h_label; /* borrowed label name */
duk_int_t catch_depth; /* catch depth at point of definition */
duk_int_t pc_label; /* pc of label statement:
* pc+1: break jump site
* pc+2: continue jump site
*/
/* Fast jumps (which avoid longjmp) jump directly to the jump sites
* which are always known even while the iteration/switch statement
* is still being parsed. A final peephole pass "straightens out"
* the jumps.
*/
} duk_labelinfo;
/* Compiling state of one function, eventually converted to duk_hcompfunc */
struct duk_compiler_func {
/* These pointers are at the start of the struct so that they pack
* nicely. Mixing pointers and integer values is bad on some
* platforms (e.g. if int is 32 bits and pointers are 64 bits).
*/
duk_bufwriter_ctx bw_code; /* bufwriter for code */
duk_hstring *h_name; /* function name (borrowed reference), ends up in _name */
/* h_code: held in bw_code */
duk_hobject *h_consts; /* array */
duk_hobject *h_funcs; /* array of function templates: [func1, offset1, line1, func2, offset2, line2]
* offset/line points to closing brace to allow skipping on pass 2
*/
duk_hobject *h_decls; /* array of declarations: [ name1, val1, name2, val2, ... ]
* valN = (typeN) | (fnum << 8), where fnum is inner func number (0 for vars)
* record function and variable declarations in pass 1
*/
duk_hobject *h_labelnames; /* array of active label names */
duk_hbuffer_dynamic *h_labelinfos; /* C array of duk_labelinfo */
duk_hobject *h_argnames; /* array of formal argument names (-> _Formals) */
duk_hobject *h_varmap; /* variable map for pass 2 (identifier -> register number or null (unmapped)) */
/* Value stack indices for tracking objects. */
/* code_idx: not needed */
duk_idx_t consts_idx;
duk_idx_t funcs_idx;
duk_idx_t decls_idx;
duk_idx_t labelnames_idx;
duk_idx_t labelinfos_idx;
duk_idx_t argnames_idx;
duk_idx_t varmap_idx;
/* Temp reg handling. */
duk_reg_t temp_first; /* first register that is a temporary (below: variables) */
duk_reg_t temp_next; /* next temporary register to allocate */
duk_reg_t temp_max; /* highest value of temp_reg (temp_max - 1 is highest used reg) */
/* Shuffle registers if large number of regs/consts. */
duk_reg_t shuffle1;
duk_reg_t shuffle2;
duk_reg_t shuffle3;
/* Stats for current expression being parsed. */
duk_int_t nud_count;
duk_int_t led_count;
duk_int_t paren_level; /* parenthesis count, 0 = top level */
duk_bool_t expr_lhs; /* expression is left-hand-side compatible */
duk_bool_t allow_in; /* current paren level allows 'in' token */
/* Misc. */
duk_int_t stmt_next; /* statement id allocation (running counter) */
duk_int_t label_next; /* label id allocation (running counter) */
duk_int_t catch_depth; /* catch stack depth */
duk_int_t with_depth; /* with stack depth (affects identifier lookups) */
duk_int_t fnum_next; /* inner function numbering */
duk_int_t num_formals; /* number of formal arguments */
duk_reg_t reg_stmt_value; /* register for writing value of 'non-empty' statements (global or eval code), -1 is marker */
#if defined(DUK_USE_DEBUGGER_SUPPORT)
duk_int_t min_line; /* XXX: typing (duk_hcompfunc has duk_uint32_t) */
duk_int_t max_line;
#endif
/* Status booleans. */
duk_uint8_t is_function; /* is an actual function (not global/eval code) */
duk_uint8_t is_eval; /* is eval code */
duk_uint8_t is_global; /* is global code */
duk_uint8_t is_setget; /* is a setter/getter */
duk_uint8_t is_decl; /* is a function declaration (as opposed to function expression) */
duk_uint8_t is_strict; /* function is strict */
duk_uint8_t is_notail; /* function must not be tail called */
duk_uint8_t in_directive_prologue; /* parsing in "directive prologue", recognize directives */
duk_uint8_t in_scanning; /* parsing in "scanning" phase (first pass) */
duk_uint8_t may_direct_eval; /* function may call direct eval */
duk_uint8_t id_access_arguments; /* function refers to 'arguments' identifier */
duk_uint8_t id_access_slow; /* function makes one or more slow path accesses that won't match own static variables */
duk_uint8_t id_access_slow_own; /* function makes one or more slow path accesses that may match own static variables */
duk_uint8_t is_arguments_shadowed; /* argument/function declaration shadows 'arguments' */
duk_uint8_t needs_shuffle; /* function needs shuffle registers */
duk_uint8_t reject_regexp_in_adv; /* reject RegExp literal on next advance() call; needed for handling IdentifierName productions */
};
struct duk_compiler_ctx {
duk_hthread *thr;
/* filename being compiled (ends up in functions' '_filename' property) */
duk_hstring *h_filename; /* borrowed reference */
/* lexing (tokenization) state (contains two valstack slot indices) */
duk_lexer_ctx lex;
/* current and previous token for parsing */
duk_token prev_token;
duk_token curr_token;
duk_idx_t tok11_idx; /* curr_token slot1 (matches 'lex' slot1_idx) */
duk_idx_t tok12_idx; /* curr_token slot2 (matches 'lex' slot2_idx) */
duk_idx_t tok21_idx; /* prev_token slot1 */
duk_idx_t tok22_idx; /* prev_token slot2 */
/* recursion limit */
duk_int_t recursion_depth;
duk_int_t recursion_limit;
/* code emission temporary */
duk_int_t emit_jumpslot_pc;
/* current function being compiled (embedded instead of pointer for more compact access) */
duk_compiler_func curr_func;
};
/*
* Prototypes
*/
#define DUK_JS_COMPILE_FLAG_EVAL (1 << 0) /* source is eval code (not global) */
#define DUK_JS_COMPILE_FLAG_STRICT (1 << 1) /* strict outer context */
#define DUK_JS_COMPILE_FLAG_FUNCEXPR (1 << 2) /* source is a function expression (used for Function constructor) */
DUK_INTERNAL_DECL void duk_js_compile(duk_hthread *thr, const duk_uint8_t *src_buffer, duk_size_t src_length, duk_small_uint_t flags);
#endif /* DUK_JS_COMPILER_H_INCLUDED */