This patch allows the following code to run without allocating on the heap:
super().foo(...)
Before this patch such a call would allocate a super object on the heap and
then load the foo method and call it right away. The super object is only
needed to perform the lookup of the method and not needed after that. This
patch makes an optimisation to allocate the super object on the C stack and
discard it right after use.
Changes in code size due to this patch are:
bare-arm: +128
minimal: +232
unix x64: +416
unix nanbox: +364
stmhal: +184
esp8266: +340
cc3200: +128
This patch moves some common code from the individual inline assemblers to
the compiler, the code that calls the emit-glue to assign the machine code
to the functions scope.
This patch adds the MICROPY_EMIT_INLINE_XTENSA option, which, when
enabled, allows the @micropython.asm_xtensa decorator to be used.
The following opcodes are currently supported (ax is a register, a0-a15):
ret_n()
callx0(ax)
j(label)
jx(ax)
beqz(ax, label)
bnez(ax, label)
mov(ax, ay)
movi(ax, imm) # imm can be full 32-bit, uses l32r if needed
and_(ax, ay, az)
or_(ax, ay, az)
xor(ax, ay, az)
add(ax, ay, az)
sub(ax, ay, az)
mull(ax, ay, az)
l8ui(ax, ay, imm)
l16ui(ax, ay, imm)
l32i(ax, ay, imm)
s8i(ax, ay, imm)
s16i(ax, ay, imm)
s32i(ax, ay, imm)
l16si(ax, ay, imm)
addi(ax, ay, imm)
ball(ax, ay, label)
bany(ax, ay, label)
bbc(ax, ay, label)
bbs(ax, ay, label)
beq(ax, ay, label)
bge(ax, ay, label)
bgeu(ax, ay, label)
blt(ax, ay, label)
bnall(ax, ay, label)
bne(ax, ay, label)
bnone(ax, ay, label)
Upon entry to the assembly function the registers a0, a12, a13, a14 are
pushed to the stack and the stack pointer (a1) decreased by 16. Upon
exit, these registers and the stack pointer are restored, and ret.n is
executed to return to the caller (caller address is in a0).
Note that the ABI for the Xtensa emitters is non-windowing.
The 3 kinds of comprehensions are similar enough that merging their emit
functions reduces code size. Decreases in code size in bytes are:
bare-arm:24, minimal:96, unix(NDEBUG,x86-64):328, stmhal:80, esp8266:76.
unix-cpy was originally written to get semantic equivalent with CPython
without writing functional tests. When writing the initial
implementation of uPy it was a long way between lexer and functional
tests, so the half-way test was to make sure that the bytecode was
correct. The idea was that if the uPy bytecode matched CPython 1-1 then
uPy would be proper Python if the bytecodes acted correctly. And having
matching bytecode meant that it was less likely to miss some deep
subtlety in the Python semantics that would require an architectural
change later on.
But that is all history and it no longer makes sense to retain the
ability to output CPython bytecode, because:
1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode
changes from version to version, and seems to have changed quite a bit
in 3.5. There's no point in changing the bytecode output to match
CPython anymore.
2. uPy and CPy do different optimisations to the bytecode which makes it
harder to match.
3. The bytecode tests are not run. They were never part of Travis and
are not run locally anymore.
4. The EMIT_CPYTHON option needs a lot of extra source code which adds
heaps of noise, especially in compile.c.
5. Now that there is an extensive test suite (which tests functionality)
there is no need to match the bytecode. Some very subtle behaviour is
tested with the test suite and passing these tests is a much better
way to stay Python-language compliant, rather than trying to match
CPy bytecode.
Previous to this patch each time a bytes object was referenced a new
instance (with the same data) was created. With this patch a single
bytes object is created in the compiler and is loaded directly at execute
time as a true constant (similar to loading bignum and float objects).
This saves on allocating RAM and means that bytes objects can now be
used when the memory manager is locked (eg in interrupts).
The MP_BC_LOAD_CONST_BYTES bytecode was removed as part of this.
Generated bytecode is slightly larger due to storing a pointer to the
bytes object instead of the qstr identifier.
Code size is reduced by about 60 bytes on Thumb2 architectures.
This fixes a long standing problem that viper code generation gave
terrible error messages, and actually no errors on pyboard where
assertions are disabled.
Now all compile-time errors are raised as proper Python exceptions, and
are of type ViperTypeError.
Addresses issue #940.
When just the bytecode emitter is needed there is no need to have a
dynamic method table for the emitter back-end, and we can instead
directly call the mp_emit_bc_XXX functions. This gives a significant
reduction in code size and a very slight performance boost for the
compiler.
This patch saves 1160 bytes code on Thumb2 and 972 bytes on x86, when
native emitters are disabled.
Overall savings in code over the last 3 commits are:
bare-arm: 1664 bytes.
minimal: 2136 bytes.
stmhal: 584 bytes (it has native emitter enabled).
cc3200: 1736 bytes.
First pass for the compiler is computing the scope (eg if an identifier
is local or not) and originally had an entire table of methods dedicated
to this, most of which did nothing. With changes from previous commit,
this set of methods can be removed and the methods from the bytecode
emitter used instead, with very little modification -- this is what is
done in this commit.
This factoring has little to no impact on the speed of the compiler
(tested by compiling 3763 Python scripts and timing it).
This factoring reduces code size by about 270-300 bytes on Thumb2 archs,
and 400 bytes on x86.
Previous to this patch, a big-int, float or imag constant was interned
(made into a qstr) and then parsed at runtime to create an object each
time it was needed. This is wasteful in RAM and not efficient. Now,
these constants are parsed straight away in the parser and turned into
objects. This allows constants with large numbers of digits (so
addresses issue #1103) and takes us a step closer to #722.
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte. In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient. When null termination is needed it must be
done explicitly using vstr_null_terminate.
This patch makes the MICROPY_PY_BUILTINS_SLICE compile-time option
fully disable the builtin slice operation (when set to 0). This
includes removing the slice sytanx from the grammar. Now, enabling
slice costs 4228 bytes on unix x64, and 1816 bytes on stmhal.
This patch makes MICROPY_PY_BUILTINS_SET compile-time option fully
disable the builtin set object (when set to 0). This includes removing
set constructor/comprehension from the grammar, the compiler and the
emitters. Now, enabling set costs 8168 bytes on unix x64, and 3576
bytes on stmhal.
Native emitter can now compile try/except blocks using nlr_push/nlr_pop.
It probably only works for 1 level of exception handling. It doesn't
work on Thumb (only x64).
Native emitter can also handle some additional op codes.
With this patch, 198 tests now pass using "-X emit=native" option to
micropython.
Needed to pop the iterator object when breaking out of a for loop. Need
also to be careful to unwind exception handler before popping iterator.
Addresses issue #635.
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
3 emitter functions are needed only for emitcpy, and so we can #if them
out when compiling with emitcpy support.
Also remove unused SETUP_LOOP bytecode.
Closed over variables are now passed on the stack, instead of creating a
tuple and passing that. This way memory for the closed over variables
can be allocated within the closure object itself. See issue #510 for
background.
Very little has changed. In Python 3.4 they removed the opcode
STORE_LOCALS, but in Micro Python we only ever used this for CPython
compatibility, so it was a trivial thing to remove. It also allowed to
clean up some dead code (eg the 0xdeadbeef in class construction), and
now class builders use 1 less stack word.
Python 3.4.0 introduced the LOAD_CLASSDEREF opcode, which I have not
yet understood. Still, all tests (apart from bytecode test) still pass.
Bytecode tests needs some more attention, but they are not that
important anymore.