When an exception is raised and is to be handled by the VM, it is stored
on the Python value stack so the bytecode can access it. CPython stores
3 objects on the stack for each exception: exc type, exc instance and
traceback. uPy followed this approach, but it turns out not to be
necessary. Instead, it is enough to store just the exception instance on
the Python value stack. The only place where the 3 values are needed
explicitly is for the __exit__ handler of a with-statement context, but
for these cases the 3 values can be extracted from the single exception
instance.
This patch removes the need to store 3 values on the stack, and instead
just stores the exception instance.
Code size is reduced by about 50-100 bytes, the compiler and VM are
slightly simpler, generate bytecode is smaller (by 2 bytes for each try
block), and the Python value stack is reduced in size for functions that
handle exceptions.
The 3 kinds of comprehensions are similar enough that merging their emit
functions reduces code size. Decreases in code size in bytes are:
bare-arm:24, minimal:96, unix(NDEBUG,x86-64):328, stmhal:80, esp8266:76.
They are sugar for marking function as generator, "yield from"
and pep492 python "semantically equivalents" respectively.
@dpgeorge was the original author of this patch, but @pohmelie made
changes to implement `async for` and `async with`.
Previous to this patch, the "**b" in "a**b" had its own parse node with
just one item (the "b"). Now, the "b" is just the last element of the
power parse-node. This saves (a tiny bit of) RAM when compiling.
This new compile-time option allows to make the bytecode compiler
configurable at runtime by setting the fields in the mp_dynamic_compiler
structure. By using this feature, the compiler can generate bytecode
that targets any MicroPython runtime/VM, regardless of the host and
target compile-time settings.
Options so far that fall under this dynamic setting are:
- maximum number of bits that a small int can hold;
- whether caching of lookups is used in the bytecode;
- whether to use unicode strings or not (lexer behaviour differs, and
therefore generated string constants differ).
Before this patch, (x+y)*z would be parsed to a tree that contained a
redundant identity parse node corresponding to the parenthesis. With
this patch such nodes are optimised away, which reduces memory
requirements for expressions with parenthesis, and simplifies the
compiler because it doesn't need to handle this identity case.
A parenthesis parse node is still needed for tuples.
MICROPY_ENABLE_COMPILER can be used to enable/disable the entire compiler,
which is useful when only loading of pre-compiled bytecode is supported.
It is enabled by default.
MICROPY_PY_BUILTINS_EVAL_EXEC controls support of eval and exec builtin
functions. By default they are only included if MICROPY_ENABLE_COMPILER
is enabled.
Disabling both options saves about 40k of code size on 32-bit x86.
To use, put the following in mpconfigport.h:
#define MICROPY_OBJ_REPR (MICROPY_OBJ_REPR_D)
#define MICROPY_FLOAT_IMPL (MICROPY_FLOAT_IMPL_DOUBLE)
typedef int64_t mp_int_t;
typedef uint64_t mp_uint_t;
#define UINT_FMT "%llu"
#define INT_FMT "%lld"
Currently does not work with native emitter enabled.
It makes much more sense to do constant folding in the parser while the
parse tree is being built. This eliminates the need to create parse
nodes that will just be folded away. The code is slightly simpler and a
bit smaller as well.
Constant folding now has a configuration option,
MICROPY_COMP_CONST_FOLDING, which is enabled by default.
With this patch parse nodes are allocated sequentially in chunks. This
reduces fragmentation of the heap and prevents waste at the end of
individually allocated parse nodes.
Saves roughly 20% of RAM during parse stage.
Function annotations are only needed when the native emitter is enabled
and when the current scope is emitted in viper mode. All other times
the annotations can be skipped completely.
unix-cpy was originally written to get semantic equivalent with CPython
without writing functional tests. When writing the initial
implementation of uPy it was a long way between lexer and functional
tests, so the half-way test was to make sure that the bytecode was
correct. The idea was that if the uPy bytecode matched CPython 1-1 then
uPy would be proper Python if the bytecodes acted correctly. And having
matching bytecode meant that it was less likely to miss some deep
subtlety in the Python semantics that would require an architectural
change later on.
But that is all history and it no longer makes sense to retain the
ability to output CPython bytecode, because:
1. It outputs CPython 3.3 compatible bytecode. CPython's bytecode
changes from version to version, and seems to have changed quite a bit
in 3.5. There's no point in changing the bytecode output to match
CPython anymore.
2. uPy and CPy do different optimisations to the bytecode which makes it
harder to match.
3. The bytecode tests are not run. They were never part of Travis and
are not run locally anymore.
4. The EMIT_CPYTHON option needs a lot of extra source code which adds
heaps of noise, especially in compile.c.
5. Now that there is an extensive test suite (which tests functionality)
there is no need to match the bytecode. Some very subtle behaviour is
tested with the test suite and passing these tests is a much better
way to stay Python-language compliant, rather than trying to match
CPy bytecode.
Previous to this patch there were some cases where line numbers for
errors were 0 (unknown). Now the compiler attempts to give a better
line number where possible, in some cases giving the line number of the
closest statement, and other cases the line number of the inner-most
scope of the error (eg the line number of the start of the function).
This helps to give good (and sometimes exact) line numbers for
ViperTypeError exceptions.
This patch also makes sure that the first compile error (eg SyntaxError)
that is encountered is reported (previously it was the last one that was
reported).
ViperTypeError now includes filename and function name where the error
occurred. The line number is the line number of the start of the
function definition, which is the best that can be done without a lot
more work.
Partially addresses issue #1381.
Previous to this patch each time a bytes object was referenced a new
instance (with the same data) was created. With this patch a single
bytes object is created in the compiler and is loaded directly at execute
time as a true constant (similar to loading bignum and float objects).
This saves on allocating RAM and means that bytes objects can now be
used when the memory manager is locked (eg in interrupts).
The MP_BC_LOAD_CONST_BYTES bytecode was removed as part of this.
Generated bytecode is slightly larger due to storing a pointer to the
bytes object instead of the qstr identifier.
Code size is reduced by about 60 bytes on Thumb2 architectures.
This fixes a long standing problem that viper code generation gave
terrible error messages, and actually no errors on pyboard where
assertions are disabled.
Now all compile-time errors are raised as proper Python exceptions, and
are of type ViperTypeError.
Addresses issue #940.