This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
Instead of at end of state, n_state - 1. It was originally (way back in
v1.0) put at the end of the state because the VM didn't have a pointer to
the start. But now that the VM takes a mp_code_state_t pointer it does
have a pointer to the start of the state so can put the exception object
there.
This commit saves about 30 bytes of code on all architectures, and, more
importantly, reduces C-stack usage by a couple of words (8 bytes on Thumb2
and 16 bytes on x86-64) for every (non-generator) call of a bytecode
function because fun_bc_call no longer needs to remember the n_state
variable.
This commit implements PEP479 which disallows raising StopIteration inside
a generator to signal that it should be finished. Instead, the generator
should simply return when it is complete.
See https://www.python.org/dev/peps/pep-0479/ for details.
With the recent change b488a4a848, a
generating function now has the same layout in memory as a normal bytecode
function, and so can reuse the latter's attribute accessor code to
implement __name__.
For generating functions there is no need to wrap the bytecode function in
a generator wrapper instance. Instead the type of the bytecode function
can be changed to mp_type_gen_wrap. This reduces code size and saves a
block of GC heap RAM for each generator.
The code_state.old_globals variable is there to save the globals state so
should be used for this purpose, to avoid the need for additional local
variables on the C stack.
This implements .pend_throw(exc) method, which sets up an exception to be
triggered on the next call to generator's .__next__() or .send() method.
This is unlike .throw(), which immediately starts to execute the generator
to process the exception. This effectively adds Future-like capabilities
to generator protocol (exception will be raised in the future).
The need for such a method arised to implement uasyncio wait_for() function
efficiently (its behavior is clearly "Future" like, and normally would
require to introduce an expensive Future wrapper around all native
couroutines, like upstream asyncio does).
py/objgenerator: pend_throw: Return previous pended value.
This effectively allows to store an additional value (not necessary an
exception) in a coroutine while it's not being executed. uasyncio has
exactly this usecase: to mark a coro waiting in I/O queue (and thus
not executed in the normal scheduling queue), for the purpose of
implementing wait_for() function (cancellation of such waiting coro
by a timeout).
This commit essentially reverts aa9dbb1b03
where this if-condition was added. It seems that even when that commit
was made the code was never reached by any tests, nor reachable by
analysis (see below). The same is true with the code as it currently
stands: no test triggers this if-condition, nor any uasyncio examples.
Analysing the flow of the program also shows that it's not reachable:
==START==
-> to trigger this if condition mp_execute_bytecode() must return
MP_VM_RETURN_YIELD with *sp==MP_OBJ_STOP_ITERATION
-> mp_execute_bytecode() can only return MP_VM_RETURN_YIELD from the
MP_BC_YIELD_VALUE bytecode, which can happen in 2 ways:
-> 1) from a "yield <x>" in bytecode, but <x> must always be a proper
object, never MP_OBJ_STOP_ITERATION; ==END1==
-> 2) via yield from, via mp_resume() which must return
MP_VM_RETURN_YIELD with ret_value==MP_OBJ_STOP_ITERATION, which
can happen in 3 ways:
-> 1) it delegates to mp_obj_gen_resume(); go back to ==START==
-> 2) it returns MP_VM_RETURN_YIELD directly but with a guard that
ret_val!=MP_OBJ_STOP_ITERATION; ==END2==
-> 3) it returns MP_VM_RETURN_YIELD with ret_val set from
mp_call_method_n_kw(), but mp_call_method_n_kw() must return a
proper object, never MP_OBJ_STOP_ITERATION; ==END3==
The above shows there is no way to trigger the if-condition and it can be
removed.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
Taking the address of a local variable leads to increased stack usage, so
the mp_decode_uint_skip() function is added to reduce the need for taking
addresses. The changes in this patch reduce stack usage of a Python call
by 8 bytes on ARM Thumb, by 16 bytes on non-windowing Xtensa archs, and by
16 bytes on x86-64. Code size is also slightly reduced on most archs by
around 32 bytes.
Instead of caching data that is constant (code_info, const_table and
n_state), store just a pointer to the underlying function object from which
this data can be derived.
This helps reduce stack usage for the case when the mp_code_state_t
structure is stored on the stack, as well as heap usage when it's stored
on the heap.
The downside is that the VM becomes a little more complex because it now
needs to derive the data from the underlying function object. But this
doesn't impact the performance by much (if at all) because most of the
decoding of data is done outside the main opcode loop. Measurements using
pystone show that little to no performance is lost.
This patch also fixes a nasty bug whereby the bytecode can be reclaimed by
the GC during execution. With this patch there is always a pointer to the
function object held by the VM during execution, since it's stored in the
mp_code_state_t structure.
Allows to iterate over the following without allocating on the heap:
- tuple
- list
- string, bytes
- bytearray, array
- dict (not dict.keys, dict.values, dict.items)
- set, frozenset
Allows to call the following without heap memory:
- all, any, min, max, sum
TODO: still need to allocate stack memory in bytecode for iter_buf.
Checks for number of args removes where guaranteed by function descriptor,
self checking is replaced with mp_check_self(). In few cases, exception
is raised instead of assert.
This patch changes the type signature of .make_new and .call object method
slots to use size_t for n_args and n_kw (was mp_uint_t. Makes code more
efficient when mp_uint_t is larger than a machine word. Doesn't affect
ports when size_t and mp_uint_t have the same size.
This allows the mp_obj_t type to be configured to something other than a
pointer-sized primitive type.
This patch also includes additional changes to allow the code to compile
when sizeof(mp_uint_t) != sizeof(void*), such as using size_t instead of
mp_uint_t, and various casts.
Unfortunately, MP_OBJ_STOP_ITERATION doesn't have means to pass an associated
value, so we can't optimize StopIteration exception with (non-None) argument
to MP_OBJ_STOP_ITERATION.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
This patch gets full function argument passing working with native
emitter. Includes named args, keyword args, default args, var args
and var keyword args. Fully Python compliant.
It reuses the bytecode mp_setup_code_state function to do all the hard
work. This function is slightly adjusted to accommodate native calls,
and the native emitter is forced a bit to emit similar prelude and
code-info as bytecode.
This saves a lot of RAM for 2 reasons:
1. For functions that don't have default values, var args or var kw
args (which is a large number of functions in the general case), the
mp_obj_fun_bc_t type now fits in 1 GC block (previously needed 2 because
of the extra pointer to point to the arg_names array). So this saves 16
bytes per function (32 bytes on 64-bit machines).
2. Combining separate memory regions generally saves RAM because the
unused bytes at the end of the GC block are saved for 1 of the blocks
(since that block doesn't exist on its own anymore). So generally this
saves 8 bytes per function.
Tested by importing lots of modules:
- 64-bit Linux gave about an 8% RAM saving for 86k of used RAM.
- pyboard gave about a 6% RAM saving for 31k of used RAM.
Code-info size, block name, source name, n_state and n_exc_stack now use
variable length encoded uints. This saves 7-9 bytes per bytecode
function for most functions.
This improves stack usage in callers to mp_execute_bytecode2, and is step
forward towards unifying execution interface for function and generators
(which is important because generators don't even support full forms
of arguments passing (keywords, etc.)).