This patch is a code optimisation, trading text bytes for speed. On
pyboard it's an increase of 0.06% in code size for a gain (in pystone
performance) of roughly 6.5%.
The patch optimises load/store/delete of attributes in user defined classes
by not looking up special accessors (@property, __get__, __delete__,
__set__, __setattr__ and __getattr_) if they are guaranteed not to exist in
the class.
Currently, if you do my_obj.foo() then the runtime has to do a few checks
to see if foo is a property or has __get__, and if so delegate the call.
And for stores things like my_obj.foo = 1 has to first check if foo is a
property or has __set__ defined on it.
Doing all those checks each and every time the attribute is accessed has a
performance penalty. This patch eliminates all those checks for cases when
it's guaranteed that the checks will always fail, ie no attributes are
properties nor have any special accessor methods defined on them.
To make this guarantee it checks all attributes of a user-defined class
when it is first created. If any of the attributes of the user class are
properties or have special accessors, or any of the base classes of the
user class have them, then it sets a flag in the class to indicate that
special accessors must be checked for. Then in the load/store/delete code
it checks this flag to see if it can take the shortcut and optimise the
lookup.
It's an optimisation that's pretty widely applicable because it improves
lookup performance for all methods of user defined classes, and stores of
attributes, at least for those that don't have special accessors. And, it
allows to enable descriptors with minimal additional runtime overhead if
they are not used for a particular user class.
There is one restriction on dynamic class creation that has been introduced
by this patch: a user-defined class cannot go from zero special accessors
to one special accessor (or more) after that class has been subclassed. If
the script attempts this an AttributeError is raised (see addition to
tests/misc/non_compliant.py for an example of this case).
The cost in code space bytes for the optimisation in this patch is:
unix x64: +528
unix nanbox: +508
stm32: +192
cc3200: +200
esp8266: +332
esp32: +244
Performance tests that were done:
- on unix x86-64, pystone improved by about 5%
- on pyboard, pystone improved by about 6.5%, from 1683 up to 1794
- on pyboard, bm_chaos (from CPython benchmark suite) improved by about 5%
- on esp32, pystone improved by about 30% (but there are caching effects)
- on esp32, bm_chaos improved by about 11%
This VFS component allows to mount a host POSIX filesystem within the uPy
VFS sub-system. All traditional POSIX file access then goes through the
VFS, allowing to sandbox a uPy process to a certain sub-dir of the host
system, as well as mount other filesystem types alongside the host
filesystem.
Since a long time now, mp_obj_type_t no longer refers explicitly to
mp_stream_p_t but rather to an abstract "const void *protocol". So there's
no longer any need to define mp_stream_p_t in obj.h and it can go with all
its associated definitions in stream.h. Pretty much all users of this type
will already include the stream header.
The code_state.old_globals variable is there to save the globals state so
should be used for this purpose, to avoid the need for additional local
variables on the C stack.
Without this, if GC threshold is hit and there is not enough memory left to
satisfy the request, gc_collect() will run a second time and the search for
memory will happen again and will fail again.
Thanks to @adritium for pointing out this issue, see #3786.
Under ubsan, when evaluating hash(-0.) the following diagnostic occurs:
../../py/objfloat.c:102:15: runtime error: negation of
-9223372036854775808 cannot be represented in type 'mp_int_t' (aka
'long'); cast to an unsigned type to negate this value to itself
So do just that, to tell the compiler that we want to perform this
operation using modulo arithmetic rules.
Before this, ubsan would detect a problem when executing
hash(006699999999999999999999999999999999999999999999999999999999999999999999)
../../py/mpz.c:1539:20: runtime error: left shift of 1067371580458 by
32 places cannot be represented in type 'mp_int_t' (aka 'long')
When the overflow does occur it now happens as defined by the rules of
unsigned arithmetic.
When computing e.g. hash(0.4e3) with ubsan enabled, a diagnostic like the
following would occur:
../../py/objfloat.c:91:30: runtime error: shift exponent 44 is too
large for 32-bit type 'int'
By casting constant "1" to the right type the intended value is preserved.
Fuzz testing combined with the undefined behavior sanitizer found that
parsing unreasonable float literals like 1e+9999999999999 resulted in
undefined behavior due to overflow in signed integer arithmetic, and a
wrong result being returned.
There is no need to use the mp_int_t type which may be 64-bits wide, there
is enough bit-width in a normal int to parse reasonable exponents. Using
int helps to reduce code size for 64-bit ports, especially nan-boxing
builds. (Similarly for the "dig" variable which is now an unsigned int.)
Calling memset(NULL, value, 0) is not standards compliant so we must add an
explicit check that emit->label_offsets is indeed not NULL before calling
memset (this pointer will be NULL on the first pass of the parse tree and
it's more logical / safer to check this pointer rather than check that the
pass is not the first one).
Code sanitizers will warn if NULL is passed as the first value to memset,
and compilers may optimise the code based on the knowledge that any pointer
passed to memset is guaranteed not to be NULL.
Before this patch:
>>> print(')
... ')
Traceback (most recent call last):
File "<stdin>", line 1
SyntaxError: invalid syntax
After this patch:
>>> print(')
Traceback (most recent call last):
File "<stdin>", line 1
SyntaxError: invalid syntax
This matches CPython and prevents getting stuck in REPL continuation when a
1-quote is unmatched.
Before this patch, when using the switch statement for dispatch in the VM
(not computed goto) a pending exception check was done after each opcode.
This is not necessary and this patch makes the pending exception check only
happen when explicitly requested by certain opcodes, like jump. This
improves performance of the VM by about 2.5% when using the switch.
This patch fixes the macro so you can pass any name in, and the macro will
make more sense if you're reading it on its own. It worked previously
because n_state is always passed in as n_state_out_var.
gcc 8.0 supports the naked attribute for x86 systems so it can now be used
here. And in fact it is necessary to use this for nlr_push because gcc 8.0
no longer generates a prelude for this function (even without the naked
attribute).
This patch moves the start of the root pointer section in mp_state_ctx_t
so that it skips entries that are not pointers and don't need scanning.
Previously, the start of the root pointer section was at the very beginning
of the mp_state_ctx_t struct (which is the beginning of mp_state_thread_t).
This was the original assembler version of the NLR code was hard-coded to
have the nlr_top pointer at the start of this state structure. But now
that the NLR code is partially written in C there is no longer this
restriction on the location of nlr_top (and a comment to this effect has
been removed in this patch).
So now the root pointer section starts part way through the
mp_state_thread_t structure, after the entries which are not root pointers.
This patch also moves the non-pointer entries for MICROPY_ENABLE_SCHEDULER
outside the root pointer section.
Moving non-pointer entries out of the root pointer section helps to make
the GC more precise and should help to prevent some cases of collectable
garbage being kept.
This patch also has a measurable improvement in performance of the
pystone.py benchmark: on unix x86-64 and stm32 there was an improvement of
roughly 0.6% (tested with both gcc 7.3 and gcc 8.1).
This patch changes 2 things in the endianness detection:
1. Don't assume that __BYTE_ORDER__ not being __ORDER_LITTLE_ENDIAN__ means
that the machine is big endian, so add an explicit check that this macro
is indeed __ORDER_BIG_ENDIAN__ (same with __BYTE_ORDER, __LITTLE_ENDIAN
and __BIG_ENDIAN). A machine could have PDP endianness.
2. Remove the checks which base their autodetection decision on whether any
little or big endian macros are defined (eg __LITTLE_ENDIAN__ or
__BIG_ENDIAN__). Just because a system defines these does not mean it
has that endianness.
See issue #3760.
For cases where size_t is smaller than mp_int_t (eg nan-boxing builds) the
difference between two size_t's is not sign extended into mp_int_t and so
the result is never negative. This patch fixes this bug by using ssize_t
for the type of the result.
This gives dir() better behaviour when listing the attributes of a user
type that defines __getattr__: it will now not list those attributes for
which __getattr__ raises AttributeError (meaning the attribute is not
supported by the object).
This patch fixes the possibility of a crash of the REPL when tab-completing
an object which raises an exception when its attributes are accessed.
See issue #3729.
This new helper function acts like mp_load_method_maybe but is wrapped in
an NLR handler so it can catch exceptions. It prevents AttributeError from
propagating out, and optionally all other exceptions. This helper can be
used to fully implement hasattr (see follow-up commit), and also for cases
where mp_load_method_maybe is used but it must now raise an exception.