This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.
This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.
The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).
It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.
For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:
diff of scores (higher is better)
N=2000 M=2000 bccache -> attrmapcache diff diff% (error%)
bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%)
bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%)
bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%)
bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%)
bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%)
misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%)
misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%)
misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%)
This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).
The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
diff of scores (higher is better)
N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%)
bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%)
bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%)
bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%)
bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%)
bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%)
misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%)
misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%)
In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.
See #7680 for further discussion. And see also #7653 for a discussion
about simplifying mpy-cross options.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
The PIO state machines on the RP2040 have 4 word deep TX and RX FIFOs. If
you only need one direction, you can "merge" them into either a single 8
word deep TX or RX FIFO.
We simply add constants to the PIO object, and set the appropriate bits in
`shiftctrl`.
Resolves#6854.
Signed-off-by: Tim Radvan <tim@tjvr.org>
There were a few changes that had broken this example, specifically
2cdf1d25f5 removed file.c from ports/unix.
And (at least for MacOS) mp_state_ctx must be placed in the BSS with
-fno-common so it is visible to the linker.
Signed-off-by: Santeri Paavolainen <santtu@iki.fi>
It's a bit of a pitfall with user C modules that including them in the
build does not automatically enable them. This commit changes the docs and
examples for user C modules to encourage writers of user C modules to
enable them unconditionally. This makes things simpler and covers most use
cases.
See discussion in issue #6960, and also #7086.
Signed-off-by: Damien George <damien@micropython.org>
examples/usercmodule/micropython.cmake:
Root micropython.cmake file is responsible for including modules.
examples/usercmodule/cexample/micropython.cmake:
examples/usercmodule/cppexample/micropython.cmake:
Module micropython.cmake files define the target and link it to usermod.
Signed-off-by: Phil Howard <phil@pimoroni.com>
The "word" referred to by BYTES_PER_WORD is actually the size of mp_obj_t
which is not always the same as the size of a pointer on the target
architecture. So rename this config value to better reflect what it
measures, and also prefix it with MP_.
For uses of BYTES_PER_WORD in setting the stack limit this has been
changed to sizeof(void *), because the stack usually grows with
machine-word sized values (eg an nlr_buf_t has many machine words in it).
Signed-off-by: Damien George <damien@micropython.org>
This commit adds a new port "rp2" which targets the new Raspberry Pi RP2040
microcontroller.
The build system uses pure cmake (with a small Makefile wrapper for
convenience). The USB driver is TinyUSB, and there is a machine module
with most of the standard classes implemented. Some examples are provided
in the examples/rp2/ directory.
Work done in collaboration with Graham Sanderson.
Signed-off-by: Damien George <damien@micropython.org>
This widens the characteristic/descriptor flags to 16-bit, to allow setting
encryption/authentication requirements.
Sets the required flags for NimBLE and btstack implementations.
The BLE.FLAG_* constants will eventually be deprecated in favour of copy
and paste Python constants (like the IRQs).
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Add working example code to provide a starting point for users with files
that they can just copy, and include the modules in the coverage test to
verify the complete user C module build functionality. The cexample module
uses the code originally found in cmodules.rst, which has been updated to
reflect this and partially rewritten with more complete information.
Updating to Black v20.8b1 there are two changes that affect the code in
this repository:
- If there is a trailing comma in a list (eg [], () or function call) then
that list is now written out with one line per element. So remove such
trailing commas where the list should stay on one line.
- Spaces at the start of """ doc strings are removed.
Signed-off-by: Damien George <damien@micropython.org>
This commit adds the IRQ_GATTS_INDICATE_DONE BLE event which will be raised
with the status of gatts_indicate (unlike notify, indications require
acknowledgement).
An example of its use is added to ble_temperature.py, and to the multitests
in ble_characteristic.py.
Implemented for btstack and nimble bindings, tested in both directions
between unix/btstack and pybd/nimble.
According to Supplement to the Bluetooth Core Specification v8 Part A
1.3.1, to support BR/EDR the code should set the fifth bit (Simultaneous LE
and BR/EDR to Same Device Capable (Controller)) and fourth bit
(Simultaneous LE and BR/EDR to Same Device Capable (Host)) of the flag.
This commit makes sure that all discovery complete and read/write status
events set the status to zero on success.
The status value will be implementation-dependent on non-success cases.
On btstack there's no status associated with the read result, it comes
through as a separate event. This allows you to detect read failures or
timeouts.
There doesn't appear to be any use for only triggering on specific events,
so it's just easier to number them sequentially. This makes them smaller
values so they take up only 1 byte in the ringbuf, only 1 byte for the
opcode in the bytecode, and makes room for more events.
Also add a couple of new event types that need to be implemented (to avoid
re-numbering later).
And rename _COMPLETE and _STATUS to _DONE for consistency.
In the future the "trigger" keyword argument can be reinstated by requiring
the user to compute the bitmask, eg:
ble.irq(handler, 1 << _IRQ_SCAN_RESULT | 1 << _IRQ_SCAN_DONE)
For ports that have a system malloc which is not garbage collected (eg
unix, esp32), the stream object for the DB must be retained separately to
prevent it from being reclaimed by the MicroPython GC (because the
berkeley-db library uses malloc to allocate the DB structure which stores
the only reference to the stream).
Although in some cases the user code will explicitly retain a reference to
the underlying stream because it needs to call close() on it, this is not
always the case, eg in cases where the DB is intended to live forever.
Fixes issue #5940.
No functionality change is intended with this commit, it just consolidates
the separate implementations of GC helper code to the lib/utils/ directory
as a general set of helper functions useful for any port. This reduces
duplication of code, and makes it easier for future ports or embedders to
get the GC implementation correct.
Ports should now link against gchelper_native.c and either gchelper_m0.s or
gchelper_m3.s (currently only Cortex-M is supported but other architectures
can follow), or use the fallback gchelper_generic.c which will work on
x86/x64/ARM.
The gc_helper_get_sp function from gchelper_m3.s is not really GC related
and was only used by cc3200, so it has been moved to that port and renamed
to cortex_m3_get_sp.