Current users of fixed vstr buffers (building file paths) assume that there
is no overflow and do not check for overflow after building the vstr. This
has the potential to lead to NULL pointer dereferences
(when vstr_null_terminated_str returns NULL because it can't allocate RAM
for the terminating byte) and stat'ing and loading invalid path names (due
to the path being truncated). The safest and simplest thing to do in these
cases is just raise an exception if a write goes beyond the end of a fixed
vstr buffer, which is what this patch does. It also simplifies the vstr
code.
Now there is just one function to allocate a new vstr, namely vstr_new
(in addition to vstr_init etc). The caller of this function should know
what initial size to allocate for the buffer, or at least have some policy
or config option, instead of leaving it to a default (as it was before).
The vstr.had_error flag was a relic from the very early days which assumed
that the malloc functions (eg m_new, m_renew) returned NULL if they failed
to allocate. But that's no longer the case: these functions will raise an
exception if they fail.
Since it was impossible for had_error to be set, this patch introduces no
change in behaviour.
An alternative option would be to change the malloc calls to the _maybe
variants, which return NULL instead of raising, but then a lot of code
will need to explicitly check if the vstr had an error and raise if it
did.
The code-size savings for this patch are, in bytes: bare-arm:188,
minimal:456, unix(NDEBUG,x86-64):368, stmhal:228, esp8266:360.
Effect measured on esp8266 port:
Before:
>>> pystone_lowmem.main(10000)
Pystone(1.2) time for 10000 passes = 44214 ms
This machine benchmarks at 226 pystones/second
>>> pystone_lowmem.main(10000)
Pystone(1.2) time for 10000 passes = 44246 ms
This machine benchmarks at 226 pystones/second
After:
>>> pystone_lowmem.main(10000)
Pystone(1.2) time for 10000 passes = 44343ms
This machine benchmarks at 225 pystones/second
>>> pystone_lowmem.main(10000)
Pystone(1.2) time for 10000 passes = 44376ms
This machine benchmarks at 225 pystones/second
vstr_null_terminated_str is almost certainly a vstr finalization operation,
so it should add the requested NUL byte, and not try to pre-allocate more.
The previous implementation could actually allocate double of the buffer
size.
I checked the entire codebase, and every place that vstr_init_len
was called, there was a call to mp_obj_new_str_from_vstr after it.
mp_obj_new_str_from_vstr always tries to reallocate a new buffer
1 byte larger than the original to store the terminating null
character.
In many cases, if we allocated the initial buffer to be 1 byte
longer, we can prevent this extra allocation, and just reuse
the originally allocated buffer.
Asking to read 256 bytes and only getting 100 will still cause
the extra allocation, but if you ask to read 256 and get 256
then the extra allocation will be optimized away.
Yes - the reallocation is optimized in the heap to try and reuse
the buffer if it can, but it takes quite a few cycles to figure
this out.
Note by Damien: vstr_init_len should now be considered as a
string-init convenience function and used only when creating
null-terminated objects.
Previous to this patch the printing mechanism was a bit of a tangled
mess. This patch attempts to consolidate printing into one interface.
All (non-debug) printing now uses the mp_print* family of functions,
mainly mp_printf. All these functions take an mp_print_t structure as
their first argument, and this structure defines the printing backend
through the "print_strn" function of said structure.
Printing from the uPy core can reach the platform-defined print code via
two paths: either through mp_sys_stdout_obj (defined pert port) in
conjunction with mp_stream_write; or through the mp_plat_print structure
which uses the MP_PLAT_PRINT_STRN macro to define how string are printed
on the platform. The former is only used when MICROPY_PY_IO is defined.
With this new scheme printing is generally more efficient (less layers
to go through, less arguments to pass), and, given an mp_print_t*
structure, one can call mp_print_str for efficiency instead of
mp_printf("%s", ...). Code size is also reduced by around 200 bytes on
Thumb2 archs.
This cleans up vstr so that it's a pure "variable buffer", and the user
can decide whether they need to add a terminating null byte. In most
places where vstr is used, the vstr did not need to be null terminated
and so this patch saves code size, a tiny bit of RAM, and makes vstr
usage more efficient. When null termination is needed it must be
done explicitly using vstr_null_terminate.
With this patch str/bytes construction is streamlined. Always use a
vstr to build a str/bytes object. If the size is known beforehand then
use vstr_init_len to allocate only required memory. Otherwise use
vstr_init and the vstr will grow as needed. Then use
mp_obj_new_str_from_vstr to create a str/bytes object using the vstr
memory.
Saves code ROM: 68 bytes on stmhal, 108 bytes on bare-arm, and 336 bytes
on unix x64.
This patch allows to reuse vstr memory when creating str/bytes object.
This improves memory usage.
Also saves code ROM: 128 bytes on stmhal, 92 bytes on bare-arm, and 88
bytes on unix x64.
It seems most sensible to use size_t for measuring "number of bytes" in
malloc and vstr functions (since that's what size_t is for). We don't
use mp_uint_t because malloc and vstr are not Micro Python specific.
Blanket wide to all .c and .h files. Some files originating from ST are
difficult to deal with (license wise) so it was left out of those.
Also merged modpyb.h, modos.h, modstm.h and modtime.h in stmhal/.
vstr is initially intended to deal with arbitrary-length strings. By
providing a bit lower-level API calls, it will be also useful to deal
with arbitrary-length I/O buffers (the difference from strings is that
buffers are filled from "outside", via I/O).
Another issue, especially aggravated by I/O buffer use, is alloc size
vs actual size length. If allocated 1Mb for buffer, but actually
read 1 byte, we don't want to keep rest of 1Mb be locked by this I/O
result, but rather return it to heap ASAP ("shrink" buffer before passing
it to qstr_from_str_take()).