This commit parallelizes almost everything that can currently be
parallelized. With that, it also introduces a framework for easily
parallelizing other parts of the compiler.
Code for baremetal targets already compiles slightly faster because it
can parallelize the compilation of supporting assembly files. However,
the speedup is especially noticeable when libraries (compiler-rt,
picolibc) also need to be compiled: they will be compiled in parallel
next to the Go files using all available cores. On my dual core laptop
(4 cores if you count hyperthreading) this cuts compilation time roughly
in half when compiling something for a Cortex-M board after running
`tinygo clean`.
Eventually we might want to start using the FPU, but the easy option
right now is to simply disable it everywhere. Previously, it depended on
whether Clang was built as part of TinyGo or it was an external binary.
By setting the floating point mode explicitly, such inconsistencies are
avoided.
This commit creates a new cortex-m4 target which can be the central
place for setting FPU-related settings across all Cortex-M4 chips.
The frame pointer was already omitted in the object files that TinyGo
emits, but wasn't yet omitted in the C files it compiles. Omitting the
frame pointer is good for code size (and perhaps performance).
The frame pointer was originally used for printing stack traces in a
debugger. However, advances in DWARF debug info have made it largely
unnecessary (debug info contains enough information now to recover the
frame pointer even without an explicit frame pointer register). In fact,
GDB has been able to produce backtraces in TinyGo compiled code for a
while now while it didn't include a frame pointer.
The main change is in building the libraries, where -fshort-enums was
passed on RISC-V while other C files weren't compiled with this setting.
Note: the test already passed before this change, but it seems like a
good idea to explicitly test for enum size consistency.
There is also not a particular reason not to pass -fshort-enums on
RISC-V. Perhaps it's better to do it there too (on baremetal targets
that don't have to worry about binary compatibility).
This refactor makes adding a new library (such as a libc) much easier in
the future as it avoids a lot of duplicate code. Additionally, CI should
become a little bit faster (~15s) as build-builtins now uses the build
cache.