This commit makes the output of `tinygo test` similar to that of `go
test`. It changes the following things in the process:
* Running multiple tests in a single command is now possible. They
aren't paralellized yet.
* Packages with no test files won't crash TinyGo, instead it logs it
in the same way the Go toolchain does.
With this is possible to enable e.g., SIMD in WASM using -llvm-features
+simd128. Multiple features can be specified separated by comma,
e.g., -llvm-features +simd128,+tail-call
With help from @deadprogram and @aykevl.
At the moment, all targets use the Clang compiler to compile C and
assembly files. There is no good reason to make this configurable
anymore and in fact it will make future changes more complicated (and
thus more likely to have bugs). Therefore, I've removed support for
setting the compiler.
Note that the same is not true for the linker. While it makes sense to
standardize on the Clang compiler (because if Clang doesn't support a
target, TinyGo is unlikely to support it either), linkers will remain
configurable for the foreseeable future. One example is Xtensa, which is
supported by the Xtensa LLVM fork but doesn't have support in ld.lld
yet.
I've also fixed a bug in compileAndCacheCFile: it wasn't using the right
CFlags for caching purposes. This could lead to using stale caches. This
commit fixes that too.
The -Qunused-arguments flag disables the warning where some flags are
not relevant to a compilation. This commonly happens when compiling
assembly files (.s or .S files) because some flags are specific to C and
not relevant to assembly.
Because practically all baremetal targets need some form of assembly,
this flag is added to most CFlags. This creates a lot of noise. And it
is also added for compiling C code where it might hide bugs (by hiding
the fact a flag is actually unused).
This commit adds the flag to all assembly compilations and removes them
from all target JSON files.
This commit implements replacing some global variables with a different
value, if the global variable has no initializer. For example, if you
have:
package main
var version string
you can replace the value with -ldflags="-X main.version=0.2".
Right now it only works for uninitialized globals. The Go tooling also
supports initialized globals (var version = "<undefined>") but that is a
bit hard to combine with how initialized globals are currently
implemented.
The current implementation still allows caching package IR files while
making sure the values don't end up in the build cache. This means
compiling a program multiple times with different values will use the
cached package each time, inserting the string value only late in the
build process.
Fixes#1045
This should result in a small compile time reduction for incremental
builds, somewhere around 5-9%.
This commit, while small, required many previous commits to not regress
binary size. Right now binary size is basically identical with very few
changes in size (the only baremetal program that changed in size did so
with a 4 byte increase).
This commit is one extra step towards doing as much work as possible in
the parallel and cached package build step, out of the serial LTO phase.
Later improvements in this area have this change as a prerequisite.
This simplifies future changes. While the move itself is very simple, it
required some other changes to a few transforms that create new
functions to add the optsize attribute manually. It also required
abstracting away the optimization level flags (based on the -opt flag)
so that it can easily be retrieved from the config object.
This commit does not impact binary size on baremetal and WebAssembly.
I've seen a few tests on linux/amd64 grow slightly in size, but I'm not
too worried about those.
This patch adds support for passing CFLAGS added in #cgo lines of the
CGo preprocessing phase to the compiler when compiling C files inside
packages. This is expected and convenient but didn't work before.
This probably won't speed up the build on multicore systems (the build
is still dominated by the whole-program optimization step) but should be
useful at a later date for other optimizations. For example, I intend to
eventually optimize each package individually including C files, which
should enable cross-language optimizations (inlining C functions into Go
functions, for example). For that to work, accurate dependency tracking
is important.
Add a 'result' member to the compileJob struct which is used by the link
job to get all the paths that should be linked together. This is not yet
necessary (the paths are fixed), but soon the paths are only known after
a linker dependency has run.
This results in a significant speedup in some cases. For example, this
runs over twice as fast with a warm cache:
tinygo build -o test.elf ./testdata/stdlib.go
This should help a lot with edit-compile-test cycles, that typically
only modify a single package.
This required some changes to the interp package to deal with globals
created in a previous run of the interp package and to deal with
external globals (that can't be loaded from or stored to).
This commit switches from the previous behavior of compiling the whole
program at once, to compiling every package in parallel and linking the
LLVM bitcode files together for further whole-program optimization.
This is a small performance win, but it has several advantages in the
future:
- There are many more things that can be done per package in parallel,
avoiding the bottleneck at the end of the compiler phase. This
should speed up the compiler futher.
- This change is a necessary step towards a non-LTO build mode for
fast incremental builds that only rebuild the changed package, when
compiler speed is more important than binary size.
- This change refactors the compiler in such a way that it will be
easier to inspect the IR for one package only. Inspecting this IR
will be very helpful for compiler developers.
This optimization level wasn't working before because some passes expect
some globals to be cleaned up afterwards. Cleaning these globals is
easy, just add the pass necessary for it. This shouldn't reduce the
usefulness of the -opt=0 build flag as most optimizations are still
skipped.
This way is more consistent with how picolibc is specified and allows
generating a helpful error message. This error message should never be
generated for TinyGo binary releases, only when doing local development.
This avoids external commands from finishing after the TinyGo command
exits. For example, when testing out compiler-rt on AVR, I got the
following error:
$ go install; and tinygo run -target=atmega1284p ./testdata/calls.go
[... Clang error removed ...]
error: failed to build /home/ayke/src/github.com/tinygo-org/tinygo/lib/compiler-rt/lib/builtins/extendsfdf2.c: exit status 1
error: unable to open output file '/tmp/tinygo361380649/build-lib-compiler-rt/ffsdi2.c.o': 'No such file or directory'
1 error generated.
That last error ("unable to open output file") is a spurious error
because the temporary directory has already been removed. This commit
waits until all running jobs have finished before returning, so that
these errors won't happen.
This flag is important for the Xtensa backend because by default a more
powerful backend (ESP32) is assumed. Therefore, compiling for the
ESP8266 won't work by default and needs the -mcpu flag.
This key was intended as some sort of cache key (as the name indicates)
but that never happened. Let's remove it to avoid clutter. The cacheLoad
and cacheStore functions are only used for C libraries (libc,
compiler-rt) so their caching behavior is a bit different from other
things worth caching.
Moving settings to a separate config struct has two benefits:
- It decouples the compiler a bit from other packages, most
importantly the compileopts package. Decoupling is generally a good
thing.
- Perhaps more importantly, it precisely specifies which settings are
used while compiling and affect the resulting LLVM module. This will
be necessary for caching the LLVM module.
While it would have been possible to cache without this refactor, it
would have been very easy to miss a setting and thus let the
compiler work with invalid/stale data.
This commit parallelizes almost everything that can currently be
parallelized. With that, it also introduces a framework for easily
parallelizing other parts of the compiler.
Code for baremetal targets already compiles slightly faster because it
can parallelize the compilation of supporting assembly files. However,
the speedup is especially noticeable when libraries (compiler-rt,
picolibc) also need to be compiled: they will be compiled in parallel
next to the Go files using all available cores. On my dual core laptop
(4 cores if you count hyperthreading) this cuts compilation time roughly
in half when compiling something for a Cortex-M board after running
`tinygo clean`.
This commit finally introduces unit tests for the compiler, to check
whether input Go code is converted to the expected output IR.
To make this necessary, a few refactors were needed. Hopefully these
refactors (to compile a program package by package instead of all at
once) will eventually become standard, so that packages can all be
compiled separate from each other and be cached between compiles.
This commit switches to LLVM 11 for builds with LLVM linked statically
(e.g. `make`). It does not yet switch the default for builds dynamically
linked to LLVM, that should be done in a later change.
This commit also changes to use the default host toolchain (probably
GCC) instead of Clang as the default compiler in CI. There were some
issues with Clang 3.8 in CI and hopefully this will fix it.
Additionally it updates the way LLVM is built on Windows, with
-DLLVM_ENABLE_PIC=OFF (which should have been used all along). This
change makes it possible to revert a hack to build libclang manually and
instead uses the libclang static library like on all other operating
systems, simplifying the Makefile.
On Fedora 33+, there is a buggy package that installs to
`/usr/lib64/clang/{version}/lib`, even on 32-bit systems. The original
code sees the `/usr/lib64/clang/{version}` directory, checks for an
`include` subdirectory, and then gives up because it doesn't exist.
To be more robust, check both `/usr/lib64/clang/{version}/include` and
`/usr/lib/clang/{version}/include`, and only allow versions that match
the LLVM major version used to build tinygo.
Unfortunately, the .rodata section can't be stored in flash. Instead, an
explicit .progmem section should be used, which is supported in LLVM as
address space 1 but not exposed to normal programs.
Eventually a pass should be written that converts trivial const globals
of which all loads are visible to be in addrspace 1, to get the benefits
of storing those globals directly in ROM.
This can be useful to test improvements in LLVM master and to make it
possible to support LLVM 11 for the most part already before the next
release. That also allows catching LLVM bugs early to fix them upstream.
Note that tests do not yet pass for this LLVM version, but the TinyGo
compiler can be built with the binaries from apt.llvm.org (at the time
of making this commit).
On 64-bit Fedora, `lib64` is where the clang headers are, not `lib`. For
multiarch systems, both will exist, but it's likely you want 64-bit, so
check that first.
Test binaries must be run in the source directory of the package to be
tested. This wasn't done, leading to a few "file not found" errors.
This commit implements this. Unfortunately, it does not allow more
packages to be tested as both affected packages (debug/macho and
debug/plan9obj) will still fail with this patch even though the "file
not found" errors are gone.
Right now this requires setting the -port parameter, but other than that
it totally works (if esptool.py is installed). It works by converting
the ELF file to the custom ESP32 image format and flashing that using
esptool.py.
Interrupts store 32 bytes on the current stack, which may be a goroutine
stack. After that the interrupt switches to the main stack pointer so
nothing more is pushed to the current stack. However, these 32 bytes
were not included in the stack size calculation.
This commit adds those 32 bytes. The code is rather verbose, but that is
intentional to make sure it is readable. This is tricky code that's hard
to get right, so I'd rather keep it well documented.
This is a big change that will determine the stack size for many
goroutines automatically. Functions that aren't recursive and don't call
function pointers can in many cases have an automatically determined
worst case stack size. This is useful, as the stack size is usually much
lower than the previous hardcoded default of 1024 bytes: somewhere
around 200-500 bytes is common.
A side effect of this change is that the default stack sizes (including
the stack size for other architectures such as AVR) can now be changed
in the config JSON file, making it tunable per application.
Currently, turning optimizations off causes compile failures.
We rely on the optimizer removing some dead symbols.
Avoid providing an option that does not work right now.
In the future once everything has been fixed we can re-enable this.
For now, this is just an extra flag that can be used to print stack
frame information, but this is intended to provide a way to determine
stack sizes for goroutines at compile time in many cases.
Stack sizes are often somewhere around 350 bytes so are in fact not all
that big usually. Once this can be determined at compile time in many
cases, it is possible to use this information when available and as a
result increase the fallback stack size if the size cannot be determined
at compile time. This should reduce stack overflows while at the same
time reducing RAM consumption in many cases.
Interesting output for testdata/channel.go:
function stack usage (in bytes)
Reset_Handler 332
.Lcommand-line-arguments.fastreceiver 220
.Lcommand-line-arguments.fastsender 192
.Lcommand-line-arguments.iterator 192
.Lcommand-line-arguments.main$1 184
.Lcommand-line-arguments.main$2 200
.Lcommand-line-arguments.main$3 200
.Lcommand-line-arguments.main$4 328
.Lcommand-line-arguments.receive 176
.Lcommand-line-arguments.selectDeadlock 72
.Lcommand-line-arguments.selectNoOp 72
.Lcommand-line-arguments.send 184
.Lcommand-line-arguments.sendComplex 192
.Lcommand-line-arguments.sender 192
.Lruntime.run$1 548
This shows that the stack size (if these numbers are correct) can in
fact be determined automatically in many cases, especially for small
goroutines. One of the great things about Go is lightweight goroutines,
and reducing stack sizes is very important to make goroutines
lightweight on microcontrollers.
This is necessary to avoid a circular dependency in the loader (which
soon will need to read the Go version) and because it seems like a
better place anyway.
Eventually we might want to start using the FPU, but the easy option
right now is to simply disable it everywhere. Previously, it depended on
whether Clang was built as part of TinyGo or it was an external binary.
By setting the floating point mode explicitly, such inconsistencies are
avoided.
This commit creates a new cortex-m4 target which can be the central
place for setting FPU-related settings across all Cortex-M4 chips.