This commit improves accuracy of the -size=full flag in a big way.
Instead of relying on symbol names to figure out by which package
symbols belong, it will instead mostly use DWARF debug information
(specifically, debug line tables and debug information for global
variables) relying on symbols only for some specific things. This is
much more accurate: it also accounts for inlined functions.
For example, here is how it looked previously when compiling a personal
project:
code rodata data bss | flash ram | package
1902 333 0 0 | 2235 0 | (bootstrap)
46 256 0 0 | 302 0 | github
0 454 0 0 | 454 0 | handleHardFault$string
154 24 4 4 | 182 8 | internal/task
2498 83 5 2054 | 2586 2059 | machine
0 16 24 130 | 40 154 | machine$alloc
1664 32 12 8 | 1708 20 | main
0 0 0 200 | 0 200 | main$alloc
2476 79 0 36 | 2555 36 | runtime
576 0 0 0 | 576 0 | tinygo
9316 1277 45 2432 | 10638 2477 | (sum)
11208 - 48 6548 | 11256 6596 | (all)
And here is how it looks now:
code rodata data bss | flash ram | package
------------------------------- | --------------- | -------
1509 0 12 23 | 1521 35 | (unknown)
660 0 0 0 | 660 0 | C compiler-rt
58 0 0 0 | 58 0 | C picolibc
0 0 0 4096 | 0 4096 | C stack
174 0 0 0 | 174 0 | device/arm
6 0 0 0 | 6 0 | device/sam
598 256 0 0 | 854 0 | github.com/aykevl/ledsgo
320 24 0 4 | 344 4 | internal/task
1414 99 24 2181 | 1537 2205 | machine
726 352 12 208 | 1090 220 | main
3002 542 0 36 | 3544 36 | runtime
848 0 0 0 | 848 0 | runtime/volatile
70 0 0 0 | 70 0 | time
550 0 0 0 | 550 0 | tinygo.org/x/drivers/ws2812
------------------------------- | --------------- | -------
9935 1273 48 6548 | 11256 6596 | total
There are some notable differences:
* Odd packages like main$alloc and handleHardFault$string are gone,
instead their code is put in the correct package.
* C libraries and the stack are now included in the list, they were
previously part of the (bootstrap) pseudo-package.
* Unknown bytes are slightly reduced. It should be possible to reduce
it significantly more in the future: most of it is now caused by
interface invoke wrappers.
* Inlined functions are now correctly attributed. For example, the
runtime/volatile package is normally entirely inlined.
* There is no difference between (sum) and (all) anymore. A better
code size algorithm now counts the code/data sizes correctly.
* And last (but not least) there is a stylistic change: the table now
looks more like a table. Especially the summary should be clearer
now.
Future goals:
* Improve debug information so that the (unknown) pseudo-package is
reduced in size or even eliminated altogether.
* Add support for other file formats, most importantly WebAssembly.
* Perhaps provide a way to expand this report per file, or in a
machine-readable format like JSON or CSV.
This commit has a few related changes:
* It sets the optsize attribute immediately in the compiler instead of
adding it to each function afterwards in a loop. This seems to me
like the more appropriate way to do it.
* It centralizes setting the optsize attribute in the transform
package, to make later changes easier.
* It sets the optsize in a few more places: to runtime.initAll and to
WebAssembly i64 wrappers.
This commit does not affect the binary size of any of the smoke tests,
so should be risk-free.
This adds support for stdio in picolibc and fixes wasm_exec.js so that
it can also support C puts. With this, C stdout works on all supported
platforms.
LLDB mostly works on most platforms, but it is still lacking in some
features. For example, it doesn't seem to support RISC-V yet (coming in
LLVM 12), it only partially supports AVR (no stacktraces), and it
doesn't seem to support the Ctrl-C keyboard command when running a
binary for another platform (e.g. with GOOS=arm64). However, it does
mostly work, even on baremetal systems.
... instead of setting a special -target= value. This is more robust and
makes sure that the test actually tests different arcitectures as they
would be compiled by TinyGo. As an example, the bug of the bugfix in the
previous commit ("arm: use armv7 instead of thumbv7") would have been
caught if this change was applied earlier.
I've decided to put GOOS/GOARCH in compileopts.Options, as it makes
sense to me to treat them the same way as command line parameters.
For example, the following did not work before but does work with this
change:
// int add(int a, int b) {
// return a + b;
// }
import "C"
func main() {
println("add:", C.add(3, 5))
}
Even better, the functions in the header are compiled together with the
rest of the Go code and so they can be optimized together! Currently,
inlining is not yet allowed but const-propagation across functions
works. This should be improved in the future.
This is a loose collection of small fixes flagged by staticcheck:
- dead code
- regexp expressions not using backticks (`foobar` / "foobar")
- redundant types of slice and map initializers
- misc other fixes
Not all of these seem very useful to me, but in particular dead code is
nice to fix. I've fixed them all just so that if there are problems,
they aren't hidden in the noise of less useful issues.
Instead of keeping a slice of jobs to run, let the runJobs function
determine which jobs should be run by investigating all dependencies.
This has two benefits:
- The code is somewhat cleaner, as no 'jobs' slice needs to be
maintained while constructing the dependency graph.
- Eventually, some jobs might not be required by any dependency.
While it's possible to avoid adding them to the slice, the simpler
solution is to build a new slice from the dependencies which will
only include required dependencies by design.
This change adds support for the ESP32-C3, a new chip from Espressif. It
is a RISC-V core so porting was comparatively easy.
Most peripherals are shared with the (original) ESP32 chip, but with
subtle differences. Also, the SVD file I've used gives some
peripherals/registers a different name which makes sharing code harder.
Eventually, when an official SVD file for the ESP32 is released, I
expect that a lot of code can be shared between the two chips.
More information: https://www.espressif.com/en/products/socs/esp32-c3
TODO:
- stack scheduler
- interrupts
- most peripherals (SPI, I2C, PWM, etc)
Stripping debug information at link time also allows relocation
compression (aka linker relaxations). Keeping debug information at
compile time and optionally stripping it at link time has some
advantages:
* Automatic stack sizes on Cortex-M rely on the presence of debug
information.
* Some parts of the compiler now rely on the presence of debug
information for proper diagnostics.
* It works better with the cache: there is no distinction between
debug and no-debug builds.
* It makes it easier (or possible at all) to enable debug information
in the wasi-libc library without big downsides.
Static libraries should be added at the end of the linker command, after
all object files. If that isn't done, that's _usually_ not a problem,
unless there are duplicate symbols. In that case, weird dependency
issues can arise. To solve that, object files (that may include symbols
to override symbols in the library) should be listed first on the
command line and then the static libraries should be listed.
This fixes an issue with overriding some symbols in wasi-libc.
On ARM, the stack has to be aligned to 8 bytes on function calls, but
not necessarily within a function. Leaf functions can take advantage of
this by not keeping the stack aligned so they can avoid pushing one
register. However, because regular functions might expect an aligned
stack, the interrupt controller will forcibly re-align the stack when an
interrupt happens in such a leaf function (controlled by the STKALIGN
flag, defaults to on). This means that stack size calculation (as used
in TinyGo) needs to make sure this extra space for stack re-alignment is
available.
This commit fixes this by aligning the stack size that will be used for
new goroutines.
Additionally, it increases the stack canary size from 4 to 8 bytes, to
keep the stack aligned. This is not strictly necessary but is required
by the AAPCS so let's do it anyway just to be sure.
This commit makes the output of `tinygo test` similar to that of `go
test`. It changes the following things in the process:
* Running multiple tests in a single command is now possible. They
aren't paralellized yet.
* Packages with no test files won't crash TinyGo, instead it logs it
in the same way the Go toolchain does.
With this is possible to enable e.g., SIMD in WASM using -llvm-features
+simd128. Multiple features can be specified separated by comma,
e.g., -llvm-features +simd128,+tail-call
With help from @deadprogram and @aykevl.
At the moment, all targets use the Clang compiler to compile C and
assembly files. There is no good reason to make this configurable
anymore and in fact it will make future changes more complicated (and
thus more likely to have bugs). Therefore, I've removed support for
setting the compiler.
Note that the same is not true for the linker. While it makes sense to
standardize on the Clang compiler (because if Clang doesn't support a
target, TinyGo is unlikely to support it either), linkers will remain
configurable for the foreseeable future. One example is Xtensa, which is
supported by the Xtensa LLVM fork but doesn't have support in ld.lld
yet.
I've also fixed a bug in compileAndCacheCFile: it wasn't using the right
CFlags for caching purposes. This could lead to using stale caches. This
commit fixes that too.
The -Qunused-arguments flag disables the warning where some flags are
not relevant to a compilation. This commonly happens when compiling
assembly files (.s or .S files) because some flags are specific to C and
not relevant to assembly.
Because practically all baremetal targets need some form of assembly,
this flag is added to most CFlags. This creates a lot of noise. And it
is also added for compiling C code where it might hide bugs (by hiding
the fact a flag is actually unused).
This commit adds the flag to all assembly compilations and removes them
from all target JSON files.
This commit implements replacing some global variables with a different
value, if the global variable has no initializer. For example, if you
have:
package main
var version string
you can replace the value with -ldflags="-X main.version=0.2".
Right now it only works for uninitialized globals. The Go tooling also
supports initialized globals (var version = "<undefined>") but that is a
bit hard to combine with how initialized globals are currently
implemented.
The current implementation still allows caching package IR files while
making sure the values don't end up in the build cache. This means
compiling a program multiple times with different values will use the
cached package each time, inserting the string value only late in the
build process.
Fixes#1045
This should result in a small compile time reduction for incremental
builds, somewhere around 5-9%.
This commit, while small, required many previous commits to not regress
binary size. Right now binary size is basically identical with very few
changes in size (the only baremetal program that changed in size did so
with a 4 byte increase).
This commit is one extra step towards doing as much work as possible in
the parallel and cached package build step, out of the serial LTO phase.
Later improvements in this area have this change as a prerequisite.
This simplifies future changes. While the move itself is very simple, it
required some other changes to a few transforms that create new
functions to add the optsize attribute manually. It also required
abstracting away the optimization level flags (based on the -opt flag)
so that it can easily be retrieved from the config object.
This commit does not impact binary size on baremetal and WebAssembly.
I've seen a few tests on linux/amd64 grow slightly in size, but I'm not
too worried about those.
This patch adds support for passing CFLAGS added in #cgo lines of the
CGo preprocessing phase to the compiler when compiling C files inside
packages. This is expected and convenient but didn't work before.
This probably won't speed up the build on multicore systems (the build
is still dominated by the whole-program optimization step) but should be
useful at a later date for other optimizations. For example, I intend to
eventually optimize each package individually including C files, which
should enable cross-language optimizations (inlining C functions into Go
functions, for example). For that to work, accurate dependency tracking
is important.
Add a 'result' member to the compileJob struct which is used by the link
job to get all the paths that should be linked together. This is not yet
necessary (the paths are fixed), but soon the paths are only known after
a linker dependency has run.
This results in a significant speedup in some cases. For example, this
runs over twice as fast with a warm cache:
tinygo build -o test.elf ./testdata/stdlib.go
This should help a lot with edit-compile-test cycles, that typically
only modify a single package.
This required some changes to the interp package to deal with globals
created in a previous run of the interp package and to deal with
external globals (that can't be loaded from or stored to).
This commit switches from the previous behavior of compiling the whole
program at once, to compiling every package in parallel and linking the
LLVM bitcode files together for further whole-program optimization.
This is a small performance win, but it has several advantages in the
future:
- There are many more things that can be done per package in parallel,
avoiding the bottleneck at the end of the compiler phase. This
should speed up the compiler futher.
- This change is a necessary step towards a non-LTO build mode for
fast incremental builds that only rebuild the changed package, when
compiler speed is more important than binary size.
- This change refactors the compiler in such a way that it will be
easier to inspect the IR for one package only. Inspecting this IR
will be very helpful for compiler developers.
This optimization level wasn't working before because some passes expect
some globals to be cleaned up afterwards. Cleaning these globals is
easy, just add the pass necessary for it. This shouldn't reduce the
usefulness of the -opt=0 build flag as most optimizations are still
skipped.
This way is more consistent with how picolibc is specified and allows
generating a helpful error message. This error message should never be
generated for TinyGo binary releases, only when doing local development.
This avoids external commands from finishing after the TinyGo command
exits. For example, when testing out compiler-rt on AVR, I got the
following error:
$ go install; and tinygo run -target=atmega1284p ./testdata/calls.go
[... Clang error removed ...]
error: failed to build /home/ayke/src/github.com/tinygo-org/tinygo/lib/compiler-rt/lib/builtins/extendsfdf2.c: exit status 1
error: unable to open output file '/tmp/tinygo361380649/build-lib-compiler-rt/ffsdi2.c.o': 'No such file or directory'
1 error generated.
That last error ("unable to open output file") is a spurious error
because the temporary directory has already been removed. This commit
waits until all running jobs have finished before returning, so that
these errors won't happen.
This flag is important for the Xtensa backend because by default a more
powerful backend (ESP32) is assumed. Therefore, compiling for the
ESP8266 won't work by default and needs the -mcpu flag.
This key was intended as some sort of cache key (as the name indicates)
but that never happened. Let's remove it to avoid clutter. The cacheLoad
and cacheStore functions are only used for C libraries (libc,
compiler-rt) so their caching behavior is a bit different from other
things worth caching.
Moving settings to a separate config struct has two benefits:
- It decouples the compiler a bit from other packages, most
importantly the compileopts package. Decoupling is generally a good
thing.
- Perhaps more importantly, it precisely specifies which settings are
used while compiling and affect the resulting LLVM module. This will
be necessary for caching the LLVM module.
While it would have been possible to cache without this refactor, it
would have been very easy to miss a setting and thus let the
compiler work with invalid/stale data.
This commit parallelizes almost everything that can currently be
parallelized. With that, it also introduces a framework for easily
parallelizing other parts of the compiler.
Code for baremetal targets already compiles slightly faster because it
can parallelize the compilation of supporting assembly files. However,
the speedup is especially noticeable when libraries (compiler-rt,
picolibc) also need to be compiled: they will be compiled in parallel
next to the Go files using all available cores. On my dual core laptop
(4 cores if you count hyperthreading) this cuts compilation time roughly
in half when compiling something for a Cortex-M board after running
`tinygo clean`.
This commit finally introduces unit tests for the compiler, to check
whether input Go code is converted to the expected output IR.
To make this necessary, a few refactors were needed. Hopefully these
refactors (to compile a program package by package instead of all at
once) will eventually become standard, so that packages can all be
compiled separate from each other and be cached between compiles.
This commit switches to LLVM 11 for builds with LLVM linked statically
(e.g. `make`). It does not yet switch the default for builds dynamically
linked to LLVM, that should be done in a later change.
This commit also changes to use the default host toolchain (probably
GCC) instead of Clang as the default compiler in CI. There were some
issues with Clang 3.8 in CI and hopefully this will fix it.
Additionally it updates the way LLVM is built on Windows, with
-DLLVM_ENABLE_PIC=OFF (which should have been used all along). This
change makes it possible to revert a hack to build libclang manually and
instead uses the libclang static library like on all other operating
systems, simplifying the Makefile.
On Fedora 33+, there is a buggy package that installs to
`/usr/lib64/clang/{version}/lib`, even on 32-bit systems. The original
code sees the `/usr/lib64/clang/{version}` directory, checks for an
`include` subdirectory, and then gives up because it doesn't exist.
To be more robust, check both `/usr/lib64/clang/{version}/include` and
`/usr/lib/clang/{version}/include`, and only allow versions that match
the LLVM major version used to build tinygo.
Unfortunately, the .rodata section can't be stored in flash. Instead, an
explicit .progmem section should be used, which is supported in LLVM as
address space 1 but not exposed to normal programs.
Eventually a pass should be written that converts trivial const globals
of which all loads are visible to be in addrspace 1, to get the benefits
of storing those globals directly in ROM.
This can be useful to test improvements in LLVM master and to make it
possible to support LLVM 11 for the most part already before the next
release. That also allows catching LLVM bugs early to fix them upstream.
Note that tests do not yet pass for this LLVM version, but the TinyGo
compiler can be built with the binaries from apt.llvm.org (at the time
of making this commit).
On 64-bit Fedora, `lib64` is where the clang headers are, not `lib`. For
multiarch systems, both will exist, but it's likely you want 64-bit, so
check that first.