ThinLTO results in a small code size reduction, which is nice
(especially on these very small chips). It also brings us one step
closer to using ThinLTO everywhere.
This flag controls whether to convert external i64 parameters for use in
a browser-like environment.
This flag was needed in the past because back then we only supported
wasm on browsers but no WASI. Now, I can't think of a reason why anybody
would want to change the default. For `-target=wasm` (used for
browser-like environments), the wasm_exec.js file expects this
i64-via-stack ABI. For WASI, there is no limitation on i64 values and
`-wasm-abi=generic` is the default.
There was a bug in the wasm ABI lowering pass (found using
AddressSanitizer on LLVM 15) that resulted in a rather subtle memory
corruption. This commit fixes this issues.
A number of llvm.Const* functions (in particular extractvalue and
insertvalue) were removed in LLVM 15, so we have to use a builder
instead. This builder will create the same constant values, it simply
uses a different API.
Scanning of allocas was entirely broken on WebAssembly. The code
intended to do this was never run. There were also no tests.
Looking into this further, I found that it is actually not really
necessary to do that: the C stack can be scanned conservatively and in
fact this was already done for goroutine stacks (because they live on
the heap and are always referenced). It wasn't done for the system stack
however.
With these fixes, I believe code should be both faster *and* more
correct.
I found this in my work to get opaque pointers supported in LLVM 15,
because the code that was never reached now finally got run and was
actually quite buggy.
Go 1.19 started reformatting code in a way that makes it more obvious
how it will be rendered on pkg.go.dev. It gets it almost right, but not
entirely. Therefore, I had to modify some of the comments so that they
are formatted correctly.
Previously, the MakeGCStackSlots pass would attempt to pop the stack chain before a tail call.
This resulted in use-after-free bugs when the tail call allocated memory and used a value allocated by its caller.
Instead of trying to move the stack chain pop, remove the tail flag from the call.
Precise globals require a whole program optimization pass that is hard
to support when building packages separately. This patch removes support
for these globals by converting the last use (Linux) to use
linker-defined symbols instead.
For details, see: https://github.com/tinygo-org/tinygo/issues/2870
This shrinks transform.Optimize() a little bit, working towards the goal
of https://github.com/tinygo-org/tinygo/issues/2870. I ran the smoke
tests and there is no practical downside: one test got smaller (??) and
one had a different .hex hash, but other than that there was no
difference.
This should also make TinyGo a liiitle bit faster but it's probably not
even measurable.
The transform package is the more appropriate location for package-level
optimizations, to match `transform.Optimize` for whole-program
optimizations.
This is just a refactor, to make later changes easier to read.
In https://github.com/tinygo-org/tinygo/issues/2777, a poison value
ended up in `runtime.alloc`. This shouldn't happen, especially not for
well written code. So I'm not sure why it happens. But here is a fix
anyway.
There used to be a difference between `byte` and `uint8` in interface
methods. These are aliases, so they should be treated the same.
This patch introduces a custom serialization format for types,
circumventing the `Type.String()` method that is slightly wrong for our
purposes.
This also fixes an issue with the `any` keyword in Go 1.18, which
suffers from the same problem (but this time actually leads to a crash).
ThinLTO optimizes across LLVM modules at link time. This means that
optimizations (such as inlining and const-propagation) are possible
between C and Go. This makes this change especially useful for CGo, but
not just for CGo. By doing some optimizations at link time, the linker
can discard some unused functions and this leads to a size reduction on
average. It does increase code size in some cases, but that's true for
most optimizations.
I've excluded a number of targets for now (wasm, avr, xtensa, windows,
macos). They can probably be supported with some more work, but that
should be done in separate PRs.
Overall, this change results in an average 3.24% size reduction over all
the tinygo.org/x/drivers smoke tests.
TODO: this commit runs part of the pass pipeline twice. We should set
the PrepareForThinLTO flag in the PassManagerBuilder for even further
reduced code size (0.7%) and improved compilation speed.
This removes the parentHandle argument from the internal calling convention.
It was formerly used to implment coroutines.
Now that coroutines have been removed, it is no longer necessary.
This adds support for building with `-tags=llvm13` and switches to LLVM
13 for tinygo binaries that are statically linked against LLVM.
Some notes on this commit:
* Added `-mfloat-abi=soft` to all Cortex-M targets because otherwise
nrfx would complain that floating point was enabled on Cortex-M0.
That's not the case, but with `-mfloat-abi=soft` the `__SOFTFP__`
macro is defined which silences this warning.
See: https://reviews.llvm.org/D100372
* Changed from `--sysroot=<root>` to `-nostdlib -isystem <root>` for
musl because with Clang 13, even with `--sysroot` some system
libraries are used which we don't want.
* Changed all `-Xclang -internal-isystem -Xclang` to simply
`-isystem`, for consistency with the above change. It appears to
have the same effect.
* Moved WebAssembly function declarations to the top of the file in
task_asyncify_wasm.S because (apparently) the assembler has become
more strict.
When I wrote the code originally, I didn't know about SetAlignment so I
hacked a way around it by allocating [...]uintptr types. However, this
allocates a few too many bytes in some cases.
This commit changes this to only allocate the space that we actually
need.
The code size effect is mixed, but generally positive. The combined
average is reduced by 0.27% with more programs being reduced in size
than are increasing in size.
This change implements a new "scheduler" for WebAssembly using binaryen's asyncify transform.
This is more reliable than the current "coroutines" transform, and works with non-Go code in the call stack.
runtime (js/wasm): handle scheduler nesting
If WASM calls into JS which calls back into WASM, it is possible for the scheduler to nest.
The event from the callback must be handled immediately, so the task cannot simply be deferred to the outer scheduler.
This creates a minimal scheduler loop which is used to handle such nesting.
This matches Clang, and with that, it adds support for inlining between
Go and C because LLVM only allows inlining if the "target-cpu" and
"target-features" string attributes match.
For example, take a look at the following code:
// int add(int a, int b) {
// return a + b;
// }
import "C"
func main() {
println(C.add(3, 5))
}
The 'add' function is not inlined into the main function before this
commit, but after it, it can be inlined and trivially be optimized to
`println(8)`.
This is fake debug info. It doesn't point to a source location because
there is no source location. However, it helps to correctly attribute
code size usage to particular packages.
I've also updated builder/sizes.go with some debugging helpers.
Instead of doing everything in the interrupt lowering pass, generate
some more code in gen-device to declare interrupt handler functions and
do some work in the compiler so that interrupt lowering becomes a lot
simpler.
This has several benefits:
- Overall code is smaller, in particular the interrupt lowering pass.
- The code should be a bit less "magical" and instead a bit easier to
read. In particular, instead of having a magic
runtime.callInterruptHandler (that is fully written by the interrupt
lowering pass), the runtime calls a generated function like
device/sifive.InterruptHandler where this switch already exists in
code.
- Debug information is improved. This can be helpful during actual
debugging but is also useful for other uses of DWARF debug
information.
For an example on debug information improvement, this is what a
backtrace might look like before this commit:
Breakpoint 1, 0x00000b46 in UART0_IRQHandler ()
(gdb) bt
#0 0x00000b46 in UART0_IRQHandler ()
#1 <signal handler called>
[..etc]
Notice that the debugger doesn't see the source code location where it
has stopped.
After this commit, breaking at the same line might look like this:
Breakpoint 1, (*machine.UART).handleInterrupt (arg1=..., uart=<optimized out>) at /home/ayke/src/github.com/tinygo-org/tinygo/src/machine/machine_nrf.go:200
200 uart.Receive(byte(nrf.UART0.RXD.Get()))
(gdb) bt
#0 (*machine.UART).handleInterrupt (arg1=..., uart=<optimized out>) at /home/ayke/src/github.com/tinygo-org/tinygo/src/machine/machine_nrf.go:200
#1 UART0_IRQHandler () at /home/ayke/src/github.com/tinygo-org/tinygo/src/device/nrf/nrf51.go:176
#2 <signal handler called>
[..etc]
By now, the debugger sees an actual source location for UART0_IRQHandler
(in the generated file) and an inlined function.
This matches the behavior of Clang, which uses optsize for -Os and adds
minsize for -Oz.
The code size change is all over the map, but using a hacked together
size comparison tool I've found that there is a slight reduction in
binary size overall (-1.6% with the tinygo smoke tests and -0.8% for the
drivers smoke test).
This commit has a few related changes:
* It sets the optsize attribute immediately in the compiler instead of
adding it to each function afterwards in a loop. This seems to me
like the more appropriate way to do it.
* It centralizes setting the optsize attribute in the transform
package, to make later changes easier.
* It sets the optsize in a few more places: to runtime.initAll and to
WebAssembly i64 wrappers.
This commit does not affect the binary size of any of the smoke tests,
so should be risk-free.
This layout parameter is currently always nil and ignored, but will
eventually contain a pointer to a memory layout.
This commit also adds module verification to the transform tests, as I
found out that it didn't (and therefore didn't initially catch all
bugs).
This commit simplifies the IR a little bit: instead of calling
pseudo-functions runtime.interfaceImplements and
runtime.interfaceMethod, real declared functions are being called that
are then defined in the interface lowering pass. This should simplify
the interaction between various transformation passes. It also reduces
the number of lines of code, which is generally a good thing.
This adds support for a construct like this:
type foo func(fn foo)
Unfortunately, LLVM cannot create function pointers that look like this.
LLVM only supports named types for structs (not for pointers) and thus
can't add a pointer to a function type of the same type to a parameter
of that function type.
The fix is simple: cast all function pointers to a void function, in
LLVM IR:
void ()*
Raw function pointers are cast to this type before storing, and cast
back to the regular function type before calling. This means that
function parameters will never refer to its own type because raw
function types are fixed at that one type.
Somehow, this does have an effect on binary size in some cases. The
effect is small and goes both ways. On top of that, there is work
underway in LLVM which would make all pointer types opaque (without a
pointee type). This would make this whole commit useless and therefore
should fix any size increases that might happen.
https://llvm.org/docs/OpaquePointers.html
... instead of setting a special -target= value. This is more robust and
makes sure that the test actually tests different arcitectures as they
would be compiled by TinyGo. As an example, the bug of the bugfix in the
previous commit ("arm: use armv7 instead of thumbv7") would have been
caught if this change was applied earlier.
I've decided to put GOOS/GOARCH in compileopts.Options, as it makes
sense to me to treat them the same way as command line parameters.
This change fixes a bug in which `alloca` memory lifetimes would not extend past the suspend of an asynchronous tail call.
This would typically manifest as memory corruption, and could happen with or without normal suspending calls within the function.
Bug 1790 ("musttail call must precede a ret with an optional bitcast")
is caused by the GC stack slot pass inserting a store instruction
between a musttail call and a return instruction. This is not allowed in
LLVM IR.
One solution would be to remove the musttail. That would probably work,
but 1) the go-llvm API doesn't support this and 2) this might have
unforeseen consequences. What I've done in this commit is to move the
store instruction to a position earlier in the basic block, just after
the last access to the GC stack slot alloca.
Thanks to @fgsch for a very small repro, which I've used as a regression
test.
This patch adds a new pragma for functions and globals to set the
section name. This can be useful to place a function or global in a
special device specific section, for example:
* Functions may be placed in RAM to make them run faster, or in flash
(if RAM is the default) to not let them take up RAM.
* DMA memory may only be placed in a special memory area.
* Some RAM may be faster than other RAM, and some globals may be
performance critical thus placing them in this special RAM area can
help.
* Some (large) global variables may need to be placed in external RAM,
which can be done by placing them in a special section.
To use it, you have to place a function or global in a special section,
for example:
//go:section .externalram
var externalRAMBuffer [1024]byte
This can then be placed in a special section of the linker script, for
example something like this:
.bss.extram (NOLOAD) : {
*(.externalram)
} > ERAM