This is a big commit that changes the way runtime type information is stored in
the binary. Instead of compressing it and storing it in a number of sidetables,
it is stored similar to how the Go compiler toolchain stores it (but still more
compactly).
This has a number of advantages:
* It is much easier to add new features to reflect support. They can simply
be added to these structs without requiring massive changes (especially in
the reflect lowering pass).
* It removes the reflect lowering pass, which was a large amount of hard to
understand and debug code.
* The reflect lowering pass also required merging all LLVM IR into one
module, which is terrible for performance especially when compiling large
amounts of code. See issue 2870 for details.
* It is (probably!) easier to reason about for the compiler.
The downside is that it increases code size a bit, especially when reflect is
involved. I hope to fix some of that in later patches.
This was actually surprising once I got TinyGo to build on Windows 11
ARM64. All the changes are exactly what you'd expect for a new
architecture, there was no special weirdness just for arm64.
Actually getting TinyGo to build was kind of involved though. The very
short summary is: install arm64 versions of some pieces of software
(like golang, cmake) instead of installing them though choco. In
particular, use the llvm-mingw[1] toolchain instead of using standard
mingw.
[1]: https://github.com/mstorsjo/llvm-mingw/releases
Using ThinLTO manages to optimize binaries quite significantly. The
exact amount varies a lot by program but it's about 10-15% usually.
Don't remove non-ThinLTO support yet. It would not surprise me if this
triggered some unintended side effect. Eventually, non-ThinLTO support
should be removed though.
This allows you to expand {tmpDir} in the json "emulator" field, and
uses it in wasmtime instead of custom TMPDIR mapping logic.
Before, we had custom logic for wasmtime to create a separate tmpDir
when running go tests. This overwrite the TMPDIR variable when running,
after making a mount point. A simpler way to accomplish the end goal of
writing temp files is to use wasmtime's map-dir instead. When code is
compiled to wasm with the wasi target, tempDir is always /tmp, so we
don't need to add variables (since we know what it is). Further, the
test code is the same between normal go and run through wasmtime. So, we
don't need to make a separate temp dir first, and avoiding that reduces
logic, as well makes it easier to swap out the emulator (for wazero
which has no depedencies). To map the correct directory, this introduces
a {tmpDir} token whose value is the host-specific value taken from
`os.TempDir()`.
The motivation I have for this isn't so much to clean up the wasmtime
code, but allow wazero to execute the same tests. After this change, the
only thing needed to pass tests is to change the emulator, due to
differences in how wazero deals with relative lookups (they aren't
restricted by default, so there's not a huge amount of custom logic
needed).
In other words, installing wazero from main, `make tinygo-test-wasi`
works with no other changes except this PR and patching
`targets/wasi.json`.
```json
"emulator": "wazero run -mount=.:/ -mount={tmpDir}:/tmp {}",
```
On that note, if there's a way to override the emulator via arg or env,
this would be even better, but in any case patching json is fine.
Signed-off-by: Adrian Cole <adrian@tetrate.io>
I found that when I enable ThinLTO, a miscompilation triggers that had
been hidden all the time previously. The bug appears to happen as
follows:
1. TinyGo generates a function with a runtime.trackPointer call, but
without an alloca (or the alloca gets optimized away).
2. LLVM sees that no alloca needs to be kept alive across the
runtime.trackPointer call, and therefore it adds the 'tail' flag.
One of the effects of this flag is that it makes it undefined
behavior to keep allocas alive across the call (which is still safe
at that point).
3. The GC lowering pass adds a stack slot alloca and converts
runtime.trackPointer calls into alloca stores.
The last step triggers the bug: the compiler inserts an alloca where
there was none before but that's not valid as long as the 'tail' flag is
present.
This patch fixes the bug in a somewhat dirty way, by always creating a
dummy alloca so that LLVM won't do the optimization in step 2 (and
possibly other optimizations that rely on there being no alloca
instruction).
This is purely a refactor, to make the next change simpler.
The wordpack functionality used to be necessary in transform passes, but
now it isn't anymore so let's not complicate this more than necessary.
This implements the block-based GC as a partially precise GC. This means
that for most heap allocations it is known which words contain a pointer
and which don't. This should in theory make the GC faster (because it
can skip non-pointer object) and have fewer false positives in a GC
cycle. It does however use a bit more RAM to store the layout of each
object.
Right now this GC seems to be slower than the conservative GC, but
should be less likely to run out of memory as a result of false
positives.
This makes it much easier to read the value at runtime, as pointer
indices are naturally little endian. It should not affect anything else
in the program.
Most of the code of the conservative GC can be reused for the precise
GC. So before adding precise GC support, this commit just moves code
around to make the next commit cleaner. It is a non-functional change.
Proof: https://godbolt.org/z/as4EM3713
Essentially, this means that there are objects on arm64 that have a
16-byte alignment and so we have to respect that when we allocate things
on the heap.
This function provides a mechanism to watch for changes to the GODEBUG
environment variable. For now, we'll not implement it. It might be
useful in the future, when it can always be added.