The interp package was assuming that all targets were little-endian. But
that's not true: we now have a big-endian target (GOARCH=mips).
This fixes the interp package to use the appropriate byte order for a
given target.
This is a big change: apart from removing LLVM 14 it also removes typed
pointer support (which was only fully supported in LLVM up to version
14). This removes about 200 lines of code, but more importantly removes
a ton of special cases for LLVM 14.
This is a big commit that changes the way runtime type information is stored in
the binary. Instead of compressing it and storing it in a number of sidetables,
it is stored similar to how the Go compiler toolchain stores it (but still more
compactly).
This has a number of advantages:
* It is much easier to add new features to reflect support. They can simply
be added to these structs without requiring massive changes (especially in
the reflect lowering pass).
* It removes the reflect lowering pass, which was a large amount of hard to
understand and debug code.
* The reflect lowering pass also required merging all LLVM IR into one
module, which is terrible for performance especially when compiling large
amounts of code. See issue 2870 for details.
* It is (probably!) easier to reason about for the compiler.
The downside is that it increases code size a bit, especially when reflect is
involved. I hope to fix some of that in later patches.
This implements the block-based GC as a partially precise GC. This means
that for most heap allocations it is known which words contain a pointer
and which don't. This should in theory make the GC faster (because it
can skip non-pointer object) and have fewer false positives in a GC
cycle. It does however use a bit more RAM to store the layout of each
object.
Right now this GC seems to be slower than the conservative GC, but
should be less likely to run out of memory as a result of false
positives.
This makes it much easier to read the value at runtime, as pointer
indices are naturally little endian. It should not affect anything else
in the program.
These instructions sometimes pop up in LLVM 15, but they never occured
in LLVM 14. Implementing them is relatively straightforward: simply
generalize the code that already exists for llvm.ICmp interpreting.
A number of llvm.Const* functions (in particular extractvalue and
insertvalue) were removed in LLVM 15, so we have to use a builder
instead. This builder will create the same constant values, it simply
uses a different API.
This is necessary for the next commit. The next commit would otherwise
cause an issue with the following constant operation:
i64 add (i64 ptrtoint (%runtime.machHeader* @_mh_execute_header to i64), i64 32)
Constant globals can't have been modified, even if a pointer is passed
externally. Therefore, don't treat it as such in hasExternalStore.
In addition, it doesn't make sense to update values of constant globals
after the interp pass is finished. So don't do this.
TODO: track whether objects are actually modified and only update the
globals if this is the case.
This commit will use the memory layout information for heap allocations
added in the previous commit to determine LLVM types, instead of
guessing their types based on the content. This fixes a bug in which
recursive data structures (such as doubly linked lists) would result in
a compiler stack overflow due to infinite recursion.
Not all heap allocations have a memory layout yet, but this can be
incrementally fixed in the future. So far, this commit should fix
(almost?) all cases of this stack overflow issue.
Instead of doing lots of complicated calculations to get the shortest
GEP, I'll just cast it to i8*, do the GEP, and optionally cast to the
requested type.
This currently produces ugly constant expressions, but once LLVM
switches to opaque pointer types all of this shouldn't matter anymore.
This is uncommon, but it does happen if the source pointer is a bitcast
of a global. For example, if a struct is cast to an i8*, it's possible
to index beyond what would appear to be the size of the pointer (i8*).
This fixes https://github.com/tinygo-org/tinygo/issues/1884.
My original plan to fix this was much more complicated, but then I
realized that the output type doesn't matter anyway and I can simply
cast the type to an *i8 and perform a GEP on that pointer.
The markExternal function is used when a global (function or global
variable) is somehow run at runtime. All the other globals it refers to
are from then on no longer known at compile time, so can't be used by
the interp package anymore.
This can also include inline assembly. While it is possible to modify
globals that way, it is only possible to modify exported globals:
similar to calling an undefined function (in C for example).
The interp package is in many cases able to execute map functions in the
runtime directly. This is probably slower than adding special support
for them in the interp package and also doesn't cover all cases (most
importantly, map keys that contain pointers) but removing this code also
removes a large amount of code that needs to be maintained and is
susceptible to hard-to-find bugs.
As a side effect, this resulted in different output of the
testdata/map.go test because the test relied on the existing iteration
order of TinyGo maps. I've updated the test to not rely on this test,
making the output compatible with what the Go toolchain would output.
This results in a significant speedup in some cases. For example, this
runs over twice as fast with a warm cache:
tinygo build -o test.elf ./testdata/stdlib.go
This should help a lot with edit-compile-test cycles, that typically
only modify a single package.
This required some changes to the interp package to deal with globals
created in a previous run of the interp package and to deal with
external globals (that can't be loaded from or stored to).
This commit replaces a number of panics with returning an error value as
a result of changing the toLLVMValue method signature. This should make
it easier to diagnose issues.
This distinction was useful before when reflect wasn't properly
supported. Back then it made sense to only include method sets that were
actually used in an interface. But now that it is possible to get to
other values (for example, by extracting fields from structs) and it is
possible to turn them back into interfaces, it is necessary to preserve
all method sets that can possibly be used in the program in a type
assert, interface assert or interface method call.
In the future, this logic will need to be revisited again when
reflect.New or reflect.Zero gets implemented.
Code size increases a bit in some cases, but usually in a very limited
way (except for one outlier in the drivers smoke tests). The next commit
will improve the situation significantly.
GetElementPtr would not work on values that weren't pointers. Because
fixed addresses (often used in memory-mapped I/O) are integers rather
than pointers in interp, it would return an error.
This resulted in the teensy40 target not compiling correctly since the
interp package rewrite. This commit should fix that.
During a run of interp, some memory (for example, memory allocated
through runtime.alloc) may not have a known LLVM type. This memory is
alllocated by creating an i8 array.
This does not necessarily work, as i8 has no alignment requirements
while the allocated object may have allocation requirements. Therefore,
the resulting global may have an alignment that is too loose.
This works on some microcontrollers but notably does not work on a
Cortex-M0 or Cortex-M0+, as all load/store operations must be aligned.
This commit fixes this by setting the alignment of untyped memory to the
maximum alignment. The determination of "maximum alignment" is not
great but should get the job done on most architectures.
For a full explanation, see interp/README.md. In short, this rewrite is
a redesign of the partial evaluator which improves it over the previous
partial evaluator. The main functional difference is that when
interpreting a function, the interpretation can be rolled back when an
unsupported instruction is encountered (for example, an actual unknown
instruction or a branch on a value that's only known at runtime). This
also means that it is no longer necessary to scan functions to see
whether they can be interpreted: instead, this package now just tries to
interpret it and reverts when it can't go further.
This new design has several benefits:
* Most errors coming from the interp package are avoided, as it can
simply skip the code it can't handle. This has long been an issue.
* The memory model has been improved, which means some packages now
pass all tests that previously didn't pass them.
* Because of a better design, it is in fact a bit faster than the
previous version.
This means the following packages now pass tests with `tinygo test`:
* hash/adler32: previously it would hang in an infinite loop
* math/cmplx: previously it resulted in errors
This also means that the math/big package can be imported. It would
previously fail with a "interp: branch on a non-constant" error.