* winch: Solidify bounds check for dynamic heaps
This commit fixes and edge case for bounds checks for dynamic heaps.
https://github.com/bytecodealliance/wasmtime/pull/8157/files erroneously
tied the bounds check operation (more concretely the overflow check) to the size derived from from the heap
type. Even though offsets and access sizes are validated ahead-of-time
and bound to the heap type, in the case of overflow checking, we must
ensure that the operation size is tied to the target's pointer size to
avoid clamping the access size and offset addition, which would result
in missing an out-of-bounds memory access.
This commit also adds a disassembly test to avoid introducing
regressions in the future.
Additionally, this commit adds more comments around why `pointer_size`
is used for certain bounds checking operations.
* Update disassembly test
* Add v128.const support to Winch
* Remove next_vr and vector_reg_for methods
* Adjust alignment and slot size for v128
* Forgot to update disas tests
* Update unit tests
* Use 8 byte stack slot sizes
* Fix broken unit tests, add tests for vecs, and use ty size for vecs
* Wasmtime: Implement the custom-page-sizes proposal
This commit adds support for the custom-page-sizes proposal to Wasmtime:
https://github.com/WebAssembly/custom-page-sizes
I've migrated, fixed some bugs within, and extended the `*.wast` tests for this
proposal from the `wasm-tools` repository. I intend to upstream them into the
proposal shortly.
There is a new `wasmtime::Config::wasm_custom_page_sizes_proposal` method to
enable or disable the proposal. It is disabled by default.
Our fuzzing config has been updated to turn this feature on/off as dictated by
the arbitrary input given to us from the fuzzer.
Additionally, there were getting to be so many constructors for
`wasmtime::MemoryType` that I added a builder rather than add yet another
constructor.
In general, we store the `log2(page_size)` rather than the page size
directly. This helps cut down on invalid states and properties we need to
assert.
I've also intentionally written this code such that supporting any power of two
page size (rather than just the exact values `1` and `65536` that are currently
valid) will essentially just involve updating `wasmparser`'s validation and
removing some debug asserts in Wasmtime.
* Update error string expectation
* Remove debug logging
* Use a right shift instead of a division
* fix error message expectation again
* remove page size from VMMemoryDefinition
* fix size of VMMemoryDefinition again
* Only dynamically check for `-1` sentinel for 1-byte page sizes
* Import functions that are used a few times
* Better handle overflows when rounding up to the host page size
Propagate errors instead of returning a value that is not actually a rounded up
version of the input.
Delay rounding up various config sizes until runtime instead of eagerly doing it
at config time (which isn't even guaranteed to work, so we already had to have a
backup plan to round up at runtime, since we might be cross-compiling wasm or
not have the runtime feature enabled).
* Fix some anyhow and nostd errors
* Add missing rounding up to host page size at runtime
* Add validate feature to wasmparser dep
* Add some new rounding in a few places, due to no longer rounding in config methods
* Avoid actually trying to allocate the whole address space in the `massive_64_bit_still_limited` test
The point of the test is to ensure that we hit the limiter, so just cancel the
allocation from the limiter, and otherwise avoid MIRI attempting to allocate a
bunch of memory after we hit the limiter.
* prtest:full
* Revert "Avoid actually trying to allocate the whole address space in the `massive_64_bit_still_limited` test"
This reverts commit ccfa34a78dd3d53e49a6158ca03077d42ce8bcd7.
* miri: don't attempt to allocate more than 4GiB of memory
It seems that rather than returning a null pointer from `std::alloc::alloc`,
miri will sometimes choose to simply crash the whole program.
* remove duplicate prelude import after rebasing
Fixes: https://github.com/bytecodealliance/wasmtime/issues/8632
This commit fixes the handling of f64 comparison operations in Winch.
f64 comparison operations are defined as
[f64 f64] -> [i32]
The previous implemementation was widening the result to `[i64]` which
caused issues which stack shuffling in multi-value returns.
Similar to https://github.com/bytecodealliance/wasmtime/pull/8481 but for struct
types instead of array types.
Note that this is support for only defining these types in Wasm or the host; we
don't support allocating instances of these types yet. That will come in follow
up PRs.
This commit adds support for defining array types from Wasm or the host, and
managing them inside the engine's types registry. It does not introduce support
for allocating or manipulating array values. That functionality will come in
future pull requests.
We had something hacked together to support `(ref.i31 (i32.const N))`. It wasn't
a long-term solution. This is the first time that we have to really deal with
multi-instruction const expressions.
This commit introduces a tiny interpreter to evaluate const expressions.
\### The `GcRuntime` and `GcCompiler` Traits
This commit factors out the details of the garbage collector away from the rest
of the runtime and the compiler. It does this by introducing two new traits,
very similar to a subset of [those proposed in the Wasm GC RFC], although not
all equivalent functionality has been added yet because Wasmtime doesn't
support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run
collections within them, and execute the various GC barriers the collector
requires.
Rather than monomorphize all of Wasmtime on this trait, we use it
as a dynamic trait object. This does imply some virtual call overhead and
missing some inlining (and resulting post-inlining) optimization
opportunities. However, it is *much* less disruptive to the existing embedder
API, results in a cleaner embedder API anyways, and we don't believe that VM
runtime/embedder code is on the hot path for working with the GC at this time
anyways (that would be the actual Wasm code, which has inlined GC barriers
and direct calls and all of that). In the future, once we have optimized
enough of the GC that such code is ever hot, we have options we can
investigate at that time to avoid these dynamic virtual calls, like only
enabling one single collector at build time and then creating a static type
alias like `type TheOneGcImpl = ...;` based on the compile time
configuration, and using this type alias in the runtime rather than a dynamic
trait object.
The `GcRuntime` trait additionally defines a method to reset a GC heap, for
use by the pooling allocator. This allows reuse of GC heaps across different
stores. This integration is very rudimentary at the moment, and is missing
all kinds of configuration knobs that we should have before deploying Wasm GC
in production. This commit is large enough as it is already! Ideally, in the
future, I'd like to make it so that GC heaps receive their memory region,
rather than allocate/reserve it themselves, and let each slot in the pooling
allocator's memory pool be *either* a linear memory or a GC heap. This would
unask various capacity planning questions such as "what percent of memory
capacity should we dedicate to linear memories vs GC heaps?". It also seems
like basically all the same configuration knobs we have for linear memories
apply equally to GC heaps (see also the "Indexed Heaps" section below).
2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements
GC barriers for various operations on GC-managed references. The Rust code
calls into this trait dynamically via a trait object, but since it is
customizing the CLIF that is generated for Wasm code, the Wasm code itself is
not making dynamic, indirect calls for GC barriers. The `GcCompiler`
implementation can inline the parts of GC barrier that it believes should be
inline, and leave out-of-line calls to rare slow paths.
All that said, there is still only a single implementation of each of these
traits: the existing deferred reference-counting (DRC) collector. So there is a
bunch of code motion in this commit as the DRC collector was further isolated
from the rest of the runtime and moved to its own submodule. That said, this was
not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply
skipping over the DRC collector's code in review.
\### Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all
collector implementations, such as a shared `VMGcHeader` that all objects
allocated within a GC heap must begin with, but the most notable and
far-reaching of these assumptions is that all collectors will use "indexed
heaps".
What we are calling indexed heaps are basically the three following invariants:
1. All GC heaps will be a single contiguous region of memory, and all GC objects
will be allocated within this region of memory. The collector may ask the
system allocator for additional memory, e.g. to maintain its free lists, but
GC objects themselves will never be allocated via `malloc`.
2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into
the GC heap's contiguous region of memory. We never hold raw pointers to GC
objects (although, of course, we have to compute them and use them
temporarily when actually accessing objects). This means that deref'ing GC
pointers is equivalent to deref'ing linear memory pointers: we need to add a
base and we also check that the GC pointer/index is within the bounds of the
GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly
common technique among high-performance GC
implementations[^compressed-oops][^v8-ptr-compression] so we are in good
company.
3. Anything stored inside the GC heap is untrusted. Even each GC reference that
is an element of an `(array (ref any))` is untrusted, and bounds checked on
access. This means that, for example, we do not store the raw pointer to an
`externref`'s host object inside the GC heap. Instead an `externref` now
stores an ID that can be used to index into a side table in the store that
holds the actual `Box<dyn Any>` host object, and accessing that side table is
always checked.
[^compressed-oops]: See ["Compressed OOPs" in
OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops)
[^v8-ptr-compression]: See [V8's pointer
compression](https://v8.dev/blog/pointer-compression).
The good news with regards to all the bounds checking that this scheme implies
is that we can use all the same virtual memory tricks that linear memories use
to omit explicit bounds checks. Additionally, (2) means that the sizes of GC
objects is that much smaller (and therefore that much more cache friendly)
because they are only holding onto 32-bit, rather than 64-bit, references to
other GC objects. (We can, in the future, support GC heaps up to 16GiB in size
without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment
and storing aligned indices rather than byte indices, while still leaving the
bottom bit available for tagging as an `i31ref` discriminant. Should we ever
need to support even larger GC heap capacities, we could go to full 64-bit
references, but we would need explicit bounds checks.)
The biggest benefit of indexed heaps is that, because we are (explicitly or
implicitly) bounds checking GC heap accesses, and because we are not otherwise
trusting any data from inside the GC heap, we greatly reduce how badly things
can go wrong in the face of collector bugs and GC heap corruption. We are
essentially sandboxing the GC heap region, the same way that linear memory is a
sandbox. GC bugs could lead to the guest program accessing the wrong GC object,
or getting garbage data from within the GC heap. But only garbage data from
within the GC heap, never outside it. The worse that could happen would be if we
decided not to zero out GC heaps between reuse across stores (which is a valid
trade off to make, since zeroing a GC heap is a defense-in-depth technique
similar to zeroing a Wasm stack and not semantically visible in the absence of
GC bugs) and then a GC bug would allow the current Wasm guest to read old GC
data from the old Wasm guest that previously used this GC heap. But again, it
could never access host data.
Taken altogether, this allows for collector implementations that are nearly free
from `unsafe` code, and unsafety can otherwise be targeted and limited in scope,
such as interactions with JIT code. Most importantly, we do not have to maintain
critical invariants across the whole system -- invariants which can't be nicely
encapsulated or abstracted -- to preserve memory safety. Such holistic
invariants that refuse encapsulation are otherwise generally a huge safety
problem with GC implementations.
\### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore
`VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here
was to be sure that I was actually calling GC barriers at all the correct
places. I couldn't be sure before. Now, you can still explicitly copy a raw GC
reference without running GC barriers if you need to and understand why that's
okay (aka you are implementing the collector), but that is something you have to
opt into explicitly by calling `unchecked_copy`. The default now is that you
can't just copy the reference, and instead call an explicit `clone` method (not
*the* `Clone` trait, because we need to pass in the GC heap context to run the
GC barriers) and it is hard to forget to do that accidentally. This resulted in
a pretty big amount of churn, but I am wayyyyyy more confident that the correct
GC barriers are called at the correct times now than I was before.
\### `i31ref`
I started this commit by trying to add `i31ref` support. And it grew into the
whole traits interface because I found that I needed to abstract GC barriers
into helpers anyways to avoid running them for `i31ref`s, so I figured that I
might as well add the whole traits interface. In comparison, `i31ref` support is
much easier and smaller than that other part! But it was also difficult to pull
apart from this commit, sorry about that!
---------------------
Overall, I know this is a very large commit. I am super happy to have some
synchronous meetings to walk through this all, give an overview of the
architecture, answer questions directly, etc... to make review easier!
prtest:full
This commit fixes an accidental issue introduced in #8018 where using an
element segment which had been dropped with an `externref` table would
cause a panic. The panic happened due to an assertion that tables are
being used with the right type of item and that was being mismatched.
The underlying issue was that dropped element segments are modeled as an
empty element segment but the empty element segment was using the
"functions" encoding as opposed to the "expressions" encoding. This
meant that code later assumed that due to the use of functions the table
must be a table-of-functions, but this was not correct for
externref-based tables.
The fix in this commit is to instead model the encoding as an
"expressions" list which means that the table type is dispatched on to
call the appropriate initializer.
There is no memory safety issue with this mistake as the assertion was
specifically targetted at preventing memory safety. This does, however,
enable any WebAssembly module to panic a host.
Closes#8281
* Canonicalize fpromote/fdemote operations
This commit changes the strategy implemented in #8146 to canonicalize
promotes/demotes of floats to additionally handle #8179.
Closes#8179
* Canonicalize fvpromote_low/fvdemote as well
* Enhance `typed-funcrefs.wast` test with more cases
Have the same function with slightly different variations to compare
codegen between the possible strategies.
* Skip type checks on tables that don't need it
This commit implements an optimization to skip type checks in
`call_indirect` for tables that don't require it. With the
function-references proposal it's possible to have tables of a single
type of function as opposed to today's default `funcref` which is a
heterogenous set of functions. In this situation it's possible that a
`call_indirect`'s type tag matches the type tag of a
`table`-of-typed-`funcref`-values, meaning that it's impossible for the
type check to fail.
The type check of a function pointer in `call_indirect` is refactored
here to take the table's type into account. Various things are shuffled
around to ensure that the right traps still show up in the right places
but the important part is that, when possible, the type check is omitted
entirely.
* Update crates/cranelift/src/func_environ.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Move remaining `*.wat` tests out of cranelift-wasm/wasmtests
Move these up to Wasmtime's misc testsuite to get translated and
instantiated by Wasmtime.
Note that the max-function-index-in-name-section test was removed here
as that's tested by the support added in #3509.
* Remove cranelift-wasm test for name section
This is pretty thoroughly tested elsewhere in Wasmtime that we respect
the name section, for example many of the trap tests assert that the
name of the function comes from the text format.
* Move reachability tests out of cranelift-wasm
Instead add them to the disassembly test suite to ensure we don't
generate dead code. Additionally this has a lot of coverage via fuzzing
too.
* Move more tests out of cranelift-wasm
Move them into `tests/disas` so we can easily see the CLIF.
This commit updates the nan-canonicalization pass that Cranelift does to
canonicalize bitcasts from arbitrary integers in addition to floating
point arithmetic operations.
Closes#8145
This commit aims to address #8116 by fixing these two instructions to
load the proper amount of bytes when a load is sunk into them. Currently
the instruction variant used here requires an aligned `XmmMem` which
is today auto-translated with a 16-byte load unconditionally. For loads
near the end of memory this loads too much and can erroneously cause a
trap. The fix in this commit is to force the load to happen manually
with the appropriate type rather than a 16-byte type.
I'll note that `XmmMem` is left as an argument in this case because the
AVX variants of these instructions can continue to leverage unaligned
accesses. This also means that the test added here won't fail on a
machine with AVX support, it needs to be explicitly disabled.
Closes#8116
This commit completely replaces the `XmmMemAligned` operand in
`XmmCmove` with `Xmm` instead. Looking more into the fix#8113 I was
looking to add some `*.wast` runtime tests to assert that the fix works
at the wasm layer in addition to the instruction selection layer. I was
poking around and there's a second user of `XmmCmove` which wasn't
addressed in #8113.
In #8113 the only caller of `cmove_xmm` was updated to ensure that
everything was always in memory. This was done to both prevent the
upgrade-to-an-aligned-load from loading too much but additionally to
ensure that the load always happened, regardless of the condition. There
was a second constructor of `XmmCmove`, however, from `cmove_or_xmm`.
This is triggered when the condition is a `f64.ne` instruction, for
example, and ran into the same bug that #8112 was exposing.
To fix both of these and prevent any future issues about skipping a load
by accident due to control flow this commit removes the `XmmMemAligned`
argument entirely from `XmmCmove` and replaces it with `Xmm`. This
prevents sinking loads entirely and sidesteps all these issues at the
type level.
This commit updates the allocation scheme for resources in the component
model to start at 1 instead of 0 when communicating with components.
This is an implementation of WebAssembly/component-model#284.
While this broke a number of tests we have this shouldn't actually break
any components in practice. The broken tests were all overly-precise in
their assertions and error messages and this shouldn't idiomatically
come up in any guest language, so this should not be a practically
breaking change.
This change additionally places an upper limit on the maximum
allocatable index at `1 << 30` which is also specified in the above PR.
This commit is born out of a fuzz bug on x64 that was discovered recently.
Today, on `main`, and in the 17.0.1 release Wasmtime will panic when compiling
this wasm module for x64:
(module
(func (result v128)
i32.const 0
i32x4.splat
f64x2.convert_low_i32x4_u))
panicking with:
thread '<unnamed>' panicked at /home/alex/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cranelift-codegen-0.104.1/src/machinst/lower.rs:766:21:
should be implemented in ISLE: inst = `v6 = fcvt_from_uint.f64x2 v13 ; v13 = const0`, type = `Some(types::F64X2)`
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Bisections points to the "cause" of this regression as #7859 which
more-or-less means that this has always been an issue and that PR just
happened to expose the issue. What's happening here is that egraph
optimizations are turning the IR into a form that the x64 backend can't
codegen. Namely there's no general purpose lowering of i64x2 being
converted to f64x2. The Wasm frontend never produces this but the
optimizations internally end up producing this.
Notably here the result of this function is constant and what's
happening is that a convert-of-a-splat is happening. In lieu of adding
the full general lowering to x64 (which is perhaps overdue since this is
the second or third time this panic has been triggered) I've opted to
add constant propagation optimizations for int-to-float conversions.
These are all based on the Rust `as` operator which has the same
semantics as Cranelift. This is enough to fix the issue here for the
time being.
* winch: Add support for WebAssembly loads/stores
Closes https://github.com/bytecodealliance/wasmtime/issues/6529
This patch adds support for all the instructions involving WebAssembly
loads and stores for 32-bit memories. Given that the `memory64` proposal
is not enabled by default, this patch doesn't include an
implementation/tests for it; in theory minimal tweaks to the
currrent implementation will be needed in order to support 64-bit
memories.
Implemenation-wise, this change, follows a similar pattern as Cranelift
in order to calculate addresses for dynamic/static heaps, the main
difference being that in some cases, doing less work at compile time is
preferred; the current implemenation only checks for the general case of
out-of-bounds access for dynamic heaps for example.
Another important detail regarding the implementation, is the
introduction of `MacroAssembler::wasm_load` and
`MacroAssembler::wasm_store`, which internally use a common
implemenation for loads and stores, with the only difference that the
`wasm_*` variants set the right flags in order to signal that these
operations are not trusted and might trap.
Finally, given that this change introduces support for the last set of
instructions missing for a Wasm MVP, it removes most of Winch's copy of
the spectest suite, and switches over to using the official test suite
where possible (for tests that don't use SIMD or Reference Types).
Follow-up items:
* Before doing any deep benchmarking I'm planning on landing a couple of
improvements regarding compile times that I've identified in parallel
to this change.
* The `imports.wast` tests are disabled because I've identified a bug
with `call_indirect`, which is not related to this change and exists
in main.
* Find a way to run the `tests/all/memory.rs` (or perhaps most of
integration tests) with Winch.
--
prtest:full
* Review comments
* Add stack overflow tests
* Add stack overflow tests for indirect calls
* Check for stack overflow on function entry
* Ignore the call tests on windows, as stack overflows trap
* Bless the winch filetests
* Update the wasm-tools family of crates
Brings in support for validating gc instructions, but they're all left
disabled for now.
* Update fuzz test case generation
* More test fixes, remove stray files
* More test fixes
* Rebase
* winch: Multi-Value Part 2: Blocks
This commit adds support for the Multi-Value proposal for blocks.
In general, this change, introduces multiple building blocks to enable
supporting arbitrary params and results in blocks:
* `BlockType`: Introduce a block type, to categorize the type of each
block, this makes it easier to categorize blocks per type and also
makes it possible to defer the calculation of the `ABIResults` until
they are actually needed rather than calculating everyghing upfront
even though they might not be needed (when in an unreachable state).
* Push/pop operations are now frame aware. Given that each
`ControlStackFrame` contains all the information needed regarding
params and results, this change moves the the implementation of the
push and pop operations to the `ControlStackFrame` struct.
* `StackState`: this struct holds the entry and exit invariants of each
block; these invariants are pre-computed when entering the block and
used throughout the code generation, to handle params, results and
assert the respective invariants.
In terms of the mechanics of the implementation: when entering each
block, if there are results on the stack, the expected stack pointer
offsets will be calculated via the `StackState`, and the `target_offset`
will be used to create the block's `RetArea`. Note that when entering
the block and calculating the `StackState` no space is actually reserved
for the results, any space increase in the stack is deffered until the
results are popped from the value stack via
`ControlStackFrame::pop_abi_results`.
The trickiest bit of the implementation is handling constant values that
need to be placed on the right location on the machine stack. Given that
constants are generally not spilled, this means that in order to keep
the machine and value stack in sync (spilled-values-wise), values must
be shuffled to ensure that constants are placed in the expected location results wise.
See the comment in `ControlStackFrame::adjust_stack_results` for more
details.
* Review fixes
* winch: Add memory instructions
This commit adds support for the following memory instructions to winch:
* `data.drop`
* `memory.init`
* `memory.fill`
* `memory.copy`
* `memory.size`
* `memory.grow`
In general the implementation is similar to what other instructions via
builtins are hanlded (e.g. table instructions), which involve stack
manipulation prior to emitting a builtin function call, with the
exception of `memory.size`, which involves loading the current length
from the `VMContext`
* Emit right shift instead of division to obtain the memory size in pages
This commit reworks the `br_table` logic so that it correctly handles
all the jumps involved to each of the targets.
Even though it is safe to use the default branch for type information,
it is not safe to use it to derive the base stack pointer and base value
stack length. This change ensures that each target offset is taken into
account to balance the value stack prior to each jump.
Follow up to:
https://github.com/bytecodealliance/wasmtime/pull/7547
In which I overlooked this change and the fuzzer found an issue with the
following program:
```wat
(module
(func (export "") (result i32)
block (result i32)
i32.const 0
end
i32.const 0
i32.const 0
br_table 0
)
)
```
This commit ensures that the stack pointer is correctly positioned when
emitting br_table.
We can't know for sure which branch will be taken, but since all
branches must share the same type information, we can be certain that
the expectations regarding the stack pointer are the same and thus can
we use the default target in order to ensure the correct placement.
* winch: Introduce `ABIParams` and `ABIResults`
This commit prepares Winch to support WebAssembly Multi-Value.
The most notorious piece of this change is the introduction of the
`ABIParams` and `ABIResults` structs which are type wrappers around the
concept of an `ABIOperand`, which is the underlying main representation
of a param or result.
This change also consolidates how the size for WebAssembly types is
derived by introducing `ABI::sizeof`, as well as introducing
`ABI::stack_slot_size` to concretely indicate the stack slot size in
bytes for stack params, which is ABI dependent.
* winch: Add the necessary ABI building blocks for multi-value
This change adds the necessary changes at the ABI level in order to
handle multi-value.
The most notable modifications in this change are:
* Modifying Winch's default ABI to reverse the order of results,
ensuring that results that go in the stack should always come first;
this makes it easier to respect the following two stack invariants:
* Spilled memory values always precede register values
* Spilled values are stored from oldest to newest, matching their
respective locations on the machine stack.
* Modify all calling conventions supported by Winch so that only one result, the first one is stored in
registers. This differs from their vanilla counterparts in that these
ABIs can handle multiple results in registers. Given that Winch is not
a generic code generator, keeping the ABI close to what Wasmtime
expects makes it easier to pass multiple results at trampolines.
* Add more multi-value tests
This commit adds more tests for multi-value and improves documentation.
prtest:full
* Address review feedback
This commit properly derives a scratch register for a particular
WebAssembly type. The included spec test uncovered that the previous
implementation used a int scratch register to assign float stack
arguments, which resulted in a panic.
This change is a follow up to https://github.com/bytecodealliance/wasmtime/pull/7443;
after it landed I realized that Winch doesn't include spec tests for
local.get and loca.set.
Those tests uncovered a bug on the handling of the constant pool: given
Winch's singlepass nature, there's very little room know all the
constants ahead of time and to register them all at once at emission
time; instead they are emitted when they are needed by an instruction.
Even though Cranelift's machinery is capable of deuplicated constants in
the pool, `register_constant` assumes and checks that each constat
should only be pushed once. In Winch's case, since we emit as we go, we
need to carefully check if the constant is one was not emitted before,
and if that's the case, register it. Else we break the invariant that
each constant should only be registered once.
* Implement support for `thread` in `*.wast` tests
This commit implements support for `thread` and `wait` in `*.wast` files
and imports the upstream spec test suite from the `threads` proposal.
This additionally and hopefully makes it a bit easier to write threaded
tests in the future if necessary too.
* Fix compile of fuzzing
* winch(x64): Add support for table instructions
This change adds support for the following table insructions:
`elem.drop`, `table.copy`, `table.set`, `table.get`, `table.fill`,
`table.grow`, `table.size`, `table.init`.
This change also introduces partial support for the `Ref` WebAssembly
type, more conretely the `Func` heap type, which means that all the
table instructions above, only work this WebAssembly type as of this
change.
Finally, this change is also a small follow up to the primitives
introduced in https://github.com/bytecodealliance/wasmtime/pull/7100,
more concretely:
* `FnCall::with_lib`: tracks the presence of a libcall and ensures that
any result registers are freed right when the call is emitted.
* `MacroAssembler::table_elem_addr` returns an address rather than the
value of the address, making it convenient for other use cases like
`table.set`.
--
prtest:full
* chore: Make stack functions take impl IntoIterator<..>
* Update winch/codegen/src/codegen/call.rs
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Remove a dangling `dbg!`
* Add comment on branching
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* winch(x64): Call indirect
This change adds support for the `call_indirect` instruction to Winch.
Libcalls are a pre-requisite for supporting `call_indirect` in order to
lazily initialy funcrefs. This change adds support for libcalls to
Winch by introducing a `BuiltinFunctions` struct similar to Cranelift's
`BuiltinFunctionSignatures` struct.
In general, libcalls are handled like any other function call, with the
only difference that given that not all the information to fulfill the
function call might be known up-front, control is given to the caller
for finalizing the call.
The introduction of function references also involves dealing with
pointer-sized loads and stores, so this change also adds the required
functionality to `FuncEnv` and `MacroAssembler` to be pointer aware,
making it straight forward to derive an `OperandSize` or `WasmType` from
the target's pointer size.
Finally, given the complexity of the call_indirect instrunction, this
change bundles an improvement to the register allocator, allowing it to
track the allocatable vs non-allocatable registers, this is done to
avoid any mistakes when allocating/de-allocating registers that are not
alloctable.
--
prtest:full
* Address review comments
* Fix typos
* Better documentation for `new_unchecked`
* Introduce `max` for `BitSet`
* Make allocatable property `u64`
* winch(calls): Overhaul `FnCall`
This commit simplifies `FnCall`'s interface making its usage more
uniform throughout the compiler. In summary, this change:
* Avoids side effects in the `FnCall::new` constructor, and also makes
it the only constructor.
* Exposes `FnCall::save_live_registers` and
`FnCall::calculate_call_stack_space` to calculate the stack space
consumed by the call and so that the caller can decide which one to
use at callsites depending on their use-case.
* tests: Fix regset tests
* Bump wasm-tools crates
Two major changes/reasons for this update:
* Primarily pulling in support for semicolons-in-WIT files. Semicolons are
not currently required, though, so I'll follow-up later with actual
semicolons.
* The syntax for parsing `(if ...)` was fixed in `wast`. Previously it
did not require `(then ...)` but this is required by the spec. New
spec tests require this as well. This breaks existing text format
tests which don't use `(then ...)` inside of an `(if ...)`. Most tests
were updated by hand but `embenchen_*` tests were updated by running
through the old parser to produce non-s-expression using code.
* Fix an example `*.wat`
* Update wasm-tools family of crates
Mostly minor updates, but staying up-to-date.
* Update text format syntax
* Update cargo vet entries
* Update more old text syntax
* winch: Support f32.abs and f64.abs on x64
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add an implementation of f32.neg and f64.neg
* Enable spec tests for winch with f{32,64}.{neg,abs}
* Enable differential fuzzing for f{32,64}.{neg,abs} for winch
* Comments from code review
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* winch: Add support for `br_table`
This change adds support for the `br_table` instruction, including
several modifications to the existing control flow implementation:
* Improved handling of jumps to loops: Previously, the compiler erroneously
treated the result of loop blocks as the definitive result of the jump. This
change fixes this bug.
* Streamlined result handling and stack pointer balancing: In the past, these
operations were executed in two distinct steps, complicating the process of
ensuring the correct invariants when emitting unconditional jumps. To simplify
this, `CodeGenContext::unconditional_jump` is introduced . This function
guarantees all necessary invariants are met, encapsulating the entire operation
within a single function for easier understanding and maintenance.
* Handling of unreachable state at the end of a function: when reaching the end
of a function in an unreachable state, clear the stack and ensure that the
machine stack pointer is correctly placed according to the expectations of the
outermost block.
In addition to the above refactoring, the main implementation of the
`br_table` instruction involves emitting labels for each target. Within each
label, an unconditional jump is emitted to the frame's label, ensuring correct
stack pointer balancing when the jump is emitted.
While it is possible to optimize this process by avoiding intermediate labels
when balancing isn't required, I've opted to maintain the current
implementation until such optimization becomes necessary.
* chore: Rust fmt
* fuzzing: Add `BrTable` to list of support instructions
* docs: Improve documentation for `unconditional_jump`