This commit adds lowerings to have lowering rules for these instructions
on the x64 backend when the `phadd{w,d}` instructions are not available.
Additionally this implements `iadd_pairwise` for i8x16 types which while
not used by wasm enables running the CLIF runtest on x64.
* Update adapter build
* Rename the binary artifact to `wasi_snapshot_preview1.wasm` and update
build scripts to account for this.
* Update documentation to mention difference between reactor/command
builds.
Closes#6569
* More renaming
* winch(x64) Add support for if/else
This change adds the necessary building blocks to support control flow;
this change also adds support for the `If` / `Else` operators.
This change does not include multi-value support. The idea is to add
support for multi-value across the compiler (functions and blocks) as
a separate future change.
The general gist of the change is to track the presence of control flow
frames as part of the code generation context and emit the corresponding
labels as and instructions as control flow blocks are found.
* PR review
* Allocate 64 slots for `ControlStackFrames`
* Explicitly track else branches through an else entry in
`ControlStackFrame`
I believe that historically it was difficult to write a 128-bit constant
in ISLE but nowadays ISLE supports `u128` integer literals so it's now
possible to do that. This commit moves some existing constants in
`x64/lower/isle.rs` into `lower.isle` directly to more easily understand
them when reading over instruction lowerings by avoiding having to
context switch between ISLE and Rust to understand the value of a constant.
* cranelift: Remove the `fcvt_low_from_sint` instruction
This commit removes this instruction since it's a combination of
`swiden_low` plus `fcvt_from_sint`. This was used by the WebAssembly
`f64x2.convert_low_i32x4_s` instruction previously but the corresponding
unsigned variant of the instruction, `f64x2.convert_low_i32x4_u`, used a
`uwiden_low` plus `fcvt_from_uint` combo. To help simplify Cranelift's
instruction set and to make these two instructions mirrors of each other
the Cranelift instruction is removed.
The s390x and AArch64 backend lowering rules for this instruction could
simply be deleted as the previous combination of the `swiden_low` and
`fcvt_from_sint` lowering rules produces the same code. The x64 backend
moved its lowering to a special case of the `fcvt_from_sint` lowering.
* Fix cranelift-fuzzgen build
This commit adds a targeted optimization aimed at fixing #6562 as a
temporary measure for now. The "real" fix for #6562 is to add a full
lowering of `fcvt_from_uint` to the x64 backend, but for now adding this
rule should fix the specific issue cropping up.
Closes#6562
In #5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.
So let's undo that change. Fixes#6548.
Adds the widening add instructions from the V spec. These are `vwadd{u,}.{w,v}{v,x}`.
This also adds a bunch of rules to try to match these instructions. And some
of these end up being quite complex.
Rules that match `{u,s}widen_high` are the same as their `{u,s}widen_low` counterparts
but they first do a `vslidedown` of half the vector, to bring the top lanes down.
`uwiden_low` rules are the same as the `swiden_low` rules, but they use `vwaddu.*`
instead of `vwadd.*` which is the unsigned version of the instruction.
Now, in each of these groups of rules we have a few different instructions.
`vwadd.wv` does a 2*SEW = 2*SEW + SEW, this just means that the elements in the RHS
vector are first sign extended before doing the addition. The only trick here is
that since the result is 2*SEW we must use a vstate type that has half the element
size as the type that we want to end up with. So to end up with a i32x4 `iadd` we need
to pass in a i16x4 type as a vstate type.
`vwadd.vv` does 2*SEW = SEW + SEW, so as long as both sides are extended we can
use this instruction. Again we must pass in a type with half the element size.
`vwadd.wx` and `vwadd.vx` do the same thing, but the RHS is expected to be a extended
and splatted X register, so we try to match exactly that. To make these rules
more applicable I've previously added some egraph rules (#6533) that convert
`{u,s}widen_{low,high}` into `splat+{u,s}extend`, this way we only have to try to
match the splat version, which reduces the number of rules.
All of these rules use `vstate_mf2`. This is sets the LMUL setting to 1/2, meaning
that at most we will read half of the source vector registers, and the result
is guaranteed to fit in a single destination register. Otherwise the CPU could
have to write the result into multiple register, which is something that the
ISA supports, but adds a bunch of constraints that we dont need here.
* riscv64: Add SIMD Load+Extends
* riscv64: Add SIMD `{u,s}widen_{low,high}`
* riscv64: Add `gen_slidedown_half`
This isn't really necessary yet, but we are going to make a lot of use for it
in the widening arithmetic instructions, so might as well add it now.
* riscv64: Add multi widen SIMD instructions
* riscv64: Typo Fix
This commit fixes the implementation of `pop_to_reg`. In the previous
implementation, whenever a specific register was requested as the
destination register and a register-to-register moved happened the
source register was never marked as free.
This issue became more evident with more complex programs involving
control flow and division for example.
This is the first of a series of patches to support control flow in
Winch.
This change exposes `MachLabel` from cranelift for it to be consumed by
Winch's `MacroAssembler` and `Assembler`.
The previous implementation of Imm12::maybe_from_u64 did not match the
constant values 0xfff or 0xfff000, even though those are expressible in
the aarch64 12-bit immediate format.
Also the explicit test for 0 was unnecessary; it's a valid example of
all bits outside the least-significant 12 bits being 0.
* Force `execute_across_threads` to use multiple threads
Currently this uses tokio's `spawn_blocking` but that will reuse threads
in its thread pool. Instead spawn a thread and perform a single poll on
that thread to force lots of fresh threads to be used and ideally stress
TLS management further.
* Add a guard against using stale stacks
This commit adds a guard to Wasmtime's async support to double-check
that when a call to `poll` is finished that the currently active TLS
activation pointer does not point to the stack that is being switched
off of. This is attempting to be a bit of a defense-in-depth measure to
prevent stale pointers from sticking around in TLS. This is currently
happening and causing #6493 which can result in unsoundness but
currently is manifesting as a crash.
* Fix a soundness issue with the component model and async
This commit addresses #6493 by fixing a soundness issue with the async
implementation of the component model. This issue has been presence
since the inception of the addition of async support to the component
model and doesn't represent a recent regression. The underlying problem
is that one of the base assumptions of the trap handling code is that
there's only one single activation in TLS that needs to be pushed/popped
when a stack is switched (e.g. a fiber is switched to or from). In the
case of the component model there might be two activations: one for an
invocation of a component function and then a second for an invocation
of a `realloc` function to return results back to wasm (e.g. in the case
an imported function returns a list).
This problem is fixed by changing how TLS is managed in the presence of
fibers. Previously when a fiber was suspended it would pop a single
activation from the top of the stack and save that to get pushed when
the fiber was resumed. This has the benefit of maintaining an entire
linked list of activations for the current thread but has the problem
above where it doesn't handle a fiber with multiple activations on it.
Instead now TLS management is done when a fiber is resumed instead of
suspended. Instead of pushing/popping a single activation the entire
linked list of activations is tracked for a particular fiber and stored
within the fiber itself. In this manner resuming a fiber will push
all activations onto the current thread and suspending a fiber will pop
all activations for the fiber (and store them as a new linked list in
the fiber's state itself).
This end result is that all activations on a fiber should now be managed
correctly, regardless of how many there are. The main downside of this
commit is that fiber suspension and resumption is more complicated, but
the hope there is that fiber suspension typically corresponds with I/O
not being ready or similar so the order of magnitude of TLS operations
isn't too significant compared to the I/O overhead.
Closes#6493
* Review comments
* Fix restoration during panic
* Allow async yield from epoch interruption callback
When an epoch interruption deadline arrives, previously it was possible
to yield to the async executor, or to invoke a callback on the wasm
stack, but not both. This changes the API to allow callbacks to run and
then request yielding to the async executor.
* Fix Wasmtime C API implementation
* Upgrade file-per-thread-logger to v0.2.0
Signed-off-by: Benjamin Bouvier <public@benj.me>
* Update audits too
Signed-off-by: Benjamin Bouvier <public@benj.me>
---------
Signed-off-by: Benjamin Bouvier <public@benj.me>
This commit fixes the helper function `execute_across_threads` in tests
to actually execute across threads. This function was refactored
in #3975 and accidentally introduced a bug where the future provided was
polled once and then dropped, cancelling it instead of executing it to
completion.
This is necessary for implementing callee-pops calling conventions, as is
required for tail calls. This is just a small part of tail calls, and doesn't
implement everything, but is a good piece to land on its own so that eventual PR
isn't so huge.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This change relaxes what kinds of modules can be run when wasi-threads
is enabled via `--wasi-modules experimental-wasi-threads`. Previously,
as reported in #6153, simple modules that made no use of thread spawning
or shared memories were preemptively rejected when the wasi-threads
context was created. This is too restrictive.
Instead, this change does the following:
- it moves the check for whether a module is valid according to the
wasi-threads specification to the point a new thread is spawned; this
resolves#6153
- as noted in #6153, this change also adds a better error message
indicating that wasi-threads expects a shared memory import
- the way this is implemented also improves the module instantiation: by
constructing an `InstancePre` once when the `WasiThreadsCtx` is built,
we might shave off a bit of time from the "spawn a thread" call;
this supercedes a similar effort in #5741
This commit removes the SSE4.1 requirement for the `enable_simd` CLIF
feature. This means that the new baseline required is SSSE3 for the
WebAssembly SIMD proposal. Many existing tests for codegen were all
updated to explicitly enable `has_sse41` and runtests were updated to
test with and without SSE 4.1.
Wasmtime's fuzzing is additionally updated to flip the SSE4.1 feature to
enable fuzz-testing this.