This commit fixes a test that has failed on CI and seems flaky. This
test asserts that stderr/stdout are writable and that a 200ms timeout
does not elapse when calculating this. The implementation, on Windows at
least, of `poll_oneoff` always reports stdout/stderr as "ready for
write" meaning that this test should trivially pass. The flakiness comes
from the timeout parameter where apparently sometimes on CI the
determination of this takes more than 200ms. This means that the timer
subscription may or may not fire depending on the timing of the test,
and currently the test always fails if the timeout subscription fires.
This commit updates the test to store whether a timeout has passed and
only fail if `poll_oneoff` is attempted when the timeout has already
elapsed. This will allow the timeout to fire so long as the two streams
are considered writable at the same time, achieving the desired result
of the test to assert that, without timing out both stdout and stderr
are considered writable.
* Fix inconsistent deadlines for monotonic clock timeouts
Rename `MonotonicClockSubscription`'s `deadline` field to
`absolute_deadline` and change the code that computes the value to
compute an absolute time rather than a relative time, so that it's
interpreted consistently everywhere.
Fixes#6588.
* Polling on stdio is not yet implemented on Windows.
These are implemented as a combination of two steps, mask generation and
mask expansion. Our comparision rules only return their results as a mask
register, so we need to expand the mask into lane sized elements.
We have 20 (!) comparision instructions, nearly the full table of all IntCC codes
in VV, VX and VI formats. However there are some holes in this table.
They are:
* `vmsltu.vi`
* `vmslt.vi`
* `vmsgtu.vv`
* `vmsgt.vv`
* `vmsgeu.*`
* `vmsge.*`
Most of these can be replaces with the inverted IntCC instruction, however
this commit only implements the existing instructions without any inversion
and the inverted VV versions of `sgtu`/`sgt`/`sgeu`/`sge` since we need them
to get the full icmp functionality.
I've split the actual mask expansion into it's own separate rule since we are
going to need it for the `fcmp` rules as well.
The instruction selection for `icmp` is on a separate rule simply because the
rulse end up less verbose than if they were inlined directly into the `icmp` rule.
This change adds support for the `loop`, `br` and `br_if` instructions
as well as unreachable code handling. Whenever an instruction that
affects reachability is emitted (`br` in the case of this PR), the
compiler will enter into an unreachable code state, essentially ignoring
most of the subsequent instructions. When handling the unreachable code
state some instructions are still observed, in order to determine if
reachability should be restored.
This change, particulary the handling of unreachable code, adds all the
necessary building blocks to the compiler to emit other instructions
that affect reachability (e.g `unreachable`, `return`).
Address review feedback
* Rename `branch_target` to `is_branch_target`
* Use the visitor pattern to handle unreachable code
Avoid string comparison and split unreachable handling functions
* x64: Add non-SSSE3 lowerings of `pshufb`
Or, more accurately, add lowerings which don't use `pshufb`'s
functionality at all where possible or otherwise fall back to a new
libcall. This particular instruction seemed uniquely difficult to
implement in the backend so I decided to "cop out" and use libcall
instead. The libcall will be used for `popcnt`, `shuffle`, and
`swizzle` instructions when SSSE3 isn't available.
* Implemente SSE2 popcnt with Hacker's Delight
* x64: Implement passing vector arguments in the fastcall convention
Windows says that vector arguments are passed indirectly so handle that
here through the `ABIArg::ImplicitPtrArg` variant. Some additional
handling is added to the general machinst backend.
* Update `gen_load_base_offset` for x64
* Fill out remaining bits of fastcall and vector parameters
* Remove now-unnecessary `Clone` bound
A `shuffle` specialization can fall-back to the default implementation
and otherwise two other rules already gated on SSE4.1 for other
instructions needs a second clause for SSSE3 as well.
Note that the `shuffle` variant will get tested in a subsequent commit
that adds a `pshufb` fallback.
The Bytecode Alliance didn't actually audit these crates but rather
simply trusts them, per the notes. Previously we didn't have a way
to express this distinction, but now we do.
* Cranelift: Adjust virtual SP after `tail` call-conv callees return
Callees that use the `tail` calling convention will pop stack arguments from the
stack for their callers. They do not, however, adjust the caller's virtual SP,
so that still needs to happen in our ABI and `CallSite` code. This is, however,
slightly trickier than just emitting a nominal SP adjustment pseudo-instruction
because we cannot let regalloc attempt to spill or reload values between the
call and the SP adjustment because the stack offsets will be off by the size of
the stack arguments to the call. Therefore, we add the number of bytes that the
callee pops to the `CallInfo` structures and have emission update the virtual SP
atomically with regards to the call itself.
Fixes#6581Fixes#6582
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Have `fuzzgen` generate functions with the `tail` calling convention
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Add i32.popcnt and i64.popcnt to winch
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* Add fallback implementation for popcnt
Move popcnt fallback up into the macroassembler.
Share code between 32-bit and 64-bit popcnt
Add Popcnt to winch differential fuzzing
* Use _rr functions where possible
* Avoid using scratch register for popcnt
The scratch register was getting clobbered by the calls to `and`,
so this is instead passing in a CodeGenContext to the masm's `popcnt`
and letting it handle its own registers
* Add filetests for the fallback popcnt impls
* address PR comments
* Update filetests
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
This commit adds lowerings to have lowering rules for these instructions
on the x64 backend when the `phadd{w,d}` instructions are not available.
Additionally this implements `iadd_pairwise` for i8x16 types which while
not used by wasm enables running the CLIF runtest on x64.
* Update adapter build
* Rename the binary artifact to `wasi_snapshot_preview1.wasm` and update
build scripts to account for this.
* Update documentation to mention difference between reactor/command
builds.
Closes#6569
* More renaming
* winch(x64) Add support for if/else
This change adds the necessary building blocks to support control flow;
this change also adds support for the `If` / `Else` operators.
This change does not include multi-value support. The idea is to add
support for multi-value across the compiler (functions and blocks) as
a separate future change.
The general gist of the change is to track the presence of control flow
frames as part of the code generation context and emit the corresponding
labels as and instructions as control flow blocks are found.
* PR review
* Allocate 64 slots for `ControlStackFrames`
* Explicitly track else branches through an else entry in
`ControlStackFrame`
I believe that historically it was difficult to write a 128-bit constant
in ISLE but nowadays ISLE supports `u128` integer literals so it's now
possible to do that. This commit moves some existing constants in
`x64/lower/isle.rs` into `lower.isle` directly to more easily understand
them when reading over instruction lowerings by avoiding having to
context switch between ISLE and Rust to understand the value of a constant.
* cranelift: Remove the `fcvt_low_from_sint` instruction
This commit removes this instruction since it's a combination of
`swiden_low` plus `fcvt_from_sint`. This was used by the WebAssembly
`f64x2.convert_low_i32x4_s` instruction previously but the corresponding
unsigned variant of the instruction, `f64x2.convert_low_i32x4_u`, used a
`uwiden_low` plus `fcvt_from_uint` combo. To help simplify Cranelift's
instruction set and to make these two instructions mirrors of each other
the Cranelift instruction is removed.
The s390x and AArch64 backend lowering rules for this instruction could
simply be deleted as the previous combination of the `swiden_low` and
`fcvt_from_sint` lowering rules produces the same code. The x64 backend
moved its lowering to a special case of the `fcvt_from_sint` lowering.
* Fix cranelift-fuzzgen build
This commit adds a targeted optimization aimed at fixing #6562 as a
temporary measure for now. The "real" fix for #6562 is to add a full
lowering of `fcvt_from_uint` to the x64 backend, but for now adding this
rule should fix the specific issue cropping up.
Closes#6562
In #5382 ("egraph support: rewrite to work in terms of CLIF data
structures"), we added the `trace-log` feature to the set of default
features for `cranelift-codegen`. I think this was an accident, probably
added while debugging and overlooked when cleaning up to merge.
So let's undo that change. Fixes#6548.
Adds the widening add instructions from the V spec. These are `vwadd{u,}.{w,v}{v,x}`.
This also adds a bunch of rules to try to match these instructions. And some
of these end up being quite complex.
Rules that match `{u,s}widen_high` are the same as their `{u,s}widen_low` counterparts
but they first do a `vslidedown` of half the vector, to bring the top lanes down.
`uwiden_low` rules are the same as the `swiden_low` rules, but they use `vwaddu.*`
instead of `vwadd.*` which is the unsigned version of the instruction.
Now, in each of these groups of rules we have a few different instructions.
`vwadd.wv` does a 2*SEW = 2*SEW + SEW, this just means that the elements in the RHS
vector are first sign extended before doing the addition. The only trick here is
that since the result is 2*SEW we must use a vstate type that has half the element
size as the type that we want to end up with. So to end up with a i32x4 `iadd` we need
to pass in a i16x4 type as a vstate type.
`vwadd.vv` does 2*SEW = SEW + SEW, so as long as both sides are extended we can
use this instruction. Again we must pass in a type with half the element size.
`vwadd.wx` and `vwadd.vx` do the same thing, but the RHS is expected to be a extended
and splatted X register, so we try to match exactly that. To make these rules
more applicable I've previously added some egraph rules (#6533) that convert
`{u,s}widen_{low,high}` into `splat+{u,s}extend`, this way we only have to try to
match the splat version, which reduces the number of rules.
All of these rules use `vstate_mf2`. This is sets the LMUL setting to 1/2, meaning
that at most we will read half of the source vector registers, and the result
is guaranteed to fit in a single destination register. Otherwise the CPU could
have to write the result into multiple register, which is something that the
ISA supports, but adds a bunch of constraints that we dont need here.
* riscv64: Add SIMD Load+Extends
* riscv64: Add SIMD `{u,s}widen_{low,high}`
* riscv64: Add `gen_slidedown_half`
This isn't really necessary yet, but we are going to make a lot of use for it
in the widening arithmetic instructions, so might as well add it now.
* riscv64: Add multi widen SIMD instructions
* riscv64: Typo Fix
This commit fixes the implementation of `pop_to_reg`. In the previous
implementation, whenever a specific register was requested as the
destination register and a register-to-register moved happened the
source register was never marked as free.
This issue became more evident with more complex programs involving
control flow and division for example.
This is the first of a series of patches to support control flow in
Winch.
This change exposes `MachLabel` from cranelift for it to be consumed by
Winch's `MacroAssembler` and `Assembler`.