This commits removes an assert that checked that the stack pointer
position at the end of a call should be greater or equal than the
position registered at the callsite.
Even though this is true in most cases, there are cases in which this is
invariant is not met and as well as there are cases in which the stack pointer will
inevitably be greater than the position registered at callsite:
1. When the call setup doesn't spill any values and instead it
only consumes memory values from the value stack, the stack pointer
can end up being less than what it was at the callsite.
2. When the call setup spills values that are not going to be consumed
by the call (not used as params to the function) the stack pointer
position can end up being greater than what it was at the callsite.
The assert was originally introduced to ensure the right deallocation of
stack space consumed by the call, and it could be improved by applying
the heuristics mentioned above, but I prefer to remove it since we
already assert when emitting the epilogue that both the value stack and
machine stack are in the correct state when fishing compilation.
This change includes an extra test in which the original invariant
doesn't hold (case 2 described above occurs).
* Remove Wasmtime ABIs from Cranelift
This commit removes the `Wasmtime*` family of ABIs from Cranelift. These
were originally added to support multi-value in Wasmtime via the
`TypedFunc` API, but they should now no longer be necessary. In general
this is a higher-level Wasmtime concern than something all backends of
Cranelift should have to deal with.
Today with recent refactorings it's possible to remove the reliance on
ABI details for multi-value and instead codify it directly into the
Cranelift IR generated. For example wasm calls are able to have a
"purely internal" ABI which Wasmtime's Rust code doesn't see at all, and
the Rust code only interacts with the native ABI. The native ABI is
redefined to be what the previous Wasmtime ABIs were, which is to return
the first of a 2+ value return through a register (native return value)
and everything else through a return pointer.
* Remove some wasmtime_system_v usage in tests
* Add back WasmtimeSystemV for s390x
* Fix some docs and references in winch
* Fix another doc link
Pass in the calling convention so we can make an informed decision of which
register to use as a stack-limit temporary register. We need to choose different
registers on x64, for example, when using the `tail` calling convention vs
system v calling convention.
This is a leftover from #3302 where we used to inject a special `vmctx`
struct if the test requested it. We've removed that capability in #5386
but accidentally left this in, which caused some weird handling of these
test invocations.
* poll.wit: we can returl a list<bool> now
* adapter: fixes for poll-oneoff returning list<bool>
* wasi preview2: fixes for poll-oneoff returning list<bool>
* adapter: manually import poll-oneoff to avoid pulling in std to allocate return vec
* comment describing the skip functions
* preview2: refactor WasiCtxBuilder impl, and make WasiCtx fields private
The optional-fields WasiCtxBuilder was an intermediate stepping stone
which has outlived its usefulness - there is now only one sensible
default value for each field, so we no longer need to expose the Default
(empty) constructor.
There is no need for the WasiCtx::builder method being an alias for
default, and it creates user confusion.
Finally, the WasiCtx itself is a private implementation detail for use
in this crate only. The WasiCtxBuilder is the only way to customize
its contents. We don't want to let users modify the fields of WasiCtx
after it has been used by a wasm guest, because the guest may have made
assumptions that fields won't change - e.g. the stderr field is fetched
once by the adapter and assumed to always be the same, environment
variables are copied into libc once and assumed to always be the same.
* fix tests
On x86_64, for example, the temporary register was previously hardcoded to `r11`
which was available as a non-argument, caller-saved register on both system v
and fast call. However in the `tail` calling convention it is used as an
argument register.
This commit plumbs through the calling convention to where we are choosing a
temporary register for stack probes so that we can make an informed decision.
Fixes#6640
When the size of the tail callee's and tail caller's stack arguments is the
same, we don't need to copy the return address and then write it to the stack
again, since its original location will also be its final location. That is, we
were previously reading a value and then writing it back to the exact same
location.
* [wit-bindgen] provide more control over type ownership
This replaces the `duplicate_if_necessary` parameter to
`wasmtime::component::bindgen` with a new `ownership` parameter which provides
finer-grained control over whether and how generated types own their fields.
The default is `Ownership::Owning`, which means types own their fields
regardless of how they are used in functions. These types passed by reference
when used as parameters to guest-exported functions. Note that this also
affects how unnamed types (e.g. `list<list<string>>`) are passed: using a
reference only at the top level (e.g. `&[Vec<String>]` instead of `&[&[&str]]`,
which is more difficult to construct when using non-`'static` data).
The other option is `Ownership::Borrowing`, which includes a
`duplicate_if_necessary` field, providing the same code generation strategy as
was used prior to this change.
If we're happy with this approach, I'll open another PR in the `wit-bindgen`
repo to match.
This also fixes a bug that caused named types to be considered owned and/or
borrowed when they shouldn't have been due to having fields with unnamed types
which were owned and/or borrowed in unrelated interfaces.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* fix test breakage
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
---------
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Lean on wasmparser's type information when translating components
Before this commit Wasmtime would build up its own representation of
type information independently of wasmparser, effectively duplicating
things such as type scopes and managing indices. This to some degree is
required because Wasmtime's type information is serialized into compiled
images and wasmparser's information isn't easily serialized. With the
advent of resources in components, however, the task of doing this
correctly has become much more difficult.
When component translation was first written it was more difficult to
acquire type information from `wasmparser::Validator`. Nowadays there's
helpful type information exposed every step of the way which makes it
much easier to get at the types of items while we're translating rather
than only at the very end (or not at all). This is one of the
motivations for this commit where now it's possible to avoid duplicating
the work `wasmparser` is doing whereas before it wasn't as easy.
Additionally with resources in the component model they perform namely a
number of involved type substitutions during instantiation of arbitrary
components which isn't implemented in Wasmtime today. Some of this will
be required for Wasmtime to correctly implement resources so instead of
doing all that again I've decided to replace Wasmtime's management of
types with wasmparser's management of types.
The main difference in this commit is that Wasmtime no longer tracks
type information during translation and that conversion into Wasmtime's
type hierarchy now has a different entry point. Previously conversion
would happen based on raw types read from the wasm file (think
index-based things) whereas now translation happens on `wasmparser`'s
parsed and validated hierarchy of types (think ID-based things rather
than index-based things). This makes translation slightly more involved
but overall it's largely performing the same work.
One gotcha with this PR is that core wasm modules using GC types and
typed function references could theoretically have worked previously but
they no longer work. It turns out that `wasmparser` is not correctly
surfacing type information in components for core modules that use GC
types, namely because `wasmparser`'s validated type hierarchy uses the
same core wasm types as what's read raw from the type section. This in
turn means that the index-based format can't be resolved. This is a bug
in `wasmparser` which will need resolving but is a big chunk of work to
take on, so for now the component model will panic on these sorts of
modules (which are disabled by default anyway).
Overall the end-goal of this commit is to ease the implementation of
resources a bit by heavily relying on `wasmparser`'s understanding of
resources, chiefly the functionality of performing type substitutions on
subcomponent instantiations.
* Fix a warning
* Fix factc build
* Refactor compilation of component-related functions
Extract some common functionality out to helper functions to reduce the
boilerplate involved in compiling component-related functions. This is
in preparation for upcoming resource-related work which adds more
functions to compile.
* Fix a typo
* aarch64: Fix `AuthenticatedRet` when stack bytes are popped
This commit fixes an accidental issue with #6478 where when pointer
authentication was enabled and stack bytes are being popped during a
return this didn't work. In this situation an authenticated return
instruction was used, such as `retab`, and no extra stack bytes were
popped. The fix here is to use the non-`retab` path which handles stack
bytes being popped if there are stack bytes to pop.
Closes#6567
* Still use `retab` for `is_hint: false`
The memory-store format of `pextrw` requires SSE4.1 despite `pextrw`
itself only requiring SSE2. This commit updates this lowering to require
an extra feature.
* x64: Fix ISA requirement for `pmaddubsw`
This was erroneously listed as SSE2 but this is actually an SSSE3
instruction. This commit updates the ISA requirement as well as related
lowerings to have everything work with an SSE2 baseline as well which is
mostly plumbing to make sure the relaxed-simd dot product instructions
use it when it's available but otherwise fall-back to the deterministic
lowering.
* Add require ssse3 feature
* Fix preexisting wat test
This commit removes a setting for Cranelift which I've found a bit
confusing historically and I think is no longer necessary. The setting
is currently documented as enabling SIMD instructions, but that only
sort of works for the x64 backend and none of the other backends look at
it. Historically this was used to flag to Cranelift that a higher x64
baseline feature set is required for codegen but as of #6625 that's no
longer necessary.
Otherwise it seems more Cranelift-like nowadays to say that vector
instructions generate SIMD instructions where non-vector instructions
probably don't, but may still depending on activated CPU features. In
that sense I'm not sure if a dedicated `enable_simd` setting is still
motivated, so this PR removes it.
This renames some features in the x86 backend such as `use_avx_simd` to
`use_avx` since the `_simd` part is no longer part of the computation
now that `enable_simd` is gone.
* issue-6592: args accumulator match entry block params
* Formatting
* Formatting
* added push_non_formal to aarch64 and x86 implementations of add_ret_area_ptr case in compute_arg_locs
---------
Co-authored-by: Cameron <cameron@Camerons-MacBook-Pro.local>
All instructions in Cranelift now have lowerings for SSE2 as a baseline,
even if they're not exactly the speediest things in the world. This
enables lowering the baseline required for the SIMD proposal for
WebAssembly to SSE2, the base features set of x86_64. Lots of tests were
updated here to remove explicit `has_foo=false` annotations as they no
longer have any effect.
Additionally fuzzing has been updated to enable disabling `sse3` and
`ssse3` which will help stress-test all previously-added lowerings.
* Add a `set_random` function to `WasiCtxBuilder`.
Add a `set_random` function to `WasiCtxBuilder`, allowing users that
construct a default interface via `WasiCtxBuilder::default()` or
`WasiCtx::builder()` to initialize the `random` state.
Fixes#6576.
* Rename `set_random_for_testing` to `set_secure_random_to_custom_generator`
Remove the `cfg(test)`, and add comments advising users of the potential
hazards.
* rustfmt
* Fix documentation links.
I originally thought this was the cause of PRs bouncing last night but
now that they're landing again I'm less certain that this is the cause.
Nevertheless this seems good to do regardless.
It is needed because cargo features are additive. Given a situation where you have two dependencies:
* One depends on cranelift aarch64 codegen, so it enables the `aarch64` feature.
* The other depends on cranelift native codegen, so it does not enable any arch feature.
Given the additive property of features, cranelift-codegen will be built with the `aarch64` feature for both dependencies (assuming they use the same cranelift version), so the native ISA will not be included (unless it is aarch64).
With the `host-arch` feature added here, the native host ISA can now be explicitly requested without risk of another crate of the dependency tree disabling it.
This commit fixes a test that has failed on CI and seems flaky. This
test asserts that stderr/stdout are writable and that a 200ms timeout
does not elapse when calculating this. The implementation, on Windows at
least, of `poll_oneoff` always reports stdout/stderr as "ready for
write" meaning that this test should trivially pass. The flakiness comes
from the timeout parameter where apparently sometimes on CI the
determination of this takes more than 200ms. This means that the timer
subscription may or may not fire depending on the timing of the test,
and currently the test always fails if the timeout subscription fires.
This commit updates the test to store whether a timeout has passed and
only fail if `poll_oneoff` is attempted when the timeout has already
elapsed. This will allow the timeout to fire so long as the two streams
are considered writable at the same time, achieving the desired result
of the test to assert that, without timing out both stdout and stderr
are considered writable.
* Fix inconsistent deadlines for monotonic clock timeouts
Rename `MonotonicClockSubscription`'s `deadline` field to
`absolute_deadline` and change the code that computes the value to
compute an absolute time rather than a relative time, so that it's
interpreted consistently everywhere.
Fixes#6588.
* Polling on stdio is not yet implemented on Windows.
These are implemented as a combination of two steps, mask generation and
mask expansion. Our comparision rules only return their results as a mask
register, so we need to expand the mask into lane sized elements.
We have 20 (!) comparision instructions, nearly the full table of all IntCC codes
in VV, VX and VI formats. However there are some holes in this table.
They are:
* `vmsltu.vi`
* `vmslt.vi`
* `vmsgtu.vv`
* `vmsgt.vv`
* `vmsgeu.*`
* `vmsge.*`
Most of these can be replaces with the inverted IntCC instruction, however
this commit only implements the existing instructions without any inversion
and the inverted VV versions of `sgtu`/`sgt`/`sgeu`/`sge` since we need them
to get the full icmp functionality.
I've split the actual mask expansion into it's own separate rule since we are
going to need it for the `fcmp` rules as well.
The instruction selection for `icmp` is on a separate rule simply because the
rulse end up less verbose than if they were inlined directly into the `icmp` rule.
This change adds support for the `loop`, `br` and `br_if` instructions
as well as unreachable code handling. Whenever an instruction that
affects reachability is emitted (`br` in the case of this PR), the
compiler will enter into an unreachable code state, essentially ignoring
most of the subsequent instructions. When handling the unreachable code
state some instructions are still observed, in order to determine if
reachability should be restored.
This change, particulary the handling of unreachable code, adds all the
necessary building blocks to the compiler to emit other instructions
that affect reachability (e.g `unreachable`, `return`).
Address review feedback
* Rename `branch_target` to `is_branch_target`
* Use the visitor pattern to handle unreachable code
Avoid string comparison and split unreachable handling functions
* x64: Add non-SSSE3 lowerings of `pshufb`
Or, more accurately, add lowerings which don't use `pshufb`'s
functionality at all where possible or otherwise fall back to a new
libcall. This particular instruction seemed uniquely difficult to
implement in the backend so I decided to "cop out" and use libcall
instead. The libcall will be used for `popcnt`, `shuffle`, and
`swizzle` instructions when SSSE3 isn't available.
* Implemente SSE2 popcnt with Hacker's Delight
* x64: Implement passing vector arguments in the fastcall convention
Windows says that vector arguments are passed indirectly so handle that
here through the `ABIArg::ImplicitPtrArg` variant. Some additional
handling is added to the general machinst backend.
* Update `gen_load_base_offset` for x64
* Fill out remaining bits of fastcall and vector parameters
* Remove now-unnecessary `Clone` bound
A `shuffle` specialization can fall-back to the default implementation
and otherwise two other rules already gated on SSE4.1 for other
instructions needs a second clause for SSSE3 as well.
Note that the `shuffle` variant will get tested in a subsequent commit
that adds a `pshufb` fallback.
The Bytecode Alliance didn't actually audit these crates but rather
simply trusts them, per the notes. Previously we didn't have a way
to express this distinction, but now we do.