* cranelift: Only enable winch calling convention for x64
This commit is a follow up to https://github.com/bytecodealliance/wasmtime/pull/8198; it ensures ensures that the Winch calling convention is only allowed when the architecture is x64.
* Update comment
There were two remaining uses, both in cranelift-tools, in the `wasm`
and `souper-harvest` subcommands.
These days it's better to use `wasmtime compile --emit-clif` rather than
`clif-util wasm`, in order to see the CLIF that we actually generate
rather than a rough approximation of it.
And `clif-util souper-harvest` will give more relevant results if you
feed it CLIF from `wasmtime compile --emit-clif` as well. To make that
easier to do, I've made souper-harvest accept multiple input files, as
well as accepting directories which recursively contain input files.
There have been more fuzzbugs than expected and the onslaught of issues
something I definitely don't have time to deal with right now; let's try
again later in the year (unless someone else wants to drive this!).
This puts the fuzzing logic under an off-by-default feature so it can
still be tested and developed in-tree as desired.
* Exit through Cranelift-generated trampolines for builtins
This commit changes how builtin functions in Wasmtime (think
`memory.grow`) are implemented. These functions are required to exit
through some manner of trampoline to handle runtime requirements for
backtracing right now. Currently this is done via inline assembly for
each architecture (or external assembly for s390x). This is a bit
unfortunate as it's a lot of hand-coding and making sure everything is
right, and it's not easy to update as it's multiple platforms to update.
The change in this commit is to instead use Cranelift-generated
trampolines for this purpose instead. The path for invoking a builtin
function now looks like:
* Wasm code calls a statically known symbol for each builtin.
* The statically known symbol will perform exit trampoline duties (e.g.
pc/fp/etc) and then load a function pointer to the host
implementation.
* The host implementation is invoked and then proceeds as usual.
The main new piece for this PR is that all wasm modules and functions
are compiled in parallel but an output of this compilation phase is what
builtin functions are required. All builtin functions are then unioned
together into one set and then anything required is generated just
afterwards. That means that only one builtin-trampoline per-module is
generated per-builtin.
This work is inspired by #8135 and my own personal desire to have as
much about our ABI details flowing through Cranelift as we can. This in
theory makes it more flexible to deal with future improvements to our
ABI.
prtest:full
* Fix some build issues
* Update winch test expectations
* Update Winch to use new builtin shims.
This commit refactors the Winch compiler to use the new trampolines for
all Wasmtime builtins created in the previous commits. This required a
fair bit of refactoring to handle plumbing through a new kind of
relocation and function call.
Winch's `FuncEnv` now contains a `PrimaryMap` from `UserExternalNameRef`
to `UserExternalName`. This is because there's now more than one kind of
name than just wasm function relocations, so the raw index space of
`UserExternalNameRef` is no longer applicable. This required threading
`FuncEnv` to more locations along with some refactorings to ensure that
lifetimes work out ok.
The `CompiledFunction` no longer stores a trait object of how to map
name refs to names and now directly has a `Primarymap`. This also means
that Winch's return value from its `TargetIsa` is a `CompiledFunction`
as opposed to the previous just-a-`MachBuffer` so it can also package up
all the relocation information. This ends up having `winch-codegen`
depend on `wasmtime-cranelift-shared` as a new dependency.
* Review feedback
The wasmtime-cranelift-shared crate is not as useful as it once was, as
it's no longer possible to build wasmtime with only winch; winch uses
the trampolines generated by cranelift now.
* Disable argument packing with the winch cc if extension is present
* Don't generate functions that use SIMD and the Winch calling convention
* Handle i128 values for the winch calling convention
* Use some `byte_add` methods from Rust 1.75
Now that we're able to, use some convenience methods from the 1.75
release of Rust.
* Use `Atomic*::from_ptr` from Rust 1.75
Helps clean up some casts and clarify local intent.
* Remove callee saves from Winch's MacroAssembler trait
prtest:mingw-x64
* Remove the unused callee_saved_regs function
* Removed the unused callee_saved function from Winch's aarch64 backend
* Remove additional unused functions from the Winch ABI trait
* PCC: x64: insertlane instructions read only scalar-sized values.
Also fix `clamp_range` on greater-than-64-bit values: no range fact is
possible in this case (propagate `Option` a bit deeper to represent
this).
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67538.
* Rebase to latest main with leaf-function changes and update test expectations.
Currently even when the `wmemcheck` Cargo feature is disabled the
various related libcalls are still compiled into `wasmtime-runtime`.
Additionally their signatures are translated when lowering functions,
although the signatures are never used. This commit adds `#[cfg]`
annotations to compile these all out when they're not enabled.
Applying this change, however, uncovered a subtle bug in our libcalls.
Libcalls are numbered in-order as-listed in the macro ignoring `#[cfg]`,
but they're assigned a runtime slot in a `VMBuiltinFunctionsArray`
structure which does respect `#[cfg]`. This meant, for example, that if
`gc` was enabled and `wmemcheck` was disabled, as is the default for our
tests, then there was a hole in the numbering where libcall numbers were
mismatched at runtime and compile time.
To fix this I've first added a const assertion that the runtime-number
of libcalls equals the build-time number of libcalls. I then updated the
macro a bit to plumb the `#[cfg]` differently and now libcalls are
unconditionally defined regardless of cfgs but the implementation is
`std::process::abort()` if it's compiled out.
This ended up having a large-ish impact on the `disas` test suite. Lots
of functions have fewer signatures translation because wmemcheck, even
when disabled, was translating a few signatures. This also had some
assembly changes, too, because I believe functions are considered leaves
based on whether they declare a signature or not, so declaring an unused
signature was preventing all wasm functions from being considered leaves.
* Bump MSRV to 1.75.0
Coupled with today's release of 1.77.0. Today's release actually has
some nice functions and such I think we'll want to use in Wasmtime but
we'll need to wait 3 months to be able to use them.
* Fix dead code warning in onnx
* Cranelift: make `bitcast`s between integer and reference types do a copy
This avoids putting multiple conflicting regalloc constraints on values that
would otherwise be aliases of each other (one constraint that the value must be
in a register as a function argument and another that it must be in a stack slot
for a safepoint) by splitting the value in two and each split value getting its
own constraint.
Fixes#8180
* Fix test expectations
* Return the last result through registers in the winch calling convention
* Add a run test for winch calling convention functions
* Disable the Winch calling convention in cranelift's aarch64 backend
* Remove the aarch64 winc.clif test
* Skip realignment for winch results on the stack
* cranelift: Optimize select_spectre_guard, carefully
This commit makes two changes to our treatment of
`select_spectre_guard`.
First, stop annotating this instruction as having any side effects. We
only care that if its value result is used, then it's computed without
branching on the condition input. We don't otherwise care when the value
is computed, or if it's computed at all.
Second, introduce some carefully selected ISLE egraph rewrites for this
instruction. These particular rewrites are those where we can statically
determine which SSA value will be the result of the instruction. Since
there is no actual choice involved, there's no way to accidentally
introduce speculation on the condition input.
* Add filetests
* Add a `ModuleBuilder` type to the `wasmtime` crate
This commit is extracted from #8055 and adds a new `ModuleBuilder` type
which is intended to be able to further configure compilation beyond
what the constructors of `Module` already provide. For example in #8055
knobs will be added to process `*.dwp` files for more debuginfo
processing.
Co-authored-by: yowl00 <scott.waye@hubse.com>
* Fix build
* Review feedback
* Rename ModuleBuilder to CodeBuilder
* Fix doc errors
---------
Co-authored-by: yowl00 <scott.waye@hubse.com>
* Implement opt-in for enabling WASI to block the current thread
Currently all of Wasmtime's implementation of WASI is built on Tokio,
but some operations are currently not asynchronous such as opening a
file or reading a directory. Internally these use `spawn_blocking` to
satisfy the requirements of async users of WASI to avoid blocking the
current thread. This use of `spawn_blocking`, however, ends up causing
mostly just performance overhead in the use case of the CLI, for
example, where async wasm is not used. That then leads to this commit,
implementing an opt-in mechanism to be able to block the current thread.
A `WasiCtx` now has a flag indicating whether it's ok to block the
current thread and that's carried to various filesystem operations that
use `spawn_blocking`. The call to `spawn_blocking` is now conditional
and skipped if this flag is set.
Additionally the invocation point in the CLI for wasm modules is wrapped
in a Tokio runtime to avoid entering/exiting Tokio in the "leaves" when
wasm calls the host, as happens today. This hits a better fast path in
Tokio that appears to be more efficient.
Semantically this should not result in any change for CLI programs
except in one case: file writes. By default writes on `output-stream` in
WASI are asynchronous meaning that only one write can be in flight at a
time. That being said all current users are immediately blocking waiting
for this write to finish anyway, so this commit won't end up changing
much. It's already the case that file reads are always blocking, for
example. If necessary in the future though this can be further
special-cased at the preview1 layer.
* Thread around allow_blocking_current_thread less
Pass around `&File` instead.
This commit correctly handles the result of the memory grow builtin function. Previously, it was assumed that the result of memory grow must be of the the target's pointer type, which doesn't accurately represent the address space covered by the memory type.
* PCC: support imported memories as well.
Exposed by a fuzzbug
(https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67429); rather
than exclude from fuzzing, it seemed easier to just implement. We need
to define a new memory type to describe the memory definition struct
pointed to by vmctx, and set up points-to facts appropriately.
* Review feedback: abstract out a bit of the pointer-and-memtype handling logic.
* Canonicalize fpromote/fdemote operations
This commit changes the strategy implemented in #8146 to canonicalize
promotes/demotes of floats to additionally handle #8179.
Closes#8179
* Canonicalize fvpromote_low/fvdemote as well
Currently, every access to a table element does a bounds-check with a
conditional branch to a block that explicitly traps.
Instead, when Spectre mitigations are enabled, let's change the address
computation to return a null pointer for out-of-bounds accesses, and
then allow the subsequent load or store to trap.
This is less code in that case since we can reuse instructions we needed
anyway.
We return the MemFlags that the memory access should use, in addition to
the address it should access. That way we don't record trap metadata on
memory access instructions which can't actually trap due to being
preceded by a `trapnz`-based bounds check, when Spectre mitigations are
disabled.
In addition, when the table has constant size and the element index is a
constant and mid-end optimization is enabled, this allows the
bounds-check to be constant folded away. Later, #8139 will let us
optimize away the select_spectre_guard instruction in this case too.
Once we also implement #8160, `tests/disas/typed-funcrefs.wat` should be
almost as fast as native indirect function calls.
In fact, we can even use this fact to infer ranges of the results!
(Previously we had a "default" copy-and-paste across all instruction
cases that 64-bit reg writes produced 64 bits of undefinedness and we
hadn't tightened the spec here.)
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67436.
We have various constant-propagation/folding rules in the mid-end that
generate new `iconst`s in place of other expressions. We got a fuzzbug
with PCC wherein it was not able to verify that an iadd-iadd-uextend
combination generating a Wasm address was in-range when rules
reassociated the iadds to put constants together. Rather than carefully
augment all rules to propagate constant facts only where they exist on
the inputs, I opted to add a hook to the optimizer to generate brand-new
assertions on *all* iconsts that we insert. This adds a little more work
during verification (not too much hopefully: it's pretty low-overhead to
check that `mov $1, %rax` puts `1` in `rax`) but should provide broader
coverage of interesting corner-cases where optimization breaks the PCC
chain.
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67432.
* Enhance `typed-funcrefs.wast` test with more cases
Have the same function with slightly different variations to compare
codegen between the possible strategies.
* Skip type checks on tables that don't need it
This commit implements an optimization to skip type checks in
`call_indirect` for tables that don't require it. With the
function-references proposal it's possible to have tables of a single
type of function as opposed to today's default `funcref` which is a
heterogenous set of functions. In this situation it's possible that a
`call_indirect`'s type tag matches the type tag of a
`table`-of-typed-`funcref`-values, meaning that it's impossible for the
type check to fail.
The type check of a function pointer in `call_indirect` is refactored
here to take the table's type into account. Various things are shuffled
around to ensure that the right traps still show up in the right places
but the important part is that, when possible, the type check is omitted
entirely.
* Update crates/cranelift/src/func_environ.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* x64: Refactor lowerings for `insertlane`
Going through old PRs I stumbled across #2716 which is quite old at this
point. Upon adding the tests to `main` I see that most of it is actually
implemented except for load-lane-from-memory where the lane size is 8 or
16 bits. That requires explicitly opting-in with `sinkable_load_exact`
so this PR now subsumes the tests of #2716 in addition to implementing
this missing hole in lowerings.
This refactoring shuffles around where definitions are located to more
easily have access to `Value` to perform the relevant match. The generic
`vec_insert_lane` helper is now gone as well in lieu of direct matches
on `insertlane` lowerings.
Closes#2716
* Remove a no-longer-needed helper function
* Rename `-S common` to `-S cli`.
The "common" in `-S common` came from "wasi-common" which came from the
idea of having code in common between Wasmtime, Lucet, and others. It
doesn't have a clear meaning for end users, and has a risk of being
interpreted as "common" functionality that's generally available
everywhere.
This PR renames `-S common` to `-S cli`, and documents it as including
the WASI CLI APIs, to clarify its purpose `-S common` is still accepted,
with a warning.
* Fix formatting in RELEASES.md.
* Disable the warning about `-S common` for now.
* Don't lookup trap codes twice on traps
Currently whenever a signal or trap is handled in Wasmtime we perform
two lookups of the trap code. One during the trap handling itself to
ensure we can handle the trap, and then a second once the trap is
handled. There's not really any need to do two here, however, as the
result of the first can be carried over to the second.
While I was here refactoring things I also changed how some return
values are encoded, such as `take_jmp_buf_if_trap` now returns a more
self-descriptive enum.
* Fix dead code warning on MIRI
* Update crates/environ/src/trap_encoding.rs
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
* Fix min-platform build
---------
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
This commit uses the support from #8162 to skip null function pointer
checks when performing an indirect call. Instead of an explicit check
the segfault from accessing the null function pointer is caught and
annotated with the appropriate trap.
Closes#5291
* Enhance test around stack overflow
This commit enhances the `host_always_has_some_stack` a bit in light of
some thinking around #8135. Notably this test ensures that the host
itself never segfaults even if the wasm exhausts all of its stack. There
are a number of ways that we can exit out to the host, though, and only
one was tested previously. This commit updates to ensure more cases are
covered.
* Fix test to test correct thing
* Update tests/all/stack_overflow.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This allows us to make a single pass over an argument's slots instead of
needing to first pre-allocate temporaries, because we don't need to hold
a borrow of `ctx`. For the same reason, we can also emit the new
instructions directly instead of buffering them and copying them at the
end.
This approach also moves less data around. A SmallInstVec is a usize
plus four `M::Inst`s, which are 32 or 40 bytes each, while an ABIArg is
only 40 bytes. Since the SmallVecs inside ABIArg almost never spill to
the heap, cloning one uses less memory than allocating temporary space
for a few instructions.
* Remove wasm-c-api submodule
This submodule hasn't been updated in ~3 years at this point and we
additionally don't need most of the submodule. Instead add a script to
copy the files we need and verify in CI that the files are up-to-date.
This also makes using the C API a bit nicer where you don't have to have
two `include` directories with a Wasmtime source tree, just one
suffices.
* Don't format wasm.h{,h} vendored files