Bake in the `*mut VMContext` to `&mut Instance` translation into the
macro-generated trampolines to avoid the need to use
`Instance::from_vmctx` with an extra level of indentation everywhere.
Additionally some libcalls are now entirely safe code as their one
unsafe operation was the `VMContext` to `Instance` translation.
* Make Wasmtime compatible with Stacked Borrows in MIRI
The fact that Wasmtime executes correctly under Tree Borrows but not
Stacked Borrows is a bit suspect and given what I've since learned about
the aliasing models I wanted to give it a stab to get things working
with Stacked Borrows. It turns out that this wasn't all that difficult,
but required two underlying changes:
* First the implementation of `Instance::vmctx` is now specially crafted
in an intentional way to preserve the provenance of the returned
pointer. This way all `&Instance` pointers will return a `VMContext`
pointer with the same provenance and acquiring the pointer won't
accidentally invalidate all prior pointers.
* Second the conversion from `VMContext` to `Instance` has been updated
to work with provenance and such. Previously the conversion looked
like `&mut VMContext -> &mut Instance`, but I think this didn't play
well with MIRI because `&mut VMContext` has no provenance over any
data since it's zero-sized. Instead now the conversion is from `*mut
VMContext` to `&mut Instance` where we know that `*mut VMContext` has
provenance over the entire instance allocation. This shuffled a fair
bit around to handle the new closure-based API to prevent escaping
pointers, but otherwise no major change other than the structure and
the types in play.
This commit additionally picks up a dependency on the `sptr` crate which
is a crate for prototyping strict-provenance APIs in Rust. This is I
believe intended to be upstreamed into Rust one day (it's in the
standard library as a Nightly-only API right now) but in the meantime
this is a stable alternative.
* Clean up manual `unsafe impl Send` impls
This commit adds a new wrapper type `SendSyncPtr<T>` which automatically
impls the `Send` and `Sync` traits based on the `T` type contained.
Otherwise it works similarly to `NonNull<T>`. This helps clean up a
number of manual annotations of `unsafe impl {Send,Sync} for ...`
throughout the runtime.
* Remove pointer-to-integer casts with tables
In an effort to enable MIRI's "strict provenance" mode this commit
removes the integer-to-pointer casts in the runtime `Table`
implementation for Wasmtime. Most of the bits were already there to
track all this, so this commit plumbed around the various pointer types
and with the help of the `sptr` crate preserves the provenance of all
related pointers.
* Remove integer-to-pointer casts in CoW management
The `MemoryImageSlot` type stored a `base: usize` field mostly because I
was too lazy to have a `Send`/`Sync` type as a pointer, so this commit
updates it to use `SendSyncPtr<u8>` and then plumbs the pointer-ness
throughout the implementation. This removes all integer-to-pointer casts
and has pointers stores as actual pointers when they're at rest.
* Remove pointer-to-integer casts in "raw" representations
This commit changes the "raw" representation of `Func` and `ExternRef`
to a `*mut c_void` instead of the previous `usize`. This is done to
satisfy MIRI's requirements with strict provenance, properly marking the
intermediate value as a pointer rather than round-tripping through
integers.
* Minor remaining cleanups
* Switch to Stacked Borrows for MIRI on CI
Additionally enable the strict-provenance features to force warnings
emitted today to become errors.
* Fix a typo
* Replace a negative offset with `sub`
* Comment the sentinel value
* Use NonNull::dangling
In #6338 that PR is bouncing on CI due to Windows running out of disk
space. Locally a `./ci/run-tests.sh` compile produces a ~13G build
directory. Testing on CI it looks like Windows builders have ~13G of
free space on them for GitHub Actions. I think something in that PR has
tipped us just over the edge of requiring too much space, although I'm
not sure exactly what.
To address the issue this commit disables DWARF debug information
entirely on all builders on CI. Not only should this save a sliver of
compile time but this reduces a local build directory to ~4.7G, over a
50% reduction. That should keep us fitting in CI for more time to come
hopefully.
* x64: Add non-SSE 4.1 lowerings for `insertlane` instructions
This commit avoids the use of `pinsr*` when SSE 4.1 is enabled by using
alternative means of inserting values into vectors.
* Clarify comments
* Remove the initializer from a global's type information
This commit removes the `Global::initializer` field to instead only
store type information about the global rather than its initialization
state. Instead now the initializer is stored separately from the type of
the global, and only for defined globals. This removes the need in a few
locations to synthesize dummy initializers.
* Actually delete what I intended to delete
* Simplify global initializer loop
* Remove lingering references to `experimental_x64`
This hasn't been experimental for quite a long time now and these are
all mostly old vestiges of when the current backend was under
development, so remove them all as they're no longer necessary.
* Try to fix test
* Try again to fix tests
* add fuel parameter to control plane
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* remove unused cranelift setting
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* update fuel parameter documentation
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* include control plane in printed test case
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* print control plane as comment
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* set fuel limit in cranelift settings
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* separate cli arg parsing from arbitrary impl
`ControlPlane::new` is replaced with an `Arbitrary` implementation
and `ControlPlane::set_fuel`, as it was before.
The CLI argument parsing inside the fuzz target `cranelift-fuzzgen`
is separated from the `Arbitrary` implementation of `TestCase`.
To achieve this, the `TestCase` doesn't carry a build TargetIsa
anymore, but it it generated with an isa- and flags-builder.
The CLI argument can then modify the flags further before the
`TargetIsa` is built.
The `TestCase.isa_builder` is wrapper in an `Rc` such that optimized
test cases can share the same one and it doesn't need to be made
`Clone`.
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* parse control planes with clif-util
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* move parsing logic from cranelift-control to cranelift-reader
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* remove parsing and settings plumbing
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
---------
Co-authored-by: Falk Zwimpfer <24669719+FalkZ@users.noreply.github.com>
Co-authored-by: Moritz Waser <mzrw.dev@pm.me>
* test: clean up `wasi_testsuite` test files
When running `cargo run wasi_testsuite -- --nocapture`, some test cases
generate files and folders that need to be removed for subsequent test
runs to pass. These paths are all suffixed with `*.cleanup` so we walk
the suite directory before and after execution to clean up these files.
* test: fix how `wasi_testsuite` maps directories
This change fixes how directories are mapped from the host environment
to the WebAssembly guest when the `wasi_testsuite` test is run.
Previously, Wasmtime's `--dir` flag was used which expects the passed
directory to be in the current working directory. In this case, though,
the directories were buried down `tests/wasi-testsuite/wasi-common/...`
so the correct flag is `--mapdir`. Switching to this flag allows a large
number of the ignored tests to now pass.
* wasi: add the `wasi-testsuite` tests for wasi-common
As described [here], this uses the `prod/testsuite-base` branch in which
the tests are built as `.wasm` files.
[here]: https://github.com/WebAssembly/wasi-testsuite/#getting-started
* chore: update `walkdir` everywhere to its latest version
This is done in order to use it for `wasi_testsuite` testing.
* vet: extend `walkdir`'s exemption
* test: factor out `get_wasmtime_command`
This will be helpful for `wasi_testsuite` testing.
* test: use all `wasi-testsuite` test cases
This change alters the `wasi_testsuite` test to run all of the available
test cases in [wasi-testsuite]. This involved making the test runner a
bit more robust to the various shapes of JSON specifications in that
project. Unfortunately, the `wasi_testsuite` test fails some of the
cases, so I added a `WASI_COMMON_IGNORE_LIST` to avoid these
temporarily. (This may remind some of the Wasm testsuite ignore lists in
Cranelift; those relied on `build.rs` to create a `#[test]` for each
test case, which I felt is not yet needed here).
It's unclear to me why the tests are failing. It could be because:
- wasi-common has a bug
- wasi-testsuite overspecifies (or incorrectly specifies) a test
- the test runner incorrectly configures Wasmtime's CLI execution.
But this change makes it easier to resolve this. Remove the file from
`WASI_COMMON_IGNORE_LIST` and run `cargo test wasi_testsuite --
--nocapture`. The printed output will show the expected result, the
actual result, and a command to replicate the failure from the command
line.
[wasi-testsuite]: https://github.com/WebAssembly/wasi-testsuite
* review: add "shrinking" comment
* Unify differential result comparison logic
Execution of functions handles the case where one instance stack
overflows and the other didn't, but instantiation didn't handle this
case. This commit fixes this issue by sharing the result comparison
logic between the two branches.
* Update crates/fuzzing/src/oracles.rs
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
---------
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
* Run some tests in MIRI on CI
This commit is an implementation of getting at least chunks of Wasmtime
to run in MIRI on CI. The full test suite is not possible to run in MIRI
because MIRI cannot run Cranelift-produced code at runtime (aka it
doesn't support JITs). Running MIRI, however, is still quite valuable if
we can manage it because it would have trivially detected
GHSA-ch89-5g45-qwc7, our most recent security advisory. The goal of this
PR is to select a subset of the test suite to execute in CI under MIRI
and boost our confidence in the copious amount of `unsafe` code in
Wasmtime's runtime.
Under MIRI's default settings, which is to use the [Stacked
Borrows][stacked] model, much of the code in `Instance` and `VMContext`
is considered invalid. Under the optional [Tree Borrows][tree] model,
however, this same code is accepted. After some [extremely helpful
discussion][discuss] on the Rust Zulip my current conclusion is that
what we're doing is not fundamentally un-sound but we need to model it
in a different way. This PR, however, uses the Tree Borrows model for
MIRI to get something onto CI sooner rather than later, and I hope to
follow this up with something that passed Stacked Borrows. Additionally
that'll hopefully make this diff smaller and easier to digest.
Given all that, the end result of this PR is to get 131 separate unit
tests executing on CI. These unit tests largely exercise the embedding
API where wasm function compilation is not involved. Some tests compile
wasm functions but don't run them, but compiling wasm through Cranelift
in MIRI is so slow that it doesn't seem worth it at this time. This does
mean that there's a pretty big hole in MIRI's test coverage, but that's
to be expected as we're a JIT compiler after all.
To get tests working in MIRI this PR uses a number of strategies:
* When platform-specific code is involved there's now `#[cfg(miri)]` for
MIRI's version. For example there's a custom-built "mmap" just for
MIRI now. Many of these are simple noops, some are `unimplemented!()`
as they shouldn't be reached, and some are slightly nontrivial
implementations such as mmaps and trap handling (for native-to-native
function calls).
* Many test modules are simply excluded via `#![cfg(not(miri))]` at the
top of the file. This excludes the entire module's worth of tests from
MIRI. Other modules have `#[cfg_attr(miri, ignore)]` annotations to
ignore tests by default on MIRI. The latter form is used in modules
where some tests work and some don't. This means that all future test
additions will need to be effectively annotated whether they work in
MIRI or not. My hope though is that there's enough precedent in the
test suite of what to do to not cause too much burden.
* A number of locations are fixed with respect to MIRI's analysis. For
example `ComponentInstance`, the component equivalent of
`wasmtime_runtime::Instance`, was actually left out from the fix for
the CVE by accident. MIRI dutifully highlighted the issues here and
I've fixed them locally. Some locations fixed for MIRI are changed to
something that looks similar but is subtly different. For example
retaining items in a `Store<T>` is now done with a Wasmtime-specific
`StoreBox<T>` type. This is because, with MIRI's analyses, moving a
`Box<T>` invalidates all pointers derived from this `Box<T>`. We don't
want these semantics, so we effectively have a custom `Box<T>` to suit
our needs in this regard.
* Some default configuration is different under MIRI. For example most
linear memories are dynamic with no guards and no space reserved for
growth. Settings such as parallel compilation are disabled. These are
applied to make MIRI "work by default" in more places ideally. Some
tests which perform N iterations of something perform fewer iterations
on MIRI to not take quite so long.
This PR is not intended to be a one-and-done-we-never-look-at-it-again
kind of thing. Instead this is intended to lay the groundwork to
continuously run MIRI in CI to catch any soundness issues. This feels,
to me, overdue given the amount of `unsafe` code inside of Wasmtime. My
hope is that over time we can figure out how to run Wasm in MIRI but
that may take quite some time. Regardless this will be adding nontrivial
maintenance work to contributors to Wasmtime. MIRI will be run on CI for
merges, MIRI will have test failures when everything else passes,
MIRI's errors will need to be deciphered by those who have probably
never run MIRI before, things like that. Despite all this to me it seems
worth the cost at this time. Just getting this running caught two
possible soundness bugs in the component implementation that could have
had a real-world impact in the future!
[stacked]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md
[tree]: https://perso.crans.org/vanille/treebor/
[discuss]: https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Tree.20vs.20Stacked.20Borrows.20.26.20a.20debugging.20question
* Update alignment comment
This PR updates the link in Wasmtime's README at the root of the repo to
link to Cranelift's new website rather than its subdirectory; and also
updates the README in `cranelift/` to point to the new website,
https://cranelift.dev/.
This pulls in Kerollmops/slice-group-by#20 which is necessary to get
Cranelift "clean" in MIRI with Stacked Borrows. I plan on leveraging
this in a subsequent commit to #6332 which turns on Stacked Borrows for
Wasmtime, but currently it fails due to this transitive dependency of
Cranelift, hence the update.
* wasmtime: Fix resetting stack-walking registers when entering/exiting Wasm
Fixes a regression from #6262, originally reported in
https://github.com/bytecodealliance/wasmtime-dotnet/pull/245
The issue was that we would enter Wasm and save the stack-walking registers but
never clear them after Wasm returns. Then if a host-to-host call tried to
capture a stack, we would mistakenly attempt to use those stale registers to
start the stack walk. This mistake would be caught by an assertion, triggering a
panic.
This commit fixes the issue by managing the save/restore in the
`CallThreadState` construction/drop, rather than in the old `set_prev`
method.
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
* Plumb through `VMRuntimeLimits` when capturing stack traces
This way we can differentiate between the same module loaded in different stores
and avoid leaking other stores' frames into our backtraces.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
The previous timeout was 20 seconds which while it won't ever time out
on OSS-Fuzz a wasm module is highly unlikely to do anything interesting
past the first bit as if it takes longer it's probably an uninteresting
infinite loop. Additionally this should improve loading a whole corpus
as test cases won't randomly take 20 seconds to load.
* winch: Handle relocations and traps
This change introduces handling of traps and relocations in Winch, which
was left out in https://github.com/bytecodealliance/wasmtime/pull/6119.
In order to so, this change moves the `CompiledFunction` struct to the
`wasmtime-cranelift-shared` crate, allowing Cranelift and Winch to
operate on a single, shared representation, following some of the ideas
discussed in https://github.com/bytecodealliance/wasmtime/pull/5944.
Even though Winch doesn't rely on all the fields of `CompiledFunction`,
it eventually will. With the addition of relocations and traps it
started to be more evident that even if we wanted to have different
representations of a compiled function, they would end up being very
similar.
This change also consolidates where the `traps` and `relocations` of the
`CompiledFunction` get created, by introducing a constructor that
operates on a `MachBufferFinalized<Final>`, esentially encapsulating
this process in a single place for both Winch and Cranelift.
* Rework the shared `CompiledFunction`
This commit reworks the shared `CompiledFunction` struct. The compiled
function now contains the essential pieces to derive all the information
to create the final object file and to derive the debug information for
the function.
This commit also decouples the dwarf emission process by introducing
a `metadata` field in the `CompiledFunction` struct, which is used as
the central structure for dwarf emission.
* Improve longevity for fuzzing corpus of wasm modules
This commit is an improvement to the longevity of Wasmtime's corpus of
fuzz inputs to the `instantiate` fuzzer. Currently the input to this
fuzzers is arbitrary binary data which is a "DNA" of sorts of what to
do. This DNA changes over time as we update the fuzzer and add
configuration options, for example. When this happens though the
meaning of all existing inputs in the corpus changes because they all
have slightly different meanings now. The goal of this commit is to
improve the usefulness of a historical corpus, with respect to the
WebAssembly modules generated, across changes to the DNA.
A custom mutator is now provided for the `instantiate` fuzzer. This
mutator will not only perform libfuzzer's default mutation for the input
but will additionally place an "envelope" around the fuzz input. Namely,
the fuzz input is encoded as a valid WebAssembly module where the actual
input to the fuzzer is a trailing custom section. When the fuzzer runs
over this input it will read the custom section, perform any
configuration generation necessary, and then use the envelope module as
the actual input to the fuzzer instead of whatever was generated from
the fuzz input. This means that when a future update is made to the DNA
of a module the interpretation of the fuzz input section will change but
the module in question will not change. This means that any interesting
shapes of modules with respect to instructions should be preserved over
time in theory.
Some consequences of this strategy, however, are:
* If the DNA changes then it's difficult to produce minor mutations of
the original module. This is because mutations generate a module based
on the new DNA which is likely much different than the preexisting
module. This mainly just means that libFuzzer will have to rediscover
how to mutate up into interesting shapes on DNA changes but it'll
still be able to retain all the existing interesting modules.
Additionally this can be mitigate with the integration of
`wasm-mutate` perhaps into these fuzzers as well.
* Protection is necessary against libFuzzer itself with respect to the
module. The existing fuzzers only expect valid modules to be created,
but libFuzzer can now create mutations which leave the trailing
section in place, meaning the module is no longer valid. One option is
to record a cryptographic hash in the fuzz input section of the
previous module, only using the module if the hashes match. This
approach will not work over time in the face of binary format changes,
however. For example the multi-memory proposal changed binary
encodings a year or so ago meaning that any previous fuzz-generated
cases would no longer be guaranteed to be valid. The strategy settled
by this PR is to pass a flag to the execution function indicating if
the module is "known valid" and gracefully handle error if it isn't
(for example if it's a prior test case).
I'll note that this new strategy of fuzzing is not applied to the
`differential` fuzzer. This could theoretically use the same strategy
but it relies much more strictly on being able to produce a module with
properties like NaN canonicalization, resource limits, fuel to limit
execution, etc. While it may be possible to integrate this with
`differential` in the future I figured it'd be better to start with the
`instantiate` fuzzer and go from there.
* Fix doc build
* wasmtime: In-process sampling profiler
Unlike the existing profiling options, this works on all platforms and
does not rely on any external profiling tools like perf or VTune. On the
other hand, it can only profile time spent in the WebAssembly guest, not
in Wasmtime itself or other host code. Also it can't measure time as
precisely as platform-native tools can.
The profile is saved in the Firefox processed format, which can be
viewed using https://profiler.firefox.com/.
* Ensure func_offset is populated
* Refactor
* Review comments
* Move GuestProfiler to the wasmtime crate
* Document the new GuestProfiler API
* Add TODO comments for future work
* Use module_offset, not func_offset, as fallback PC
* Minimize work done during `sample()`
Use fxprof_processed_profile's support for looking up symbols to avoid
looking up the same PC more than once per profile.
* Keep profiler state in the store
Also extend the documentation based on review comments.
* Import debugid audit from Mozilla again
* Split out platform-specific logic for `Mmap`
This commit refactors the implementation of the `wasmtime_runtime::Mmap`
structure to have the platform-specific bits separated by file rather
than interspersed throughout `mmap.rs`. I plan in the near future to add
a faux implementation for `cfg(miri)` to get some tests running with
miri on CI.
At the same time this additionally updates the interface of `Mmap` to be
more miri-friendly in the sense of ensuring that mutability is all in
the right place and we don't eagerly mark items as safe too soon. For
example it seems questionable that previously you could get a mutable
slice to readonly memory. Probably not going to cause any issues, but
this interface should hopefully be more verification-friendly.
* Fix tests on Windows
* x64: Add non-SSE 4.1 lowerings of min/max instructions
This commit updates the x64 backend to avoid using various `p{min,max}*`
instructions if SSE 4.1 isn't enabled. These instructions are used for
comparisons as well as the `{u,s}{min,max}` instructions. Alternative
lowerings are primarily drawn from LLVM.
Through this refactoring the x64 backend now has also grown (not the
most efficient) lowerings for vector comparisons with `i64x2` types,
which it previously largely didn't have. This enabled copying some
non-x86_64 tests into the main test files for various operations.
* Review comments
This fixes an issue in the AArch64 backend where a `load_addr` helper
was used exclusively for lowering `splat`-of-a-loaded-address. This
helper expanded in some cases to a pseudo-`LoadAddr` instruction but the
lowering of this instruction doesn't actually exhaustively handle all
`AMode` values.
The fix in this commit is to remove the `load_addr` helper altogether to
remove the need to go from an `AMode` back to a `Reg`, instead going
directly from an address to a register. The one small wrinkle is a small
helper now to add the immediate offset to the address register, but
that's not too too bad to write.
By avoiding the `LoadAddr` instruction the unimplemented cases aren't
hit, so the codegen issue should be fixed.
This commit updates the test case generation for the `component_api`
fuzzer to prepare for an update to the `arbitrary` crate. The current
algorithm, with the latest `arbitrary` crate, generates a 20MB source
file which is a bit egregious. The goal here was to get that under
control by altering the parameters of test case generation and
additionally changing the structure of what's generated.
The new strategy is to have a limited set of "type fuel" which is
consumed as a type is generated. This bounds the maximal size of a type
in addition to its depth as prior. Additionally a fixed set of types are
generated first and then test cases select from these types as opposed
to test cases always generating types for themselves. Coupled together
this brings the size of the generated file back into the 200K range as
it was before.
* Slightly shrink compiled wasm modules
This commit shuffles trampolines to the end of a compiled ELF file
instead of interspersed throughout. Additionally trampolines are no
longer given a higher alignment requirement than is required by the ISA
as is given to functions since they're not perf critical.
The savings here are quite minor, only 0.3% locally on
spidermonkey.wasm.
* Fix winch compile
* Return a more descriptive `FunctionAlignment` from `TargetIsa`
* Push alignment further into Cranelift
Remove the need for taking a function's alignment and an ISA's alignment
and combining them, instead only using the function's alignment as the
source of truth.
* Review comments
* aarch64: Remove unnecessary saves on calls with differing ABIs
The AArch64 AAPCS calling convention, which all our backends use for
register allocation, indicates that the low 64-bits of the v8-v15
registers are all caller-saved. Cranelift doesn't track precisely which
registers are used, however, so it says the entire register is a
caller-saved register. This currently interacts poorly where a call from
one function to another where the ABIs differ forces the caller to save
all of v8-v15 in the prologue and restore it in the epilogue.
Currently in all cases, however, this isn't actually necessary. The
AArch64 backend also has an optimization where if both the caller and
the callee are using the same ABI then the clobbers of a `call`
instruction are not counted in the clobber set for the function since
nothing new can be clobbered. This way if `v8` is never used, for
example, it's not considered clobbered and it's not saved. This logic,
however, is comparing ABIs exactly which means that different names for
the same ABI, which don't differ in register allocation, force saves to
happen.
This comes up with trampolines generated by Wasmtime where the calling
convention of the trampoline is `WasmtimeSystemV`, for example, where
the callee (a wasm function) is `Fast`. Because these differ it means
that all trampolines are generating saves/restores of registers, despite
the actual underlying calling convention being the same.
This commit updates the optimization that skips including a `call`
instruction in the clobber set by comparing the caller and callee's ABI
clobber sets. If both clobber the same registers then for the purposes
of clobbers it's as-if they were the same ABI, so the `call` can be
skipped.
Overall this removes unnecessary saves/restores in trampolines generated
by Cranelift and shrinks the size of spidermonkey.wasm by 2% after #6262.
* Use a subset check instead
Move the setup/teardown of contexts for each function's compile into a
new `FunctionCompiler` structure which handles more internally between
trampolines and normal wasm functions.
We only have caller-checked function references, and it is unlikely we will have
any other kind for quite a long time.
Also remove all old `anyfunc`s and replace them with some variation of
`func_ref`.
Also consolidate style to use `func_ref` instead of `funcref` except when that
would be a breaking change to the public API or a comment is using the `funcref`
shorthand from WAT.
This trims down the `[exemptions]` list ever-so-slightly by following
the suggestions of `cargo vet suggest` and updating a few crates across
some minor versions.
Since the latest updates to our release process which transitioned to
merge queues it appears that patch release create incorrectly named
tarballs. The version in the tarball is based on the branch name, which
doesn't change for patch releases, so the version needs to come from
`Cargo.toml`. Thankfully there's already a helpful shell script to do
that so use the shell script instead of using the branch name.
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.
However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
prtest:full