in the scheduler code, it is only commented out. we need to really
redesign that whole deal if we are going to support pollables in other
crates. that is too ambitious for this first PR.
The previous timeout was 20 seconds which while it won't ever time out
on OSS-Fuzz a wasm module is highly unlikely to do anything interesting
past the first bit as if it takes longer it's probably an uninteresting
infinite loop. Additionally this should improve loading a whole corpus
as test cases won't randomly take 20 seconds to load.
* winch: Handle relocations and traps
This change introduces handling of traps and relocations in Winch, which
was left out in https://github.com/bytecodealliance/wasmtime/pull/6119.
In order to so, this change moves the `CompiledFunction` struct to the
`wasmtime-cranelift-shared` crate, allowing Cranelift and Winch to
operate on a single, shared representation, following some of the ideas
discussed in https://github.com/bytecodealliance/wasmtime/pull/5944.
Even though Winch doesn't rely on all the fields of `CompiledFunction`,
it eventually will. With the addition of relocations and traps it
started to be more evident that even if we wanted to have different
representations of a compiled function, they would end up being very
similar.
This change also consolidates where the `traps` and `relocations` of the
`CompiledFunction` get created, by introducing a constructor that
operates on a `MachBufferFinalized<Final>`, esentially encapsulating
this process in a single place for both Winch and Cranelift.
* Rework the shared `CompiledFunction`
This commit reworks the shared `CompiledFunction` struct. The compiled
function now contains the essential pieces to derive all the information
to create the final object file and to derive the debug information for
the function.
This commit also decouples the dwarf emission process by introducing
a `metadata` field in the `CompiledFunction` struct, which is used as
the central structure for dwarf emission.
* Improve longevity for fuzzing corpus of wasm modules
This commit is an improvement to the longevity of Wasmtime's corpus of
fuzz inputs to the `instantiate` fuzzer. Currently the input to this
fuzzers is arbitrary binary data which is a "DNA" of sorts of what to
do. This DNA changes over time as we update the fuzzer and add
configuration options, for example. When this happens though the
meaning of all existing inputs in the corpus changes because they all
have slightly different meanings now. The goal of this commit is to
improve the usefulness of a historical corpus, with respect to the
WebAssembly modules generated, across changes to the DNA.
A custom mutator is now provided for the `instantiate` fuzzer. This
mutator will not only perform libfuzzer's default mutation for the input
but will additionally place an "envelope" around the fuzz input. Namely,
the fuzz input is encoded as a valid WebAssembly module where the actual
input to the fuzzer is a trailing custom section. When the fuzzer runs
over this input it will read the custom section, perform any
configuration generation necessary, and then use the envelope module as
the actual input to the fuzzer instead of whatever was generated from
the fuzz input. This means that when a future update is made to the DNA
of a module the interpretation of the fuzz input section will change but
the module in question will not change. This means that any interesting
shapes of modules with respect to instructions should be preserved over
time in theory.
Some consequences of this strategy, however, are:
* If the DNA changes then it's difficult to produce minor mutations of
the original module. This is because mutations generate a module based
on the new DNA which is likely much different than the preexisting
module. This mainly just means that libFuzzer will have to rediscover
how to mutate up into interesting shapes on DNA changes but it'll
still be able to retain all the existing interesting modules.
Additionally this can be mitigate with the integration of
`wasm-mutate` perhaps into these fuzzers as well.
* Protection is necessary against libFuzzer itself with respect to the
module. The existing fuzzers only expect valid modules to be created,
but libFuzzer can now create mutations which leave the trailing
section in place, meaning the module is no longer valid. One option is
to record a cryptographic hash in the fuzz input section of the
previous module, only using the module if the hashes match. This
approach will not work over time in the face of binary format changes,
however. For example the multi-memory proposal changed binary
encodings a year or so ago meaning that any previous fuzz-generated
cases would no longer be guaranteed to be valid. The strategy settled
by this PR is to pass a flag to the execution function indicating if
the module is "known valid" and gracefully handle error if it isn't
(for example if it's a prior test case).
I'll note that this new strategy of fuzzing is not applied to the
`differential` fuzzer. This could theoretically use the same strategy
but it relies much more strictly on being able to produce a module with
properties like NaN canonicalization, resource limits, fuel to limit
execution, etc. While it may be possible to integrate this with
`differential` in the future I figured it'd be better to start with the
`instantiate` fuzzer and go from there.
* Fix doc build
* wasmtime: In-process sampling profiler
Unlike the existing profiling options, this works on all platforms and
does not rely on any external profiling tools like perf or VTune. On the
other hand, it can only profile time spent in the WebAssembly guest, not
in Wasmtime itself or other host code. Also it can't measure time as
precisely as platform-native tools can.
The profile is saved in the Firefox processed format, which can be
viewed using https://profiler.firefox.com/.
* Ensure func_offset is populated
* Refactor
* Review comments
* Move GuestProfiler to the wasmtime crate
* Document the new GuestProfiler API
* Add TODO comments for future work
* Use module_offset, not func_offset, as fallback PC
* Minimize work done during `sample()`
Use fxprof_processed_profile's support for looking up symbols to avoid
looking up the same PC more than once per profile.
* Keep profiler state in the store
Also extend the documentation based on review comments.
* Import debugid audit from Mozilla again
* Split out platform-specific logic for `Mmap`
This commit refactors the implementation of the `wasmtime_runtime::Mmap`
structure to have the platform-specific bits separated by file rather
than interspersed throughout `mmap.rs`. I plan in the near future to add
a faux implementation for `cfg(miri)` to get some tests running with
miri on CI.
At the same time this additionally updates the interface of `Mmap` to be
more miri-friendly in the sense of ensuring that mutability is all in
the right place and we don't eagerly mark items as safe too soon. For
example it seems questionable that previously you could get a mutable
slice to readonly memory. Probably not going to cause any issues, but
this interface should hopefully be more verification-friendly.
* Fix tests on Windows
* x64: Add non-SSE 4.1 lowerings of min/max instructions
This commit updates the x64 backend to avoid using various `p{min,max}*`
instructions if SSE 4.1 isn't enabled. These instructions are used for
comparisons as well as the `{u,s}{min,max}` instructions. Alternative
lowerings are primarily drawn from LLVM.
Through this refactoring the x64 backend now has also grown (not the
most efficient) lowerings for vector comparisons with `i64x2` types,
which it previously largely didn't have. This enabled copying some
non-x86_64 tests into the main test files for various operations.
* Review comments
This fixes an issue in the AArch64 backend where a `load_addr` helper
was used exclusively for lowering `splat`-of-a-loaded-address. This
helper expanded in some cases to a pseudo-`LoadAddr` instruction but the
lowering of this instruction doesn't actually exhaustively handle all
`AMode` values.
The fix in this commit is to remove the `load_addr` helper altogether to
remove the need to go from an `AMode` back to a `Reg`, instead going
directly from an address to a register. The one small wrinkle is a small
helper now to add the immediate offset to the address register, but
that's not too too bad to write.
By avoiding the `LoadAddr` instruction the unimplemented cases aren't
hit, so the codegen issue should be fixed.
This commit updates the test case generation for the `component_api`
fuzzer to prepare for an update to the `arbitrary` crate. The current
algorithm, with the latest `arbitrary` crate, generates a 20MB source
file which is a bit egregious. The goal here was to get that under
control by altering the parameters of test case generation and
additionally changing the structure of what's generated.
The new strategy is to have a limited set of "type fuel" which is
consumed as a type is generated. This bounds the maximal size of a type
in addition to its depth as prior. Additionally a fixed set of types are
generated first and then test cases select from these types as opposed
to test cases always generating types for themselves. Coupled together
this brings the size of the generated file back into the 200K range as
it was before.
* Slightly shrink compiled wasm modules
This commit shuffles trampolines to the end of a compiled ELF file
instead of interspersed throughout. Additionally trampolines are no
longer given a higher alignment requirement than is required by the ISA
as is given to functions since they're not perf critical.
The savings here are quite minor, only 0.3% locally on
spidermonkey.wasm.
* Fix winch compile
* Return a more descriptive `FunctionAlignment` from `TargetIsa`
* Push alignment further into Cranelift
Remove the need for taking a function's alignment and an ISA's alignment
and combining them, instead only using the function's alignment as the
source of truth.
* Review comments
* aarch64: Remove unnecessary saves on calls with differing ABIs
The AArch64 AAPCS calling convention, which all our backends use for
register allocation, indicates that the low 64-bits of the v8-v15
registers are all caller-saved. Cranelift doesn't track precisely which
registers are used, however, so it says the entire register is a
caller-saved register. This currently interacts poorly where a call from
one function to another where the ABIs differ forces the caller to save
all of v8-v15 in the prologue and restore it in the epilogue.
Currently in all cases, however, this isn't actually necessary. The
AArch64 backend also has an optimization where if both the caller and
the callee are using the same ABI then the clobbers of a `call`
instruction are not counted in the clobber set for the function since
nothing new can be clobbered. This way if `v8` is never used, for
example, it's not considered clobbered and it's not saved. This logic,
however, is comparing ABIs exactly which means that different names for
the same ABI, which don't differ in register allocation, force saves to
happen.
This comes up with trampolines generated by Wasmtime where the calling
convention of the trampoline is `WasmtimeSystemV`, for example, where
the callee (a wasm function) is `Fast`. Because these differ it means
that all trampolines are generating saves/restores of registers, despite
the actual underlying calling convention being the same.
This commit updates the optimization that skips including a `call`
instruction in the clobber set by comparing the caller and callee's ABI
clobber sets. If both clobber the same registers then for the purposes
of clobbers it's as-if they were the same ABI, so the `call` can be
skipped.
Overall this removes unnecessary saves/restores in trampolines generated
by Cranelift and shrinks the size of spidermonkey.wasm by 2% after #6262.
* Use a subset check instead
Move the setup/teardown of contexts for each function's compile into a
new `FunctionCompiler` structure which handles more internally between
trampolines and normal wasm functions.
* redesign wasi ctx and builder
* in ctx, resources are only ever owned by the table
* preopen dirs and stdio are just indexes into the table that we hand to the caller
* builder requires less boxing, in general, takes more impl traits
* tests: rewrite the way we generate tests to not construct store
* uncomment WasiCtx's pool methods
We only have caller-checked function references, and it is unlikely we will have
any other kind for quite a long time.
Also remove all old `anyfunc`s and replace them with some variation of
`func_ref`.
Also consolidate style to use `func_ref` instead of `funcref` except when that
would be a breaking change to the public API or a comment is using the `funcref`
shorthand from WAT.
* adapter: special case for NOTDIR instead of BADF fixes path_open_dirfd test.
* get_dir now fails with NOTDIR for non-dir files
* get rid of special case
* complete test coverage for all path_ functions giving NOTDIR errors
* adapter: special case for NOTDIR instead of BADF fixes path_open_dirfd test.
* get_dir now fails with NOTDIR for non-dir files
* get rid of special case
* complete test coverage for all path_ functions giving NOTDIR errors
This trims down the `[exemptions]` list ever-so-slightly by following
the suggestions of `cargo vet suggest` and updating a few crates across
some minor versions.
Since the latest updates to our release process which transitioned to
merge queues it appears that patch release create incorrectly named
tarballs. The version in the tarball is based on the branch name, which
doesn't change for patch releases, so the version needs to come from
`Cargo.toml`. Thankfully there's already a helpful shell script to do
that so use the shell script instead of using the branch name.
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.
However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
prtest:full
* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix dates and fill out more notes
* tests: remove all use of rights for anything besides path_open read | write
* wasi-common and friends: delete all Caps from FileEntry and DirEntry
the sole thing rights are used to determine is whether a path_open
is opening for reading and writing.
* x64: Add non-SSE4.1 lowerings of `pmov{s,z}x*`
This commit adds lowerings for a suite of sign/zero extension
instructions which don't require SSE4.1. Like before these lowerings are
based on LLVM's output.
This commit also deletes special casees for `i16x8.extmul_{low,high}_*`
since the output of the special case is the same as the default lowering
of all the component instructions used within as well.
* Remove SSE4.1 specialization of `uwiden_high`
LLVM prefers the `punpckh*`-based lowerings and at least according to
`llvm-mca` these are slightly better cycle-wise too.
Previously an `event` filter was applied to lookup the merge queue's
github run ID but this filter doesn't work after #6288. The filter isn't
strictly necessary, though, so remove it.
Use this to prime caches used by PRs to `main` and additionally the
merge queue used to merge into `main`.
While I'm here additionally update the trigger for merge-queue-based PRs
to use `merge_group:` now that it's been fixed.
Closes#6285
It seems that this fell through given that the incremental cache is
behind a cargo feature. I noticed this while building
`cranelift-codegen` via `cargo build --all-features`.
I decided to add a check in CI to hopefully prevent this in the future,
but I'm happy to remove it / update it if there's a better way or another way.
Several of these badges were out of date, with some crates in wide production
use marked as "experimental". Insted of trying to keep them up to date, just
remove them, since they are [no longer displayed on crates.io].
[no longer displayed on crates.io]: https://doc.rust-lang.org/cargo/reference/manifest.html#the-badges-section