This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.
However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
prtest:full
* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix dates and fill out more notes
* x64: Remove incorrect `amode_add` lowering rules
This commit removes two incorrect rules as part of the x64 backend's
computation of addressing modes. These two rules folded a zero-extended
32-bit computation into the address mode operand, but this isn't correct
as the 32-bit computation should be truncated to 32-bits but when folded
into the address mode computation it happens with 64-bit operands,
meaning truncation doesn't happen.
* Add release notes
* Add release notes for 3.0.1
* Update some version directives for crates in Wasmtime
* Mark anything with `publish = false` as version 0.0.0
* Mark the icache coherence crate with the same version as Wasmtime
* Fix manifest directives
* Update release date of Wasmtime 3.0.0
* Update release date for 3.0.0
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Update 0.40.0 release notes
Not a ton happened in terms of user-facing improvements here so I
outlined some internal changes as well. The cumulative effect of
improving compile times is Sightglass showing 30-40% improvements for
major benchmarks. Additionally I wrote down a note indicating that this
is likely the last `0.*` release and the next release of Wasmtime on
September 20 is planned to be 1.0.
* Remove perf-related relnotes
* Call out s390x simd at the top-level
* Upgrade to regalloc2 v0.2.3 to get bugfix from bytecodealliance/regalloc2#60.
* Update RELEASES.md.
* Update two compile tests based on slightly shifting regalloc output.
* Decouple some more `Config` methods from each other
This commit decouples validation of stack sizes and guard sizes until
`Engine::new` to avoid odd interactions between the order of invocation
of `Config` methods.
* Fix C API
* Typos
This commit refactored `Config` to use a seperate `CompilerConfig` field instead
of operating on `CompilerBuilder` directly to make all its methods idempotent.
Fixes#4189