Since the latest updates to our release process which transitioned to
merge queues it appears that patch release create incorrectly named
tarballs. The version in the tarball is based on the branch name, which
doesn't change for patch releases, so the version needs to come from
`Cargo.toml`. Thankfully there's already a helpful shell script to do
that so use the shell script instead of using the branch name.
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.
However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
prtest:full
* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix dates and fill out more notes
* tests: remove all use of rights for anything besides path_open read | write
* wasi-common and friends: delete all Caps from FileEntry and DirEntry
the sole thing rights are used to determine is whether a path_open
is opening for reading and writing.
* x64: Add non-SSE4.1 lowerings of `pmov{s,z}x*`
This commit adds lowerings for a suite of sign/zero extension
instructions which don't require SSE4.1. Like before these lowerings are
based on LLVM's output.
This commit also deletes special casees for `i16x8.extmul_{low,high}_*`
since the output of the special case is the same as the default lowering
of all the component instructions used within as well.
* Remove SSE4.1 specialization of `uwiden_high`
LLVM prefers the `punpckh*`-based lowerings and at least according to
`llvm-mca` these are slightly better cycle-wise too.
Previously an `event` filter was applied to lookup the merge queue's
github run ID but this filter doesn't work after #6288. The filter isn't
strictly necessary, though, so remove it.
Use this to prime caches used by PRs to `main` and additionally the
merge queue used to merge into `main`.
While I'm here additionally update the trigger for merge-queue-based PRs
to use `merge_group:` now that it's been fixed.
Closes#6285
It seems that this fell through given that the incremental cache is
behind a cargo feature. I noticed this while building
`cranelift-codegen` via `cargo build --all-features`.
I decided to add a check in CI to hopefully prevent this in the future,
but I'm happy to remove it / update it if there's a better way or another way.
Several of these badges were out of date, with some crates in wide production
use marked as "experimental". Insted of trying to keep them up to date, just
remove them, since they are [no longer displayed on crates.io].
[no longer displayed on crates.io]: https://doc.rust-lang.org/cargo/reference/manifest.html#the-badges-section
* riscv64: Swap order of `VecAluRRR` source registers
These were accidentally reversed from what we declare in the isle emit helper
* riscv64: Add SIMD `isub`
* riscv64: Add SIMD `imul`
* riscv64: Add `{u,s}mulhi`
* riscv64: Add `b{and,or,xor}`
* cranelift: Move `imul.i8x16` runtest to separate file
Looks like x86 does not implement it
* riscv64: Better formatting for `VecAluOpRRR`
* cranelift: Enable x86 SIMD tests with `has_sse41=false`
The `Config` needs to be mutable while building a compiler, but in a
build configuration without a compiler, declaring it as `mut` produces a
warning since nothing else needs that.
I found the existing workaround for this warning confusing, so this PR
removes `mut` from the binding for `config` and instead re-binds the
variable in builds where we call `build_compiler`.
* winch(fuzz): Initial support for differential fuzzing
This commit introduces initial support for differential fuzzing for Winch. In
order to fuzz winch, this change introduces the `winch` cargo feature. When the
`winch` cargo feature is enabled the differential fuzz target uses `wasmi` as
the differential engine and `wasm-smith` and `single-inst` as the module sources.
The intention behind this change is to have a *local* approach for fuzzing and
verifying programs generated by Winch and to have an initial implementation that
will allow us to eventually enable this change by default. Currently it's not
worth it to enable this change by default given all the filtering that needs to
happen to ensure that the generated modules are supported by Winch.
It's worth noting that the Wasm filtering code will be temporary, until Winch
reaches feature parity in terms of Wasm operators.
* Check build targets with the `winch` feature flag
* Rename fuzz target feature to `fuzz-winch`
This crate re-exports the `Backtrace` type at top-level from a nested
module. `Backtrace` in turn has `Frame` in its public API, which is not
re-exported anywhere. This is legal and external users can call methods
on `Frame`, but it doesn't appear in the rustdocs, making it
unnecessarily difficult to figure out how to use this API. Re-exporting
`Frame` fixes that, and also allows naming the type directly if needed.
OSS-Fuzz found a case where the `differential` fuzzer was failing and
the underlying cause was that Wasmtime was hitting an OOM while Wasmi
wasn't. This meant that the two modules were producing "different
results" since memories had differing lengths, but this isn't a failure
we're interested in. This commit updates the differential fuzzer to
discard the test case once the Wasmtime half reaches OOM.
* Make streams owned by request/response that they are tied to.
* Address comments, fix tests.
* Address comment.
* Update crates/wasi-http/src/streams_impl.rs
Co-authored-by: Pat Hickey <pat@moreproductive.org>
* Switch to BytesMut
---------
Co-authored-by: Pat Hickey <pat@moreproductive.org>
* Allow WASI to open directories without O_DIRECTORY
The `O_DIRECTORY` flag is a request that open should fail if the named
path is not a directory. Opening a path which turns out to be a
directory is not supposed to fail if this flag is not specified.
However, wasi-common required callers to use it when opening
directories.
With this PR, we always open the path the same way whether or not the
`O_DIRECTORY` flag is specified. However, after opening it, we `stat` it
to check whether it turned out to be a directory, and determine which
operations the file descriptor should support accordingly. In addition,
we explicitly check whether the precondition defined by `O_DIRECTORY` is
satisfied.
Closes#4947 and closes#4967, which were earlier attempts at fixing the
same issue, but which had race conditions.
prtest:full
* Add tests from #4967/#4947
This test was authored by Roman Volosatovs <rvolosatovs@riseup.net> as
part of #4947.
* Tests: Close FDs before trying to unlink files
On Windows, when opening a path which might be a directory using
`CreateFile`, cap-primitives also removes the `FILE_SHARE_DELETE` mode.
That means that if we implement WASI's `path_open` such that it always
uses `CreateFile` on Windows, for both files and directories, then
holding an open file handle prevents deletion of that file.
So I'm changing these test programs to make sure they've closed the
handle before trying to delete the file.
* Remove ModuleCompiledFunction
The same information can be retrieved using
ctx.compiled_code().unwrap().code_info().total_size
In addition for Module implementations that don't immediately compile the
given function there is no correct value that can be returned.
* Don't give anonymous functions and data objects an internal name
This internal name can conflict if a module is serialized and then
deserialized into another module. It also wasn't used by any of the
Module implementations anyway.
* Allow serializing all cranelift-module data structures
This allows a Module implementation to serialize it's internal state and
deserialize it in another compilation session. For example to implement
LTO or to load the module into cranelift-interpreter.
* Use expect
`poll_oneoff` uses `system_interface::ReadReady` to compute how many
bytes are ready to be read, which is part of the Preview1 `poll_oneoff`
API. This updates to system-interface 0.25.7 which has a fix to handle
special files such as /dev/urandom and /dev/null properly.
Fixes#6239.
This commit adds lowerings to the x64 backend for two more CLIF
instructions that currently require SSE 4.1. These lowerings are
inspired by LLVM's lowerings and avoid the use of SSE 4.1 instructions.
* riscv64: Remove unused code
* riscv64: Add vector types
* riscv64: Initial Vector ABI Load/Stores
* riscv64: Vector Loads/Stores
* riscv64: Fix `vsetvli` encoding error
* riscv64: Add SIMD `iadd` runtests
* riscv64: Rename `VecSew`
The SEW name is correct, but only for VType. We also use this type
in loads/stores as the Efective Element Width, so the name isn't
quite correct in that case.
* ci: Add V extension to RISC-V QEMU
* riscv64: Misc Cleanups
* riscv64: Check V extension in `load`/`store` for SIMD
* riscv64: Fix `sumop` doc comment
* cranelift: Fix comment typo
* riscv64: Add convert for VType and VecElementWidth
* riscv64: Remove VecElementWidth converter
This updates to rustix 0.37.13, which contains some features we can use to
implement more features in wasi-common for the wasi-sockets API. This also
pulls in several other updates to avoid having multiple versions of rustix.
This does introduce multiple versions of windows-sys, as the errno and tokio
crates are currently using 0.45 while rustix and other dependencies have
updated to 0.48; PRs updating these are already in flight so this will
hopefully be resolved soon.
It also includes cap-std 1.0.14, which disables the use of `openat2` and
`statx` on Android, fixing a bug where some Android devices crash the
process when those syscalls are executed.
* Fix default architecture for winch
This updates the `winch/codegen/build.rs` script to default to the
target architecture being compiled for as opposed to the host
architecture that's performing the compile.
Closes#6241
* Auto-enable other future architectures
This commit marks the loads of `*mut VMContext` and the callee function
pointer as `readonly` in the context of indirect function calls and
additionally calls to imported functions (which are indirect). Once a
`VMCallerCheckedAnyfunc` is initialized it's never modified so it should
be valid to mark these as readonly and if called in a loop should be
hoistable outside of the loop.
This test was not meaningfully executing, because wasi-common never
provides rights containing RIGHTS_PATH_FILESTAT_SET_SIZE - this flag is
not even defined in wasi-common/srd/dir.rs as one of the DirCaps flags.
Even when you get rid of that guard that skips the meat of the test,
path_open was being called with OFLAGS_TRUNC but without
RIGHTS_FD_WRITE, which boils down to an `open(2)` with OFLAGS_TRUNC set
and none of the access modes set, so it will always fail with EINVAL.
So, it doesn't look like this test ever would have meaningfully passed,
even in pre-wiggle-rewrite version of wasi-common it landed in back in
late 2019. Late 2019! before the pandemic! our eyes were so full of
stars and dreams of the future!
The behavior we really care about for truncation are taken care of
by the fd_filestat_set test, which shows fd_filestat_set_size works
correctly, and the file_truncation test, which shows that opening
with OFLAGS_TRUNC will truncate the file.
* Add support for binary/octal literals to ISLE
In a number of x64-changes recently some u8 immediates are interpreted
as four bit-packed 2-bit numbers and I have a tough time going between
hex and these bit-packed numbers. I've been writing `0xAA == 0b...` in
comments to indicate the intent but I figured it'd be a bit clearer if
the binary literal was accepted directly!
This is a minor update to the ISLE lexer to allow for binary `0b00...`
and octal `0o00...` literals in the same manner as hex literals. Some
comments in the x64 backend are then removed to use the binary literal
syntax directly.
* Update ISLE reference for octal/binary
* Update ISLE tests for octal/binary
This method returns a Hash, the output of which can be used to index
precompiled binaries from one Engine instance that can be deserialized
by another Engine instance.