When we compute the amount of space that we need in a stack frame for
the stack limit check, we were only counting spill-slots and explicit
stack-slots. However, we need to account for all uses of the stack which
occur before the next stack limit check. That includes clobbers and any
stack arguments we want to pass to callees.
The maximum amount that we could have missed by is essentially bounded
by the number of arguments which could be passed to a function. In
Wasmtime, that is limited by `MAX_WASM_FUNCTION_PARAMS` in
`wasmparser::limits`, which is set to 1,000, and the largest arguments
are 16-byte vectors, so this could undercount by about 16kB.
This is not a security issue according to Wasmtime's security policy
(https://docs.wasmtime.dev/security-what-is-considered-a-security-vulnerability.html)
because it's the embedder's responsibility to ensure that the stack
where Wasmtime is running has enough extra space on top of the
configured `max_wasm_stack` size, and getting within 16kB of the host
stack size is too small to be safe even with this fixed.
However, this was definitely not the intended behavior when stack limit
checks or stack probes are enabled, and anyone with non-default
configurations or non-Wasmtime uses of Cranelift should evaluate whether
this bug impacts your use case.
(For reference: When Wasmtime is used in async mode or on Linux, the
default stack size is 1.5MB larger than the default WebAssembly stack
limit, so such configurations are typically safe regardless. On the
other hand, on macOS the default non-async stack size for threads other
than the main thread is the same size as the default for
`max_wasm_stack`, so that is too small with or without this bug fix.)
When this variant is used within a specific target backend, we know
exactly which address-mode to generate, so using the target independent
`StackAMode` doesn't buy us anything.
This PR ensures that `StackAMode` is only constructed by target
independent code in machinst::abi, so that it's easier to figure out how
each of the variants are used.
Since #6850, we've been able to rely on `iconst` instructions having
their immediate operands' high bits zeroed before lowering.
So a couple of places in `x64/lower.rs` can be expressed more simply now
as a result.
Out of an abundance of caution, I added a debug-assertion when constants
are looked up during lowering, to check that earlier phases really did
ensure the high bits are zero.
I also got rid of an `expect` where a simple pattern-match will do.
* winch: Add support for address maps
Closes https://github.com/bytecodealliance/wasmtime/issues/8095
This commit adds support for generating address maps for Winch.
Give that source code locations and machine code offsets are machine independent, one objective of this change is introduce minimal methods to the MacroAssesmbler and Asssembler implementations and tries to accomodate the bulk of the work in the ISA independent `CodeGen` module.
* Update method documentation to match implementation
This commit fixes an issue with errors in the `wasmtime-wasi-http` crate
by using the `trappable_error_type` bindgen configuration option in the
same manner as other WASI interfaces in the `wasmtime-wasi` crate.
Unfortunately due to coherence the `TrappableError<T>` type itself could
not be used but it was small enough it wasn't much effort to duplicate.
Closes#8269
\### The `GcRuntime` and `GcCompiler` Traits
This commit factors out the details of the garbage collector away from the rest
of the runtime and the compiler. It does this by introducing two new traits,
very similar to a subset of [those proposed in the Wasm GC RFC], although not
all equivalent functionality has been added yet because Wasmtime doesn't
support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run
collections within them, and execute the various GC barriers the collector
requires.
Rather than monomorphize all of Wasmtime on this trait, we use it
as a dynamic trait object. This does imply some virtual call overhead and
missing some inlining (and resulting post-inlining) optimization
opportunities. However, it is *much* less disruptive to the existing embedder
API, results in a cleaner embedder API anyways, and we don't believe that VM
runtime/embedder code is on the hot path for working with the GC at this time
anyways (that would be the actual Wasm code, which has inlined GC barriers
and direct calls and all of that). In the future, once we have optimized
enough of the GC that such code is ever hot, we have options we can
investigate at that time to avoid these dynamic virtual calls, like only
enabling one single collector at build time and then creating a static type
alias like `type TheOneGcImpl = ...;` based on the compile time
configuration, and using this type alias in the runtime rather than a dynamic
trait object.
The `GcRuntime` trait additionally defines a method to reset a GC heap, for
use by the pooling allocator. This allows reuse of GC heaps across different
stores. This integration is very rudimentary at the moment, and is missing
all kinds of configuration knobs that we should have before deploying Wasm GC
in production. This commit is large enough as it is already! Ideally, in the
future, I'd like to make it so that GC heaps receive their memory region,
rather than allocate/reserve it themselves, and let each slot in the pooling
allocator's memory pool be *either* a linear memory or a GC heap. This would
unask various capacity planning questions such as "what percent of memory
capacity should we dedicate to linear memories vs GC heaps?". It also seems
like basically all the same configuration knobs we have for linear memories
apply equally to GC heaps (see also the "Indexed Heaps" section below).
2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements
GC barriers for various operations on GC-managed references. The Rust code
calls into this trait dynamically via a trait object, but since it is
customizing the CLIF that is generated for Wasm code, the Wasm code itself is
not making dynamic, indirect calls for GC barriers. The `GcCompiler`
implementation can inline the parts of GC barrier that it believes should be
inline, and leave out-of-line calls to rare slow paths.
All that said, there is still only a single implementation of each of these
traits: the existing deferred reference-counting (DRC) collector. So there is a
bunch of code motion in this commit as the DRC collector was further isolated
from the rest of the runtime and moved to its own submodule. That said, this was
not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply
skipping over the DRC collector's code in review.
\### Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all
collector implementations, such as a shared `VMGcHeader` that all objects
allocated within a GC heap must begin with, but the most notable and
far-reaching of these assumptions is that all collectors will use "indexed
heaps".
What we are calling indexed heaps are basically the three following invariants:
1. All GC heaps will be a single contiguous region of memory, and all GC objects
will be allocated within this region of memory. The collector may ask the
system allocator for additional memory, e.g. to maintain its free lists, but
GC objects themselves will never be allocated via `malloc`.
2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into
the GC heap's contiguous region of memory. We never hold raw pointers to GC
objects (although, of course, we have to compute them and use them
temporarily when actually accessing objects). This means that deref'ing GC
pointers is equivalent to deref'ing linear memory pointers: we need to add a
base and we also check that the GC pointer/index is within the bounds of the
GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly
common technique among high-performance GC
implementations[^compressed-oops][^v8-ptr-compression] so we are in good
company.
3. Anything stored inside the GC heap is untrusted. Even each GC reference that
is an element of an `(array (ref any))` is untrusted, and bounds checked on
access. This means that, for example, we do not store the raw pointer to an
`externref`'s host object inside the GC heap. Instead an `externref` now
stores an ID that can be used to index into a side table in the store that
holds the actual `Box<dyn Any>` host object, and accessing that side table is
always checked.
[^compressed-oops]: See ["Compressed OOPs" in
OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops)
[^v8-ptr-compression]: See [V8's pointer
compression](https://v8.dev/blog/pointer-compression).
The good news with regards to all the bounds checking that this scheme implies
is that we can use all the same virtual memory tricks that linear memories use
to omit explicit bounds checks. Additionally, (2) means that the sizes of GC
objects is that much smaller (and therefore that much more cache friendly)
because they are only holding onto 32-bit, rather than 64-bit, references to
other GC objects. (We can, in the future, support GC heaps up to 16GiB in size
without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment
and storing aligned indices rather than byte indices, while still leaving the
bottom bit available for tagging as an `i31ref` discriminant. Should we ever
need to support even larger GC heap capacities, we could go to full 64-bit
references, but we would need explicit bounds checks.)
The biggest benefit of indexed heaps is that, because we are (explicitly or
implicitly) bounds checking GC heap accesses, and because we are not otherwise
trusting any data from inside the GC heap, we greatly reduce how badly things
can go wrong in the face of collector bugs and GC heap corruption. We are
essentially sandboxing the GC heap region, the same way that linear memory is a
sandbox. GC bugs could lead to the guest program accessing the wrong GC object,
or getting garbage data from within the GC heap. But only garbage data from
within the GC heap, never outside it. The worse that could happen would be if we
decided not to zero out GC heaps between reuse across stores (which is a valid
trade off to make, since zeroing a GC heap is a defense-in-depth technique
similar to zeroing a Wasm stack and not semantically visible in the absence of
GC bugs) and then a GC bug would allow the current Wasm guest to read old GC
data from the old Wasm guest that previously used this GC heap. But again, it
could never access host data.
Taken altogether, this allows for collector implementations that are nearly free
from `unsafe` code, and unsafety can otherwise be targeted and limited in scope,
such as interactions with JIT code. Most importantly, we do not have to maintain
critical invariants across the whole system -- invariants which can't be nicely
encapsulated or abstracted -- to preserve memory safety. Such holistic
invariants that refuse encapsulation are otherwise generally a huge safety
problem with GC implementations.
\### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore
`VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here
was to be sure that I was actually calling GC barriers at all the correct
places. I couldn't be sure before. Now, you can still explicitly copy a raw GC
reference without running GC barriers if you need to and understand why that's
okay (aka you are implementing the collector), but that is something you have to
opt into explicitly by calling `unchecked_copy`. The default now is that you
can't just copy the reference, and instead call an explicit `clone` method (not
*the* `Clone` trait, because we need to pass in the GC heap context to run the
GC barriers) and it is hard to forget to do that accidentally. This resulted in
a pretty big amount of churn, but I am wayyyyyy more confident that the correct
GC barriers are called at the correct times now than I was before.
\### `i31ref`
I started this commit by trying to add `i31ref` support. And it grew into the
whole traits interface because I found that I needed to abstract GC barriers
into helpers anyways to avoid running them for `i31ref`s, so I figured that I
might as well add the whole traits interface. In comparison, `i31ref` support is
much easier and smaller than that other part! But it was also difficult to pull
apart from this commit, sorry about that!
---------------------
Overall, I know this is a very large commit. I am super happy to have some
synchronous meetings to walk through this all, give an overview of the
architecture, answer questions directly, etc... to make review easier!
prtest:full
The `gen_spill` and `gen_reload` methods on `Callee` are used to emit
appropriate moves between registers and the stack, as directed by the
register allocator.
These moves always apply to a single register at a time, even if that
register was originally part of a group of registers. For example, when
an I128 is represented using two 64-bit registers, either of those
registers may be spilled independently.
As a result, the `load_spillslot`/`store_spillslot` helpers were more
general than necessary, which in turn required extra complexity in the
`gen_load_stack_multi`/`gen_store_stack_multi` helpers. None of these
helpers were used in any other context, so all that complexity was
unnecessary.
Inlining all four helpers and then simplifying eliminates a lot of code
without changing the output of the compiler.
These helpers were also the only uses of `StackAMode::offset`, so I've
deleted that. While I was there, I also deleted `StackAMode::get_type`,
which was introduced in #8151 and became unused again in #8246.
* cranelift: Minimize ways to manipulate instruction results
In particular, remove support for detaching/attaching/appending
instruction results.
The AliasAnalysis pass used detach_results, but leaked the detached
ValueList; using clear_results instead is better.
The verifier's `test_printing_contextual_errors` needed to get the
verifier to produce an error containing a pretty-printed instruction,
and did so by appending too many results. Instead, failing to append any
results gets a similar error out of the verifier, without requiring that
we expose the easy-to-misuse append_result method. However, `iconst` is
not a suitable instruction for this version of the test because its
result type is its controlling type, so failing to create any results
caused assertion failures rather than the desired verifier error. I
switched to `f64const` which has a non-polymorphic type.
The DFG's `aliases` test cleared both results of an instruction and then
reattached one of them. Since we have access to DFG internals in these
tests, it's easier to directly manipulate the relevant ValueList than to
use these unsafe methods.
The only other use of attach/append was in `make_inst_results_reusing`
which decided which to use based on whether a particular result was
supposed to reuse an existing value. Inlining both methods there
revealed that they were nearly identical and could have most of their
code factored out.
While I was looking at uses of `DataFlowGraph::results`, I also
simplified replace_with_aliases a little bit.
* Review comments
* cranelift: Specialize StackAMode::FPOffset
The StackAMode::FPOffset address mode was always used together with
fp_to_arg_offset, to compute addresses within the current stack frame's
argument area.
Instead, introduce a new StackAMode::ArgOffset variant specifically for
stack addresses within the current frame's argument area. The details of
how to find the argument area are folded into the conversion from the
target-independent StackAMode into target-dependent address-mode types.
Currently, fp_to_arg_offset returns a target-specific constant, so I've
preserved that constant in each backend's address-mode conversion.
However, in general the location of the argument area may depend on
calling convention, flags, or other concerns. Also, it may not always be
desirable to use a frame pointer register as the base to find the
argument area. I expect some backends will eventually need to introduce
new synthetic addressing modes to resolve argument-area offsets after
register allocation, when the full frame layout is known.
I also cleaned up a couple minor things while I was in the area:
- Determining argument extension type was written in a confusing way and
also had a typo in the comment describing it.
- riscv64's AMode::offset was only used in one place and is clearer
when inlined.
* Review comments
@bjorn3 correctly pointed out that I had changed the overflow behavior
of this address computation.
The existing code always added the result of `fp_to_arg_offset` using
`i64` addition. It used Rust's default overflow behavior for addition,
which panics in debug builds and wraps in release builds.
In this commit I'm preserving that behavior:
- s390x doesn't add anything, so can't overflow.
- aarch64 and riscv64 use `i64` offsets in `FPOffset` address modes, so
the addition is still using `i64` addition.
- x64 does a checked narrowing to `i32`, so it's important to do the
addition before that, on the wider `i64` offset.
* Move bind into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move start_connect into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move finish_connect into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move *_listen into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move accept into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move address methods into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move various option methods into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move shutdown methods into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move finish bind methods into Tcp type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Change connect's return type
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move shutdown over to io::Result
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Rearrange some code
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
* Move bind to io Error
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
---------
Signed-off-by: Ryan Levick <ryan.levick@fermyon.com>
Along the same lines as #8280 I occasionally get emails about failures
here but that's not too too useful, so ignore the errors here to get
triaged elsewhere if necessary.
This was originally added when Cargo would git clone the index and the
significant size of the index meant we got nontrivial speedups during
the cloning process. Nowadays though Cargo does a much more CI-efficient
method by default where it uses an HTTP index instead. This removes the
original need for caching since the index operations should now be much
faster, probably moreso than saving/restoring the cache.
This additionally removes the caching of registry downloads and git
clones too since in theory the cache isn't all that much faster than
what Cargo is already doing.
* More flags like `--dir` and `--env` are moved into `RunCommon` to be
shared between `wasmtime serve` and `wasmtime run`, meaning that the
`serve` command can now configure environment variables.
* A small test has been added as well as infrastructure for running
tests with `wasmtime serve` itself. Previously there were no tests
that executed `wasmtime serve`.
* The `test_programs` crate had a small refactoring to avoid
double-generation of http bindings.
This commit fixes an accidental issue introduced in #8018 where using an
element segment which had been dropped with an `externref` table would
cause a panic. The panic happened due to an assertion that tables are
being used with the right type of item and that was being mismatched.
The underlying issue was that dropped element segments are modeled as an
empty element segment but the empty element segment was using the
"functions" encoding as opposed to the "expressions" encoding. This
meant that code later assumed that due to the use of functions the table
must be a table-of-functions, but this was not correct for
externref-based tables.
The fix in this commit is to instead model the encoding as an
"expressions" list which means that the table type is dispatched on to
call the appropriate initializer.
There is no memory safety issue with this mistake as the assertion was
specifically targetted at preventing memory safety. This does, however,
enable any WebAssembly module to panic a host.
Closes#8281
Currently this workflow fails about 50% of the time due to what appears
to be rate limiting issues. I'm personally receiving emails every time
this fails and would rather not continue to receive emails. I've updated
this to allow failure to hopefully stop notifying me about this.
* cranelift: Fix indirect tail calls on aarch64/riscv64
x64 now uses a dedicated codegen strategy for tail-calls which writes
the callee's stack arguments directly into their final location, but
aarch64 and riscv64 still use our previous strategy of setting up for a
normal function call and then moving the new stack frame up to overwrite
the old one. In order to emit correct code for a normal function call,
we need to fake a normal Call/CallIndirect opcode rather than
ReturnCall/ReturnCallIndirect.
The new tests are a combination of x64's compile-tests for
return-call-indirect.clif, plus a large-stack-frame test from
aarch64/riscv64's return-call.clif modified to do an indirect call.
* Review comments: runtests and x64 compile tests
This commit is a refactoring and modernization of wiggle's
`BorrowChecker` implementation. This type is quite old and predates
everything related to the component model for example. This type
additionally predates the implementation of WASI threads for Wasmtime as
well. In general, this type is old and has not been updated in a long
time.
Originally a `BorrowChecker` was intended to be a somewhat cheap method
of enabling the host to have active safe shared and mutable borrows to
guest memory. Over time though this hasn't really panned out. The WASI
threads proposal, for example, doesn't allow safe shared or mutable
borrows at all. Instead everything must be modeled as a copy in or copy
out of data. This means that all of `wasmtime-wasi` and `wasi-common`
have largely already been rewritten in such a way to minimize borrows
into linear memory.
Nowadays the only types that represent safe borrows are the `GuestSlice`
type and its equivalents (e.g. `GuestSliceMut`, `GuestStr`, etc). These
are minimally used throughout `wasi-common` and `wasmtime-wasi` and when
they are used they're typically isolated to a small region of memory.
This is all coupled with the facst that `BorrowChecker` never ended up
being optimized. It's a `Mutex<HashMap<..>>` effectively and a pretty
expensive one at that. The `Mutex` is required because `&BorrowChecker`
must both allow mutations and be `Sync`. The `HashMap` is used to
implement precise byte-level region checking to fulfill the original
design requirements of what `wiggle` was envisioned to be.
Given all that, this commit guts `BorrowChecker`'s implementation and
functionality. The type is now effectively a glorified `RefCell` for the
entire span of linear memory. Regions are no longer considered when
borrows are made and instead a shared borrow is considered as borrowing
the entirety of shared memory. This means that it's not possible to
simultaneously have a safe shared and mutable borrow, even if they're
disjoint, at the same time.
The goal of this commit is to address performance issues seen in #7973
which I've seen locally as well. The heavyweight implementation of
`BorrowChecker` isn't really buying us much nowadays, especially with
much development having since moved on to the component model. The hope
is that this much coarser way of implementing borrow checking, which
should be much more easily optimizable, is sufficient for the needs of
WASI and not a whole lot else.
This commit refactors the `wasmtime-runtime` crate to avoid the
`std::panic` module entirely if it's compiled with `panic=abort`. From
an optimization perspective this is not really required since it'll
optimize the same either way with `-Cpanic=abort`, but avoiding
`std::panic` can help make the code a bit more portable. This
refactoring bundles in the `catch_unwind` with the longjmp of the panic
to keep the `#[cfg]` in one location. Callers are then updated as
appropriate.
* bump tokio-rustls
Note that rustls is not on the latest minor since tokio-rustls
has not updated yet.
* Add vet exemptions
* Update ureq to trim the crate graph
* Add vet for ureq
* Fix compile on riscv
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
i128 arguments must always be passed either fully in registers or fully
on the stack according to GCC. LLVM until recently allowed splitting it
into a half passed in a register and a half passed on the stack. This
was recently fixed. For cg_clif to remain ABI compatible with cg_llvm,
it is necessary to apply the same fix to Cranelift.
See also https://blog.rust-lang.org/2024/03/30/i128-layout-update.html
I'm not sure why the test fails this way locally, but I spuriously was
receiving a "connection refused" error instead of a "connection
timeout" error. I've updated the test case to accept either error here
to fix the spurious errors I'm seeing locally.
* Lift all serde deps to the workspace level
Deduplicate some versions mentioned throughout crates in the workspace.
* Lift `bincode` deps to the workspace configuration level
Deduplicate some mentioned versions throughout.
* Lift libc deps up to the workspace root
As with prior commits, deduplicate some versions mentioned.
* Update Wasmtime's policy on `cargo vet`
This was discussed at today's Wasmtime meeting out of some concerns
around our current policies. Namely I felt the current state of affairs
is not striking the right balance between cost and benefit with our
usage of `cargo vet`. After discussion we've reached consensus around
two changes to our `cargo vet` policy documented here in this PR:
* An exemption can be added for "popular crates" at any time with no
review required. This should handle most big crates that are needed
for various dependencies. The thinking behind this is that a
supply-chain attack against these crates is highly likely to be
detected in a short time due to their popularity. Coupled with the
fact that changes to Wasmtime take a minimum of two weeks to get
released means that it's an unlikely exploitation vector.
* Maintainers are recommended to push directly to contributor's PRs for
`cargo vet` entries instead of making separate PRs. This avoids the
need for contributor rebasing and additionally solves the problem
where the `vet` entries land in a separate PR but then the
contributor's PR takes much longer to land. In the interim some `vet`
entries have been cleaned up by accident which requires re-landing the
PR to add the entries.
* Review comments
* Bad relocation type generated
When disabling sse on x86_64 architecture machines and generating float
libcall, incorrect relocation type R_X86_64_8 may be obtained, with the
correct type being R_X86_64_64.
* Gate support for the wasm `threads` proposal behind a Cargo feature
This commit moves support for the `threads` proposal behind a new
on-by-default Cargo feature: `threads`. This is intended to support
building Wasmtime with fewer runtime dependencies such as those required
for the atomic operations on memories.
This additionally adds the `gc` feature in a few missing places too.
* Fix compile of C API without threads
* egraph: Resolve all aliases at once
This way we can use the linear-time alias rewriting pass, and then avoid
having to think about value aliases ever again.
* Resolve aliases in facts and values_labels
When resolving aliases in values_labels, this discards debug info on
values which are replaced by aliases. However, that is equivalent to the
existing behavior in `Lower::get_value_labels`, which resolves value
aliases first and only then looks for attached debug info.
* Fix rustdoc warnings on Nightly
I noticed during a failed doc build of another PR we've got a number of
warnings being emitted, so resolve all those here.
* Fix more warnings
* Fix rebase conflicts
* Add a `compile` feature to `wasmtime-environ`
This commit adds a compile-time feature to remove some dependencies of
the `wasmtime-environ` crate. This compiles out support for compiling
modules/components and makes the crate slimmer in terms of amount of
code compiled along with its dependencies. Much of this should already
have been statically removed by native linkers so this likely won't have
any compile-size impact, but it's a nice-to-have in terms of
organization.
This has a fair bit of shuffling around of code, but apart from
renamings and movement there are no major changes here.
* Fix compile issue
* Gate `ModuleTranslation` and its methods on `compile`
* Fix doc link
* Fix doc link
Sometimes, when in the course of silly optimizations to make the most of
one's registers, one might want to pack two `i64`s into one `v128`, and
one might want to do it without any loads or stores.
In clang targeting Wasm at least, building an `i64x2` (with
`wasm_i64x2_make(a, b)` from `<wasm_simd128.h>`) will generate (i) an
`i64x2.splat` to create a new v128 with lane 0's value in both lanes,
then `i64x2.replace_lane` to put lane 1's value in place. Or, in the
case that one of the lanes is zero, it will generate a `v128.const 0`
then insert the other lane.
Cranelift's lowerings for both of these patterns on x64 are slightly
less optimal than they could be.
- For the former (replace-lane of splat), the 64-bit value is moved over
to the XMM register, then the rest of the `splat` semantics are
implemented by a `pshufd` (shuffle), even though we're just about to
overwrite the only other lane. We could omit that shuffle instead, and
everything would work fine.
This optimization is specific to `i64x2` (that is, only two lanes): we
need to know that the only other lane that the `splat` is splatting
into is overwritten. We could in theory match a chain of
replace-lane operators for higher-lane-count types, but let's save
that for the case that we actually need it later.
- For the latter (replace-lane of constant zero), the load of a constant
zero from the constant pool is the part that bothers me most. While I
like zeroed memory as much as the next person, there is a vector XOR
instruction *right there* under our noses, and we'd be silly not to
use it. This applies to any `vconst 0`, not just ones that occur as a
source to replace-lane.
* Plumb coredump feature to `wasmtime-runtime`
The `wasmtime` crate already has a `coredump` feature but whether or not
it's enabled the `wasmtime-runtime` crate still captures a core dump.
Use this flag in the `wasmtime` crate to plumb support to
`wasmtime-runtime` to skip capture if it's not enabled.
* Fix a typo
This commit fixes a mistake in #8181 which meant that the caching for
components was no longer working. The mistake is fixed in this commit as
well as a new test being added too.
* Add documentation and examples for `wasmtime-wasi`
This commit adds lots of missing documentation and examples to top-level
types in `wasmtime-wasi`, mostly related to WASIp2. I've additionally
made a number of small refactorings here to try to make the APIs a bit
more straightforward and symmetric and simplify where I can.
* Remove `bindings::wasi` (reexports are still present under `bindings`)
* Rename `bindings::sync_io` to `bindings::sync`
* Generate fewer bindings in `bindings::sync` that can be pulled in from
the `bindings` module.
* Change `WasiCtxBuilder::preopened_dir` to take a path instead of a
`Dir` argument to avoid the need for another import.
* Synchronize `wasmtime_wasi_http::{add_to_linker, sync::add_to_linker}`
in terms of interfaces added.
* Remove `wasmtime_wasi::command` and move the generated types to the
`bindings` module.
* Move top-level add-to-linker functions to
`wasmtime_wasi::add_to_linker_sync` and
`wasmtime_wasi::add_to_linker_async`.
Closes#8187Closes#8188
* Add documentation for `wasmtime_wasi::preview1` and refactor
This commit adds documentation for the `wasmtime_wasi::preview1` module
and additionally refactors it as well. Previously this was based on a
similar design as WASIp2 with a "view trait" and various bits and
pieces, but the design constraints of WASIp1 lends itself to a simpler
solution of "just" passing around `WasiP1Ctx` instead. This goes back to
what `wasi-common` did of sorts where the `add_to_linker_*` functions
only need a projection from `&mut T` to `&mut WasiP1Ctx`, a concrete
type, which simplifies the module and usage.
* Small refactorings to `preopened_dir`
* Add `WasiCtx::builder`.
* Fix typo
* Review comments
* Add GrowFrame and ShrinkFrame instructions for moving the frame
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Experimentally emit grow/shrink frame instructions for x64 tail calls
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Reuse the epilogue generation functions for tail call emission
Instead of building and copying the new frame over the old one, make use
of the frame shrink/grow pseudo-instructions to move the frame, and then
reuse the existing epilogue generation functions to setup the tail call.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Enable callee saves with the tail calling convention on x64
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Remove the requirement that indirect calls go through r15 with the tail cc
* Stop using r14 for a temporary during the stack check with the tail cc
* Apply suggestions from code review
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* Remove constants in favor of reusing values computed for FrameLayout
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Suggestions from review
* Rename the grow/shrink frame instructions, and adjust their comments
* Comments on ArgLoc
* Add more tests for return_call, and fix grow/shrink arg area printing
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Co-authored-by: Jamey Sharp <jamey@minilop.net>
With all Winch tests moved to `tests/disas` in #8243 plus the support of
`wasmtime compile -C compiler=winch` this tool should in theory be
supplanted nowadays with other alternatives. This commit removes the
executable and the `winch-filetests` support.
* Switch Winch tests to ATT syntax
* Update all test expectations
* Move all winch tests to `disas` folder
* Add `test = "winch"` to `disas`
* Add `test = "winch"` to all winch test files
* Stub out bits to get AArch64 Winch tests working
* Update expectations for all aarch64 winch tests
* Update flags in Winch tests
Use CLI syntax as that's what `flags` was repurposes as in the new test
suite.
* Update all test expectations for x64 winch
* Omit more offsets by default
* Delete now-dead code
* Update an error message
* Update non-winch test expectations
* Disassemble `*.cwasm` for `compile` disas tests
This commit changes how the `compile` mode of the `disas` test suite
works. Previously this would use `--emit-clif` and run the Cranelift
pipeline for each individual function and use the custom VCode-based
disassembly for instruction output. This commit instead uses the raw
binary coming out of Wasmtime. The ELF file itself is parsed and is
disassembled in a manner similar to Winch tests.
The goal of this commit is somewhat twofold:
* Lay the groundwork to migrate all Winch-based filetests to
`tests/disas`.
* Test the raw output from Cranelift/Wasmtime which includes
optimizations like branch chomping in the `MachBuffer`.
This commit doesn't itself move the Winch tests yet, that's left for a
future commit.
* Update all test expectations for new output
* Fix PR-based CI when too many files are changed