Rework the compilation of amodes in the aarch64 backend to stop reusing registers and instead generate fresh virtual registers for intermediates. This resolves some SSA checker violations with the aarch64 backend, and as a nice side-effect removes some unnecessary movs in the generated code.
As loading constants on aarch64 can take up to 4 instructions, we need to plumb through some additional registers. Rather than pass a fixed list of registers in, pass an allocation function.
* Simplify the `ModuleRuntimeInfo` trait slightly
Fold two functions into one as they're only called from one location
anyway.
* Remove ModuleRuntimeInfo::signature
This is redundant as the array mapping is already stored within the
`VMContext` so that can be consulted rather than having a separate trait
function for it. This required altering the `Global` creation slightly
to work correctly in this situation.
* Remove a now-dead constant
* Shared `VMOffsets` across instances
This commit removes the computation of `VMOffsets` to being per-module
instead of per-instance. The `VMOffsets` structure is also quite large
so this shaves off 112 bytes per instance which isn't a huge impact but
should help lower the cost of instantiating small modules.
* Remove `InstanceAllocator::adjust_tunables`
This is no longer needed or necessary with the pooling allocator.
* Fix compile warning
* Fix a vtune warning
* Fix pooling tests
* Fix another test warning
* Add a `WasmBacktrace::new()` constructor
This commit adds a method of manually capturing a backtrace of
WebAssembly frames within a `Store`. The new constructor can be called
with any `AsContext` values, primarily `&Store` and `&Caller`, during
host functions to inspect the calling state.
For now this does not respect the `Config::wasm_backtrace` option and
instead unconditionally captures the backtrace. It's hoped that this can
continue to adapt to needs of embedders by making it more configurable
int he future if necessary.
Closes#5339
* Split `new` into `capture` and `force_capture`
* Remove some custom error types in Wasmtime
These types are mostly cumbersome to work with nowadays that `anyhow` is
used everywhere else. This commit removes `InstantiationError` and
`SetupError` in favor of using `anyhow::Error` throughout. This can
eventually culminate in creation of specific errors for embedders to
downcast to but for now this should be general enough.
* Fix Windows build
* cranelift-wasm: Remove `ModuleTranslationState`
We were putting data into it, but never reading data out of it. Can be removed.
* cranelift-wasm: move `funct_state.rs` sub module to `state.rs`
Since it is the only submodule of `state` it can just be the whole `state`
module.
* Treat `-` as an alias to `/dev/stdin`
This applies to unix targets only,
as Windows does not have an appropriate alternative.
* Add tests for piped modules from stdin
This applies to unix targets only,
as Windows does not have an appropriate alternative.
* Move precompiled module detection into wasmtime
Previously, wasmtime-cli checked the module to be loaded is
precompiled or not, by pre-opening the given file path to
check if the "\x7FELF" header exists.
This commit moves this branch into the `Module::from_trusted_file`,
which is only invoked with `--allow-precompiled` flag on CLI.
The initial motivation of the commit is, feeding a module to wasmtime
from piped inputs, is blocked by the pre-opening of the module.
The `Module::from_trusted_file`, assumes the --allow-precompiled flag
so there is no piped inputs, happily mmap-ing the module to test
if the header exists.
If --allow-precompiled is not supplied, the existing `Module::from_file`
will be used, without the additional header check as the precompiled
modules are intentionally not allowed on piped inputs for security measures.
One caveat of this approach is that the user may be confused if
he or she tries to execute a precompiled module without
--allow-precompiled, as wasmtime shows an 'input bytes aren't valid
utf-8' error, not directly getting what's going wrong.
So this commit includes a hack-ish workaround for this.
Thanks to @jameysharp for suggesting this idea with a detailed guidance.
Avoid reusing a destination virtual register for 64-bit constants in the s390x backend. This change addresses a case identified by the regalloc2 ssa validator, as the destination register was written to twice when constants were generated via the MachInst::gen_constant function.
Some of our ISLE rules can never fire because there's a higher-priority
rule that will always fire instead.
Sometimes the worst that can happen is we generate sub-optimal output.
That's not so bad but we'd still like to know about it so we can fix it.
In other cases there might be instructions which can't be lowered in
isolation. If a general rule for lowering one of the instructions is
higher-priority than the rule for lowering the combined sequence, then
lowering the combined sequence will always fail.
Either way, this is always a bug, so make it a fatal error if we can
detect it.
Introduce a temporary for an intermediate value in the lowering of div in the x64 backend. Additionally, add a src argument to the shift_r smart constructor, which is why the diff got larger than just the div lowering.
Avoid reusing output registers in make_i64x2_from_lanes by threading the output name instead, and using smart constructors for x64_pinsrd instead of constructing the instructions directly.
* Cranelift: implement `heap_{load,store}` instruction legalization
This does not remove `heap_addr` yet, but it does factor out the common
bounds-check-and-compute-the-native-address functionality that is shared between
all of `heap_{addr,load,store}`.
Finally, this adds a missing optimization for when we can dedupe explicit bounds
checks for static memories and Spectre mitigations.
* Cranelift: Enable `heap_load_store_*` run tests on all targets
* Turn off probestack by default in Cranelift
The probestack feature is not implemented for the aarch64 and s390x
backends and currently the on-by-default status requires the aarch64 and
s390x implementations to be a stub. Turning off probestack by default
allows the s390x and aarch64 backends to panic with an error message to
avoid providing a false sense of security. When the probestack option is
implemented for all backends, however, it may be reasonable to
re-enable.
* aarch64: Improve codegen for AMode fallback
Currently the final fallback for finalizing an `AMode` will generate
both a constant-loading instruction as well as an `add` instruction to
the base register into the same temporary. This commit improves the
codegen by removing the `add` instruction and folding the final add into
the finalized `AMode`. This changes the `extendop` used but both
registers are 64-bit so shouldn't be affected by the extending
operation.
* aarch64: Implement inline stack probes
This commit implements inline stack probes for the aarch64 backend in
Cranelift. The support here is modeled after the x64 support where
unrolled probes are used up to a particular threshold after which a loop
is generated. The instructions here are similar in spirit to x64 except
that unlike x64 the stack pointer isn't modified during the unrolled
loop to avoid needing to re-adjust it back up at the end of the loop.
* Enable inline probestack for AArch64 and Riscv64
This commit enables inline probestacks for the AArch64 and Riscv64
architectures in the same manner that x86_64 has it enabled now. Some
more testing was additionally added since on Unix platforms we should be
guaranteed that Rust's stack overflow message is now printed too.
* Enable probestack for aarch64 in cranelift-fuzzgen
* Address review comments
* Remove implicit stack overflow traps from x64 backend
This commit removes implicit `StackOverflow` traps inserted by the x64
backend for stack-based operations. This was historically required when
stack overflow was detected with page faults but Wasmtime no longer
requires that since it's not suitable for wasm modules which call host
functions. Additionally no other backend implements this form of
implicit trap-code additions so this is intended to synchronize the
behavior of all the backends.
This fixes a test added prior for aarch64 to properly abort the process
instead of accidentally being caught by Wasmtime.
* Fix a style issue
* wasmtime: enable stack probing for x86_64 targets.
This commit unconditionally enables stack probing for x86_64 targets.
On Windows, stack probing is always required because of the way Windows commits
stack pages (via guard page access).
Fixes#5340.
* Remove SIMD types from test case.
Was missing some '$' characters and so was comparing string literals against
string literals instead of variable values against string literals. Regenerated
tests to fix them and add missing tests.
* Cranelift: consider heap's guard pages when legalizing `heap_addr`
Fixes#5328
* Update comment to align more directly with implementation
* Add legalization tests for `heap_addr` and offset guard pages
Ulrich Weigand identified two bugs in this code due to it falsely
claiming there were unreachable rules in the s390x backend. The fixes
are:
- Add constraints for pure constructors.
I didn't notice that a constructor which is declared pure (which
currently implies that it is fallible), when used on the left-hand side
of a rule, can cause the rule to fail to match. Therefore, any
constructors on the left-hand side must be noted as additional
constraints on the rule, so that overlap checking can see them.
- Ignore subset-overlaps for rules with equality constraints
This eliminates false positives when checking for unreachable rules. It
introduces false negatives instead but we prefer to fail to detect an
error instead of claiming that valid input is wrong. We can implement a
more accurate check later.
- `Store::add_fuel` documentation said it'd panic when the engine is
not configured to have fuel, but it in fact returns an error[1].
- There was a typo in `Store::add_fuel`'s documentation (either either).
I "fixed" the typo by rewording the section using the same wording
as `Store::fuel_consumed` (_if fuel consumption is not enabled
via..._)
[1]ff5abfd993/crates/wasmtime/src/store.rs (L1397-L1400)
- Remove remaining references to Miette
- Borrow implementation of `line_starts` from codespan-reporting
- Clean up a use of `Result` that no longer conflicts with a local
definition
- When printing plain errors, add a blank line between errors for
readability
Fix shadowing identified in #5322 for imul and swiden_high/swiden_low/uwiden_high/uwiden_low combinations in the x64 backend, and remove some redundant rules from the aarch64 dynamic neon ruleset. Additionally, add tests to the x64 backend showing that the imul specializations are firing.
The main change here is that io-lifetimes 1.0 switches to use the I/O safety
feature in the standard library rather than providing its own copy.
This also updates to windows-sys 0.42.0 and rustix 0.36.
Add assertions to the OperandCollector that show we're not using pinned vregs, and use reg_fixed_nonallocatable constraints when a real register is used with other constraint generation functions like reg_use etc.
* Clear affine slots when dropping a `Module`
This commit implements a resource usage optimization for Wasmtime with
the pooling instance allocator by ensuring that when a `Module` is
dropped its backing virtual memory mappings are all removed. Currently
when a `Module` is dropped it releases a strong reference to its
internal memory image but the memory image may stick around in
individual pooling instance allocator slots. When using the `Random`
allocation strategy, for example, this means that the memory images
could stick around for a long time.
While not a pressing issue this has resource usage implications for
Wasmtime. Namely removing a `Module` does not guarantee the memfd, if in
use for a memory image, is closed and deallocated within the kernel.
Unfortunately simply closing the memfd is not sufficient as well as the
mappings into the address space additionally all need to be removed for
the kernel to release the resources for the memfd. This means that to
release all kernel-level resources for a `Module` all slots which have
the memory image mapped in must have the slot reset.
This problem isn't particularly present when using the `NextAvailable`
allocation strategy since the number of lingering memfds is proportional
to the maximum concurrent size of wasm instances. With the `Random` and
`ReuseAffinity` strategies, however, it's much more prominent because
the number of lingering memfds can reach the total number of slots
available. This can appear as a leak of kernel-level memory which can
cause other system instability.
To fix this issue this commit adds necessary instrumentation to `Drop
for Module` to purge all references to the module in the pooling
instance allocator. All index allocation strategies now maintain
affinity tracking to ensure that regardless of the strategy in use a
module that is dropped will remove all its memory mappings. A new
allocation method was added to the index allocator for allocating an
index without setting affinity and only allocating affine slots. This is
used to iterate over all the affine slots without holding the global
index lock for an unnecessarily long time while mappings are removed.
* Review comments
This gets this package to build for `x86_64`, but does not include any
cache clearing for `aarch64-freebsd`. Which will probably build, but not work.
As far as I can tell membarriers are still in the process of being added.
See: https://reviews.freebsd.org/D32360
Since FreeBSD is mirroring the Linux membarrier syscall we may be able
to reuse the same implementation for both Linux and FreeBSD for AArch64.
There were several issues with ISLE's existing error reporting
implementation.
- When using Miette for more readable error reports, it would panic if
errors were reported from multiple files in the same run.
- Miette is pretty heavy-weight for what we're doing, with a lot of
dependencies.
- The `Error::Errors` enum variant led to normalization steps in many
places, to avoid using that variant to represent a single error.
This commit:
- replaces Miette with codespan-reporting
- gets rid of a bunch of cargo-vet exemptions
- replaces the `Error::Errors` variant with a new `Errors` type
- removes source info from `Error` variants so they're easy to construct
- adds source info only when formatting `Errors`
- formats `Errors` with a custom `Debug` impl
- shares common code between ISLE's callers, islec and cranelift-codegen
- includes a source snippet even with fancy-errors disabled
I tried to make this a series of smaller commits but I couldn't find any
good split points; everything was too entangled with everything else.
This commit refactors the internals of `wasmtime_runtime::SharedMemory`
a bit to expose the necessary functions to invoke from the
`wasmtime::SharedMemory` layer. Notably some items are moved out of the
`RwLock` from prior, such as the type and the `VMMemoryDefinition`.
Additionally the organization around the `atomic_*` methods has been
redone to ensure that the `wasmtime`-layer abstraction has a single
method to call into which everything else uses as well.
* Update release date of Wasmtime 3.0.0
* Update release date for 3.0.0
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
From the documentation of `CondVar::wait_timeout`:
> The semantics of this function are equivalent to wait except that the thread
> will be blocked for roughly no longer than `dur`. This method should not be
> used for precise timing due to anomalies such as preemption or platform
> differences that might not cause the maximum amount of time waited to be
> precisely `dur`.
Therefore, go to sleep again, if the thread has not slept long enough.
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
* bench: add more WASI benchmarks
This follows up on #5274 to add several more scenarios with which to
benchmark WASI performance:
- `open-file.wat`: opens and closes a file
- `read-file.wat`: opens a file, reads 4K bytes from it, then closes it
- `read-dir.wat`: reads a directory's entries
Each benchmark is hand-crafted WAT to more clearly control what WASI
calls are made. As with #5274, these modules' sole entry point takes a
parameter indicating the number of iterations to run in order to use
`criterion`'s `iter_custom` feature.
* fix: reduce expected size of directory entries
* Cranelift: Define `heap_load` and `heap_store` instructions
* Cranelift: Implement interpreter support for `heap_load` and `heap_store`
* Cranelift: Add a suite runtests for `heap_{load,store}`
There are so many knobs we can twist for heaps and I wanted to exhaustively test
all of them, so I wrote a script to generate the tests. I've checked in the
script in case we want to make any changes in the future, but I don't think it
is worth adding this to CI to check that scripts are up to date or anything like
that.
* Review feedback
* feat: implement memory.atomic.notify,wait32,wait64
Added the parking_spot crate, which provides the needed registry for the
operations.
Signed-off-by: Harald Hoyer <harald@profian.com>
* fix: change trap message for HeapMisaligned
The threads spec test wants "unaligned atomic"
instead of "misaligned memory access".
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add test for atomic wait on non-shared memory
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests/spec_testsuite/proposals/threads
without pooling and reference types.
Also "shared_memory" is added to the "spectest" interface.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add atomics_notify.wast
checking that notify with 0 waiters returns 0 on shared and non-shared
memory.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests for atomic wait on shared memory
- return 2 - timeout for 0
- return 2 - timeout for 1000ns
- return 1 - invalid value
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
It seems they were mistakenly added to the `wasmtime_valunion` union whereas it is actually the `ValRaw` Rust type (represented by `wasmtime_val_raw`) that is affected by the change.