A fuzz bug was hit last night where the root of the fuzz bug appears to
be exhaustion of the virtual address space. The specific case in
question instantiated a module with ~100 memories ~100 times, and each
memory reserved ~8gb of the virtual address space. This takes around 47
bits of addressable memory which is mighty close to the limit of what
can be done on x86_64, so this commit reduces the number of memories
that an instance may have when coming out of `wasm-smith`.
* Implement runtime checks for compilation settings
This commit fills out a few FIXME annotations by implementing run-time
checks that when a `Module` is created it has compatible codegen
settings for the current host (as `Module` is proof of "this code can
run"). This is done by implementing new `Engine`-level methods which
validate compiler settings. These settings are validated on
`Module::new` as well as when loading serialized modules.
Settings are split into two categories, one for "shared" top-level
settings and one for ISA-specific settings. Both categories now have
allow-lists hardcoded into `Engine` which indicate the acceptable values
for each setting (if applicable). ISA-specific settings are checked with
the Rust standard library's `std::is_x86_feature_detected!` macro. Other
macros for other platforms are not stable at this time but can be added
here if necessary.
Closes#3897
* Fix fall-through logic to actually be correct
* Use a `OnceCell`, not an `AtomicBool`
* Fix some broken tests
In some use cases it is desirable to provide a custom snapshot WASI
context. Facilitate this by depending on a combination of traits
required rather than concrete type in the signature.
Signed-off-by: Roman Volosatovs <rvolosatovs@riseup.net>
After adding the `call`-oriented benchmark recently I just noticed that
running benchmarks on CI is taking 30+ minutes which is not intended.
Instead of running a full benchmark run on CI (which I believe we're not
looking at anyway) instead only run the benchmarks for a single
iteration to ensure they still work but otherwise don't collect
statistics about them.
Additionally cap the number of parallel instantiations to 16 to avoid
running tons of tests for machines with lots of cpus.
This avoids a situation such as for the last point release where we had
different point releases all use the same branch name which forced us
make one release at a time. By putting the version number in the branch
name that should hopefully fix this.
If either end stack overflows we can't validate the other side since the
other side, depending on codegen settings, may have been successful, hit
a different trap, or also stack overflowed.
The goal of this new benchmark, `call`, is to help us measure the
overhead of both calling into WebAssembly from the host as well as
calling the host from WebAssembly. There's lots of various ways to
measure this so this benchmark is a bit large but should hopefully be
pretty thorough. It's expected that this benchmark will rarely be run in
its entirety but rather only a subset of the benchmarks will be run at
any one given time.
Some metrics measured here are:
* Typed vs Untyped vs Unchecked - testing the cost of both calling wasm
with these various methods as well as having wasm call the host where
the host function is defined with these various methods.
* With and without `call_hook` - helps to measure the overhead of the
`Store::call_hook` API.
* Synchronous and Asynchronous - measures the overhead of calling into
WebAssembly asynchronously (with and without the pooling allocator) in
addition to defining host APIs in various methods when wasm is called
asynchronously.
Currently all the numbers are as expected, notably:
* Host calling WebAssembly is ~25ns of overhead
* WebAssembly calling the host is ~3ns of overhead
* "Unchecked" is a bit slower than "typed", and "Untyped" is slower than
unchecked.
* Asynchronous wasm calling a synchronous host function has ~3ns of
overhead (nothing more than usual).
* Asynchronous calls are much slower, on the order of 2-3us due to
`madvise`.
Lots of other fiddly bits that can be measured here, but this will
hopefully help establish a benchmark through which we can measure in the
future in addition to measuring changes such as #3876
This commit fixes calling the call hooks configured in a store for host
functions defined with `Func::new_unchecked` or similar. I believe that
this was just an accidental oversight and there's no fundamental reason
to not support this.
Another instance similar to #3879 where when doing differential tests
the pooling allocator configuration needs to be updated to allow for a
possible table.
* Move spec interpreter fuzzing behind a Cargo feature
Building the spec interpreter requires a local installation of Ocaml and
now libgmp which isn't always available, so this enables the ability to
disable building the spec interpreter by using `cargo +nightly fuzz
build --no-default-features`. The spec interpreter is still built by
default but if fuzzers are being built locally and the spec interpreter
isn't needed then this should enable it to be relatively easily
opted-out of.
* Tweak manifest directives
This commit removes some `.unwrap()` annotations around casts between
integers to either be infallible or handle errors. This fixes a panic in
a fuzz test case that popped out for memory64-using modules. The actual
issue here is pretty benign, we were just too eager about assuming
things fit into 32-bit.
The recent removal of `ModuleLimits` meant that the update to the
fuzzers could quickly fail where the instance size limit was set to
something small (like 0) and then nothing would succeed in compilation.
This allows the modules to fail to compile and then continues to the
next fuzz input in these situations.
I frequently notice that the fuzz build of `cranelift-codegen` takes an
extremely long time and recently realized that one issue is that when
fuzzers are built we enable all of the backends in `cranelift-codegen`
but AFAIK only the native backend is actually fuzzed. I traced the
inclusion of `all-arch` back to #2323, specifically [this comment][1]
and it looks like now that the old backend is removed this should be
able to be removed as well.
[1]: https://github.com/bytecodealliance/wasmtime/pull/2323#discussion_r515228552
This seems to have intended to allow overrides but the specific Makefile
syntax used didn't actually allow overrides, so update that to allow env
vars from the outside world to override the variable (needed locally on
AArch64 I'm building on which has a different path to libgmp)
Spec tests require multi-value to be enabled and wasm-smith recently
made this a fuzz-input option, so override the fuzz input as we do for
other features and force-enable multi-value.
This commit updates the build script which clones the spec interpreter
for fuzzing to specifically pin at a hardcoded revision. This is
intended at improving reproducibility if we hit any issues while fuzzing
to ensure that the same wasmtime revision is always using the same spec
interpreter revision.
This commit aims to achieve the goal of being able to run the test suite
on Windows with `--test-threads 1`, or more notably allowing Wasmtime's
defaults to work better with the main thread on Windows which appears to
have a smaller stack by default than Linux by comparison. In decreasing
the default wasm stack size a test is also update to probe for less
stack to work on Windows' main thread by default, ideally allowing the
full test suite to work with `--test-threads 1` (although this isn't
added to CI as it's not really critical).
Closes#3857
* Shrink the size of the anyfunc table in `VMContext`
This commit shrinks the size of the `VMCallerCheckedAnyfunc` table
allocated into a `VMContext` to be the size of the number of "escaped"
functions in a module rather than the number of functions in a module.
Escaped functions include exports, table elements, etc, and are
typically an order of magnitude smaller than the number of functions in
general. This should greatly shrink the `VMContext` for some modules
which while we aren't necessarily having any problems with that today
shouldn't cause any problems in the future.
The original motivation for this was that this came up during the recent
lazy-table-initialization work and while it no longer has a direct
performance benefit since tables aren't initialized at all on
instantiation it should still improve long-running instances
theoretically with smaller `VMContext` allocations as well as better
locality between anyfuncs.
* Fix some tests
* Remove redundant hash set
* Use a helper for pushing function type information
* Use a more descriptive `is_escaping` method
* Clarify a comment
* Fix condition
* Remove the `ModuleLimits` pooling configuration structure
This commit is an attempt to improve the usability of the pooling
allocator by removing the need to configure a `ModuleLimits` structure.
Internally this structure has limits on all forms of wasm constructs but
this largely bottoms out in the size of an allocation for an instance in
the instance pooling allocator. Maintaining this list of limits can be
cumbersome as modules may get tweaked over time and there's otherwise no
real reason to limit the number of globals in a module since the main
goal is to limit the memory consumption of a `VMContext` which can be
done with a memory allocation limit rather than fine-tuned control over
each maximum and minimum.
The new approach taken in this commit is to remove `ModuleLimits`. Some
fields, such as `tables`, `table_elements` , `memories`, and
`memory_pages` are moved to `InstanceLimits` since they're still
enforced at runtime. A new field `size` is added to `InstanceLimits`
which indicates, in bytes, the maximum size of the `VMContext`
allocation. If the size of a `VMContext` for a module exceeds this value
then instantiation will fail.
This involved adding a few more checks to `{Table, Memory}::new_static`
to ensure that the minimum size is able to fit in the allocation, since
previously modules were validated at compile time of the module that
everything fit and that validation no longer happens (it happens at
runtime).
A consequence of this commit is that Wasmtime will have no built-in way
to reject modules at compile time if they'll fail to be instantiated
within a particular pooling allocator configuration. Instead a module
must attempt instantiation see if a failure happens.
* Fix benchmark compiles
* Fix some doc links
* Fix a panic by ensuring modules have limited tables/memories
* Review comments
* Add back validation at `Module` time instantiation is possible
This allows for getting an early signal at compile time that a module
will never be instantiable in an engine with matching settings.
* Provide a better error message when sizes are exceeded
Improve the error message when an instance size exceeds the maximum by
providing a breakdown of where the bytes are all going and why the large
size is being requested.
* Try to fix test in qemu
* Flag new test as 64-bit only
Sizes are all specific to 64-bit right now
The current definition of `ValueSlice` is not usable, since any call to
a constructor returning a `ValueSlice` will extend the mutable borrow
on the context taken by the constructor call, with the result that it
cannot be passed to any other constructor ever.
Re-implement `ValueSlice` as a pair of a `ValueList` identifer plus an
offset into the list. This type can simply be copied without requiring
a borrow on the context.
This changes the output of the `lower` constructor from a
`ValueRegs` to a new `InstOutput` type, which is a vector
of `ValueRegs`.
Code in `lower_common` is updated to use this new type to
handle instructions with multiple outputs. All back-ends
are updated to use the new type.
This PR makes use of the new implicit-conversion feature of the ISLE DSL
that was introduced in #3807 in order to make the lowering rules
significantly simpler and more concise.
The basic idea is to eliminate the repetitive and mechanical use of
terms that convert from one type to another when there is only one real
way to do the conversion -- for example, to go from a `WritableReg` to a
`Reg`, the only sensible way is to use `writable_reg_to_reg`.
This PR generally takes any term of the form "A_to_B" and makes it an
automatic conversion, as well as some others that are similar in spirit.
The notable exception to the pure-value-convsion category is the
`put_in_reg` family of operations, which actually do have side-effects.
However, as noted in the doc additions in #3807, this is fine as long as
the side-effects are idempotent. And on balance, making `put_in_reg`
automatic is a significant clarity win -- together with other operand
converters, it enables rules like:
```
;; Add two registers.
(rule (lower (has_type (fits_in_64 ty)
(iadd x y)))
(add ty x y))
```
There may be other converters that we could define to make the rules
even simpler; we can make such improvements as we think of them, but
this should be a good start!
This commit fixes a potential issue where the fast-path instantiate in
`MemoryImageSlot` where when the previous image is compared against the
new image it only performed file descriptor equality, but nowadays with
loading images from `*.cwasm` files there might be multiple images in
the same file so the offsets also need to be considered. I think this
isn't really easy to hit today, it would require combining both module
linking and multi-memory which gets into the realm of being pretty
esoteric so I haven't added a test case here for this.
It seems our `compile` fuzz target for ISLE has not been regularly
tested, as it was never updated for the `isle` -> `cranelift_isle` crate
renaming. This PR fixes it to compile again.
This also includes a simple fix in the typechecking: when verifying that
a term decl is valid, we might insert a term ID into the name->ID map
before fully checking that all of the types exist, and then skipping
(for error recovery purposes) the actual push onto the term-signature
vector if one of the types does have an error. This phantom TID can
later cause a panic. The fix is to avoid adding to the map until we have
fully verified the term decl.
Add support for implicit type conversions to ISLE.
This feature allows the DSL user to register to the compiler that a
particular term (used as a constructor or extractor) converts from one
type to another. The compiler will then *automatically* insert this term
whenever a type mismatch involving that specific pair of types occurs.
This significantly cleans up many uses of the ISLE DSL. For example,
when defining the compiler backends, we often have newtypes like `Gpr`
around `Reg` (signifying a particular type of register); we can define
a conversion from Gpr to Reg automatically.
Conversions can also have side-effects, as long as these side-effects
are idempotent. For example, `put_value_in_reg` in a compiler backend
has the effect of marking the value as used, causing codegen to produce
it, and assigns a register to the value; but multiple invocations of
this will return the same register for the same value. Thus it is safe
to use it as an implicit conversion that may be invoked multiple times.
This is documented in the ISLE-Cranelift integration document.
This PR also adds some testing infrastructure to the ISLE compiler,
checking that "pass" tests pass through the DSL compiler, "fail" tests
do not, and "link" tests are able to generate code and link that code
with corresponding Rust code.
Without async fuzzing, we won't be able to test the most interesting
aspects of epoch interruption, namely the
interrupt/update-deadline/resume flow. However, the "trap on epoch
change" behavior works even for synchronous stores, so we can fuzz with
this the same way we fuzz with the interrupt flag.
* fuzzing: Add a custom mutator based on `wasm-mutate`
* fuzz: Add a version of the `compile` fuzz target that uses `wasm-mutate`
* Update `wasmparser` dependencies
* Panic on resetting image slots back to anonymous memory
This commit updates `Drop for MemoryImageSlot` to panic instead of
ignoring errors when resetting memory back to a clean slate. On reading
some of this code again for a different change I realized that if an
error happens in `reset_with_anon_memory` it would be possible,
depending on where another error happened, to leak memory from one image
to another.
For example if `clear_and_remain_ready` failed its `madvise` (for
whatever reason) and didn't actually reset any memory, then if `Drop for
MemoryImageSlot` also hit an error trying to remap memory (for whatever
reason), then nothing about memory has changed and when the
`MemoryImageSlot` is recreated it'll think that it's 0-length when
actually it's a bit larger and may leak data.
I don't think this is a serious problem since we don't know any
situation under which the `madvise` would fail and/or the resetting with
anonymous memory, but given that these aren't expected to fail I figure
it's best to be a bit more defensive here and/or loud about failures.
* Update a comment