* cranelift: Reorganize test suite
Group some SIMD operations by instruction.
* cranelift: Deduplicate some shift tests
Also, new tests with the mod behaviour
* aarch64: Lower shifts with mod behaviour
* x64: Lower shifts with mod behaviour
* wasmtime: Don't mask SIMD shifts
* Implement variant translation in fused adapters
This commit implements the most general case of variants for fused
adapter trampolines. Additionally a number of other primitive types are
filled out here to assist with testing variants. The implementation
internally was relatively straightforward given the shape of variants,
but there's room for future optimization as necessary especially around
converting locals to various types.
This commit also introduces a "one off" fuzzer for adapters to ensure
that the generated adapter is valid. I hope to extend this fuzz
generator as more types are implemented to assist in various corner
cases that might arise. For now the fuzzer simply tests that the output
wasm module is valid, not that it actually executes correctly. I hope to
integrate with a fuzzer along the lines of #4307 one day to test the
run-time-correctness of the generated adapters as well, at which point
this fuzzer would become obsolete.
Finally this commit also fixes an issue with `u8` translation where
upper bits weren't zero'd out and were passed raw across modules.
Instead smaller-than-32 types now all mask out their upper bits and do
sign-extension as appropriate for unsigned/signed variants.
* Fuzz memory64 in the new trampoline fuzzer
Currently memory64 isn't supported elsewhere in the component model
implementation of Wasmtime but the trampoline compiler seems as good a
place as any to ensure that it at least works in isolation. This plumbs
through fuzz input into a `memory64` boolean which gets fed into
compilation. Some miscellaneous bugs were fixed as a result to ensure
that memory64 trampolines all validate correctly.
* Tweak manifest for doc build
DHAT reports that when compiling the Spidermonkey Sightglass benchmark,
there are over 100k of these Vec allocations, averaging less than 4
bytes, and with an average lifetime of only about 500 instructions.
This function is only called from one place, which immediately converts
it into an iterator. So this commit just returns the iterator that was
previously being collected into a Vec. The iterator has to borrow from
the DataFlowGraph, so this would change borrow-check results, but in the
one caller that turns out to be okay.
(That sole caller is in cranelift/codegen/src/machinst/lower.rs, in
Lower::lower().)
According to Sightglass, this is a compile-time improvement of between
2% and 12% on the Spidermonkey benchmark:
instantiation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm
Δ = 14628.76 ± 10318.59 (confidence = 99%)
main-0e6ffd024.so is 0.87x to 0.98x faster than no-small-vecs.so!
no-small-vecs.so is 1.02x to 1.14x faster than main-0e6ffd024.so!
[142023 187464.24 301522] main-0e6ffd024.so
[103742 172835.48 263917] no-small-vecs.so
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm
Δ = 362392705.93 ± 267070467.06 (confidence = 99%)
main-0e6ffd024.so is 0.89x to 0.98x faster than no-small-vecs.so!
no-small-vecs.so is 1.02x to 1.12x faster than main-0e6ffd024.so!
[3655734131 5522594697.83 6471126699] main-0e6ffd024.so
[3278129811 5160201991.90 5810600015] no-small-vecs.so
First, we switch from a `BTreeSet` to a `HashSet` because clearing a `BTreeSet`
will deallocate the btree's nodes but clearing a `HashSet` will not deallocate
the backing hash table, saving the space to reuse for future insertions.
Then, we reuse the same set (and therefore the same allocation) across every
call to `can_optimize_var_lookup`.
This results in a 1.22x to 1.32x speed up on various Sightglass benchmarks:
```
compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm
Δ = 39478181.76 ± 3441880.32 (confidence = 99%)
main.so is 0.75x to 0.79x faster than reuse-set.so!
reuse-set.so is 1.27x to 1.32x faster than main.so!
[160128343 172174751.09 213325968] main.so
[115055695 132696569.33 200782128] reuse-set.so
compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm
Δ = 22576954.88 ± 1830771.68 (confidence = 99%)
main.so is 0.77x to 0.81x faster than reuse-set.so!
reuse-set.so is 1.25x to 1.29x faster than main.so!
[100449245 106820149.65 118628066] main.so
[77039172 84243194.77 128168647] reuse-set.so
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm
Δ = 664533554.97 ± 22109170.05 (confidence = 99%)
main.so is 0.81x to 0.82x faster than reuse-set.so!
reuse-set.so is 1.22x to 1.23x faster than main.so!
[3549762523 3640587103.35 3798662501] main.so
[2793335181 2976053548.38 3192950484] reuse-set.so
```
* Don't try to report file size or timestamps for stdio streams.
Calling `File::metadata()` on a stdio stream handle fails on Windows, where
the stdio streams are not files.
This `File::metadata()` call was effectively only being used to add file size
and timestamps to the result of `filestat_get`. It's common for users to
redirect stdio streams to interesting places, and applications
generally shouldn't change their behavior depending on the size or
timestamps of the file, if the streams are redirected to a file, so just
leave these fields to 0, which is commonly understood to represent
"unknown".
Fixes#4497.
* Remove dependency on `more-asserts`
In my recent adventures to do a bit of gardening on our dependencies I
noticed that there's a new major version for the `more-asserts` crate.
Instead of updating to this though I've opted to instead remove the
dependency since I don't think we heavily lean on this crate and
otherwise one-off prints are probably sufficient to avoid the need for
pulling in a whole crate for this.
* Remove exemption for `more-asserts`
* Add initial support for fused adapter trampolines
This commit lands a significant new piece of functionality to Wasmtime's
implementation of the component model in the form of the implementation
of fused adapter trampolines. Internally within a component core wasm
modules can communicate with each other by having their exports
`canon lift`'d to get `canon lower`'d into a different component. This
signifies that two components are communicating through a statically
known interface via the canonical ABI at this time. Previously Wasmtime
was able to identify that this communication was happening but it simply
panicked with `unimplemented!` upon seeing it. This commit is the
beginning of filling out this panic location with an actual
implementation.
The implementation route chosen here for fused adapters is to use a
WebAssembly module itself for the implementation. This means that, at
compile time of a component, Wasmtime is generating core WebAssembly
modules which then get recursively compiled within Wasmtime as well. The
choice to use WebAssembly itself as the implementation of fused adapters
stems from a few motivations:
* This does not represent a significant increase in the "trusted
compiler base" of Wasmtime. Getting the Wasm -> CLIF translation
correct once is hard enough much less for an entirely different IR to
CLIF. By generating WebAssembly no new interactions with Cranelift are
added which drastically reduces the possibilities for mistakes.
* Using WebAssembly means that component adapters are insulated from
miscompilations and mistakes. If something goes wrong it's defined
well within the WebAssembly specification how it goes wrong and what
happens as a result. This means that the "blast zone" for a wrong
adapter is the component instance but not the entire host itself.
Accesses to linear memory are guaranteed to be in-bounds and otherwise
handled via well-defined traps.
* A fully-finished fused adapter compiler is expected to be a
significant and quite complex component of Wasmtime. Functionality
along these lines is expected to be needed for Web-based polyfills of
the component model and by using core WebAssembly it provides the
opportunity to share code between Wasmtime and these polyfills for the
component model.
* Finally the runtime implementation of managing WebAssembly modules is
already implemented and quite easy to integrate with, so representing
fused adapters with WebAssembly results in very little extra support
necessary for the runtime implementation of instantiating and managing
a component.
The compiler added in this commit is dubbed Wasmtime's Fused Adapter
Compiler of Trampolines (FACT) because who doesn't like deriving a name
from an acronym. Currently the trampoline compiler is limited in its
support for interface types and only supports a few primitives. I plan
on filing future PRs to flesh out the support here for all the variants
of `InterfaceType`. For now this PR is primarily focused on all of the
other infrastructure for the addition of a trampoline compiler.
With the choice to use core WebAssembly to implement fused adapters it
means that adapters need to be inserted into a module. Unfortunately
adapters cannot all go into a single WebAssembly module because adapters
themselves have dependencies which may be provided transitively through
instances that were instantiated with other adapters. This means that a
significant chunk of this PR (`adapt.rs`) is dedicated to determining
precisely which adapters go into precisely which adapter modules. This
partitioning process attempts to make large modules wherever it can to
cut down on core wasm instantiations but is likely not optimal as
it's just a simple heuristic today.
With all of this added together it's now possible to start writing
`*.wast` tests that internally have adapted modules communicating with
one another. A `fused.wast` test suite was added as part of this PR
which is the beginning of tests for the support of the fused adapter
compiler added in this PR. Currently this is primarily testing some
various topologies of adapters along with direct/indirect modes. This
will grow many more tests over time as more types are supported.
Overall I'm not 100% satisfied with the testing story of this PR. When a
test fails it's very difficult to debug since everything is written in
the text format of WebAssembly meaning there's no "conveniences" to
print out the state of the world when things go wrong and easily debug.
I think this will become even more apparent as more tests are written
for more types in subsequent PRs. At this time though I know of no
better alternative other than leaning pretty heavily on fuzz-testing to
ensure this is all exercised.
* Fix an unused field warning
* Fix tests in `wasmtime-runtime`
* Add some more tests for compiled trampolines
* Remap exports when injecting adapters
The exports of a component were accidentally left unmapped which meant
that they indexed the instance indexes pre-adapter module insertion.
* Fix typo
* Rebase conflicts
* fuzzgen: Use Switch interface
Turns out this is an interface that the frontend provides.
We should fuzz it.
* cranelift: Restrict index range in Switch emission on fuzzgen
* x64: Add VEX Instruction Encoder
This uses a similar builder pattern to the EVEX Encoder.
Does not yet support memory accesses.
* x64: Add FMA Flag
* x64: Implement SIMD `fma`
* x64: Use 4 register Vex Inst
* x64: Reorder VEX pretty print args
* Allow 64-bit vectors and implement for interpreter
The AArch64 backend already supports 64-bit vectors; this simply allows
instructions to make use of that.
Implemented support for 64-bit vectors within the interpreter to allow
interpret runtests to use them.
Copyright (c) 2022 Arm Limited
* Disable 64-bit SIMD `iaddpairwise` tests on s390x
Copyright (c) 2022 Arm Limited
* [AArch64] Port SIMD narrowing to ISLE
Fvdemote, snarrow, unarrow and uunarrow.
Also refactor the aarch64 instructions descriptions to parameterize
on ScalarSize instead of using different opcodes.
The zero_value pure constructor has been introduced and used by the
integer narrow operations and it replaces, and extends, the compare
zero patterns.
Copright (c) 2022, Arm Limited.
* use short 'if' patterns
This enables more runtests to be executed on s390x. Doing so
uncovered a two back-end bugs, which are fixed as well:
- The result of cls was always off by one.
- The result of popcnt.i16 has uninitialized high bits.
In addition, I found a bug in the load-op-store.clif test case:
v3 = heap_addr.i64 heap0, v1, 4
v4 = iconst.i64 42
store.i32 v4, v3
This was clearly intended to perform a 32-bit store, but
actually performs a 64-bit store (it seems the type annotation
of the store opcode is ignored, and the type of the operand
is used instead). That bug did not show any noticable symptoms
on little-endian architectures, but broke on big-endian.
* support dynamic function calls in component model
This addresses #4310, introducing a new `component::values::Val` type for
representing component values dynamically, as well as `component::types::Type`
for representing the corresponding interface types. It also adds a `call` method
to `component::func::Func`, which takes a slice of `Val`s as parameters and
returns a `Result<Val>` representing the result.
Note that I've moved `post_return` and `call_raw` from `TypedFunc` to `Func`
since there was nothing specific to `TypedFunc` about them, and I wanted to
reuse them. The code in both is unchanged beyond the trivial tweaks to make
them fit in their new home.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* order variants and match cases more consistently
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* implement lift for String, Box<str>, etc.
This also removes the redundant `store` parameter from `Type::load`.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* implement code review feedback
This fixes a few issues:
- Bad offset calculation when lowering
- Missing variant padding
- Style issues regarding `types::Handle`
- Missed opportunities to reuse `Lift` and `Lower` impls
It also adds forwarding `Lift` impls for `Box<[T]>`, `Vec<T>`, etc.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* move `new_*` methods to specific `types` structs
Per review feedback, I've moved `Type::new_record` to `Record::new_val` and
added a `Type::unwrap_record` method; likewise for the other kinds of types.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* make tuple, option, and expected type comparisons recursive
These types should compare as equal across component boundaries as long as their
type parameters are equal.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* improve error diagnostic in `Type::check`
We now distinguish between more failure cases to provide an informative error
message.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* address review feedback
- Remove `WasmStr::to_str_from_memory` and `WasmList::get_from_memory`
- add `try_new` methods to various `values` types
- avoid using `ExactSizeIterator::len` where we can't trust it
- fix over-constrained bounds on forwarded `ComponentType` impls
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* rearrange code per review feedback
- Move functions from `types` to `values` module so we can make certain struct fields private
- Rename `try_new` to just `new`
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* remove special-case equality test for tuples, options, and expecteds
Instead, I've added a FIXME comment and will open an issue to do recursive
structural equality testing.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* cranelift: Restrict `br_table` to `i32` indices
In #4498 it was proposed that we should only accept `i32` indices
to `br_table`. The rationale for this is that larger types lead the
users to a false sense of flexibility (since we don't support jump
tables larger than u32's), and narrower types are not well tested
paths that would be safer if we removed them.
* cranelift: Reduce directly from i128 to i32 in Switch
Converted the existing implementations for the following opcodes to ISLE
on AArch64:
- `sqrt`
- `fneg`
- `fabs`
- `fpromote`
- `fdemote`
- `ceil`
- `floor`
- `trunc`
- `nearest`
Copyright (c) 2022 Arm Limited
In #4502 we discovered a bug in the switch api where it would emit
`icmp_imm`'s with types that were not able to fully represent the
destination index.
We now reject these inputs. The index val must always have a
type that is capable of addressing the entire range of inputs.
* Add cmake compatibility to c-api
* Add CMake documentation to wasmtime.h
* Add CMake instructions in examples
* Modify CI for CMake support
* Use correct rust in CI
* Trigger build
* Refactor run-examples
* Reintroduce example_to_run in run-examples
* Replace run-examples crate with cmake
* Fix markdown formatting in examples readme
* Fix cmake test quotes
* Build rust wasm before cmake tests
* Pass CTEST_OUTPUT_ON_FAILURE
* Another cmake test
* Handle os differences in cmake test
* Fix bugs in memory and multimemory examples
I noticed that `TableOp::insert` had assertions that `num_params` and
`table_size` were greater than 0, but no assert for `num_globals`. These
asserts couldn't be hit because the `*_RANGE` constants were all set to
a minimum of 1.
But the only reason I can see to prohibit 0-sized tables, locals, or
globals, was because indexes into those spaces were generated with the
`%` operator. Allowing 0-sized spaces requires not generating the
corresponding instructions at all when there are no valid indexes.
So I pushed the final selection of which table/local/global to access
earlier, to the moment when we're picking which TableOps to run. Then,
instead of generating a random u8 or u32 and taking the remainder to get
it into the right range, I can just ask `arbitrary` to generate a number
in the right range to begin with.
So this now explores some size-0 corners that it didn't before, and it
doesn't require reasoning about whether remainder can divide by zero.
Also I think it uses fewer bits of the `Unstructured` input to produce
the same cases, and I hope that lets libFuzzer more quickly find bits it
can mutate to get to novel coverage paths.
On s390x, we do not have a frame pointer that can be used to chain
stack frames for easy unwinding. Instead, our ABI defines a stack
"backchain" mechanism that can be used to the same effect.
This PR uses that backchain mechanism to implement the new
preserve_frame_pointers flags introduced here:
https://github.com/bytecodealliance/wasmtime/pull/4469
This includes some changes from @bnjbvr to the trace-logging/annotation
to reduce overhead when logging is enabled but only non-RA2 subsystems
are at `Trace` level.
* Components: ignore type exports (for now).
This commit updates component translation to ignore type exports for now.
Components generated with `wit-component` contain type exports to give names to
types used within the component's functions based on the component's wit
definition.
The intention is to allow bindings to be generated with meaningful names
directly from a component. In the future, type exports (and imports) may be
used for more than this purpose to support things like resource types.
This commit effectively ignores type exports when translating the component as
they are not useful to executing a component at this time.
Closes#4415.
* Code review feedback.
* fuzzgen: Add float support
Add support for generating floats and some float instructions.
* fuzzgen: Enable NaN Canonicalization
Both IEEE754 and the Wasm spec are somewhat loose about what is allowed
to be returned from NaN producing operations. And in practice this changes
from X86 to Aarch64 and others. Even in the same host machine, the
interpreter may produce a code sequence different from cranelift that
generates different NaN's but produces legal results according to the spec.
These differences cause spurious failures in the fuzzer. To fix this
we enable the NaN Canonicalization pass that replaces any NaN's produced
with a single fixed canonical NaN value.
* fuzzgen: Use `MultiAry` when inserting opcodes
This deduplicates a few inserters!
* Skip new `table_ops` test under emulation
When emulating we already have to disable most pooling-allocator related
tests so this commit carries over that logic to the new fuzz test which
may run some configurations with the pooling allocator depending on the
random input.
* Fix panics in s390x codegen related to aliases
This commit fixes an issue introduced as part of the fix for
GHSA-5fhj-g3p3-pq9g. The `reftyped_vregs` list given to `regalloc2` is
not allowed to have duplicates in it and while the list originally
doesn't have duplicates once aliases are applied the list may have
duplicates. The fix here is to perform another pass to remove duplicates
after the aliases have been processed.
* Improve cranelift disassembly of stack maps
Print out extra information about stack maps such as their contents and
other related metadata available. Additionally also print out addresses
in hex to line up with the disassembly otherwise printed as well.
* Improve the `table_ops` fuzzer
* Generate more instructions by default
* Fix negative indices appearing in `table.{get,set}`
* Assert that the traps generated are expected to prevent accidental
other errors reporting a fuzzing success.
* Fix `reftype_vregs` reported to `regalloc2`
This fixes a mistake in the register allocation of Cranelift functions
where functions using reference-typed arguments incorrectly report which
virtual registers are reference-typed values if there are vreg aliases
in play. The fix here is to apply the vreg aliases to the final list of
reftyped regs which is eventually passed to `regalloc2`.
The main consequence of this fix is that functions which previously
accidentally didn't have correct stack maps should now have the missing
stack maps.
* Add a test that `table_ops` gc's eventually
* Add a comment about new alias resolution
* Update crates/fuzzing/src/oracles.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add some comments
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
With branch protections enabled that would otherwise mean that the PR
cannot be landed since CI is now required to run. These date-update PRs
typically come at odd off-hours for Wasmtime anyway so it should be fine
to run CI.
Preserving frame pointers -- even inside leaf functions -- makes it easy to
capture the stack of a running program, without requiring any side tables or
metadata (like `.eh_frame` sections). Many sampling profilers and similar tools
walk frame pointers to capture stacks. Enabling this option will play nice with
those tools.
These were for x86 (32-bit) where the ISA didn't have instructions for these
things, but now that we don't support that, and always have SSE2 for x86_64, we
never need or use these libcalls anymore.