Right now this is only on some crates such as `wasmtime` itself and
`wasmtime-cli`, but by applying it to all crates it helps with version
selection of those using just Cranelift for example.
* Introduce the `pulley-interpreter` crate
This commit is the first step towards implementing
https://github.com/bytecodealliance/rfcs/pull/35
This commit introduces the `pulley-interpreter` crate which contains the Pulley
bytecode definition, encoder, decoder, disassembler, and interpreter.
This is still very much a work in progress! It is expected that we will tweak
encodings and bytecode definitions, that we will overhaul the interpreter (to,
for example, optionally support the unstable Rust `explicit_tail_calls`
feature), and otherwise make large changes. This is just a starting point to get
the ball rolling.
Subsequent commits and pull requests will do things like add the Cranelift
backend to produce Pulley bytecode from Wasm as well as the runtime integration
to run the Pulley interpreter inside Wasmtime.
* remove stray fn main
* Add small tests for special x registers
* Remove now-unused import
* always generate 0 pc rel offsets in arbitrary
* Add doc_auto_cfg feature for docs.rs
* enable all optional features for docs.rs
* Consolidate `BytecodeStream::{advance,get1,get2,...}` into `BytecodeStream::read`
* fix fuzz targets build
* inherit workspace lints in pulley's fuzz crate
* Merge fuzz targets into one target; fix a couple small fuzz bugs
* Add Pulley to our cargo vet config
* Add pulley as a crate to publish
* Move Pulley fuzz target into top level fuzz directory
* Refactor the internals of `FunctionBuilder::insert_safepoint_spills` into a few smaller methods
* Initialize a logger for the `cranelift-fuzzgen` fuzz target
* Resolve aliases before inserting values into the live set
This fixes a fuzz bug found in the development of
https://github.com/bytecodealliance/wasmtime/pull/8941
These were originally a SpiderMonkey-ism and have been unused ever
since. It was introduced for GC integration, where the runtime could do
something to make Cranelift code hit a trap and pause for a GC and then resume
execution once GC completed. But it is unclear that, as implemented, this is
actually a useful mechanism for doing that (compared to, say, loading from some
Well Known page and the GC protecting that page and catching signals to
interrupt the mutator, or simply branching and doing a libcall). And if someone
has that particular use case in the future (Wasmtime and its GC integration
doesn't need exactly this) then we can design something for what is actually
needed at that time, instead of carrying this cruft forward forever.
We can't meaningfully audit the other WebAssembly implementations that
we use for differential fuzzing, such as wasmi and especially v8. Let's
acknowledge that the effort to do so is not practical for us, and focus
our vetting efforts on crates that developers and users are more likely
to build.
This reduces our estimated audit backlog by over three million lines,
according to `cargo vet suggest`.
Note that our crates which depend on those engines, such as
wasmtime-fuzzing, are not published to crates.io, so if we fall victim
to a supply chain attack against dependencies of these crates, the folks
who might be impacted are limited.
Although there is value in also auditing code that might be run by
people who clone our git repository, in this case I propose that anyone
who is concerned about the risks of supply chain attacks against their
development systems should be running fuzzers inside a sandbox. After
all, it's a fuzzer: it's specifically designed to try to do anything.
* Add a fuzzer for async wasm
This commit revives a very old branch of mine to add a fuzzer for
Wasmtime in async mode. This work was originally blocked on
llvm/llvm-project#53891 and while that's still an issue it now contains
a workaround for that issue. Support for async fuzzing required a good
deal of refactorings and changes, and the highlights are:
* The main part is that new intrinsics,
`__sanitizer_{start,finish}_fiber_switch` are now invoked around the
stack-switching routines of fibers. This only works on Unix and is set
to only compile when ASAN is enabled (otherwise everything is a noop).
This required refactoring of things to get it all in just the right
way for ASAN since it appears that these functions not only need to be
called but more-or-less need to be adjacent to each other in the code.
My guess is that while we're switching ASAN is in a "weird state" and
it's not ready to run arbitrary code.
* Stacks are a problem. The above issue in LLVM outlines how stacks
cannot be deallocated at this time because if the deallocated virtual
memory is later used for the heap then ASAN will have a false positive
about stack overflow. To handle this stacks are specially handled in
asan mode by using a special allocation path that never deallocates
stacks. This logic additionally applies to the pooling allocator which
uses a different stack allocation strategy with ASAN.
With all of the above a new fuzzer is added. This fuzzer generates an
arbitrary module, selects an arbitrary means of async (e.g.
epochs/fuel), and then tries to execute the exports of the module with
various values. In general the fuzzer is looking for crashes/panics as
opposed to correct answers as there's no oracle here. This is also
intended to stress the code used to switch on and off stacks.
* Fix non-async build
* Remove unused import
* Review comments
* Fix compile on MIRI
* Fix Windows build
\### The `GcRuntime` and `GcCompiler` Traits
This commit factors out the details of the garbage collector away from the rest
of the runtime and the compiler. It does this by introducing two new traits,
very similar to a subset of [those proposed in the Wasm GC RFC], although not
all equivalent functionality has been added yet because Wasmtime doesn't
support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run
collections within them, and execute the various GC barriers the collector
requires.
Rather than monomorphize all of Wasmtime on this trait, we use it
as a dynamic trait object. This does imply some virtual call overhead and
missing some inlining (and resulting post-inlining) optimization
opportunities. However, it is *much* less disruptive to the existing embedder
API, results in a cleaner embedder API anyways, and we don't believe that VM
runtime/embedder code is on the hot path for working with the GC at this time
anyways (that would be the actual Wasm code, which has inlined GC barriers
and direct calls and all of that). In the future, once we have optimized
enough of the GC that such code is ever hot, we have options we can
investigate at that time to avoid these dynamic virtual calls, like only
enabling one single collector at build time and then creating a static type
alias like `type TheOneGcImpl = ...;` based on the compile time
configuration, and using this type alias in the runtime rather than a dynamic
trait object.
The `GcRuntime` trait additionally defines a method to reset a GC heap, for
use by the pooling allocator. This allows reuse of GC heaps across different
stores. This integration is very rudimentary at the moment, and is missing
all kinds of configuration knobs that we should have before deploying Wasm GC
in production. This commit is large enough as it is already! Ideally, in the
future, I'd like to make it so that GC heaps receive their memory region,
rather than allocate/reserve it themselves, and let each slot in the pooling
allocator's memory pool be *either* a linear memory or a GC heap. This would
unask various capacity planning questions such as "what percent of memory
capacity should we dedicate to linear memories vs GC heaps?". It also seems
like basically all the same configuration knobs we have for linear memories
apply equally to GC heaps (see also the "Indexed Heaps" section below).
2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements
GC barriers for various operations on GC-managed references. The Rust code
calls into this trait dynamically via a trait object, but since it is
customizing the CLIF that is generated for Wasm code, the Wasm code itself is
not making dynamic, indirect calls for GC barriers. The `GcCompiler`
implementation can inline the parts of GC barrier that it believes should be
inline, and leave out-of-line calls to rare slow paths.
All that said, there is still only a single implementation of each of these
traits: the existing deferred reference-counting (DRC) collector. So there is a
bunch of code motion in this commit as the DRC collector was further isolated
from the rest of the runtime and moved to its own submodule. That said, this was
not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply
skipping over the DRC collector's code in review.
\### Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all
collector implementations, such as a shared `VMGcHeader` that all objects
allocated within a GC heap must begin with, but the most notable and
far-reaching of these assumptions is that all collectors will use "indexed
heaps".
What we are calling indexed heaps are basically the three following invariants:
1. All GC heaps will be a single contiguous region of memory, and all GC objects
will be allocated within this region of memory. The collector may ask the
system allocator for additional memory, e.g. to maintain its free lists, but
GC objects themselves will never be allocated via `malloc`.
2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into
the GC heap's contiguous region of memory. We never hold raw pointers to GC
objects (although, of course, we have to compute them and use them
temporarily when actually accessing objects). This means that deref'ing GC
pointers is equivalent to deref'ing linear memory pointers: we need to add a
base and we also check that the GC pointer/index is within the bounds of the
GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly
common technique among high-performance GC
implementations[^compressed-oops][^v8-ptr-compression] so we are in good
company.
3. Anything stored inside the GC heap is untrusted. Even each GC reference that
is an element of an `(array (ref any))` is untrusted, and bounds checked on
access. This means that, for example, we do not store the raw pointer to an
`externref`'s host object inside the GC heap. Instead an `externref` now
stores an ID that can be used to index into a side table in the store that
holds the actual `Box<dyn Any>` host object, and accessing that side table is
always checked.
[^compressed-oops]: See ["Compressed OOPs" in
OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops)
[^v8-ptr-compression]: See [V8's pointer
compression](https://v8.dev/blog/pointer-compression).
The good news with regards to all the bounds checking that this scheme implies
is that we can use all the same virtual memory tricks that linear memories use
to omit explicit bounds checks. Additionally, (2) means that the sizes of GC
objects is that much smaller (and therefore that much more cache friendly)
because they are only holding onto 32-bit, rather than 64-bit, references to
other GC objects. (We can, in the future, support GC heaps up to 16GiB in size
without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment
and storing aligned indices rather than byte indices, while still leaving the
bottom bit available for tagging as an `i31ref` discriminant. Should we ever
need to support even larger GC heap capacities, we could go to full 64-bit
references, but we would need explicit bounds checks.)
The biggest benefit of indexed heaps is that, because we are (explicitly or
implicitly) bounds checking GC heap accesses, and because we are not otherwise
trusting any data from inside the GC heap, we greatly reduce how badly things
can go wrong in the face of collector bugs and GC heap corruption. We are
essentially sandboxing the GC heap region, the same way that linear memory is a
sandbox. GC bugs could lead to the guest program accessing the wrong GC object,
or getting garbage data from within the GC heap. But only garbage data from
within the GC heap, never outside it. The worse that could happen would be if we
decided not to zero out GC heaps between reuse across stores (which is a valid
trade off to make, since zeroing a GC heap is a defense-in-depth technique
similar to zeroing a Wasm stack and not semantically visible in the absence of
GC bugs) and then a GC bug would allow the current Wasm guest to read old GC
data from the old Wasm guest that previously used this GC heap. But again, it
could never access host data.
Taken altogether, this allows for collector implementations that are nearly free
from `unsafe` code, and unsafety can otherwise be targeted and limited in scope,
such as interactions with JIT code. Most importantly, we do not have to maintain
critical invariants across the whole system -- invariants which can't be nicely
encapsulated or abstracted -- to preserve memory safety. Such holistic
invariants that refuse encapsulation are otherwise generally a huge safety
problem with GC implementations.
\### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore
`VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here
was to be sure that I was actually calling GC barriers at all the correct
places. I couldn't be sure before. Now, you can still explicitly copy a raw GC
reference without running GC barriers if you need to and understand why that's
okay (aka you are implementing the collector), but that is something you have to
opt into explicitly by calling `unchecked_copy`. The default now is that you
can't just copy the reference, and instead call an explicit `clone` method (not
*the* `Clone` trait, because we need to pass in the GC heap context to run the
GC barriers) and it is hard to forget to do that accidentally. This resulted in
a pretty big amount of churn, but I am wayyyyyy more confident that the correct
GC barriers are called at the correct times now than I was before.
\### `i31ref`
I started this commit by trying to add `i31ref` support. And it grew into the
whole traits interface because I found that I needed to abstract GC barriers
into helpers anyways to avoid running them for `i31ref`s, so I figured that I
might as well add the whole traits interface. In comparison, `i31ref` support is
much easier and smaller than that other part! But it was also difficult to pull
apart from this commit, sorry about that!
---------------------
Overall, I know this is a very large commit. I am super happy to have some
synchronous meetings to walk through this all, give an overview of the
architecture, answer questions directly, etc... to make review easier!
prtest:full
* Run all `*.wast` tests in fuzzing
Currently we have a `spectest` fuzzer which uses fuzz input to generate
an arbitrary configuration for Wasmtime and then executes the spec test.
This ensures that no matter the configuration Wasmtime can pass spec
tests. This commit expands this testing to include all `*.wast` tests we
have in this repository. While we don't have a ton we still have some
significant ones like in #8118 which will only reproduce when turning
knobs on CPU features.
* Fix CLI build
* Fix wast testing
* Update some CI dependencies
* Update to the latest nightly toolchain
* Update mdbook
* Update QEMU for cross-compiled testing
* Update `cargo nextest` for usage with MIRI
prtest:full
* Remove lots of unnecessary imports
* Downgrade qemu as 8.2.1 seems to segfault
* Remove more imports
* Remove unused winch trait method
* Fix warnings about unused trait methods
* More unused imports
* More unused imports
First of all, thread a "chaos mode" control-plane into Context::optimize
and from there into EgraphPass, OptimizeCtx, and Elaborator.
In this commit we use the control-plane to change the following
behaviors in ways which shouldn't cause incorrect results:
- Dominator-tree block traversal order for both the rule application and
elaboration passes
- Order of evaluating optimization alternatives from `simplify`
- Choose worst values instead of best in each eclass
Co-authored-by: L. Pereira <lpereira@fastly.com>
This commit fully enables usage of Winch in the `differential` fuzzer
against all other engines with no special cases. I attempted enabling
winch for the other fuzzers as well but Winch doesn't currently
implement all methods for generating various trampolines required so
it's currently only limited to the `differential` fuzzer.
This adds Winch as an "engine" and additionally ensures that when
configured various wasm proposals are disabled that Winch doesn't
support (similar to how enabling `wasmi` disables proposals that `wasmi`
doesn't support).
This does reduce fuzzing of Winch slightly in that the reference-types
proposal is completely disabled for Winch rather than half-enabled where
Winch doesn't implement `externref` operations yet but does implement
`funcref` operations. This, however, enables integrating it more cleanly
into the rest of the fuzzing infrastructure with fewer special cases.
* Update the wasm-tools family of crates
Pulling in some updates to improve how WIT is managed in this
repository. No changes just yet, however, just pulling in the updates
first.
* Fix tests
* Fix fuzzer build
* winch: Add support for WebAssembly loads/stores
Closes https://github.com/bytecodealliance/wasmtime/issues/6529
This patch adds support for all the instructions involving WebAssembly
loads and stores for 32-bit memories. Given that the `memory64` proposal
is not enabled by default, this patch doesn't include an
implementation/tests for it; in theory minimal tweaks to the
currrent implementation will be needed in order to support 64-bit
memories.
Implemenation-wise, this change, follows a similar pattern as Cranelift
in order to calculate addresses for dynamic/static heaps, the main
difference being that in some cases, doing less work at compile time is
preferred; the current implemenation only checks for the general case of
out-of-bounds access for dynamic heaps for example.
Another important detail regarding the implementation, is the
introduction of `MacroAssembler::wasm_load` and
`MacroAssembler::wasm_store`, which internally use a common
implemenation for loads and stores, with the only difference that the
`wasm_*` variants set the right flags in order to signal that these
operations are not trusted and might trap.
Finally, given that this change introduces support for the last set of
instructions missing for a Wasm MVP, it removes most of Winch's copy of
the spectest suite, and switches over to using the official test suite
where possible (for tests that don't use SIMD or Reference Types).
Follow-up items:
* Before doing any deep benchmarking I'm planning on landing a couple of
improvements regarding compile times that I've identified in parallel
to this change.
* The `imports.wast` tests are disabled because I've identified a bug
with `call_indirect`, which is not related to this change and exists
in main.
* Find a way to run the `tests/all/memory.rs` (or perhaps most of
integration tests) with Winch.
--
prtest:full
* Review comments
* winch: Multi-Value Part 2: Blocks
This commit adds support for the Multi-Value proposal for blocks.
In general, this change, introduces multiple building blocks to enable
supporting arbitrary params and results in blocks:
* `BlockType`: Introduce a block type, to categorize the type of each
block, this makes it easier to categorize blocks per type and also
makes it possible to defer the calculation of the `ABIResults` until
they are actually needed rather than calculating everyghing upfront
even though they might not be needed (when in an unreachable state).
* Push/pop operations are now frame aware. Given that each
`ControlStackFrame` contains all the information needed regarding
params and results, this change moves the the implementation of the
push and pop operations to the `ControlStackFrame` struct.
* `StackState`: this struct holds the entry and exit invariants of each
block; these invariants are pre-computed when entering the block and
used throughout the code generation, to handle params, results and
assert the respective invariants.
In terms of the mechanics of the implementation: when entering each
block, if there are results on the stack, the expected stack pointer
offsets will be calculated via the `StackState`, and the `target_offset`
will be used to create the block's `RetArea`. Note that when entering
the block and calculating the `StackState` no space is actually reserved
for the results, any space increase in the stack is deffered until the
results are popped from the value stack via
`ControlStackFrame::pop_abi_results`.
The trickiest bit of the implementation is handling constant values that
need to be placed on the right location on the machine stack. Given that
constants are generally not spilled, this means that in order to keep
the machine and value stack in sync (spilled-values-wise), values must
be shuffled to ensure that constants are placed in the expected location results wise.
See the comment in `ControlStackFrame::adjust_stack_results` for more
details.
* Review fixes
* winch: Add memory instructions
This commit adds support for the following memory instructions to winch:
* `data.drop`
* `memory.init`
* `memory.fill`
* `memory.copy`
* `memory.size`
* `memory.grow`
In general the implementation is similar to what other instructions via
builtins are hanlded (e.g. table instructions), which involve stack
manipulation prior to emitting a builtin function call, with the
exception of `memory.size`, which involves loading the current length
from the `VMContext`
* Emit right shift instead of division to obtain the memory size in pages
This commit tightens the fuzzing criteria for Winch. The previous
implementation only accounted for unsupported instructions. However,
unsupported types can also cause the fuzzer to crash.
Winch currently doesn't support `v128` and most of the `Ref` types.
* Configure Rust lints at the workspace level
This commit adds necessary configuration knobs to have lints configured
at the workspace level in Wasmtime rather than the crate level. This
uses a feature of Cargo first released with 1.74.0 (last week) of the
`[workspace.lints]` table. This should help create a more consistent set
of lints applied across all crates in our workspace in addition to
possibly running select clippy lints on CI as well.
* Move `unused_extern_crates` to the workspace level
This commit configures a `deny` lint level for the
`unused_extern_crates` lint to the workspace level rather than the
previous configuration at the individual crate level.
* Move `trivial_numeric_casts` to workspace level
* Change workspace lint levels to `warn`
CI will ensure that these don't get checked into the codebase and
otherwise provide fewer speed bumps for in-process development.
* Move `unstable_features` lint to workspace level
* Move `unused_import_braces` lint to workspace level
* Start running Clippy on CI
This commit configures our CI to run `cargo clippy --workspace` for all
merged PRs. Historically this hasn't been all the feasible due to the
amount of configuration required to control the number of warnings on
CI, but with Cargo's new `[lint]` table it's possible to have a
one-liner to silence all lints from Clippy by default. This commit by
default sets the `all` lint in Clippy to `allow` to by-default disable
warnings from Clippy. The goal of this PR is to enable selective access
to Clippy lints for Wasmtime on CI.
* Selectively enable `clippy::cast_sign_loss`
This would have fixed#7558 so try to head off future issues with that
by warning against this situation in a few crates. This lint is still
quite noisy though for Cranelift for example so it's not worthwhile at
this time to enable it for the whole workspace.
* Fix CI error
prtest:full
* Update to arbitrary 1.3.1
And use workspace dependencies for arbitrary.
* Prune cargo vet's supply-chain files
This is the mechanical changes made by running `cargo vet prune` which was
suggested to me when I ran `cargo vet`.
* winch(x64): Add support for table instructions
This change adds support for the following table insructions:
`elem.drop`, `table.copy`, `table.set`, `table.get`, `table.fill`,
`table.grow`, `table.size`, `table.init`.
This change also introduces partial support for the `Ref` WebAssembly
type, more conretely the `Func` heap type, which means that all the
table instructions above, only work this WebAssembly type as of this
change.
Finally, this change is also a small follow up to the primitives
introduced in https://github.com/bytecodealliance/wasmtime/pull/7100,
more concretely:
* `FnCall::with_lib`: tracks the presence of a libcall and ensures that
any result registers are freed right when the call is emitted.
* `MacroAssembler::table_elem_addr` returns an address rather than the
value of the address, making it convenient for other use cases like
`table.set`.
--
prtest:full
* chore: Make stack functions take impl IntoIterator<..>
* Update winch/codegen/src/codegen/call.rs
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Remove a dangling `dbg!`
* Add comment on branching
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* winch(x64): Call indirect
This change adds support for the `call_indirect` instruction to Winch.
Libcalls are a pre-requisite for supporting `call_indirect` in order to
lazily initialy funcrefs. This change adds support for libcalls to
Winch by introducing a `BuiltinFunctions` struct similar to Cranelift's
`BuiltinFunctionSignatures` struct.
In general, libcalls are handled like any other function call, with the
only difference that given that not all the information to fulfill the
function call might be known up-front, control is given to the caller
for finalizing the call.
The introduction of function references also involves dealing with
pointer-sized loads and stores, so this change also adds the required
functionality to `FuncEnv` and `MacroAssembler` to be pointer aware,
making it straight forward to derive an `OperandSize` or `WasmType` from
the target's pointer size.
Finally, given the complexity of the call_indirect instrunction, this
change bundles an improvement to the register allocator, allowing it to
track the allocatable vs non-allocatable registers, this is done to
avoid any mistakes when allocating/de-allocating registers that are not
alloctable.
--
prtest:full
* Address review comments
* Fix typos
* Better documentation for `new_unchecked`
* Introduce `max` for `BitSet`
* Make allocatable property `u64`
* winch(calls): Overhaul `FnCall`
This commit simplifies `FnCall`'s interface making its usage more
uniform throughout the compiler. In summary, this change:
* Avoids side effects in the `FnCall::new` constructor, and also makes
it the only constructor.
* Exposes `FnCall::save_live_registers` and
`FnCall::calculate_call_stack_space` to calculate the stack space
consumed by the call and so that the caller can decide which one to
use at callsites depending on their use-case.
* tests: Fix regset tests
* winch: Support f32.abs and f64.abs on x64
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add an implementation of f32.neg and f64.neg
* Enable spec tests for winch with f{32,64}.{neg,abs}
* Enable differential fuzzing for f{32,64}.{neg,abs} for winch
* Comments from code review
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* winch: Add support for `br_table`
This change adds support for the `br_table` instruction, including
several modifications to the existing control flow implementation:
* Improved handling of jumps to loops: Previously, the compiler erroneously
treated the result of loop blocks as the definitive result of the jump. This
change fixes this bug.
* Streamlined result handling and stack pointer balancing: In the past, these
operations were executed in two distinct steps, complicating the process of
ensuring the correct invariants when emitting unconditional jumps. To simplify
this, `CodeGenContext::unconditional_jump` is introduced . This function
guarantees all necessary invariants are met, encapsulating the entire operation
within a single function for easier understanding and maintenance.
* Handling of unreachable state at the end of a function: when reaching the end
of a function in an unreachable state, clear the stack and ensure that the
machine stack pointer is correctly placed according to the expectations of the
outermost block.
In addition to the above refactoring, the main implementation of the
`br_table` instruction involves emitting labels for each target. Within each
label, an unconditional jump is emitted to the frame's label, ensuring correct
stack pointer balancing when the jump is emitted.
While it is possible to optimize this process by avoiding intermediate labels
when balancing isn't required, I've opted to maintain the current
implementation until such optimization becomes necessary.
* chore: Rust fmt
* fuzzing: Add `BrTable` to list of support instructions
* docs: Improve documentation for `unconditional_jump`
* Fix some warnings on nightly Rust
* Fix some more fuzz-test cases from pooling changes
This commit addresses some more fallout from #6835 by updating some
error messages and adding clauses for new conditions. Namely:
* Module compilation is now allowed to fail when the module may have
more memories/tables than the pooling allocator allows per-module.
* The error message for the core instance limit being reached has been
updated.
* winch: Initial support for floats
This change introuduces the necessary building blocks to support floats in
Winch as well as support for both `f32.const` and `f64.const` instructions.
To achieve support for floats, this change adds several key enhancements to the
compiler:
* Constant pool: A constant pool is implemented, at the Assembler level, using the machinery
exposed by Cranelift's `VCode` and `MachBuffer`. Float immediates are stored
using their bit representation in the value stack, and whenever they are
used at the MacroAssembler level they are added to the constant
pool, from that point on, they are referenced through a `Constant` addressing
mode, which gets translated to a RIP-relative addressing mode during emission.
* More precise value tagging: aside from immediates, from which the type can
be easily inferred, all the other value stack entries (`Memory`, `Reg`, and `Local`) are
modified to explicitly contain a WebAssembly type. This allows for better
instruction selection.
--
prtest:full
* fix: Account for relative sp position when pushing float regs
This was an oversight of the initial implementation. When pushing float
registers, always return an address that is relative to the current position of
the stack pointer, essentially storing to (%rsp). The previous implementation
accounted for static addresses, which is not correct.
* fix: Introduce `stack_arg_slot_size_for_type`
To correctly calculate the stack argument slot sizes, instead of overallocating
for `word_bytes`, since for `f32` floating points we only need to worry about
loading/storing 4 bytes.
* fix: Correctly type the result register.
The previous version wrongly typed the register as a general purpose register.
* refactor: Re-write `add_constants` through `add_constant`
* docs: Replace old comment
* chore: Rust fmt
* refactor: Index regset per register class
This commit implements `std::ops::{Index, IndexMut}` for `RegSet` to index each
of the bitsets by class. This reduces boilerplate and repetition throuhg the
code generation context, register allocator and register set.
* refactor: Correctly size callee saved registers
To comply with the expectation of the underlying architecture: for example in
Aarch64, only the low 64 bits of VRegs are callee saved (the D-view) and in the
`fastcall` calling convention it's expected that the callee saves the entire 128
bits of the register xmm6-xmm15.
This change also fixes the the stores/loads of callee saved float registers in the
fastcall calling convention, as in the previous implementation only the low 64
bits were saved/restored.
* docs: Add comment regarding typed-based spills
* Wasmtime: Rename `IndexAllocator` to `ModuleAffinityIndexAllocator`
We will have multiple kinds of index allocators soon, so clarify which one this
is.
* Wasmtime: Introduce a simple index allocator
This will be used in future commits refactoring the pooling allocator.
* Wasmtime: refactor the pooling allocator for components
We used to have one index allocator, an index per instance, and give out N
tables and M memories to every instance regardless how many tables and memories
they need.
Now we have an index allocator for memories and another for tables. An instance
isn't associated with a single instance, each of its memories and tables have an
index. We allocate exactly as many tables and memories as the instance actually
needs.
Ultimately, this gives us better component support, where a component instance
might have varying numbers of internal tables and memories.
Additionally, you can now limit the number of tables, memories, and core
instances a single component can allocate from the pooling allocator, even if
there is the capacity for that many available. This is to give embedders tools
to limit individual component instances and prevent them from hogging too much
of the pooling allocator's resources.
* Remove unused file
Messed up from rebasing, this code is actually just inline in the index
allocator module.
* Address review feedback
* Fix benchmarks build
* Fix ignoring test under miri
The `async_functions` module is not even compiled-but-ignored with miri, it is
completely `cfg`ed off. Therefore we ahve to do the same with this test that
imports stuff from that module.
* Fix doc links
* Allow testing utilities to be unused
The exact `cfg`s that unlock the tests that use these are platform and feature
dependent and ends up being like 5 things and super long. Simpler to just allow
unused for when we are testing on other platforms or don't have the compile time
features enabled.
* Debug assert that the pool is empty on drop, per Alex's suggestion
Also fix a couple scenarios where we could leak indices if allocating an index
for a memory/table succeeded but then creating the memory/table itself failed.
* Fix windows compile errors
This commit adds support for the `local.tee` instruction. This change
also introduces a refactoring to the original implementation of
`local.set` to be able to share most of the code for the implementation
of `local.tee`.
This change adds support for the `loop`, `br` and `br_if` instructions
as well as unreachable code handling. Whenever an instruction that
affects reachability is emitted (`br` in the case of this PR), the
compiler will enter into an unreachable code state, essentially ignoring
most of the subsequent instructions. When handling the unreachable code
state some instructions are still observed, in order to determine if
reachability should be restored.
This change, particulary the handling of unreachable code, adds all the
necessary building blocks to the compiler to emit other instructions
that affect reachability (e.g `unreachable`, `return`).
Address review feedback
* Rename `branch_target` to `is_branch_target`
* Use the visitor pattern to handle unreachable code
Avoid string comparison and split unreachable handling functions
* Add i32.popcnt and i64.popcnt to winch
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* Add fallback implementation for popcnt
Move popcnt fallback up into the macroassembler.
Share code between 32-bit and 64-bit popcnt
Add Popcnt to winch differential fuzzing
* Use _rr functions where possible
* Avoid using scratch register for popcnt
The scratch register was getting clobbered by the calls to `and`,
so this is instead passing in a CodeGenContext to the masm's `popcnt`
and letting it handle its own registers
* Add filetests for the fallback popcnt impls
* address PR comments
* Update filetests
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* winch(x64) Add support for if/else
This change adds the necessary building blocks to support control flow;
this change also adds support for the `If` / `Else` operators.
This change does not include multi-value support. The idea is to add
support for multi-value across the compiler (functions and blocks) as
a separate future change.
The general gist of the change is to track the presence of control flow
frames as part of the code generation context and emit the corresponding
labels as and instructions as control flow blocks are found.
* PR review
* Allocate 64 slots for `ControlStackFrames`
* Explicitly track else branches through an else entry in
`ControlStackFrame`