* Fix handling of `Tunables` on cross-compiles
This commit fixes how Wasmtime handles `Tunables` when targetting
non-host platforms (or namely platforms with different pointer widths).
Previously the host's `Tunables` would always be used instead of the
target's tunables which meant that modules couldn't be loaded on the
other platform due to the host having differing tunables by default.
This commit updates tunables in `wasmtime::Config` to all be optional
and loading the actual `Tunables` is deferred until the target is known
during `Engine`-creation time.
* Fix warning
Purely mechanical, no functional changes.
This is to help differentiate between value types (i32, i64, reference types,
etc...) and defined types (function signatures, struct definitions, array
definitions).
* wasmtime: Rename `SignatureFooBar` to `TypeFooBar`
No functional changes, just the following mechanical renames:
* `VMSharedSignatureIndex` to `VMSharedTypeIndex`
* `SignatureIndex` to `TypeIndex`
* `SignatureRegistry` to `TypeRegistry`
* and more
This is intended to start paving the way for Wasm GC support, where there are
more than just function signatures in a Wasm module's type section, and we are
going to need to register non-function-signature types in the registry as well,
for things like casting between reference types and passing reference types
across Wasm modules.
* Reintroduce different index types for module-interned types vs Wasm-index-space types
* Fix a couple unused-import warnings
* Break more data dependencies in float-related instructions
This commit takes a stab at #7816 without diving a whole lot into it. I
noticed that the loop started with `vcvtss2sd` which is along the same
lines as previous false dependencies found earlier in PRs such as #7098.
I had forgotten these instructions at the time and meant to go back and
touch them up and #7731 has provided sufficient motivation to do so!
Locally this takes that test case from 1.6s to 0.4s for me.
* Fix inst emit tests
* Update winch codegen
* Enable all winch tests on windows
prtest:mingw-x64
* Plumb through x64 unwind info creation
* Add the frame regs unwind info
* Emit UnwindInfo::SaveReg instructions
* Review feedback
* Comment the offset_downward_to_clobbers value
* Add stack overflow tests
* Add stack overflow tests for indirect calls
* Check for stack overflow on function entry
* Ignore the call tests on windows, as stack overflows trap
* Bless the winch filetests
* Omit instruction offsets in winch disassembly when possible
Only emit instruction offsets at basic block boundaries, or for all
instructions after a return. The reason behind the latter, is that many
of the instructions after a return are traps, and will be common jump
targets.
* Update winch tests
* Configure full offset output in the `disasm` function
* winch: Multi-Value Part 2: Blocks
This commit adds support for the Multi-Value proposal for blocks.
In general, this change, introduces multiple building blocks to enable
supporting arbitrary params and results in blocks:
* `BlockType`: Introduce a block type, to categorize the type of each
block, this makes it easier to categorize blocks per type and also
makes it possible to defer the calculation of the `ABIResults` until
they are actually needed rather than calculating everyghing upfront
even though they might not be needed (when in an unreachable state).
* Push/pop operations are now frame aware. Given that each
`ControlStackFrame` contains all the information needed regarding
params and results, this change moves the the implementation of the
push and pop operations to the `ControlStackFrame` struct.
* `StackState`: this struct holds the entry and exit invariants of each
block; these invariants are pre-computed when entering the block and
used throughout the code generation, to handle params, results and
assert the respective invariants.
In terms of the mechanics of the implementation: when entering each
block, if there are results on the stack, the expected stack pointer
offsets will be calculated via the `StackState`, and the `target_offset`
will be used to create the block's `RetArea`. Note that when entering
the block and calculating the `StackState` no space is actually reserved
for the results, any space increase in the stack is deffered until the
results are popped from the value stack via
`ControlStackFrame::pop_abi_results`.
The trickiest bit of the implementation is handling constant values that
need to be placed on the right location on the machine stack. Given that
constants are generally not spilled, this means that in order to keep
the machine and value stack in sync (spilled-values-wise), values must
be shuffled to ensure that constants are placed in the expected location results wise.
See the comment in `ControlStackFrame::adjust_stack_results` for more
details.
* Review fixes
* winch: Add memory instructions
This commit adds support for the following memory instructions to winch:
* `data.drop`
* `memory.init`
* `memory.fill`
* `memory.copy`
* `memory.size`
* `memory.grow`
In general the implementation is similar to what other instructions via
builtins are hanlded (e.g. table instructions), which involve stack
manipulation prior to emitting a builtin function call, with the
exception of `memory.size`, which involves loading the current length
from the `VMContext`
* Emit right shift instead of division to obtain the memory size in pages
This commit reworks the `br_table` logic so that it correctly handles
all the jumps involved to each of the targets.
Even though it is safe to use the default branch for type information,
it is not safe to use it to derive the base stack pointer and base value
stack length. This change ensures that each target offset is taken into
account to balance the value stack prior to each jump.
Follow up to:
https://github.com/bytecodealliance/wasmtime/pull/7547
In which I overlooked this change and the fuzzer found an issue with the
following program:
```wat
(module
(func (export "") (result i32)
block (result i32)
i32.const 0
end
i32.const 0
i32.const 0
br_table 0
)
)
```
This commit ensures that the stack pointer is correctly positioned when
emitting br_table.
We can't know for sure which branch will be taken, but since all
branches must share the same type information, we can be certain that
the expectations regarding the stack pointer are the same and thus can
we use the default target in order to ensure the correct placement.
* Winch: cleanup stack in br_if in non-fallthrough case
* Remove unnecessary refetch of sp_offsets
* Refactoring based on PR feedback
* Have SPOffset implement Ord
This commit updates to the latest wasm-tools and `wit-bindgen` to bring
the family of crates forward. This update notably includes Nick's work
on packed indices in the `wasmparser` crate for validation for the
upcoming implementation of GC types. This meant that translation from
`wasmparser` types to Wasmtime types now may work with a "type id"
instead of just a type index which required plumbing not only Wasmtime's
own type information but additionally `wasmparser`'s type information
throughout translation.
This required a fair bit of refactoring to get this working but no
change in functionality is intended, only a different way of doing
everything prior.
* Winch: fix bug by spilling when calling a func
* Forgot to commit new filetest
* Only support WasmHeapType::Func
* Elaborate on call_indirect jump details
* Update docs for call
* Verify stack is only consts and memory entries
* Configure Rust lints at the workspace level
This commit adds necessary configuration knobs to have lints configured
at the workspace level in Wasmtime rather than the crate level. This
uses a feature of Cargo first released with 1.74.0 (last week) of the
`[workspace.lints]` table. This should help create a more consistent set
of lints applied across all crates in our workspace in addition to
possibly running select clippy lints on CI as well.
* Move `unused_extern_crates` to the workspace level
This commit configures a `deny` lint level for the
`unused_extern_crates` lint to the workspace level rather than the
previous configuration at the individual crate level.
* Move `trivial_numeric_casts` to workspace level
* Change workspace lint levels to `warn`
CI will ensure that these don't get checked into the codebase and
otherwise provide fewer speed bumps for in-process development.
* Move `unstable_features` lint to workspace level
* Move `unused_import_braces` lint to workspace level
* Start running Clippy on CI
This commit configures our CI to run `cargo clippy --workspace` for all
merged PRs. Historically this hasn't been all the feasible due to the
amount of configuration required to control the number of warnings on
CI, but with Cargo's new `[lint]` table it's possible to have a
one-liner to silence all lints from Clippy by default. This commit by
default sets the `all` lint in Clippy to `allow` to by-default disable
warnings from Clippy. The goal of this PR is to enable selective access
to Clippy lints for Wasmtime on CI.
* Selectively enable `clippy::cast_sign_loss`
This would have fixed#7558 so try to head off future issues with that
by warning against this situation in a few crates. This lint is still
quite noisy though for Cranelift for example so it's not worthwhile at
this time to enable it for the whole workspace.
* Fix CI error
prtest:full
This commit solidifies the approach for unreachable code handling in
control flow.
Prior to this change, at unconditional jump sites, the compiler would
reset the machine stack as well as the value stack. Even though this
appoach might seem natural at first, it actually broke several of the
invariants that must be met at the end of each contol block, this was
specially noticeable with programs that conditionally entered in an
unreachable state, like for example
```wat
(module
(func (;0;) (param i32) (result i32)
local.get 0
local.get 0
if (result i32)
i32.const 1
return
else
i32.const 2
end
i32.sub
)
(export "main" (func 0))
)
```
The approach followed in this commit ensures that all the invariants are
met and introduces more guardrails around those invariants. In short,
instead of resetting the value stack at unconditional jump sites, the
value stack handling is deferred until the reachability analysis
restores the reachability of the code generation process, ensuring that
the value stack contains the exact amount of values expected by the
frame where reachability is restored. Given that unconditional jumps
reset the machine stack, when the reachability of the code generation
process is restored, the SP offset is also restored which should match
the size of the value stack.
* winch: Introduce `ABIParams` and `ABIResults`
This commit prepares Winch to support WebAssembly Multi-Value.
The most notorious piece of this change is the introduction of the
`ABIParams` and `ABIResults` structs which are type wrappers around the
concept of an `ABIOperand`, which is the underlying main representation
of a param or result.
This change also consolidates how the size for WebAssembly types is
derived by introducing `ABI::sizeof`, as well as introducing
`ABI::stack_slot_size` to concretely indicate the stack slot size in
bytes for stack params, which is ABI dependent.
* winch: Add the necessary ABI building blocks for multi-value
This change adds the necessary changes at the ABI level in order to
handle multi-value.
The most notable modifications in this change are:
* Modifying Winch's default ABI to reverse the order of results,
ensuring that results that go in the stack should always come first;
this makes it easier to respect the following two stack invariants:
* Spilled memory values always precede register values
* Spilled values are stored from oldest to newest, matching their
respective locations on the machine stack.
* Modify all calling conventions supported by Winch so that only one result, the first one is stored in
registers. This differs from their vanilla counterparts in that these
ABIs can handle multiple results in registers. Given that Winch is not
a generic code generator, keeping the ABI close to what Wasmtime
expects makes it easier to pass multiple results at trampolines.
* Add more multi-value tests
This commit adds more tests for multi-value and improves documentation.
prtest:full
* Address review feedback
This patch fixes how jumps are handled in `br_table`; prior to this
change, `br_table` was implemented using
`CodeGenContext::unconditional_jump`; this function ensures, among other
invariants that the value stack and stack pointer must be balanced
according to the expectation of the target branch. Even though in
`br_table` there's branch to a potentially known location, it's
impossible be certain at compile time, which branch will be taken; in
that regard, `br_table` behaves more like `br_if`. Using
`unconditional_jump` resulted in the stack being manipulated multiple
times and breaking the other existing invariants around stack balancing.
This commit makes it so that `br_table` doesn't rely on
`unconditional_jump` anymore and instead it delegates control flow to
the target branch, which will ensure that the value stack and stack
pointer are correctly balanced when restoring reachability, very similar
to what happens with `br_if`.
This issue was discovered while fuzzing and a file test is included with
the test case.
This commit improves unconditional jumps by balancing the stack pointer
as well as the value stack when the current stack pointer and value
stack are greater than the target stack pointer and value stack. The invariant that
this changes maintains is that the the value stack should always reflect
the the state of the machine stack. The value stack might have excess
stack values in a presence of a fallthrough (`br_if` or `br_table`) in
which the target branch is not known at compile time; in this situation
instructions like `return` or `br` discard any excess values.
This commit properly derives a scratch register for a particular
WebAssembly type. The included spec test uncovered that the previous
implementation used a int scratch register to assign float stack
arguments, which resulted in a panic.
This change is a follow up to https://github.com/bytecodealliance/wasmtime/pull/7443;
after it landed I realized that Winch doesn't include spec tests for
local.get and loca.set.
Those tests uncovered a bug on the handling of the constant pool: given
Winch's singlepass nature, there's very little room know all the
constants ahead of time and to register them all at once at emission
time; instead they are emitted when they are needed by an instruction.
Even though Cranelift's machinery is capable of deuplicated constants in
the pool, `register_constant` assumes and checks that each constat
should only be pushed once. In Winch's case, since we emit as we go, we
need to carefully check if the constant is one was not emitted before,
and if that's the case, register it. Else we break the invariant that
each constant should only be registered once.
* Update wasm-tools crates
This commit updates the wasm-tools family of crate for a number of
notable updates:
* bytecodealliance/wasm-tools#1257 - wasmparser's ID-based
infrastructure has been refactored to have more precise types for each
ID rather than one all-purpose `TypeId`.
* bytecodealliance/wasm-tools#1262 - the implementation of
"implementation imports" for the component model which both updates
the binary format in addition to adding more syntactic forms of
imports.
* bytecodealliance/wasm-tools#1260 - a new encoding scheme for component
information for `wit-component` in objects (not used by Wasmtime but
used by bindings generators).
Translation for components needed to be updated to account for the first
change, but otherwise this was a straightforward update.
* Remove a TODO
While not a large amount of binary size if the purpose of the
`--no-default-features` build is to showcase "minimal Wasmtime" then may
as well try to make `clap` as small as possible.
* winch: Add known a subset of known libcalls and improve call emission
This change is a follow up to:
- https://github.com/bytecodealliance/wasmtime/pull/7155
- https://github.com/bytecodealliance/wasmtime/pull/7035
One of the objectives of this change is to make it easy to emit
function calls at the MacroAssembler layer, for cases in which it's
challenging to know ahead-of-time if a particular functionality can be
achieved natively (e.g. rounding and SSE4.2). The original implementation
of function call emission, made this objective difficult to achieve and
it was also difficult to reason about.
I decided to simplify the overall approach to function calls as part of
this PR; in essence, the `call` module now exposes a single function
`FnCall::emit` which is reponsible of gathtering the dependencies and
orchestrating the emission of the call. This new approach deliberately
avoids holding any state regarding the function call for simplicity.
This change also standardizes the usage of `Callee` as the main
entrypoint for function call emission, as of this change 4 `Callee`
types exist (`Local`, `Builtin`, `Import`, `FuncRef`), each callee kind
is mappable to a `CalleeKind` which is the materialized version of
a callee which Cranelift understands.
This change also moves the creation of the `BuiltinFunctions` to the
`ISA` level given that they can be safely used accross multiple function
compilations.
Finally, this change also introduces support for some of the
"well-known" libcalls and hooks those libcalls at the
`MacroAssembler::float_round` callsite.
--
prtest:full
* Review comments
* Remove unnecessary `into_iter`
* Fix remaining lifetime parameter names
* winch(x64): Add support for table instructions
This change adds support for the following table insructions:
`elem.drop`, `table.copy`, `table.set`, `table.get`, `table.fill`,
`table.grow`, `table.size`, `table.init`.
This change also introduces partial support for the `Ref` WebAssembly
type, more conretely the `Func` heap type, which means that all the
table instructions above, only work this WebAssembly type as of this
change.
Finally, this change is also a small follow up to the primitives
introduced in https://github.com/bytecodealliance/wasmtime/pull/7100,
more concretely:
* `FnCall::with_lib`: tracks the presence of a libcall and ensures that
any result registers are freed right when the call is emitted.
* `MacroAssembler::table_elem_addr` returns an address rather than the
value of the address, making it convenient for other use cases like
`table.set`.
--
prtest:full
* chore: Make stack functions take impl IntoIterator<..>
* Update winch/codegen/src/codegen/call.rs
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Remove a dangling `dbg!`
* Add comment on branching
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Handle `lower_branch` consistently amongst backends
This commit is a refactoring to consistently implement `lower_branch`
among Cranelift's backends. Previously each backend had its own means of
extracting labels and shuffling along information, and now there's
prelude methods for all backends to access and use. This changes a few
display impls but the actual meat of what's emitted shouldn't change
amongst the backends.
* Fix compile
* winch(x64): Call indirect
This change adds support for the `call_indirect` instruction to Winch.
Libcalls are a pre-requisite for supporting `call_indirect` in order to
lazily initialy funcrefs. This change adds support for libcalls to
Winch by introducing a `BuiltinFunctions` struct similar to Cranelift's
`BuiltinFunctionSignatures` struct.
In general, libcalls are handled like any other function call, with the
only difference that given that not all the information to fulfill the
function call might be known up-front, control is given to the caller
for finalizing the call.
The introduction of function references also involves dealing with
pointer-sized loads and stores, so this change also adds the required
functionality to `FuncEnv` and `MacroAssembler` to be pointer aware,
making it straight forward to derive an `OperandSize` or `WasmType` from
the target's pointer size.
Finally, given the complexity of the call_indirect instrunction, this
change bundles an improvement to the register allocator, allowing it to
track the allocatable vs non-allocatable registers, this is done to
avoid any mistakes when allocating/de-allocating registers that are not
alloctable.
--
prtest:full
* Address review comments
* Fix typos
* Better documentation for `new_unchecked`
* Introduce `max` for `BitSet`
* Make allocatable property `u64`
* winch(calls): Overhaul `FnCall`
This commit simplifies `FnCall`'s interface making its usage more
uniform throughout the compiler. In summary, this change:
* Avoids side effects in the `FnCall::new` constructor, and also makes
it the only constructor.
* Exposes `FnCall::save_live_registers` and
`FnCall::calculate_call_stack_space` to calculate the stack space
consumed by the call and so that the caller can decide which one to
use at callsites depending on their use-case.
* tests: Fix regset tests
This patch refactors all of the ISA/ABI specific prolog/epilog
generation code around the following two ideas:
1. Separate *planning* of the function's frame layout from the
actual *implementation* within prolog / epilog code.
2. No longer overload different purposes (middle-end register
tracking, platform-specific details like authorization modes,
and pop-stack-on-return) into a single return instruction.
As to 1., the new approach is based around a FrameLayout data
structure, which collects all information needed to emit prolog
and epilog code, specifically the list of clobbered registers,
and the sizes of all areas of the function's stack frame.
This data structure is now computed *once*, before any code is
emitted, and stored in the Callee data structure. ABIs need to
implement this via a new compute_frame_layout callback, which
gets all data from common code needed to make all decisions
around stack layout in one place.
The FrameLayout is then used going forward to answer all questions
about frame sizes, and it is passed to all ABI routines involved
in prolog / epilog code generation. [ This removes a lot of
duplicated calculation, e.g. the list of clobbered registers is
now only computed once and re-used everywhere. ]
This in turn allows to reduce the number of distinct callbacks
ABIs need to implement, and simplifies common code logic around
how and when to call them. In particular, we now only have the
following four routines, which are always called in this order:
gen_prologue_frame_setup
gen_clobber_save
gen_clobber_restore
gen_epilogue_frame_restore
The main differences to before are:
- frame_setup/restore are now called unconditionally (the target
ABI can look in the FrameLayout to detect the case where no
frame setup is required and skip whatever it thinks appropriate
in that case)
- there is no separate gen_prologue_start; if the target needs
to do anything here, it can now just do it instead in
gen_prologue_frame_setup
- common code no longer attempts to emit a return instruction;
instead the target can do whatever is necessary/optimal in
gen_epilogue_frame_restore
[ In principle we could also just have a single gen_prologue
and gen_epilogue callback - I didn't implement this because
then all the stack checking / probing logic would have to be
moved to target code as well. ]
As to 2., currently targets are required to implement a single
"Ret" return instruction. This is initially used during
register allocation to hold a list of return preg/vreg pairs.
During epilog emission, this is replaced by another copy of
the same "Ret" instruction that now carries various platform
specific data (e.g. authorization modes on aarch64), and is
also overloaded to handle the case where the ABI requires
that a number of bytes are popped during return.
This is a bit unfortunate in that it blows up the size of
the instruction data, and also forces targets (that do not
have a "ret N" instruction like Intel) into duplicated and
possible sub-optimal implementations of stack adjustment
during low-level emission of the return instruction.
The new approach separates these concerns. Initially, common
code emits a new "Rets" instruction that is completely parallel
to the existing "Args", and is used only during register
allocation holding the preg/vreg pairs. That instruction
-like now- is replaced during epilog emission - but unlike
now the replacement is now completely up to the target, which
can do whatever it needs in gen_epilogue_frame_restore.
This would typically emit some platform-specific low-level
"Ret" instruction instead of the regalloc "Rets". It also
allows non-Intel targets to just create a normal (or even
optimized) stack adjustment sequence before its low-level "Ret".
[ In particular, on riscv64 pop-stack-before-return currently
emits two distinct stack adjustment instructions immediately
after one another. These could now be easily merged, but that's
not yet done in this patch. ]
No functional change intended on any target.