This commits removes an assert that checked that the stack pointer
position at the end of a call should be greater or equal than the
position registered at the callsite.
Even though this is true in most cases, there are cases in which this is
invariant is not met and as well as there are cases in which the stack pointer will
inevitably be greater than the position registered at callsite:
1. When the call setup doesn't spill any values and instead it
only consumes memory values from the value stack, the stack pointer
can end up being less than what it was at the callsite.
2. When the call setup spills values that are not going to be consumed
by the call (not used as params to the function) the stack pointer
position can end up being greater than what it was at the callsite.
The assert was originally introduced to ensure the right deallocation of
stack space consumed by the call, and it could be improved by applying
the heuristics mentioned above, but I prefer to remove it since we
already assert when emitting the epilogue that both the value stack and
machine stack are in the correct state when fishing compilation.
This change includes an extra test in which the original invariant
doesn't hold (case 2 described above occurs).
* Remove Wasmtime ABIs from Cranelift
This commit removes the `Wasmtime*` family of ABIs from Cranelift. These
were originally added to support multi-value in Wasmtime via the
`TypedFunc` API, but they should now no longer be necessary. In general
this is a higher-level Wasmtime concern than something all backends of
Cranelift should have to deal with.
Today with recent refactorings it's possible to remove the reliance on
ABI details for multi-value and instead codify it directly into the
Cranelift IR generated. For example wasm calls are able to have a
"purely internal" ABI which Wasmtime's Rust code doesn't see at all, and
the Rust code only interacts with the native ABI. The native ABI is
redefined to be what the previous Wasmtime ABIs were, which is to return
the first of a 2+ value return through a register (native return value)
and everything else through a return pointer.
* Remove some wasmtime_system_v usage in tests
* Add back WasmtimeSystemV for s390x
* Fix some docs and references in winch
* Fix another doc link
This change adds support for the `loop`, `br` and `br_if` instructions
as well as unreachable code handling. Whenever an instruction that
affects reachability is emitted (`br` in the case of this PR), the
compiler will enter into an unreachable code state, essentially ignoring
most of the subsequent instructions. When handling the unreachable code
state some instructions are still observed, in order to determine if
reachability should be restored.
This change, particulary the handling of unreachable code, adds all the
necessary building blocks to the compiler to emit other instructions
that affect reachability (e.g `unreachable`, `return`).
Address review feedback
* Rename `branch_target` to `is_branch_target`
* Use the visitor pattern to handle unreachable code
Avoid string comparison and split unreachable handling functions
* Cranelift: Adjust virtual SP after `tail` call-conv callees return
Callees that use the `tail` calling convention will pop stack arguments from the
stack for their callers. They do not, however, adjust the caller's virtual SP,
so that still needs to happen in our ABI and `CallSite` code. This is, however,
slightly trickier than just emitting a nominal SP adjustment pseudo-instruction
because we cannot let regalloc attempt to spill or reload values between the
call and the SP adjustment because the stack offsets will be off by the size of
the stack arguments to the call. Therefore, we add the number of bytes that the
callee pops to the `CallInfo` structures and have emission update the virtual SP
atomically with regards to the call itself.
Fixes#6581Fixes#6582
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Cranelift: Have `fuzzgen` generate functions with the `tail` calling convention
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Add i32.popcnt and i64.popcnt to winch
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* Add fallback implementation for popcnt
Move popcnt fallback up into the macroassembler.
Share code between 32-bit and 64-bit popcnt
Add Popcnt to winch differential fuzzing
* Use _rr functions where possible
* Avoid using scratch register for popcnt
The scratch register was getting clobbered by the calls to `and`,
so this is instead passing in a CodeGenContext to the masm's `popcnt`
and letting it handle its own registers
* Add filetests for the fallback popcnt impls
* address PR comments
* Update filetests
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* winch(x64) Add support for if/else
This change adds the necessary building blocks to support control flow;
this change also adds support for the `If` / `Else` operators.
This change does not include multi-value support. The idea is to add
support for multi-value across the compiler (functions and blocks) as
a separate future change.
The general gist of the change is to track the presence of control flow
frames as part of the code generation context and emit the corresponding
labels as and instructions as control flow blocks are found.
* PR review
* Allocate 64 slots for `ControlStackFrames`
* Explicitly track else branches through an else entry in
`ControlStackFrame`
This commit fixes the implementation of `pop_to_reg`. In the previous
implementation, whenever a specific register was requested as the
destination register and a register-to-register moved happened the
source register was never marked as free.
This issue became more evident with more complex programs involving
control flow and division for example.
This is necessary for implementing callee-pops calling conventions, as is
required for tail calls. This is just a small part of tail calls, and doesn't
implement everything, but is a good piece to land on its own so that eventual PR
isn't so huge.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This commit goes through all proc-macros we have in this repository and
ensures that they're all flagged with `test = false` and `doctest =
false`. This comes about as I was curious why CI time was 40m which felt
a little long and upon inspection the cross-compiled builders were
taking upwards of 30 minutes just to build everything (not including
running tests) where the non-cross-compiled builders took only about
10-15 minutes to build everything.
Further investigation into this discrepancy showed that a lot of crates
are being double-compiled in a cross-compiled situation. This is
expected at a base level and something Cargo transparently handles, for
example if a build script and the final binary need the same dependency
then it's gotta get compiled twice. What was odd is that large portions
of the Wasmtime crate graph were being compiled more than they should
be.
I tracked this down to some `dev-dependencies` for procedural macros
pointing at wasmtime crates. This makes sense for the `tests/*.rs`-style
tests which are always compiled for the target, but tests for the
proc-macro itself would be compiled for the host. By disabling tests and
doctests for the proc macro itself this removes the need for the
host-compiled version of these dependencies.
Overall this reduces a full compile of all tests from ~840 units of work
to 700 units of work according to Cargo. The set of extra crates
compiled in a cross-compiled workflow is not much smaller than in a
non-cross-compiled workflow and they all generally "make sense" as core
shared dependencies which are rooted in both Wasmtime and some
proc-macro's dependency tree, for example.
* Make wasmtime-types type check
* Make wasmtime-environ type check.
* Make wasmtime-runtime type check
* Make cranelift-wasm type check
* Make wasmtime-cranelift type check
* Make wasmtime type check
* Make wasmtime-wast type check
* Make testsuite compile
* Address Luna's comments
* Restore compatibility with effect-handlers/wasm-tools#func-ref-2
* Add function refs feature flag; support testing
* Provide function references support in helpers
- Always support Index in blocktypes
- Support Index as table type by pretending to be Func
- Etc
* Implement ref.as_non_null
* Add br_on_null
* Update Cargo.lock to use wasm-tools with peek
This will ultimately be reverted when we refer to
wasm-tools#function-references, which doesn't have peek, but does have type
annotations on CallRef
* Add call_ref
* Support typed function references in ref.null
* Implement br_on_non_null
* Remove extraneous flag; default func refs false
* Use IndirectCallToNull trap code for call_ref
* Factor common call_indirect / call_ref into a fn
* Remove copypasta clippy attribute / format
* Add a some more tests for typed table instructions
There certainly need to be many more, but this at least catches the bugs fixed
in the next commit
* Fix missing typed cases for table_grow, table_fill
* Document trap code; remove answered question
* Mark wasm-tools to wasmtime reftype infallible
* Fix reversed conditional
* Scope externref/funcref shorthands within WasmRefType
* Merge with upstream
* Make wasmtime compile again
* Fix warnings
* Remove Bot from the type algebra
* Fix table tests.
`wast::Cranelift::spec::function_references::table`
`wast::Cranelift::spec::function_references::table_pooling`
* Fix table{get,set} tests.
```
wast::Cranelift::misc::function_references::table_get
wast::Cranelift::misc::function_references::table_get_pooling
wast::Cranelift::misc::function_references::table_set
wast::Cranelift::misc::function_references::table_set_pooling
```
* Insert subtype check to fix local_get tests.
```
wast::Cranelift::spec::function_references::local_get
wast::Cranelift::spec::function_references::local_get_pooling
```
* Fix compilation of `br_on_non_null`.
The branch destinations were the other way round... :-)
Fixes the following test failures:
```
wast::Cranelift::spec::function_references::br_on_non_null
wast::Cranelift::spec::function_references::br_on_non_null_pooling
```
* Fix ref_as_non_null tests.
The test was failing due to the wrong error message being printed. As
per upstream folks' suggest we were using the trap code
`IndirectCallToNull`, but it produces an unexpected error message.
This commit reinstates the `NullReference` trap code. It produces the
expected error message. We will have to chat with the maintainers
upstream about how to handle these "test failures".
Fixes the following test failures:
```
wast::Cranelift::spec::function_references::ref_as_non_null
wast::Cranelift::spec::function_references::ref_as_non_null_pooling
```
* Fix a call_ref regression.
* Fix global tests.
Extend `is_matching_assert_invalid_error_message` to circumvent the textual error message failure.
Fixes the following test failures:
```
wast::Cranelift::spec::function_references::global
wast::Cranelift::spec::function_references::global_pooling
```
* Cargo update
* Update
* Spell out some cases in match_val
* Disgusting hack to subvert limitations of type reconstruction.
In the function `wasmtime::values::Val::ty()` attempts to reconstruct
the type of its underlying value purely based on the shape of the
value. With function references proposal this sort of reconstruction
is no longer complete as a source reference type may have been
nullable. Nullability is not inferrable by looking at the shape of the
runtime object alone.
Consequently, the runtime cannot reconstruct the type for
`Val::FuncRef` and `Val::ExternRef` by looking at their respective
shapes.
* Address workflows comments.
* null reference => null_reference for CLIF parsing compliance.
* Delete duplicate-loads-dynamic-memory-egraph (again)
* Idiomatic code change.
* Nullability subtyping + fix non-null storage check.
This commit removes the `hacky_eq` check in `func.rs`. Instead it is
replaced by a subtype check. This subtype check occurs in
`externals.rs` too.
This commit also fixes a bug. Previously, it was possible to store a
null reference into a non-null table cell. I have added to new test
cases for this bug: one for funcrefs and another for externrefs.
* Trigger unimplemented for typed function references. Format values.rs
* run cargo fmt
* Explicitly match on HeapType::Extern.
* Address cranelift-related feedback
* Remove PartialEq,Eq from ValType, RefType, HeapType.
* Pin wasmparser to a fairly recent commit.
* Run cargo fmt
* Ignore tail call tests.
* Remove garbage
* Revert changes to wasmtime public API.
* Run cargo fmt
* Get more CI passing (#19)
* Undo Cargo.lock changes
* Fix build of cranelift tests
* Implement link-time matches relation. Disable tests failing due to lack of public API support.
* Run cargo fmt
* Run cargo fmt
* Initial implementation of eager table initialization
* Tidy up eager table initialisation
* Cargo fmt
* Ignore type-equivalence test
* Replace TODOs with descriptive comments.
* Various changes found during review (#21)
* Clarify a comment
This isn't only used for null references
* Resolve a TODO in local init
Don't initialize non-nullable locals to null, instead skip
initialization entirely and wasm validation will ensure it's always
initialized in the scope where it's used.
* Clarify a comment and skipping the null check.
* Remove a stray comment
* Change representation of `WasmHeapType`
Use a `SignatureIndex` instead of a `u32` which while not 100% correct
should be more correct. This additionally renames the `Index` variant to
`TypedFunc` to leave space for future types which aren't functions to
not all go into an `Index` variant.
This required updates to Winch because `wasmtime_environ` types can no
longer be converted back to their `wasmparser` equivalents. Additionally
this means that all type translation needs to go through some form of
context to resolve indices which is now encapsulated in a `TypeConvert`
trait implemented in various locations.
* Refactor table initialization
Reduce some duplication and simplify some data structures to have a more
direct form of table initialization and a bit more graceful handling of
element-initialized tables. Additionally element-initialize tables are
now treated the same as if there's a large element segment initializing
them.
* Clean up some unrelated chagnes
* Simplify Table bindings slightly
* Remove a no-longer-needed TODO
* Add a FIXME for `SignatureIndex` in `WasmHeapType`
* Add a FIXME for panicking on exposing function-references types
* Fix a warning on nightly
* Fix tests for winch and cranelift
* Cargo fmt
* Fix arity mismatch in aarch64/abi
---------
Co-authored-by: Daniel Hillerström <daniel.hillerstrom@ed.ac.uk>
Co-authored-by: Daniel Hillerström <daniel.hillerstrom@huawei.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
This commit is a follow up to https://github.com/bytecodealliance/wasmtime/pull/6443,
in which we discussed potentially having `PtrSize` and `ABI` as
associated types to the `MacroAssembler` trait.
I considered having `PtrSize` associated to the `ABI`, but given the
amount of ABI details needed at the `MacroAssembler` level, I decided to
go with the approach in this change.
The chosen approach ended up cutting a decent amount of boilerplate from
the `MacroAssembler` itself, but also from each of the touchpoints where
the `MacroAssembler` is used.
This change also standardizes the signatures of the `ABI` trait. Some of
them borrowed `&self` and some didn't, but in practice, there's no need
to have any of them borrow `&self`.
This commit is a small cleanup to drop the usage of the `FuncEnv` trait.
In https://github.com/bytecodealliance/wasmtime/pull/6358, we agreed on
making `winch-codegen` directly depend on `wasmtime-environ`.
Introducing a direct relatioship between `winch-codegen` and
`wasmtime-environ` means that the `FuncEnv` trait is no longer serving
its original purpose, and we can drop the usage of the trait and use the
types exposed from `winch-codegen` directly instead.
Even though this change drops the `FuncEnv` trait, it still keeps
a `FuncEnv` struct, which is used during code generation.
* winch(trampolines): Save SP, FP and return address
This change is a follow-up to https://github.com/bytecodealliance/wasmtime/pull/6358
This change implements the necessary stores of SP, FP and return address
for fast stack walking.
* Ignore backtrace test on Windows
Temporarily ignoring Winch's trap test on Windows while
support for unwind information is added.
* winch: Implement new trampolines
This change is a follow-up to
https://github.com/bytecodealliance/wasmtime/pull/6262, in which the new
trampolines, described [here](https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#new-trampolines-and-vmcallercheckedanyfunc-changes),
were introduced to Wasmtime.
This change, focuses on the `array-to-wasm`,
`native-to-wasm` and `wasm-to-native` trampolines to restore Winch's
working state prior to the introduction of the new trampolines. It's
worth noting that the new approach for trampolines make it easier to support
the `TypedFunc` API in Winch. Prior to the introduction of the new
trampolines, it was not obvious how to approach it.
This change also introduces a pinned register that will hold the
`VMContext` pointer, which is loaded in the `*-to-wasm` trampolines;
the `VMContext` register is a pre-requisite to this change to support
the `wasm-to-native` trampolines.
Lastly, with the introduction of the `VMContext` register and the
`wasm-to-native` trampolines, this change also introduces support for
calling function imports, which is a variation of the already existing
calls to locally defined functions.
The other notable piece of this change aside from the trampolines is
`winch-codegen`'s dependency on `wasmtime-environ`. Winch is so closely
tied to the concepts exposed by the wasmtime crates that it makes sense
to tie them together, even though the separation provides some
advantages like easier testing in some cases, in the long run, there's
probably going to be less need to test Winch in isolation and rather
we'd rely more on integration style tests which require all of Wasmtime
pieces anyway (fuzzing, spec tests, etc).
This change doesn't update the existing implmenetation of
`winch_codegen::FuncEnv`, but the intention is to update that part after
this change.
prtest:full
* tests: Ignore miri in Winch integration tests
* Remove hardcoded alignment and addend
* Add a cranelift setting for padding between basic blocks
Various relocations, jumps, and such require special handling in
`MachBuffer` with respect to islands to ensure that everything gets
emitted correctly. This commit adds a setting to synthetically insert
padding at the end of every basic block to help stress this logic with
more minimal test cases. The setting is disabled by default but is
something that we should be able to turn on during fuzzing, for example.
* aarch64: Fix out-of-range `Ldr19` relocations
This commit fixes a bug in the AArch64 backend, and possibly others,
where constants were unconditionally forced to be at the end of the
function when they sometimes couldn't be. For example the `Ldr19`
relocation has a 512k range meaning that if an instruction near the
beginning of a function accesses a constant at the end of a function and
the function is >1M, then the relocation cannot be resolved. This is all
handled internally with `MachBuffer`'s handling of islands but the
problem with constants is that the labels (and the constant values)
weren't defined until the end of the function.
The first attempt at fixing this was to move the calls to
`defer_constant` to the beginning of emission. This would enable the
constants to get deferred as necessary. This was problematic, however,
because it only solved the forwards case (aka your constant was forced
to the end of the function which is too far away). The backwards case,
aka your constant is way too far behind you, was a new problem that
arose.
To fix all of these issues constants are now handled differently inside
of the `MachBuffer`. Previously constants were all pre-assigned a
label-per-constant and all references to the constant would use that
single label. Instead a new heuristic has been added where constants
record their size/alignment at the start of emission and labels are
lazily deferred. When a label for a constant is requested then a label
is lazily allocated or a previously-allocated label for this constant is
returned. When an island is emitted then all emitted constants get
their labels cleared. This intends to balance the previous functionality
of multiple uses of a constant only emit the constant once with fixing
this issue with simplicity as well. This means that constants may get
emitted multiple times, since each reference to a constant after an
island is generated will be guaranteed to generate a new label, even if
it's in-range to access. This can perhaps be fixed in the future with a
more clever API where the `LabelUse` is passed into the function which
converts a constant to a label, but that's left as a refactoring for a
future date.
This commit also moves an `alignment: u32` field into the
`MachBufferFinalized` itself since that's now a function of whatever
constants actually got emitted. Additionally note that constant
emission in the middle of a function doesn't actually emit anything,
instead recording markers of where constants need to go. Then when a
buffer is finalized the constants are passed in to get access to the
data which fills in everything as it's referenced.
* Fuzz the `bb_padding_log2` setting
This commit hooks up the previously-added setting to Cranelift to
Wasmtime's fuzzing infrastructure. This will automatically configure the
setting based on the fuzz input to add a bit of "chaos" to the emitted
code. This should hopefully help expose the issue fixed previously via
fuzzing which otherwise won't generate massive functions.
* Realign back to an instruction boundary
Otherwise misaligned instructions were getting emitted and tripping
various asserts.
* Fix riscv64 testing
* Rename codegen setting to bb_padding_log2_minus_one
Allow for inserting one byte of padding.
* Doc updates
* Thread through shared flags differently
Don't use `EmitInfo`, instead pass in to vcode emission
* Fix s390x tests
* Combine island calculations during vcode emission
Fixes an off-by-just-a-few error if the two island checks are done
separately after a basic block.
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function
pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a
dedicated calling convention, so callers just choose the version that works for
them. This is as opposed to the previous behavior where we would chain together
many trampolines that converted between calling conventions, sometimes up to
four on the way into Wasm and four more on the way back out. See [0] for
details.
[0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths
Thanks to @bjorn3 for the initial idea of having multiple function pointers for
different calling conventions.
This is generally a nice ~5-10% speed up to our call benchmarks across the
board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from
Wasm to the host, which have a minor regression. We hypothesize that this is
because the old hand-written assembly trampolines did not maintain a call frame
and do a tail call, but the new Cranelift-generated trampolines do maintain a
call frame and do a regular call. The regression is only a couple nanoseconds,
which seems well-explained by these differences explain, and ultimately is not a
big deal.
However, this does lead to a ~5% code size regression for compiled modules.
Before, we compiled a trampoline per escaping function's signature and we
deduplicated these trampolines by signature. Now we compile two trampolines per
escaping function: one for if the host calls via the array calling convention
and one for it the host calls via the native calling convention. Additionally,
we compile a trampoline for every type in the module, in case there is a native
calling convention function from the host that we `call_indirect` of that
type. Much of this is in the `.eh_frame` section in the compiled module, because
each of our trampolines needs an entry there. Note that the `.eh_frame` section
is not required for Wasmtime's correctness, and you can disable its generation
to shrink compiled module code size; we just emit it to play nice with external
unwinders and profilers. We believe there are code size gains available for
follow up work to offset this code size regression in the future.
Backing up a bit: the reason each Wasm module needs to provide these
Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow
embedders to create functions even when there is no compiler available, so they
cannot bring their own trampoline. Instead the Wasm module has to supply
it. This in turn means that we need to look up and patch in these Wasm-to-native
trampolines during roughly instantiation time. But instantiation is super hot,
and we don't want to add more passes over imports or any extra work on this
path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in
ahead of time.
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
Co-Authored-By: Alex Crichton <alex@alexcrichton.com>
prtest:full
* Fix default architecture for winch
This updates the `winch/codegen/build.rs` script to default to the
target architecture being compiled for as opposed to the host
architecture that's performing the compile.
Closes#6241
* Auto-enable other future architectures
This commit improves ABI support in Winch's trampolines mainly by:
* Adding support for the `fastcall` calling convention.
* By storing/restoring callee-saved registers.
One of the explicit goals of this change is to make tests available in the x86_64 target
as a whole and remove the need exclude the windows target.
This commit also introduces a `CallingConvention` enum, to better
reflect the subset of calling conventions that are supported by Winch.
* Adding in trampoline compiling method for ISA
* Adding support for indirect call to memory address
* Refactoring frame to externalize defined locals, so it removes WASM depedencies in trampoline case
* Adding initial version of trampoline for testing
* Refactoring trampoline to be re-used by other architectures
* Initial wiring for winch with wasmtime
* Add a Wasmtime CLI option to select `winch`
This is effectively an option to select the `Strategy` enumeration.
* Implement `Compiler::compile_function` for Winch
Hook this into the `TargetIsa::compile_function` hook as well. Currently
this doesn't take into account `Tunables`, but that's left as a TODO for
later.
* Filling out Winch append_code method
* Adding back in changes from previous branch
Most of these are a WIP. It's missing trampolines for x64, but a basic
one exists for aarch64. It's missing the handling of arguments that
exist on the stack.
It currently imports `cranelift_wasm::WasmFuncType` since it's what's
passed to the `Compiler` trait. It's a bit awkward to use in the
`winch_codegen` crate since it mostly operates on `wasmparser` types.
I've had to hack in a conversion to get things working. Long term, I'm
not sure it's wise to rely on this type but it seems like it's easier on
the Cranelift side when creating the stub IR.
* Small API changes to make integration easier
* Adding in new FuncEnv, only a stub for now
* Removing unneeded parts of the old PoC, and refactoring trampoline code
* Moving FuncEnv into a separate file
* More comments for trampolines
* Adding in winch integration tests for first pass
* Using new addressing method to fix stack pointer error
* Adding test for stack arguments
* Only run tests on x86 for now, it's more complete for winch
* Add in missing documentation after rebase
* Updating based on feedback in draft PR
* Fixing formatting on doc comment for argv register
* Running formatting
* Lock updates, and turning on winch feature flags during tests
* Updating configuration with comments to no longer gate Strategy enum
* Using the winch-environ FuncEnv, but it required changing the sig
* Proper comment formatting
* Removing wasmtime-winch from dev-dependencies, adding the winch feature makes this not necessary
* Update doc attr to include winch check
* Adding winch feature to doc generation, which seems to fix the feature error in CI
* Add the `component-model` feature to the cargo doc invocation in CI
To match the metadata used by the docs.rs invocation when building docs.
* Add a comment clarifying the usage of `component-model` for docs.rs
* Correctly order wasmtime-winch and winch-environ in the publish script
* Ensure x86 test dependencies are included in cfg(target_arch)
* Further constrain Winch tests to x86_64 _and_ unix
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* winch(x64): Initial implementation for function calls
This change adds the main building blocks for calling locally defined
functions. Support for function imports will be added iteratively after this
change lands and once trampolines are supported.
To support function calls, this change introduces the following functionality to
the MacroAssembler:
* `pop` to pop the machine stack into a given register, which in the case of
this change, translates to the x64 pop instruction.
* `call` to a emit a call to locally defined functions.
* `address_from_sp` to construct memory addresses with the SP as a base.
* `free_stack` to emit the necessary instrunctions to claim stack space.
The heavy lifting of setting up and emitting the function call is done through
the implementation of `FnCall`.
* Fix spill behaviour in function calls and add more documentation
This commits adds a more detailed documentation to the `call.rs` module.
It also fixes a couple of bugs, mainly:
* The previous commit didn't account for memory addresses used as arguments for
the function call, any memory entry in the value stack used as a function
argument should be tracked and then used to claim that memory when the function
call ends. We could `pop` and do this implicitly, but we can also track this
down and emit a single instruction to decrement the stack pointer, which will
result in better code.
* Introduce a differentiator between addresses relative or absolute to the stack
pointer. When passing arguments in the stack -- assuming that SP at that point
is aligned for the function call -- we should store the arguments relative to
the absolute position of the stack pointer and when addressing a memory entry in
the Wasm value stack, we should use an address relative to the offset and the
position of the stack pointer.
* Simplify tracking of the stack space needed for emitting a function call
* Add a `MachBuffer::defer_trap` method
This commit adds a new method to `MachBuffer` to defer trap opcodes to
the end of a function in a similar manner to how constants are deferred
to the end of the function. This is useful for backends which frequently
use `TrapIf`-style opcodes. Currently a jump is emitted which skips the
next instruction, a trap, and then execution continues normally. While
there isn't any pressing problem with this construction the trap opcode
is in the middle of the instruction stream as opposed to "off on the
side" despite rarely being taken.
With this method in place all the backends (except riscv64 since I
couldn't figure it out easily enough) have a new lowering of their
`TrapIf` opcode. Now a trap is deferred, which returns a label, and then
that label is jumped to when executing the trap. A fixup is then
recorded in `MachBuffer` to get patched later on during emission, or at
the end of the function. Subsequently all `TrapIf` instructions
translate to a single branch plus a single trap at the end of the
function.
I've additionally further updated some more lowerings in the x64 backend
which were explicitly using traps to instead use `TrapIf` where
applicable to avoid jumping over traps mid-function. Other backends
didn't appear to have many jump-over-the-next-trap patterns.
Lots of tests have had their expectations updated here which should
reflect all the traps being sunk to the end of functions.
* Print trap code on all platforms
* Emit traps before constants
* Preserve source location information for traps
* Fix test expectations
* Attempt to fix s390x
The MachBuffer was registering trap codes with the first byte of the
trap, but the SIGILL handler was expecting it to be registered with the
last byte of the trap. Exploit that SIGILL is always represented with a
2-byte instruction and always march 2-backwards for SIGILL, continuing
to march backwards 1 byte for SIGFPE-generating instructions.
* Back out s390x changes
* Back out more s390x bits
* Review comments
* x64: Take SIGFPE signals for divide traps
Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.
There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.
This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:
* When `avoid_div_traps=true`, traps are no longer emitted for `div`
instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
for the `div` instruction is a divide-by-zero.
This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).
This is unlikely to have really all that much of a speedup but was
something I noticed during #6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.
* Remove the `avoid_div_traps` cranelift setting
With no known users currently removing this should be possible and helps
simplify the x64 backend.
* x64: GC more support for avoid_div_traps
Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up
some of the ISLE rules now that `div` is allowed to itself trap
unconditionally.
* x64: Store div trap code in instruction itself
* Keep divisors in registers, not in memory
Don't accidentally fold multiple traps together
* Handle EXC_ARITHMETIC on macos
* Update emit tests
* Update winch and tests
This commit introduces the `winch-environ` crate. This crate's responsibility is
to provide a shared implementatation of the `winch_codegen::FuncEnv` trait,
which is Winch's function compilation environment, used to resolve module and
runtime specific information needed by the code generation, such as resolving
all the details about a callee in a WebAssembly module, or resolving specific
information from the `VMContext`.
As of this change, the implementation only includes the necessary pieces to
resolve a function callee in a WebAssembly module. The idea is to evolve the
`winch_codegen::FuncEnv` trait as we evolve Winch's code generation.
* x64: Add precise-output tests for div traps
This adds a suite of `*.clif` files which are intended to test the
`avoid_div_traps=true` compilation of the `{s,u}{div,rem}` instructions.
* x64: Remove conditional regalloc in `Div` instruction
Move the 8-bit `Div` logic into a dedicated `Div8` instruction to avoid
having conditionally-used registers with respect to regalloc.
* x64: Migrate non-trapping, `udiv`/`urem` to ISLE
* x64: Port checked `udiv` to ISLE
* x64: Migrate urem entirely to ISLE
* x64: Use `test` instead of `cmp` to compare-to-zero
* x64: Port `sdiv` lowering to ISLE
* x64: Port `srem` lowering to ISLE
* Tidy up regalloc behavior and fix tests
* Update docs and winch
* Review comments
* Reword again
* More refactoring test fixes
* More test fixes
* Enable the native target by default in winch
Match cranelift-codegen's build script where if no architecture is
explicitly enabled then the host architecture is implicitly enabled.
* Refactor Cranelift's ISA builder to share more with Winch
This commit refactors the `Builder` type to have a type parameter
representing the finished ISA with Cranelift and Winch having their own
typedefs for `Builder` to represent their own builders. The intention is
to use this shared functionality to produce more shared code between the
two codegen backends.
* Moving compiler shared components to a separate crate
* Restore native flag inference in compiler building
This fixes an oversight from the previous commits to use
`cranelift-native` to infer flags for the native host when using default
settings with Wasmtime.
* Move `Compiler::page_size_align` into wasmtime-environ
The `cranelift-codegen` crate doesn't need this and winch wants the same
implementation, so shuffle it around so everyone has access to it.
* Fill out `Compiler::{flags, isa_flags}` for Winch
These are easy enough to plumb through with some shared code for
Wasmtime.
* Plumb the `is_branch_protection_enabled` flag for Winch
Just forwarding an isa-specific setting accessor.
* Moving executable creation to shared compiler crate
* Adding builder back in and removing from shared crate
* Refactoring the shared pieces for the `CompilerBuilder`
I decided to move a couple things around from Alex's initial changes.
Instead of having the shared builder do everything, I went back to
having each compiler have a distinct builder implementation. I
refactored most of the flag setting logic into a single shared location,
so we can still reduce the amount of code duplication.
With them being separate, we don't need to maintain things like
`LinkOpts` which Winch doesn't currently use. We also have an avenue to
error when certain flags are sent to Winch if we don't support them. I'm
hoping this will make things more maintainable as we build out Winch.
I'm still unsure about keeping everything shared in a single crate
(`cranelift_shared`). It's starting to feel like this crate is doing too
much, which makes it difficult to name. There does seem to be a need for
two distinct abstraction: creating the final executable and the handling
of shared/ISA flags when building the compiler. I could make them into
two separate crates, but there doesn't seem to be enough there yet to
justify it.
* Documentation updates, and renaming the finish method
* Adding back in a default temporarily to pass tests, and removing some unused imports
* Fixing winch tests with wrong method name
* Removing unused imports from codegen shared crate
* Apply documentation formatting updates
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Adding back in cranelift_native flag inferring
* Adding new shared crate to publish list
* Adding write feature to pass cargo check
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Refactor the structure and responsibilities of `CodeGenContext`
This commit refactors how the `CodeGenContext` is used throughout the code
generation process, making it easier to pass it around when more flexibility is
desired in the MacroAssembler to perform the lowering of certain instructions.
As of this change, the responsibility of the `CodeGenContext` is to provide an
interface for operations that require an orchestration between the register
allocator, the value stack and function's frame. The MacroAssembler is removed
from the CodeGenContext as is passed as a dependency where needed, effectly
using it as an independent code generation interface only.
By giving more responsibilities to the `CodeGenContext` we can clearly separate
the concerns of the register allocator, which previously did more than it
should (e.g. popping values and spilling).
This change ultimately allows passing in the `CodeGenContext` to the
`MacroAssembler` when a given instruction cannot be generically described
through a common interface. Allowing each implementation to decide the best way
to lower a particular instruction.
* winch: Add support for the WebAssembly `<i32|i64>.div_*` instructions
Given that some architectures have very specific requirements on how to handle
division, this change uses `CodeGenContext` as a dependency to the `div`
MacroAssembler instruction to ensure that each implementation can decide on how to lower the
division. This approach also allows -- in architectures where division can be
expressed as an ordinary binary operation -- to rely on the
`CodeGenContext::i32_binop` or `CodeGenContext::i64_binop` helpers.
This patch adds complete support for the `sub` and `add` WebAssembly instructions
for x64, and complete support for the `add` WebAssembly instruction for aarch64.
This patch also refactors how the binary operations get constructed within the
`VisitOperator` trait implementation. The refactor adds methods in the
`CodeGenContext` to abstract all the common steps to emit binary operations,
making this process less repetitive and less brittle (e.g. omitting to push the resulting value
to the stack, or omitting to free registers after used).
This patch also improves test coverage and refactors the filetests directory to make it
easier to add tests for other instructions.