* add cloning for String attributes
* use into_owned instead of to_vec to avoid a clone if possible.
* runs, but does not substitute
* show vars from c program, start cleanup
* tiday
* resolve conflicts
* remove WASI folder
* Add module_builder
add dwarf_package to state for cache
* move dwarf loading to module_environ.rs
pass the dwarf as binary as low as module_environ.rs
* pass dwarf package rather than add to debug_info
* tidy option/result nested if
* revert some toml and whitespace.
* add features cranelift,winch to module_builder and compute_artifacts
remove some `use`s
* address some feedback
remove unused 'use's
* address some feedback
remove unused 'use's
* move wat feature condition to cover whole method.
* More feedback
Another try at wat feature move
* Another try at wat feature move
* change gimli exemption version
add typed-arena exemption
* add None for c-api
* move `use` to #cfg
* fix another config build
* revert unwanted code deletion
* move inner function closer to use
* revert extra param to Module::new
* workaround object crate bug.
* add missing parameter
* add missing parameter
* Merge remote-tracking branch 'origin/main' into dwarf-att-string
# Conflicts:
# crates/wasmtime/src/engine.rs
# crates/wasmtime/src/runtime/module.rs
# src/common.rs
* remove moduke
* use common gimli version of 28.1
* remove wasm feature, revert gimli version
* remove use of object for wasm dwarf
* remove NativeFile workaround, add feature for dwp loading
* sync winch signature
* revert bench api change
* add dwarf for no cache feature
* put back merge loss of module kind
* remove param from docs
* add dwarf fission lldb test
* simplify and include test source
* clang-format
* address feedback, remove packages
add docs
simplify return type
* remove Default use on ModuleTypesBuilder
* Remove an `unwrap()` and use `if let` instead
* Use `&[u8]` instead of `&Vec<u8>`
* Remove an `unwrap()` and return `None` instead
* Clean up some code in `transform_dwarf`
* Clean up some code in `replace_unit_from_split_dwarf`
* Clean up some code in `split_unit`
* Minor refactorings and documentation in `CodeBuilder`
* Restrict visibility of `dwarf_package_binary`
* Revert supply-chain folder changes
* Fix compile error on nightly
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* use lldb 15
* prtest:full
* prtest:full
* load dwp when loading wasm bytes with path
* correct source file name
* remove debug
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Other backends have functions for passing any registers in an AMode to
the operand collector. Use the same pattern in this backend too: it's
simpler to use, and also sets up for more changes I'm planning.
`Lower::set_vreg_alias` is only used in one place. Inlining it
eliminates some borrow-checker challenges and allows the trace log to
include more context.
When OperandCollector's reg_use/reg_late_use/reg_def/reg_early_def
methods are handed a Reg that refers to a physical ("real") register,
they all delegate to reg_fixed_nonallocatable, ignoring the constraint
kinds and positions. This behavior was introduced in #5132.
In several cases, the s390x backend was calling those methods with the
result of the `gpr` or `writable_gpr` functions, which return physical
registers. In these cases we can now be more explicit that this is a
non-allocatable register.
In addition, this PR reverts #4973 and #5121 because they became
unecessary due, again, to #5132.
Fixes https://github.com/bytecodealliance/wasmtime/issues/8446
The WebAssembly tail call proposal is currently not supported in Winch. This commit returns an error when trying to enable the tail call proposal while using Winch as the compiler.
Even though the issue linked above doesn't make use of any of the tail instructions, the trampolines were generated using Cranelift's tail call calling convention.
* Add a fuzzer for async wasm
This commit revives a very old branch of mine to add a fuzzer for
Wasmtime in async mode. This work was originally blocked on
llvm/llvm-project#53891 and while that's still an issue it now contains
a workaround for that issue. Support for async fuzzing required a good
deal of refactorings and changes, and the highlights are:
* The main part is that new intrinsics,
`__sanitizer_{start,finish}_fiber_switch` are now invoked around the
stack-switching routines of fibers. This only works on Unix and is set
to only compile when ASAN is enabled (otherwise everything is a noop).
This required refactoring of things to get it all in just the right
way for ASAN since it appears that these functions not only need to be
called but more-or-less need to be adjacent to each other in the code.
My guess is that while we're switching ASAN is in a "weird state" and
it's not ready to run arbitrary code.
* Stacks are a problem. The above issue in LLVM outlines how stacks
cannot be deallocated at this time because if the deallocated virtual
memory is later used for the heap then ASAN will have a false positive
about stack overflow. To handle this stacks are specially handled in
asan mode by using a special allocation path that never deallocates
stacks. This logic additionally applies to the pooling allocator which
uses a different stack allocation strategy with ASAN.
With all of the above a new fuzzer is added. This fuzzer generates an
arbitrary module, selects an arbitrary means of async (e.g.
epochs/fuel), and then tries to execute the exports of the module with
various values. In general the fuzzer is looking for crashes/panics as
opposed to correct answers as there's no oracle here. This is also
intended to stress the code used to switch on and off stacks.
* Fix non-async build
* Remove unused import
* Review comments
* Fix compile on MIRI
* Fix Windows build
On x86-64, instructions which take 32-bit immediate operands in 64-bit
mode sign-extend those immediates.
To accommodate that, we currently only generate an immediate if the
64-bit constant we want is equal to truncating to 32 bits and then
sign-extending back to 64 bits. Otherwise we put the 64-bit constant in
the constant pool.
However, if the constant's type is I32, we don't care about its upper
32 bits. In that case it's safe to generate an immediate whether the
sign bit is set or not. We should never need to use the constant pool
for types smaller than 64 bits on x64.
Before this, Cranelift ABI code would emit a stack-load instruction for
every stack argument and add all register arguments to the `args`
pseudo-instruction, whether those arguments were used or not.
However, we already know which arguments are used at that point because
we need the analysis for load-sinking, so it's easy to filter the unused
arguments out.
This avoids generating loads that are immediately dead, which is good
for the generated code. It also slightly reduces the size of the
register allocation problem, which is a small win in compile time.
This also changes which registers RA2 chooses in some cases because it
no longer considers unused defs from the `args` pseudo-instruction.
There was an existing method named `arg_is_needed_in_body` which sounded
like it should be the right place to implement this. However, that
method was only used for Baldrdash integration and has been a stub since
that integration was removed in #4571. Also it didn't have access to the
`value_ir_uses` map needed here. But the place where that method was
called does have access to that map and was perfect for this.
Also, don't emit a `dummy_use` pseudo-instruction for the vmctx if it's
otherwise unused everywhere, as we want to drop it from the `args`
instruction in that case and then RA2 complains that it's used without
being defined.
Furthermore, don't emit debug info specially for the vmctx parameter,
because it's already emitted for all block parameters including vmctx.
Thanks to @elliottt for doing the initial investigation of this change
with me, and to @cfallin for helping me track down the `dummy_use` false
dependency.
Inequality comparisons between i128 values were previously eight
instructions and this reduces them to two, plus one move if one of the
inputs is still live afterward.
Equality comparisons were six instructions and are now three, plus up to
two moves if both inputs are still live afterward.
This removes 45 instructions from the test in x64/i128.clif that
generates all possible i128 comparisons. In addition to using fewer
instructions for each comparison, it also reduces register pressure
enough that the function no longer spills.
Conditional branches on i128 values are a special case but similar
optimizations shrink them from six instructions to two.
This brings Cranelift in line with what rustc+LLVM generates for
equivalent 128-bit comparisons.
This commit fixes a panic when a host function defined with `Func::new`
returned GC references and was called in async mode. The logic to
auto-gc before the return values go to wasm asserted that a synchronous
GC was possible but the context this function is called in could be
either async or sync. The fix applied in this commit is to remove the
auto-gc. This means that hosts will need to explicitly GC in these
situations until auto-gc is re-added back to Wasmtime.
cc #8433 as this will make the behavior consistent, but we'll want to
re-add the gc behavior.
* Update release date of Wasmtime 20.0.0
* Update the release date
---------
Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Previously trying to branch on a i128 with a branch value larger than
i64::MAX after truncating to u64 would result in the branch value incorrectly
getting interpreted as a value with the upper 64 bits being 1 rather than 0.
* x64: Rename fields of `CmpRmiR` and swap them
This commit is the start of work that is the equivalent of #8362 but for
comparisons of general purpose instructions. Here only the instruction
arguments are swapped and renamed. All callers preserve the
right-to-left ordering and a subsequent commit will swap them to
left-to-right.
* x64: Swap `test_rr` operands in Winch
* x64: Swap operands of `cmp_rr` in Winch
* x64: Swap operands of `cmp_ir` in Winch
* x64: Swap operands for the `cmp` masm method in Winch
This additionally needed to affect the `branch` method. Methods now
additionally document what's the left-hand-side and what's the
right-hand-side.
Of note here was that the `branch` instruction actually used the terms
"lhs" and "rhs" incorrectly where the left/right-ness was actually
swapped in the actual comparison. This causes no issues, however, since
`branch` was only ever used when both operands were the same or a
reflexive condition was used.
* x64: Swap operands for `Inst::cmp_rmi_r`
* x64: Swap operand order of `cmp_rmi_r` ISLE helper
Also update all callers to swap orders as well.
* x64: Swap operand order of `x64_test` helper
* Swap operand order of `x64_cmp_imm` helper
* x64: Swap operand order of `cmp`
* x64: Define `x64_cmp_imm` with `x64_cmp`
Minor refactoring which clarifies that this is just a normal `cmp`
except with a different signature.
* x64: Use `x64_cmp_imm` in a few more locations
A bit easier on the eyes to read.
* We can only have one mutable borrow at a time after #8277
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add a test like pread/pwrite, but for read/write
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Fix use-after-free in externref example
This fixes a typo in the `externref.c` example where a deallocated
`wasmtime_val_t` was used by accident. Additionally this introduces
scoping to prevent this from arising again.
* Run clang-format
* Fix compilation of C example
* Always use r10 for indirect return calls, and indirect winch calls
Use `r10` for the destination of indirect return calls, and indirect
calls using the winch calling convention, as it is a caller-saved
register.
For tail calls, this ensures that we won't accidentally pick a
callee-saved register for the destination, clobbering it when we restore
callee-saves in the call to `emit_return_call_common_sequence`.
For winch calls, using `r10` instead of `r15` means that it's still
possible to use the pinned register in combination with winch.
* Always use `r11` for the temp in return_call
* Switch ReturnCallUnknown to taking the callee as a Reg
* x64: Remove unnecessary `put_in_xmm` in `x64_ucomis`
This commit is a result of [discussion on Zulip][Zulip] which concluded
that there's no longer any need for these `put_in_xmm` methods. The
condition these were previously protecting against is now upheld by
support added in #4061
I've additionally added a number of new tests here to assert that load
sinking isn't happening where it's not supposed to, and for this commit
I've manually verified it's all as expected.
[Zulip]: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/x64.2C.20fcmp.2C.20and.20load-op.20merging/near/433319056
* x64: Change signature of `x64_ucomis` to take registers
Previously this was the only instruction helper which took a `Value` as
an argument. Now it works like all other instruction helpers by taking a
typed register as an argument, namely `Xmm` and `XmmMem`.
* x64: Add AVX encodings of float-compare instructions
This fills out the final remaining AVX-related instructions that
Cranelift can emit, so now Cranelift should use AVX instructions for all
float/simd operations when AVX is enabled.
* Fix test expectation
* x64: Refactor float comparisons and their representations
Currently the `XmmCmpRmR` instruction variant has a `dst` and a `src`
field. The instruction doesn't actually write to `dst`, however, and the
constructor of `xmm_cmp_rm_r` takes the `src` first followed by the
`dst`. This is inconsistent with most other xmm-related instructions
where the "src1" comes first and the "src2", which is a memory operand,
comes second. This memory-operand-second pattern also matches the Intel
manuals more closely.
This commit refactors the `XmmCmpRmR` instruction variant with the
following changes:
* `dst` is renamed to `src1`
* `src` is renamed to `src2`
* The `xmm_cmp_rm_r` helpers, and callers, swapped their arguments to
take `Xmm` first and `XmmMem` second.
* The `x64_ptest` instruction followed suit as it was modelled after the
same.
* Callers of `x64_ucomis` swapped their arguments to preserve the
operand orders.
* The `Inst::xmm_cmp_rm_r` helper swapped operand order and additionally
callers were updated.
* The VCode rendering of `XmmCmpRmR` swapped order of its operands,
explaining changes in rendering of filetests (although machine code is
not changing here).
The changes were then additionally propagated to Winch as well. In Winch
the `src`/`dst` naming was inherited so it was renamed to `src1` and
`src2` which swapped operands as well. In the case of Winch there was
additionally an accident in `float_cmp_op` where values were popped in
reverse order. This swapping-of-swapping all worked out prior, but to
get all the names to align correctly I've swapped this to be more
consistent. Sorry there's a lot of swaps-of-swaps here but the basic
idea is that the low-level instruction constructor swapped arguments so
to preserve the same (correct) output today something else needed to be
swapped. In Winch's case it wasn't the immediate caller of the
instruction constructor since that method looked correct, but it was
instead a higher-level `float_cmp_op` which then called a helper which
then called the low-level constructor which had operands swapped.
* Review comments
* Use callee-save registers for the riscv64 tail calling convention
Switch to using the same set of callee saved registers as the default
calling convention on riscv64.
* Stop constraining regular call indirect to use t0 as the destination
* Update tests
* Eagerly allocate the incoming argument area for riscv64 tail calls
* Update tests
* Inline clobber restore when emitting return calls, and only adjust sp once
* Indicate that incoming argument offsets are negative
While reviewing #8393 I found the existing `load_constant64_full`
function nearly incomprehensible, so I rewrote it. It has many fewer
cases now.
I've chosen an implementation which I believe generates exactly the same
instructions as the prior implementation in all cases. There are some
choices we could make which would synthesize the same constants using a
different sequence of instructions, but in almost all cases we'd produce
the same number of instructions, so it doesn't make much difference.
This algorithm is simple and produces readable disassembly listings so
we might as well preserve the existing behavior.
* Rework the riscv64 prologue to spill clobbers with positive offsets to SP
* Also modify gen_clobber_restore to load positive offsets from SP
This fixes a bug: we're not supposed to write below SP, as a signal
handler is allowed to clobber anything below it. Writing our clobbers to
negative offsets of SP leaves a window open where they might be
clobbered if we receive a signal while executing the prologue or
epilogue.
* Fix x64 clobber offsets with unwind info
The stack offsets provided to UnwindInst::SaveReg instructions are
relative to the beginning of the clobber area of the stack, and on main
the x64 backend computes those offsets by assuming a SP-relative offset
and then adjusting it. This is problematic, as any changes to the frame
layout might require updates in two locations: where we compute the
initial SP offset, and where we compute the adjustment to make the
offset relative to the clobber area again.
This PR fixes this issue by making the offset we track in the loop
always relative to the beginning of the clobber area, and adding the
offset from SP only when emitting the store. This relies on the
assumption that the clobber area is sized according to the constraints
of the registers stored within it (16-byte alignment is required for
floating point registers) but this assumption is enforced by existing
behavior in compute_clobber_size.
This also fixes a long-standing bug with floating point register spill
offsets, as the offset was saved before the alignment to a 16-byte
offset was made, leaving the resulting unwind info potentially pointing
at the wrong offset.
* Make the same refactoring to clobber_restore
This reverts the key parts of e3a08d4c40
(#8151), because it turns out that we didn't need that abstraction.
Several changes in the last month have enabled this:
- #8292 and then #8316 allow us to refer to either incoming or outgoing
argument areas in a (mostly) consistent way
- #8327, #8377, and #8383 demonstrate that we never need to delay
writing stack arguments directly to their final location
prtest:full
* Fix pcc error on some workloads
- `attach_constant_fact` should create facts for constants based on their type
- `MovK` generates incorrect fact ranges
* Update cranelift/codegen/src/isa/aarch64/pcc.rs
Co-authored-by: Jamey Sharp <jamey@minilop.net>
* adjust test format
---------
Co-authored-by: Jamey Sharp <jamey@minilop.net>
The `compute_addr` helper only works on real address modes, and one of
its callers checks that it only passes a real address mode to it. On x64
we have different Rust types for address modes which are real versus
synthetic, so we can simplify a little by reflecting this requirement in
the types.
Since the vector of clobbered registers is sorted first by class, we can
return slices of that vector instead of allocating a temporary vector
for each class.
This relies on the `RealReg` sort order and on the frame layout's
`clobbered_callee_saves` being sorted in natural order.
Semantically, a "real register" is supposed to be a physical register,
so let's use the type dedicated for that purpose.
This has the advantage that `PReg` is only one byte while `VReg` is four
bytes, so the arrays where we record collections of `RealReg` become
smaller.
There was an implementation of `From<VReg> for RealReg` which was not a
sensible conversion, because not all `VReg`s are valid `RealReg`s. I
could have replaced it with a `TryFrom` implementation but it wasn't
used anywhere important, so I'm just deleting it instead.
Winch was using that `VReg`->`RealReg` conversion, but only in the
implementation of another conversion that was itself unused, so I'm
deleting that conversion as well. It's easy to implement correctly (the
Winch `Reg` type is identical to `RealReg`, so all conversions for the
latter are readily available) but as far as I can tell Winch doesn't
need to use Cranelift's register wrappers or RA2's virtual register
type, so it's simpler to just delete those conversions.
The riscv64 backend was relying on quirks of the existing conversions
between `RealReg` and `VReg` when emitting clobber saves and restores.
Just using the generic conversions between `RealReg` and `Reg` is
simpler and works correctly with the rest of these changes.
* Ensure that indirect tail calls on x64 don't clobber the destination register
* Add the fuzzbug to the x64 isa tests as well
* Use `reg_early_def` in the ReturnCall case as well
This method isn't actually correct if `split_off` is used on the
returned values again as it's missing some extra indexing processing.
That being said nothing in Wasmtime currently uses it, so remove it
entirely instead of keeping it around.
* Avoid copying the frame for tail calls on aarch64
To mirror the implementation in the x64 backend, switch to eagerly
reserving enough space in the incoming argument area for any tail call
present in the function being compiled.
prtest:macos-arm64
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Add a TODO about how to remove the direct SP manipulation
* Fix comments
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>