This commit reworks the `br_table` logic so that it correctly handles
all the jumps involved to each of the targets.
Even though it is safe to use the default branch for type information,
it is not safe to use it to derive the base stack pointer and base value
stack length. This change ensures that each target offset is taken into
account to balance the value stack prior to each jump.
Follow up to:
https://github.com/bytecodealliance/wasmtime/pull/7547
In which I overlooked this change and the fuzzer found an issue with the
following program:
```wat
(module
(func (export "") (result i32)
block (result i32)
i32.const 0
end
i32.const 0
i32.const 0
br_table 0
)
)
```
This commit ensures that the stack pointer is correctly positioned when
emitting br_table.
We can't know for sure which branch will be taken, but since all
branches must share the same type information, we can be certain that
the expectations regarding the stack pointer are the same and thus can
we use the default target in order to ensure the correct placement.
* Winch: cleanup stack in br_if in non-fallthrough case
* Remove unnecessary refetch of sp_offsets
* Refactoring based on PR feedback
* Have SPOffset implement Ord
This commit updates to the latest wasm-tools and `wit-bindgen` to bring
the family of crates forward. This update notably includes Nick's work
on packed indices in the `wasmparser` crate for validation for the
upcoming implementation of GC types. This meant that translation from
`wasmparser` types to Wasmtime types now may work with a "type id"
instead of just a type index which required plumbing not only Wasmtime's
own type information but additionally `wasmparser`'s type information
throughout translation.
This required a fair bit of refactoring to get this working but no
change in functionality is intended, only a different way of doing
everything prior.
* Winch: fix bug by spilling when calling a func
* Forgot to commit new filetest
* Only support WasmHeapType::Func
* Elaborate on call_indirect jump details
* Update docs for call
* Verify stack is only consts and memory entries
* Configure Rust lints at the workspace level
This commit adds necessary configuration knobs to have lints configured
at the workspace level in Wasmtime rather than the crate level. This
uses a feature of Cargo first released with 1.74.0 (last week) of the
`[workspace.lints]` table. This should help create a more consistent set
of lints applied across all crates in our workspace in addition to
possibly running select clippy lints on CI as well.
* Move `unused_extern_crates` to the workspace level
This commit configures a `deny` lint level for the
`unused_extern_crates` lint to the workspace level rather than the
previous configuration at the individual crate level.
* Move `trivial_numeric_casts` to workspace level
* Change workspace lint levels to `warn`
CI will ensure that these don't get checked into the codebase and
otherwise provide fewer speed bumps for in-process development.
* Move `unstable_features` lint to workspace level
* Move `unused_import_braces` lint to workspace level
* Start running Clippy on CI
This commit configures our CI to run `cargo clippy --workspace` for all
merged PRs. Historically this hasn't been all the feasible due to the
amount of configuration required to control the number of warnings on
CI, but with Cargo's new `[lint]` table it's possible to have a
one-liner to silence all lints from Clippy by default. This commit by
default sets the `all` lint in Clippy to `allow` to by-default disable
warnings from Clippy. The goal of this PR is to enable selective access
to Clippy lints for Wasmtime on CI.
* Selectively enable `clippy::cast_sign_loss`
This would have fixed#7558 so try to head off future issues with that
by warning against this situation in a few crates. This lint is still
quite noisy though for Cranelift for example so it's not worthwhile at
this time to enable it for the whole workspace.
* Fix CI error
prtest:full
This commit solidifies the approach for unreachable code handling in
control flow.
Prior to this change, at unconditional jump sites, the compiler would
reset the machine stack as well as the value stack. Even though this
appoach might seem natural at first, it actually broke several of the
invariants that must be met at the end of each contol block, this was
specially noticeable with programs that conditionally entered in an
unreachable state, like for example
```wat
(module
(func (;0;) (param i32) (result i32)
local.get 0
local.get 0
if (result i32)
i32.const 1
return
else
i32.const 2
end
i32.sub
)
(export "main" (func 0))
)
```
The approach followed in this commit ensures that all the invariants are
met and introduces more guardrails around those invariants. In short,
instead of resetting the value stack at unconditional jump sites, the
value stack handling is deferred until the reachability analysis
restores the reachability of the code generation process, ensuring that
the value stack contains the exact amount of values expected by the
frame where reachability is restored. Given that unconditional jumps
reset the machine stack, when the reachability of the code generation
process is restored, the SP offset is also restored which should match
the size of the value stack.
* winch: Introduce `ABIParams` and `ABIResults`
This commit prepares Winch to support WebAssembly Multi-Value.
The most notorious piece of this change is the introduction of the
`ABIParams` and `ABIResults` structs which are type wrappers around the
concept of an `ABIOperand`, which is the underlying main representation
of a param or result.
This change also consolidates how the size for WebAssembly types is
derived by introducing `ABI::sizeof`, as well as introducing
`ABI::stack_slot_size` to concretely indicate the stack slot size in
bytes for stack params, which is ABI dependent.
* winch: Add the necessary ABI building blocks for multi-value
This change adds the necessary changes at the ABI level in order to
handle multi-value.
The most notable modifications in this change are:
* Modifying Winch's default ABI to reverse the order of results,
ensuring that results that go in the stack should always come first;
this makes it easier to respect the following two stack invariants:
* Spilled memory values always precede register values
* Spilled values are stored from oldest to newest, matching their
respective locations on the machine stack.
* Modify all calling conventions supported by Winch so that only one result, the first one is stored in
registers. This differs from their vanilla counterparts in that these
ABIs can handle multiple results in registers. Given that Winch is not
a generic code generator, keeping the ABI close to what Wasmtime
expects makes it easier to pass multiple results at trampolines.
* Add more multi-value tests
This commit adds more tests for multi-value and improves documentation.
prtest:full
* Address review feedback
This patch fixes how jumps are handled in `br_table`; prior to this
change, `br_table` was implemented using
`CodeGenContext::unconditional_jump`; this function ensures, among other
invariants that the value stack and stack pointer must be balanced
according to the expectation of the target branch. Even though in
`br_table` there's branch to a potentially known location, it's
impossible be certain at compile time, which branch will be taken; in
that regard, `br_table` behaves more like `br_if`. Using
`unconditional_jump` resulted in the stack being manipulated multiple
times and breaking the other existing invariants around stack balancing.
This commit makes it so that `br_table` doesn't rely on
`unconditional_jump` anymore and instead it delegates control flow to
the target branch, which will ensure that the value stack and stack
pointer are correctly balanced when restoring reachability, very similar
to what happens with `br_if`.
This issue was discovered while fuzzing and a file test is included with
the test case.
This commit improves unconditional jumps by balancing the stack pointer
as well as the value stack when the current stack pointer and value
stack are greater than the target stack pointer and value stack. The invariant that
this changes maintains is that the the value stack should always reflect
the the state of the machine stack. The value stack might have excess
stack values in a presence of a fallthrough (`br_if` or `br_table`) in
which the target branch is not known at compile time; in this situation
instructions like `return` or `br` discard any excess values.
This commit properly derives a scratch register for a particular
WebAssembly type. The included spec test uncovered that the previous
implementation used a int scratch register to assign float stack
arguments, which resulted in a panic.
This change is a follow up to https://github.com/bytecodealliance/wasmtime/pull/7443;
after it landed I realized that Winch doesn't include spec tests for
local.get and loca.set.
Those tests uncovered a bug on the handling of the constant pool: given
Winch's singlepass nature, there's very little room know all the
constants ahead of time and to register them all at once at emission
time; instead they are emitted when they are needed by an instruction.
Even though Cranelift's machinery is capable of deuplicated constants in
the pool, `register_constant` assumes and checks that each constat
should only be pushed once. In Winch's case, since we emit as we go, we
need to carefully check if the constant is one was not emitted before,
and if that's the case, register it. Else we break the invariant that
each constant should only be registered once.
* Update wasm-tools crates
This commit updates the wasm-tools family of crate for a number of
notable updates:
* bytecodealliance/wasm-tools#1257 - wasmparser's ID-based
infrastructure has been refactored to have more precise types for each
ID rather than one all-purpose `TypeId`.
* bytecodealliance/wasm-tools#1262 - the implementation of
"implementation imports" for the component model which both updates
the binary format in addition to adding more syntactic forms of
imports.
* bytecodealliance/wasm-tools#1260 - a new encoding scheme for component
information for `wit-component` in objects (not used by Wasmtime but
used by bindings generators).
Translation for components needed to be updated to account for the first
change, but otherwise this was a straightforward update.
* Remove a TODO
While not a large amount of binary size if the purpose of the
`--no-default-features` build is to showcase "minimal Wasmtime" then may
as well try to make `clap` as small as possible.
* winch: Add known a subset of known libcalls and improve call emission
This change is a follow up to:
- https://github.com/bytecodealliance/wasmtime/pull/7155
- https://github.com/bytecodealliance/wasmtime/pull/7035
One of the objectives of this change is to make it easy to emit
function calls at the MacroAssembler layer, for cases in which it's
challenging to know ahead-of-time if a particular functionality can be
achieved natively (e.g. rounding and SSE4.2). The original implementation
of function call emission, made this objective difficult to achieve and
it was also difficult to reason about.
I decided to simplify the overall approach to function calls as part of
this PR; in essence, the `call` module now exposes a single function
`FnCall::emit` which is reponsible of gathtering the dependencies and
orchestrating the emission of the call. This new approach deliberately
avoids holding any state regarding the function call for simplicity.
This change also standardizes the usage of `Callee` as the main
entrypoint for function call emission, as of this change 4 `Callee`
types exist (`Local`, `Builtin`, `Import`, `FuncRef`), each callee kind
is mappable to a `CalleeKind` which is the materialized version of
a callee which Cranelift understands.
This change also moves the creation of the `BuiltinFunctions` to the
`ISA` level given that they can be safely used accross multiple function
compilations.
Finally, this change also introduces support for some of the
"well-known" libcalls and hooks those libcalls at the
`MacroAssembler::float_round` callsite.
--
prtest:full
* Review comments
* Remove unnecessary `into_iter`
* Fix remaining lifetime parameter names
* winch(x64): Add support for table instructions
This change adds support for the following table insructions:
`elem.drop`, `table.copy`, `table.set`, `table.get`, `table.fill`,
`table.grow`, `table.size`, `table.init`.
This change also introduces partial support for the `Ref` WebAssembly
type, more conretely the `Func` heap type, which means that all the
table instructions above, only work this WebAssembly type as of this
change.
Finally, this change is also a small follow up to the primitives
introduced in https://github.com/bytecodealliance/wasmtime/pull/7100,
more concretely:
* `FnCall::with_lib`: tracks the presence of a libcall and ensures that
any result registers are freed right when the call is emitted.
* `MacroAssembler::table_elem_addr` returns an address rather than the
value of the address, making it convenient for other use cases like
`table.set`.
--
prtest:full
* chore: Make stack functions take impl IntoIterator<..>
* Update winch/codegen/src/codegen/call.rs
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Remove a dangling `dbg!`
* Add comment on branching
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Handle `lower_branch` consistently amongst backends
This commit is a refactoring to consistently implement `lower_branch`
among Cranelift's backends. Previously each backend had its own means of
extracting labels and shuffling along information, and now there's
prelude methods for all backends to access and use. This changes a few
display impls but the actual meat of what's emitted shouldn't change
amongst the backends.
* Fix compile
* winch(x64): Call indirect
This change adds support for the `call_indirect` instruction to Winch.
Libcalls are a pre-requisite for supporting `call_indirect` in order to
lazily initialy funcrefs. This change adds support for libcalls to
Winch by introducing a `BuiltinFunctions` struct similar to Cranelift's
`BuiltinFunctionSignatures` struct.
In general, libcalls are handled like any other function call, with the
only difference that given that not all the information to fulfill the
function call might be known up-front, control is given to the caller
for finalizing the call.
The introduction of function references also involves dealing with
pointer-sized loads and stores, so this change also adds the required
functionality to `FuncEnv` and `MacroAssembler` to be pointer aware,
making it straight forward to derive an `OperandSize` or `WasmType` from
the target's pointer size.
Finally, given the complexity of the call_indirect instrunction, this
change bundles an improvement to the register allocator, allowing it to
track the allocatable vs non-allocatable registers, this is done to
avoid any mistakes when allocating/de-allocating registers that are not
alloctable.
--
prtest:full
* Address review comments
* Fix typos
* Better documentation for `new_unchecked`
* Introduce `max` for `BitSet`
* Make allocatable property `u64`
* winch(calls): Overhaul `FnCall`
This commit simplifies `FnCall`'s interface making its usage more
uniform throughout the compiler. In summary, this change:
* Avoids side effects in the `FnCall::new` constructor, and also makes
it the only constructor.
* Exposes `FnCall::save_live_registers` and
`FnCall::calculate_call_stack_space` to calculate the stack space
consumed by the call and so that the caller can decide which one to
use at callsites depending on their use-case.
* tests: Fix regset tests
This patch refactors all of the ISA/ABI specific prolog/epilog
generation code around the following two ideas:
1. Separate *planning* of the function's frame layout from the
actual *implementation* within prolog / epilog code.
2. No longer overload different purposes (middle-end register
tracking, platform-specific details like authorization modes,
and pop-stack-on-return) into a single return instruction.
As to 1., the new approach is based around a FrameLayout data
structure, which collects all information needed to emit prolog
and epilog code, specifically the list of clobbered registers,
and the sizes of all areas of the function's stack frame.
This data structure is now computed *once*, before any code is
emitted, and stored in the Callee data structure. ABIs need to
implement this via a new compute_frame_layout callback, which
gets all data from common code needed to make all decisions
around stack layout in one place.
The FrameLayout is then used going forward to answer all questions
about frame sizes, and it is passed to all ABI routines involved
in prolog / epilog code generation. [ This removes a lot of
duplicated calculation, e.g. the list of clobbered registers is
now only computed once and re-used everywhere. ]
This in turn allows to reduce the number of distinct callbacks
ABIs need to implement, and simplifies common code logic around
how and when to call them. In particular, we now only have the
following four routines, which are always called in this order:
gen_prologue_frame_setup
gen_clobber_save
gen_clobber_restore
gen_epilogue_frame_restore
The main differences to before are:
- frame_setup/restore are now called unconditionally (the target
ABI can look in the FrameLayout to detect the case where no
frame setup is required and skip whatever it thinks appropriate
in that case)
- there is no separate gen_prologue_start; if the target needs
to do anything here, it can now just do it instead in
gen_prologue_frame_setup
- common code no longer attempts to emit a return instruction;
instead the target can do whatever is necessary/optimal in
gen_epilogue_frame_restore
[ In principle we could also just have a single gen_prologue
and gen_epilogue callback - I didn't implement this because
then all the stack checking / probing logic would have to be
moved to target code as well. ]
As to 2., currently targets are required to implement a single
"Ret" return instruction. This is initially used during
register allocation to hold a list of return preg/vreg pairs.
During epilog emission, this is replaced by another copy of
the same "Ret" instruction that now carries various platform
specific data (e.g. authorization modes on aarch64), and is
also overloaded to handle the case where the ABI requires
that a number of bytes are popped during return.
This is a bit unfortunate in that it blows up the size of
the instruction data, and also forces targets (that do not
have a "ret N" instruction like Intel) into duplicated and
possible sub-optimal implementations of stack adjustment
during low-level emission of the return instruction.
The new approach separates these concerns. Initially, common
code emits a new "Rets" instruction that is completely parallel
to the existing "Args", and is used only during register
allocation holding the preg/vreg pairs. That instruction
-like now- is replaced during epilog emission - but unlike
now the replacement is now completely up to the target, which
can do whatever it needs in gen_epilogue_frame_restore.
This would typically emit some platform-specific low-level
"Ret" instruction instead of the regalloc "Rets". It also
allows non-Intel targets to just create a normal (or even
optimized) stack adjustment sequence before its low-level "Ret".
[ In particular, on riscv64 pop-stack-before-return currently
emits two distinct stack adjustment instructions immediately
after one another. These could now be easily merged, but that's
not yet done in this patch. ]
No functional change intended on any target.
This change is a small refactoring to some of the MacroAssembler functions to
use `Reg` instead of `RegImm` where appropriate (e.g. when the operand is a
destination).
@elliottt pointed this out while working on https://github.com/bytecodealliance/wasmtime/pull/6982
This change also changes the signature of `float_abs` and `float_neg`, which can
be simplified to take a single register.
* Improve lowering of store_imm on x64
Adds a new x64 rule for directly lowering stores of immediates with a MOV instruction.
* Ensure that the MovImmM operand fits in an i32 and add tests.
* Update winch to handle MovImmM change
* winch: Support f32.abs and f64.abs on x64
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Add an implementation of f32.neg and f64.neg
* Enable spec tests for winch with f{32,64}.{neg,abs}
* Enable differential fuzzing for f{32,64}.{neg,abs} for winch
* Comments from code review
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* winch: Add support for `br_table`
This change adds support for the `br_table` instruction, including
several modifications to the existing control flow implementation:
* Improved handling of jumps to loops: Previously, the compiler erroneously
treated the result of loop blocks as the definitive result of the jump. This
change fixes this bug.
* Streamlined result handling and stack pointer balancing: In the past, these
operations were executed in two distinct steps, complicating the process of
ensuring the correct invariants when emitting unconditional jumps. To simplify
this, `CodeGenContext::unconditional_jump` is introduced . This function
guarantees all necessary invariants are met, encapsulating the entire operation
within a single function for easier understanding and maintenance.
* Handling of unreachable state at the end of a function: when reaching the end
of a function in an unreachable state, clear the stack and ensure that the
machine stack pointer is correctly placed according to the expectations of the
outermost block.
In addition to the above refactoring, the main implementation of the
`br_table` instruction involves emitting labels for each target. Within each
label, an unconditional jump is emitted to the frame's label, ensuring correct
stack pointer balancing when the jump is emitted.
While it is possible to optimize this process by avoiding intermediate labels
when balancing isn't required, I've opted to maintain the current
implementation until such optimization becomes necessary.
* chore: Rust fmt
* fuzzing: Add `BrTable` to list of support instructions
* docs: Improve documentation for `unconditional_jump`
By not activating the `derive` feature on `serde`, the compilation speed
can be improved by a lot. This is because `serde` can then compile in
parallel to `serde_derive`, allowing it to finish compilation possibly
even before `serde_derive`, unblocking all the crates waiting for
`serde` to start compiling much sooner.
As it turns out the main deciding factor for how long the compile time of a
project is, is primarly determined by the depth of dependencies rather
than the width. In other words, a crate's compile times aren't affected
by how many crates it depends on, but rather by the longest chain of
dependencies that it needs to wait on. In many cases `serde` is part of
that long chain, as it is part of a long chain if the `derive` feature
is active:
`proc-macro2` compile build script > `proc-macro2` run build script >
`proc-macro2` > `quote` > `syn` > `serde_derive` > `serde` >
`serde_json` (or any crate that depends on serde)
By decoupling it from `serde_derive`, the chain is shortened and compile
times get much better.
Check this issue for a deeper elaboration:
https://github.com/serde-rs/serde/issues/2584
For `wasmtime` I'm seeing a reduction from 24.75s to 22.45s when
compiling in `release` mode. This is because wasmtime through `gimli`
has a dependency on `indexmap` which can only start compiling when
`serde` is finished, which you want to happen as early as possible so
some of wasmtime's dependencies can start compiling.
To measure the full effect, the dependencies can't by themselves
activate the `derive` feature. I've upstreamed a patch for
`fxprof-processed-profile` which was the only dependency that activated
it for `wasmtime` (not yet published to crates.io). `wasmtime-cli` and
co. may need patches for their dependencies to see a similar
improvement.
* winch: Add support for parametric instructions
This commit introduces support for the drop and select instructions.
Additionally, it refactors the CodeGenContext::drop_last implementation,
enhancing flexibility for callers to determine the handling of elements to be
dropped. This refactoring simplifies scenarios where a Memory entry is at the
top of the stack.
* refactor: Use `cmov` instead of local control flow
* winch: Derive `OperandSize` from the value type
This change is a small refactor to how we've been handling the operand size
parameter passed to some of the `CodeGenContext` operations, namely,
`pop_to_reg` and `move_val_to_reg`.
Given the more precise value tagging introduced in:
https://github.com/bytecodealliance/wasmtime/pull/6860,
it's now possible to derive the operand size from the type associated to a value
stack entry, which:
* Makes the usage of the functions mentioned above less error prone.
* Allows a simplification of the two function definitions mentioned above.
* Results in better instruction selection in some cases.
* chore: Update filetests
* winch: Initial support for floats
This change introuduces the necessary building blocks to support floats in
Winch as well as support for both `f32.const` and `f64.const` instructions.
To achieve support for floats, this change adds several key enhancements to the
compiler:
* Constant pool: A constant pool is implemented, at the Assembler level, using the machinery
exposed by Cranelift's `VCode` and `MachBuffer`. Float immediates are stored
using their bit representation in the value stack, and whenever they are
used at the MacroAssembler level they are added to the constant
pool, from that point on, they are referenced through a `Constant` addressing
mode, which gets translated to a RIP-relative addressing mode during emission.
* More precise value tagging: aside from immediates, from which the type can
be easily inferred, all the other value stack entries (`Memory`, `Reg`, and `Local`) are
modified to explicitly contain a WebAssembly type. This allows for better
instruction selection.
--
prtest:full
* fix: Account for relative sp position when pushing float regs
This was an oversight of the initial implementation. When pushing float
registers, always return an address that is relative to the current position of
the stack pointer, essentially storing to (%rsp). The previous implementation
accounted for static addresses, which is not correct.
* fix: Introduce `stack_arg_slot_size_for_type`
To correctly calculate the stack argument slot sizes, instead of overallocating
for `word_bytes`, since for `f32` floating points we only need to worry about
loading/storing 4 bytes.
* fix: Correctly type the result register.
The previous version wrongly typed the register as a general purpose register.
* refactor: Re-write `add_constants` through `add_constant`
* docs: Replace old comment
* chore: Rust fmt
* refactor: Index regset per register class
This commit implements `std::ops::{Index, IndexMut}` for `RegSet` to index each
of the bitsets by class. This reduces boilerplate and repetition throuhg the
code generation context, register allocator and register set.
* refactor: Correctly size callee saved registers
To comply with the expectation of the underlying architecture: for example in
Aarch64, only the low 64 bits of VRegs are callee saved (the D-view) and in the
`fastcall` calling convention it's expected that the callee saves the entire 128
bits of the register xmm6-xmm15.
This change also fixes the the stores/loads of callee saved float registers in the
fastcall calling convention, as in the previous implementation only the low 64
bits were saved/restored.
* docs: Add comment regarding typed-based spills
This commit prepares for the introduction of float support to Winch. Initially,
I intended to include this change as part of the original change supporting
floats, but that change is already sizable enough.
This modification simplifies the Assembler and MacroAssembler interfaces, as
well as the interaction and responsibilities between them, by:
* Eliminating the `Operand` abstraction, which didn't offer a substantial
benefit over simply using the MacroAssembler's `RegImm` and `Address`
abstractions as operands where necessary. This approach also reduces the number
of conversions required prior to emission.
* Shifting the instruction dispatch responsibility solely to the MacroAssembler,
rather than having this responsibility shared across both abstractions. This was
always the original intention behind the MacroAssembler. As a result, function
definitions at the Assembler layer become simpler.
This change also introduces richer type information for immediates, which
results in better instruction selection in some cases, and it's also needed to
support floats.
* Use `Offset32` as `i32` in ISLE
This commit updates the x64 and aarch64 backends to use the `i32`
primitive type in ISLE when working with an `Offset32` instead of a
`u32`. This matches the intended representation of `Offset32` as a type
which is signed internally and represents how offsets on instructions
are often negative too.
This does not actually change any end results of compilation and instead
is intended to be "just" an internal refactoring with fewer casts and
more consistent handling of offsets.
* aarch64: Define the `PairAMode` type in ISLE
This commit moves the definition of the `PairAMode` enum into ISLE
instead of its current Rust-defined location. This is in preparation for
the next commit where all AMode calculations will be moved into ISLE.
* aarch64: Fix a copy/paste typo loading vectors
This commit fixes an assertion that can be tripped in the aarch64
backend where a 64-bit load was accidentally flagged as a 128-bit load.
This was found in future work which ended up tripping the assertion a
bit earlier.
* aarch64: Move AMode computation into ISLE
This commit moves the computation of the `AMode` enum for addressing
from Rust into ISLE. This enables deleting a good deal of Rust code in
favor of (hopefully) more readable ISLE code.
This does not mirror the preexisting logic exactly but instead takes a
different approach for generating the `AMode`. Previously the entire
chain of `iadd`s input into an address were unfolded into 32-bit and
64-bit operations and then those were re-combined as possible into an
`AMode` (possibly emitting `add` instructions. Instead now pattern
matching is used to represent this. The net result is that amodes are
emitted slightly differently here and there in a number of updated test
cases.
I've tried to verify in all test cases that the number of instructions
has not increased and the same logical operation is happening. The exact
`AMode` may differ but at least instruction-wise this shouldn't be a
regression. My hope is that if anything needs changing that can be
represented with updates to the rule precedence in ISLE or various other
special cases.
One part I found a little surprising was that the immediate with a
load/store instruction is not actually used much of the time. I naively
thought that the mid-end optimizations would move iadd immediates into
the load/store immediate but that is not the case. This necessitated two
extra ISLE rules to manually peel off immediates and fold them into the
load/store immediate.
* aarch64: Remove `NarrowValueMode`
This is no longer needed after the prior commit
* Remove deny.toml exception for wasm-coredump-builder
This isn't used any more so no need to continue to list this.
* Update Wasmtime's pretty_env_logger dependency
This removes a `deny.toml` exception for that crate, but `openvino-sys`
still depends on `pretty_env_logger 0.4.0` so a new exception is added
for that.
* Update criterion and clap dependencies
This commit started out by updating the `criterion` dependency to remove
an entry in `deny.toml`, but that ended up transitively requiring a
`clap` dependency upgrade from 3.x to 4.x because `criterion` uses
pieces of clap 4.x. Most of this commit is then dedicated to updating
clap 3.x to 4.x which was relatively simple, mostly renaming attributes
here and there.
* Update gimli-related dependencies
I originally wanted to remove the `indexmap` clause in `deny.toml` but
enough dependencies haven't updated from 1.9 to 2.0 that it wasn't
possible. In the meantime though this updates some various dependencies
to bring them to the latest and a few of them now use `indexmap` 2.0.
* Update deps to remove `windows-sys 0.45.0`
This involved updating tokio/mio and then providing new audits for new
crates. The tokio exemption was updated from its old version to the new
version and tokio remains un-audited.
* Update `syn` to 2.x.x
This required a bit of rewriting for the component-macro related bits
but otherwise was pretty straightforward. The `syn` 1.x.x track is still
present in the wasi-crypto tree at this time.
I've additionally added some trusted audits for my own publications of
`wasm-bindgen`
* Update bitflags to 2.x.x
This updates Wasmtime's dependency on the `bitflags` crate to the 2.x.x
track to keep it up-to-date.
* Update the cap-std family of crates
This bumps them all to the next major version to keep up with updates.
I've additionally added trusted entries for publishes of cap-std crates
from Dan.
There's still lingering references to rustix 0.37.x which will need to
get weeded out over time.
* Update memoffset dependency to latest
Avoids having two versions in our crate graph.
* Fix tests
* Update try_from for wiggle flags
* Fix build on AArch64 Linux
* Enable `event` for rustix on Windows too
* Update wasm-tools dependencies
* Get tests passing after wasm-tools update
Mostly dealing with updates to `wasmparser`'s API.
* Update `cargo vet` for new crates
* Add `equivalent`, `hashbrown`, and `quote` to the list of trusted
authors. We already trust these authors for other crates.
* Pull in some upstream audits for various deps.
* I've audited the `pulldown-cmark` dependency upgrade myself.
This commit adds support for the `local.tee` instruction. This change
also introduces a refactoring to the original implementation of
`local.set` to be able to share most of the code for the implementation
of `local.tee`.