* Enable the tail calling convention by default
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Move conditional default initialization to its own method
* Fix comment about when tail calls get enabled automatically
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This commit removes all our `#[cfg_attr(..., doc(cfg(...)))]`
annotations throughout Wasmtime and `wasmtime-wasi`. These are all
replaced with `feature(doc_auto_cfg)` which automatically infers the
attribute to show rather than requiring us to duplicate it.
Spot-checking the docs this looks just-as-readable while being much
easier to maintain over time.
Similar to https://github.com/bytecodealliance/wasmtime/pull/8481 but for struct
types instead of array types.
Note that this is support for only defining these types in Wasm or the host; we
don't support allocating instances of these types yet. That will come in follow
up PRs.
* riscv64: Add minimal support for Zfa Extension
This commit adds support for the Zfa extension and implements the `fminm`/`fmaxm` instructions and lowerings.
The Zfa extension provides additional floating point related instructions not included in either F/D extensions.
* riscv64: Enable Zfa in CI runner
* Migrate the `wasmtime-environ` crate to `no_std`
This commit migrates the `wasmtime-environ` crate to by default being
tagged with `#![no_std]`. Only the `component-model` and `gc` features
are able to be built without `std`, all other features will implicitly
activate the `std` feature as they currently require it one way or
another. CI is updated to build `wasmtime-environ` with these two
features active on a no_std platform.
This additionally, for the workspace, disables the `std` feature for the
`target-lexicon`, `indexmap`, `object`, and `gimli` dependencies. For
object/gimli all other crates in the workspace now enable the `std`
feature, but for `wasmtime-environ` this activation is omitted.
The `thiserror` dependency was dropped from `wasmtime-environ` and
additionally `hashbrown` was added for explicit usage of maps.
* Always enable `std` for environ for now
prtest:full
* Add some more std features
* wasi-nn: remove Git submodule
To more closely align with the conventions in the `wasmtime-wasi` and
`wasmtime-wasi-http` crates, this change removes the Git submodule that
previously provided the WIT and WITX files for `wasmtime-wasi-nn`. Like
those other crates, the syncing of wasi-nn WIT and WITX files will
happen manually for the time being. This is the first PR towards
upgrading the wasi-nn implementation to match recent spec changes and
better preview2-ABI compatibility.
prtest:full
* ci: auto-vendor the wasi-nn WIT files
* Double the default allowed table elements
This commit doubles the default allowed table elements per table in the
pooling allocator from 10k to 20k. This helps to, by default, run the
module produced in #8504.
* Update docs on deafults
We can't meaningfully audit the other WebAssembly implementations that
we use for differential fuzzing, such as wasmi and especially v8. Let's
acknowledge that the effort to do so is not practical for us, and focus
our vetting efforts on crates that developers and users are more likely
to build.
This reduces our estimated audit backlog by over three million lines,
according to `cargo vet suggest`.
Note that our crates which depend on those engines, such as
wasmtime-fuzzing, are not published to crates.io, so if we fall victim
to a supply chain attack against dependencies of these crates, the folks
who might be impacted are limited.
Although there is value in also auditing code that might be run by
people who clone our git repository, in this case I propose that anyone
who is concerned about the risks of supply chain attacks against their
development systems should be running fuzzers inside a sandbox. After
all, it's a fuzzer: it's specifically designed to try to do anything.
`wasi-nn`'s test program suite is light at the moment but, in order to
expand it, this change factors out some of the common bits that are
being used in the `test-programs` crate. Since all of the tests perform
some kind of image classification, the new `nn` module gains `classify`
and `sort_results` functions to help with this exact case.
prtest:full
* Migrate the wasmtime-types crate to no_std
This commit is where no_std for Wasmtime starts to get a bit
interesting. Specifically the `wasmtime-types` crate is the first crate
that depends on some nontrivial crates that also need to be migrated to
`no_std`. This PR disables the default feature of `wasmparser` by
default and additionally does the same for `serde`. This enables them to
compile in `no_std` contexts by default and default features will be
enabled elsewhere in this repository as necessary.
This also opts to drop the `thiserror` dependency entirely in favor of a
manual `Display` implementation with a cfg'd implementation of `Error`.
As before CI checks are added for `wasmtime-types` with a `no_std`
target itself to ensure the crate and all dependencies all avoid `std`.
* Fix adapter build
* Wasmtime: add one-entry call-indirect caching.
In WebAssembly, an indirect call is somewhat slow, because of the
indirection required by CFI (control-flow integrity) sandboxing. In
particular, a "function pointer" in most source languages compiled to
Wasm is represented by an index into a table of funcrefs. The
`call_indirect` instruction then has to do the following steps to invoke
a function pointer:
- Load the funcref table's base and length values from the vmctx.
- Bounds-check the invoked index against the actual table size; trap if
out-of-bounds.
- Spectre mitigation (cmove) on that bounds-check.
- Load the `vmfuncref` from the table given base and index.
- For lazy table init, check if this is a non-initialized funcref
pointer, and initialize the entry.
- Load the signature from the funcref struct and compare it against the
`call_indirect`'s expected signature; trap if wrong.
- Load the actual code pointer for the callee's Wasm-ABI entry point.
- Load the callee vmctx (which may be different for a cross-module
call).
- Put that vmctx in arg 0, our vmctx in arg 1, and invoke the loaded
code pointer with an indirect call instruction.
Compare and contrast to the process involved in invoking a native
function pointer:
- Invoke the code pointer with an indirect call instruction.
This overhead buys us something -- it is part of the SFI sandbox
boundary -- but it is very repetitive and unnecessary work in *most*
cases when indirect function calls are performed repeatedly (such as
within an inner loop).
This PR introduces the idea of *caching*: if we know that the result of
all the above checks won't change, then if we use the same index as "the
last time" (for some definition), we can skip straight to the "invoke
the code pointer" step, with a cached code pointer from that last time.
Concretely, it introduces a two-word struct inlined into the vmctx for
each `call_indirect` instruction in the module (up to a limit):
- The last invoked index;
- The code pointer that index corresponded to.
When compiling the module, we check whether the table could possibly be
mutable at a given index once read: any instructions like `table.set`,
or the whole table exported thus writable from the outside. We also
check whether index 0 is a non-null funcref. If neither of these things
are true, then we know we can cache an index-to-code-pointer mapping,
and we know we can use index 0 as a sentinel for "no cached value".
We then make use of the struct for each indirect call site and generate
code to check if the index matches; if so, call cached pointer; if not,
load the vmfuncref, check the signature, check that the callee vmctx is
the same as caller (intra-module case), and stash the code pointer and
index away (fill the cache), then make the call.
On an in-development branch of SpiderMonkey-in-Wasm with ICs (using
indirect calls), this is about a 20% speedup; I haven't yet measured on
other benchmarks. It is expected that this might be an
instantiation-time slowdown due to a larger vmctx (but we could use
madvise to zero if needed).
This feature is off by default right now.
* Addressed review feedback.
* Added some more comments.
* Allow unused VMCallIndirectCache struct (defined for parity with other bits but not needed in actual runtime).
* Add a limit to the number of call-indirect cache slots.
* Fix merge conflict: handle ConstOp element offset.
* Review feedback.
These lists of ranges always cover contiguous ranges of an index space,
meaning the start of one range is the same as the end of the previous
range, so we can cut storage in half by only storing one endpoint of
each range.
This in turn means we don't have to keep track of the other endpoint
while building these lists, reducing the state we need to keep while
building vcode and simplifying the various build steps.
* cranelift/x64: Fix XmmRmREvex pretty-printing
The operand collector had these operands in src1/src2/dst order, but the
pretty-printer fetched the allocations in dst/src1/src2 order instead.
Although our pretty-printer looked like it was printing src1/src2/dst,
because it consumed operands in the wrong order, what it actually
printed was src2/dst/src1.
Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU
objdump agrees.)
In the only filetest covering the vpsraq instruction, our output agreed
with Capstone because register allocation picked the same register for
both src1 and dst, so the two orders were indistinguishable. I've
extended the filetest to force register allocation to pick different
registers.
This format is also used for vpmullq, but we didn't have any compile
filetests covering that instruction, so I've added one with the same
register allocation pattern.
Now our pretty-printer agrees with Capstone on both instructions.
* Fix emit-tests and vpermi2b
This test for vpmullq had what we have now determined is the wrong order
for src1 and src2.
There were no emit-tests for vpsraq, so I added one.
The vpermi2b tests used the wrong form of the Inst enum, judging by the
assertions that are in x64_get_operands (which is not exercised by emit
tests) and the fact that we never use that form for that instruction
anywhere else.
Pretty-printing vpermi2b disagreed with Capstone in the same way as
vpsraq and vpmullq. I've fixed that form to agree with Capstone as well,
aside from the duplicated src1/dst operand which are required to be
different before register allocation and equal afterward.
* wasmtime: Use ConstExpr for element segment offsets
This shouldn't change any behavior currently, but prepares us for
supporting extended constant expressions.
* Fix clippy::cast_sign_loss lint
* Expose `wasmtime-runtime` as `crate::runtime::vm` internally for the `wasmtime` crate
* Rewrite uses of `wasmtime_runtime` to `crate::runtime::vm`
* Remove dep on `wasmtime-runtime` from `wasmtime-cli`
* Move the `wasmtime-runtime` crate into the `wasmtime::runtime::vm` module
* Update labeler for merged crates
* Fix `publish verify`
prtest:full
This establishes the property that the VCode's various lists of ranges
each fully cover the index range of another list. Previously, the
block_succ_range list covered the first half of block_succs_preds, and
the block_pred_range list covered the second half.
While I was in the area, I replaced the O(n log n) sort in
compute_preds_from_succs with a linear-time counting sort, which uses
less temporary storage and directly computes the ranges we want as a
byproduct.
Now that #8486 landed, allowing us to resolve aliases in machine
instructions, we have ensured that all VReg aliases are resolved by the
time we're done building the VCode. Therefore we only need to keep track
of the aliases map before that.
The VReg allocator is also dropped when we finish building the VCode,
and it makes sense to track aliases there. This lets us maintain an
invariant, that PCC facts are only stored on VRegs which are not
aliased, while only reasoning locally within VRegAllocator.
I've changed the trace-log output to print the VCode immediately before
it's finalized, along with key details in the VRegAllocator. This allows
seeing the instructions before aliases are rewritten, although they're
in reverse order at that point. There's another trace-log message
somewhere else which logs the finalized VCode, so you can see both.
Previously, the initial capacity of the vreg_aliases map was set to ten
times the number of basic blocks in the function. However we can make a
better estimate based on the number of SSA values in the function, and
use that to preallocate storage for other things in VRegAllocator too.
Keeping the aliases outside the VCode fixes previous borrow-checker
challenges, which is a nice bonus.
* move fx hash to workspace level dep
* change internal fxhash to use fxhash crate
* remove unneeded HashSet import
* change fxhash crate to rustc hash
* undo migration to rustc hash
* manually implement hash function from fxhash
* change to rustc hash
Instead of
> Performing build step for
'wasmtime-crate''WASMTIME_CARGO_BINARY-NOTFOUND' is not recognized as an
internal or external command, operable program or batch file.
this will now instead output
> "cargo" was not found. Ensure "cargo" is in PATH. Aborting...
This paves the way for more implementations of this OperandVisitor trait
which can do different things with the operands.
Of particular note, this commit introduces a second implementation which
is used only in the s390x backend and only to implement a debug
assertion. Previously, s390x used an OperandCollector::no_reuse_def
method to implement this assertion, but I didn't want to require that
all implementors of the new trait have to provide that method, so this
captures the same check but keeps it local to where it's needed.
The operand collector and the instruction emitter for Inst::Mov both
placed the `rm` register before `rd`, so the emitted code was correct,
but the pretty-printer used the opposite order and so printed the
operands backwards. Note that the VCode disassembly disagreed with
Capstone's disassembly of the emitted machine code.
We know the type of each VReg at the moment when we allocate it, so we
never need to set the type again. That also means we don't need to
deduplicate reftyped VRegs as we go, although it's still possible to
have duplicates after aliases are resolved.
This commit adds support for defining array types from Wasm or the host, and
managing them inside the engine's types registry. It does not introduce support
for allocating or manipulating array values. That functionality will come in
future pull requests.
Now all registers passed to the operand collector are mutably borrowed
directly out of their original locations in the Inst, so it is possible
to update them in place.
As an initial demonstration of the utility of this change, the results
of the VReg renamer are applied directly to the instructions during
operand collection, and then all VReg aliases are cleared after operand
collection.
Most of this commit consists of deleting noise from the many
`get_operands` implementations in all the backends: most ampersands and
asterisks, and all uses of the `ref` keyword.
* Add an example for wasi-nn WinML backend.
This example is similar as classification-component-onnx, but it's not a wasm
component, and it's for WinML backend. It also includes a step by step
instruction for running this example.
* Touch up the documentation for this example
This change removes some duplicated information and tweaks some of the
wording.
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Gate type-builder types from `wasmtime-environ` on `compile`
This commit gates the `*Builder` types related to building sets of types
in the `wasmtime-environ` crate on the `compile` feature. This helps
bring in less code when the feature is disabled and helps exclude some
dependencies for the upcoming `no_std` migration as well.
This commit doesn't change anything, it's just moving code around.
* Remove no-longer-needed import
prtest:full
Currently the CMakeLists.txt is designed to only be consumed by a parent
project, which limits its flexibility. Adding a `project()` function
allows it to also be built as its own project.
Also set `USES_TERMINAL_BUILD` to `TRUE` in `ExternalProject_Add()`,
which allows getting the installation progress output from `cargo
build`.
This removes about a million lines from our estimated audit backlog
according to `cargo vet suggest`.
If I understand the Criterion documentation correctly, I believe this
means that generating HTML reports from Criterion benchmarks now
requires having gnuplot installed, because it can't fall back to using
the pure-Rust "plotters" crate.
The implementation of `PRegSet::from(self.vcode.machine_env())` does a
surprising amount of work: it lazily initializes a `OnceLock` in the
backend, then loops over six vectors of registers and adds them all to
the `PRegSet`.
We could likely implement that better, but in the meantime at least we
can avoid repeating that work for every single machine instruction. The
`PRegSet` itself only takes a few words to store so it's cheap to just
keep it around.
I discovered this because when the call to `self.vcode.machine_env()` is
in the middle of the loop, that prevents holding a mutable borrow on
parts of `self.vcode`, which I want to be able to do in another PR.
Looks like GitHub is changing `macos-latest` to arm64 so change the test
builder that test x64 macos to `macos-13` which is the last builder that
wasn't arm64. Additionally drop `macos-14` from the C API tests since
testing `macos-latest` should be sufficient enough.
When lower_branch_blockparam_args is called, the instructions which
define the values used as blockparam args haven't been lowered yet, so
we haven't set any aliases referring to them yet, so there's no point
checking.
Note that block-param argument aliases are currently resolved after
operand collection, so this work does happen eventually.
Also, this method doesn't add any instructions to self.ir_insts, so
there's no need to call finish_ir_inst.
With this change, VReg alias resolution is purely local to vcode.rs.
* Start migrating some Wasmtime crates to no_std
This commit is the first in what will be multiple PRs to migrate
Wasmtime to being compatible with `#![no_std]`. This work is outlined
in #8341 and the rough plan I have in mind is to go on a crate-by-crate
basis and use CI as a "ratchet" to ensure that `no_std` compat is
preserved. In that sense this PR is a bit of a template for future PRs.
This PR migrates a few small crates to `no_std`, basically those that
need no changes beyond simply adding the attribute. The nontrivial parts
introduced in this PR are:
* CI is introduced to verify that a subset of crates can indeed be
built on a `no_std` target. The target selected is
`x86_64-unknown-none` which is known to not have `std` and will result
in a build error if it's attempted to be used.
* The `anyhow` crate, which `wasmtime-jit-icache-coherence` now depends
on, has its `std` feature disabled by default in Wasmtime's workspace.
This means that some crates which require `std` now need to explicitly
enable the feature, but it means that by-default its usage is
appropriate for `no_std`.
The first point should provide CI checks that compatibility with
`no_std` indeed works, at least from an "it compiles" perspective. Note
that it's not sufficient to test with a target like
`x86_64-unknown-linux-gnu` because `extern crate std` will work on that
target, even when `#![no_std]` is active.
The second point however is likely to increase maintenance burden
in Wasmtime unfortunately. Namely we'll inevitably, either here or in
the future, forget to turn on some feature for some crate that's not
covered in CI checks. While I've tried to do my best here in covering it
there's no guarantee that everything will work and the combinatorial
explosion of what could be checked in CI can't all be added to CI.
Instead we'll have to rely on bug fixes, users, and perhaps point
releases to add more use cases to CI over time as we see fit.
* Add another std feature
* Another std feature
* Enable anyhow/std for another crate
* Activate `std` in more crates
* Fix miri build
* Fix compile on riscv64
prtest:full
* Fix min-platform example build
* Fix icache-coherence again
We have a perfectly good data structure for holding sets of physical
registers. We don't need a hash-set to determine whether we already
added a register to an array when we can just add it to a PRegSet and
then extract a suitable array out of that.
This change also makes the computed array sorted, which isn't
particularly important since the backends all modify it and then sort it
again, but is generally a nice property to have.