* riscv64: Specify rounding modes in instructions
This commit updates how floating-point instructions specify their float
rounding mode (FRM). Previously instructions stored `Option<FRM>` and
this would mostly be `None`. All floating-point instructions in RISC-V
have a 3-bit `rm` field, and most encode the FRM into this field but
some have a require encoding of this field. For example `fsgnj.s` uses
the `rm` field to differentiate between `fsgnj`, `fsgnjx`, and `fsgnjn`.
Instructions like `fadd` however use this field for a rounding mode.
All FPU instructions now store `FRM` directly. Instruction helpers like
`fadd` require this to be specified explicitly. Instructions helpers
like for `fsgnj` do not take this as an argument and hardcode the field
as necessary. This means that all lowerings of floating point
instructions, where relevant, now specify a rounding mode.
Previously the default rounding mode was to use the `fcsr` register,
meaning that the rounding mode would be determined dynamically at
runtime depending on the status of this register. Cranelift semantics,
however, are derivative of WebAssembly semantics which specify
round-to-nearest ties-to-even. This PR additionally fixes this
discrepancy by using `FRM::RNE` in all existing instructions instead of
`FRM::Fcsr`.
* riscv64: Refactor float-to-int conversions
This commit removes the `FcvtToInt` macro-instruction in the riscv64
backend in favor of decomposing it into individual operation for
`fcvt_to_{s,u}int*` instructions. This additionally provides a slightly
different lowering for the `*_sat` operations which doesn't use
branches. The non-saturating operations continue to have a number of
branches and their code has changed slightly due to how immediates are
loaded. Overall everything is in ISLE now instead of split a bit.
* riscv64: Clean up some dead code in the backend
Don't put `#![allow(dead_code)]` at the root, instead place it on some
smaller items.
* Fix emission tests
* Add regression tests and bless output
Closes#5992Closes#5993
* Enable i8/i16 saturating float-to-int in fuzzgen
* Better `fcvt_*_bound` implementations
* Fix typo in match orderings
* Fix tests on x64
Where float-to-int isn't implemented for i8/i16
With a custom standard library disabling all features means disabling
some support for feature-detection macros of the native platform. This
meant that `wasmtime compile` output locally wasn't runnable in
`wasmtime-min run` because it couldn't correctly detect that features
were in fact available.
In an effort to simplify the many fuel related APIs, simplify the
interface here to a single counter with get and set methods.
Additionally the async yield is reduced to an interval of the total fuel
instead of injecting fuel, so it's easy to still reason about how much
fuel is left even with yielding turned on.
Internally this works by keeping two counters - one the VM uses to
increment towards 0 for fuel, the other to track how much is in
"reserve". Then when we're out of gas, we pull from the reserve to
refuel and continue. We use the reserve in two cases: one for overflow
of the fuel (which is an i64 and the API expresses fuel as u64) and the
other for async yieling, which then the yield interval acts as a cap to
how much we can refuel with.
This also means that `get_fuel` can return the full range of `u64`
before this change it could only return up to `i64::MAX`. This is
important because this PR is removing the functionality to track fuel
consumption, and this makes the API less error prone for embedders to
track consumption themselves.
Careful to note that the VM counter that is stored as `i64` can be
positive if an instruction "costs" multiple units of fuel when the fuel
ran out.
prtest:full
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This was mistakenly removed during #7300 with some local testing I was
doing. This will eventually be required to get a minimal build of the C
API but for now I didn't intend on deleting this so I wanted to rectify
my mistake.
While not a large amount of binary size if the purpose of the
`--no-default-features` build is to showcase "minimal Wasmtime" then may
as well try to make `clap` as small as possible.
* riscv64: Extend distance trampolines can jump
Use a PIC-friendly set of instructions to enable destination of the
trampoline to be more than 4k away from the tail call site of the
trampoline itself.
* Build "min" artifacts on CI
This commit updates the binary artifacts produced by CI to include "min"
builds where all default features are disabled. Additionally all the
stops are pulled in terms of build flags, nightly versions, etc, to get
a build that is as small as possible without actual source code changes.
This effectively codifies the instructions in #7282 into an easily
downloadable artifact.
No new tarballs are created for github releases but instead tarballs
that previously had a `wasmtime` executable for example now have a
`wasmtime-min` executable. Furthermore the C API which previously had
`libwasmtime.so` for example now has `libwasmtime-min.so`. The intention
is that the minimum-size artifacts are handy for determining a rough
size of Wasmtime but they're not so important that it seems worthwhile
to dedicate entire release entries for.
CI is refactored to support these minimum builds with separate builders.
This means that a single tarball produced as a final result is actually
two separate tarballs merged together, one for the normal build we do
today plus a new "min" tarball produced on the new "min" builders.
Various scripts and CI organization has been adjusted accordingly.
While here I went ahead and enabled `panic=abort` and debuginfo
stripping in our current release artifacts. While this doesn't affect a
whole lot it's less to upload to GitHub Actions all the time.
* Fix Windows unzip
This reworks the way that remat and LICM interact during aegraph
elaboration. In principle, both happen during the same single-pass "code
placement" algorithm: we decide where to place pure instructions (those
that are eligible for movement), and remat pushes them one way while
LICM pushes them the other.
The interaction is a little more subtle than simple heuristic priority,
though -- it's really a decision ordering issue. A remat'd value wants to sink
as deep into the loop nest as it can (to the use's block), but we don't
know *where* the uses go until we process them (and make LICM-related
choices), and we process uses after defs during elaboration. Or more
precisely, we have some work at the use before recursively processing
the def, and some work after the recursion returns; and the LICM
decision happens after recursion returns, because LICM wants to know
where the defs are to know how high we can hoist. (The recursion is
itself unrolled into a state machine on an explicit stack so that's a
little hard to see but that's what is happening in principle.)
The solution here is to make remat a separate just-in-time thing, once
we have arg values. Just before we plug the final arg values into the
elaborated instruction, we ask: is this a remat'd value, and if so, do
we have a copy of the computation in this block yet. If not, we make
one. This has to happen in two places (the main elab loop and the
toplevel driver from the skeleton).
The one downside of this solution is that it doesn't handle *recursive*
rematerialization by default. This means that if we, for example, decide
to remat single-constant-arg adds (as we actually do in our current
rules), we won't then also recursively remat the constant arg to those
adds. This can be seen in the `licm.clif` test case. This doesn't seem
to be a dealbreaker to me because most such cases will be able to fold
the constants anyway (they happen mostly because of pointer
pre-computations: a loop over structs in Wasm computes heap_base + p +
offset, and naive LICM pulls a `heap_base + offset` out of the loop for
every struct field accessed in the loop, with horrible register pressure
resulting; that's why we have that remat rule. Most such offsets are
pretty small.).
Fixes#7283.
@jameysharp and I noticed a couple of refactoring opportunities while
reading through the elaboration pass:
* The elaboration loop doesn't need to match on the top of the stack as
a reference, because each case pops it immediately. Instead we can pop
and match on the popped value.
* Computing the cost of a `Result` value that's not in the DFG was using
the block that contains the instruction to determine the level, but
since the instruction is already known to not be in the DFG, this
would default to `LoopLevel::root()` unconditionally. This also meant
that the `Cost::at_level` function turned into an identity function on
the cost given, making it unnecessary.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Update WASI versions to `0.2.0-rc-2023-11-05`
This commit updates the version numbers on `main` to no longer clash
with the 14.0.0 release after #7299. The version number is chosen as the
branch point for the 15.0.0 release of Wasmtime, at which point we'll
update the versions again.
* Update another version
This commit changes the `--dir` argument on the `wasmtime` CLI to be
`HOST::GUEST` rather than `GUEST::HOST`. This matches Docker for example
and is a little more consistent with only `--dir path` where the first
argument is always treated as a host directory.
In terms of breaking-ness the movement from `--mapdir` to `--dir` hasn't
been released with Wasmtime 14 yet so my hope is that this can land on
both `main` and Wasmtime 14.0.0 before it's released to avoid any
breakage other than existing scripts migrating from `--mapdir` to
`--dir`.
* PCC: draw the rest of the owl: fully-working PCC on hello-world Wasm on aarch64
This needed a bit more inference / magic than I was hoping for at first,
specifically around constants and adds. Some instructions can now
generate facts on their output registers, even if not stated. This
breaks away from the "breadcrumbs" idea, but seems to be the most
practical solution to a general problem that we have mini-lowering steps
in various places without careful preservation of PCC facts. Two
particular aspects:
- Constants: amodes on aarch64 can decompose into new
constant-generation instructions, and we need precise ranges on those
to properly check them. To avoid making the ISLE rules nightmarish,
it's best to reuse the existing semantics definitions of the Add* ALU
insts, and add a few rules for MovK/MovZ/MovN.
- Adds: similarly, amodes decompose into de-novo add instructions with
no facts. To handle this, there's now a notion of "propagating" facts
that cause an instruction with a propagating fact on the input to
generate a fact on the output.
Together, these heuristics mean that we'll eagerly generate a fact for
`mem(mt0, 0, 0) + 8 -> mem(mt0, 8, 8)`, but we won't, for example,
generate ranges on every single integer operation.
With these changes and a few other misc fixes, this PR can now check a
nontrivial "hello world" Wasm on aarch64 down to the machine-code level:
```
$ target/release/wasmtime compile -C enable-pcc=y ../wasm-tests/helloworld-rs.wasm
```
* Review feedback.
* Move `wasmtime explore` behind a Cargo feature
Enable this Cargo feature by default, but enable building the CLI
without the `explore` subcommand.
* Move the `wast` subcommand behind a Cargo feature
* Move support for `wat` behind a CLI feature
This was already conditional in crates such as `wasmtime` and this makes
it an optional dependency of the CLI as well.
* Move CLI cache support behind a Cargo feature
Additionally refactor `wasmtime-cli-flags` to not unconditionally pull
in caching support by removing its `default` feature and appropriately
enabling it from the CLI.
* Move `rayon` behind an optional feature
* Move `http-body-util` dependency behind `serve` feature
* Add a Cargo feature for compiling out log statements
This sets the static features of `log` and `tracing` to statically
remove all log statements from the binary to cut down on binary size.
* Move logging support behind a Cargo feature
Enables statically removing logging support in addition to the previous
compiling out log statements themselves.
* Move demangling support behind a Cargo feature
* Enable building the CLI without cranelift
Compile out the `compile` subcommand for example.
* Gate all profiling support behind one feature flag
This commit moves vtune/jitdump support behind a single `profiling`
feature flag that additionally includes the guest profiler dependencies
now too.
* Move support for core dumps behind a feature flag
* Move addr2line behind a feature
* Fix rebase
* Document cargo features and a minimal build
* Tidy up the source a bit
* Rename compile-out-logging
* Document disabling logging
* Note the host architecture as well
* Fix tests
* Fix tests
* Mention debuginfo stripping
* Fix CI configuration for checking features
* Fix book tests
* Update lock file after rebase
* Enable coredump feature by default
While valid in WIT I keep finding it jarring to have different
indentation in `wasi:sockets`. Additionally all other WASI WITs are
using four spaces or two, so make it a bit more consistent.
* Implement support for `thread` in `*.wast` tests
This commit implements support for `thread` and `wait` in `*.wast` files
and imports the upstream spec test suite from the `threads` proposal.
This additionally and hopefully makes it a bit easier to write threaded
tests in the future if necessary too.
* Fix compile of fuzzing
This adds a new `send_request` method to `WasiHttpView`, allowing embedders to
override the default implementation with their own if the desire. The default
implementation behaves exactly as before.
I've also added a few new `wasi-http` tests: one to test the above, and two
others to test streaming and concurrency. These tests are ports of the
`test_wasi_http_echo` and `test_wasi_http_hash_all` tests in the
[Spin](https://github.com/fermyon/spin) integration test suite. The component
they instantiate is likewise ported from the Spin
`wasi-http-rust-streaming-outgoing-body` component.
Fixes#7259
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Put versions in all WASI WIT files
This commit starts exercising the versioning feature of WIT by ensuring
that all WASI descriptions have a version associated with them. The
version chosen is 0.2.0 which reflects the upcoming "preview 2" release
where in theory 0.1.0 was claimed by preview1. This is intended to stay
as 0.2.0 for now and we'll determine how best to update these numbers in
the future once preview2 is released.
Closes#7171
* Allow omitting versions in `with` keys
As a convenience for now this enables omitting the version of an
interface from a `with` key. This has a risk of not working well if two
packages are present and one has a version and one doesn't, but that's
left as a PR to fix in the future as the benefit of avoiding repetition
seems good for now.
* Allow omitting versions in trappable_error_types
* Use 0.2.0-rc-2023-10-18 as a version number
* More test fixes
* Fix another test
* Remove usage of `BTreeMap` for compiler flags
No need for a datastructure here really, a simple list with static
strings works alright.
* Fix winch compile and a warning
* Fix test compile
* c-api: Remove reexport of wasmtime crate
This is a follow-up to #6765 to remove this reexport since it was
originally added to use both the C API and the `wasmtime` crate in the
same downstream crate, but this should be possible through Cargo with:
[dependencies]
wasmtime = "13"
wasmtime-c-api = { version = "13", package = "wasmtime-c-api" }
and that way `wasmtime::*` is available as well as `wasmtime_c_api::*`
* Add `pub use wasmtime;`
* Enable threads, multi-memory, and relaxed-simd by default
This commit enables by default three proposals that were advanced to
stage 4 in the proposal process at last week's in-person CG meeting.
These proposals are:
* `threads` - atomic instructions and shared memory
* `multi-memory` - the ability to have more than one linear memory
* `relaxed-simd` - new simd instructions that may differ across
platforms
These proposals are now all enabled by default in Wasmtime. Note that
they can still be selectively disabled via `Config` options.
* Fix an x64-specific test
This turns out to be quite simple: the fundamental operation during
egraph-based optimization is to *merge* eclasses, which is an assertion
that their value is equal. Since the values of either side of the merge
are equal, a fact about one side is a fact about the other, and
vice-versa.
Since we only support *one* fact at most per value, we can't take the
union of all facts; instead, if one side is missing a fact, or if both
sides have exactly the same fact, we keep that one; otherwise we go to a
special "conflict" fact that can't support any check. This is edging
closer to a fact-lattice, but I opted not to go there with a full
meet-function merge yet, for simplicity. (It's a little complex because
of the "minimum fact" we can know about a value based on its type -- if
we're going to do something better, I think we should account for that
too.)
In any case, that complexity mostly isn't needed, because the two cases
that happen in reality are (i) opt rules rewrite to a new node, which
will have no facts at all, so facts just propagate; or (ii) GVN merges
two values in the input program into one, but if both are the same
value, in practice the Wasm PCC frontend (for example) should be
producing the same facts on both values, so the merge is trivial.
* PCC: add facts to global values, parse and print them. No verification yet.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: propagate facts on GV loads and check them.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: support propagating facts on iteratively-elaborated GVs as well.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: fix up Wasmtime uses of GVs after refactors to memflags handling.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: working end-to-end for static memories!
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: add toplevel Wasmtime option `-C enable-pcc=y`.
* Fix filetests build.
* Review feedback, and blessed test updates due to GV legalization changes.
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This is needed for soundness when verifying accesses to memtype fields:
it's not enough to know that we're accessing an offset in `0` up to
`field_offset` inclusive, we need to know the access is actually to
`field_offset`.
The simplest change that validates this turned out to be the most
general one: making ranges two-sided rather than one-sided. The
transform is *mostly* mechanical, but a few new tests verify that ranges
are updated on both sides, and some fail-tests verify that "fuzzily
imprecise" pointers to struct fields fail to validate.
Some notable changes here are:
* The `wasm-tools` crates have enabled the `relaxed-simd`, `threads`,
and `multi-memory` proposals by default. For now I've left these
disabled-by-default in Wasmtime to get enabled in a future PR.
* The `wast` crate has support for parsing `thread` and `wait`
constructs from the `threads` proposal for WebAssembly. They're left
unimplemented for now and return errors. This will get filled in in a
future update.
This removes the need for the awkward "max-range fact is subsumed by
anything" rule noted by @fitzgen in [this
comment](https://github.com/bytecodealliance/wasmtime/pull/7231#discussion_r1358573147).
It also makes checking a little more efficient and logically clear, as
only the facts that the frontend/producer added are verified, rather
than all default facts as well.
* riscv64: Refactor `rotl` rules
Move from `inst.isle` to `lower.isle` since it's the only caller,
reorganize the rules to be a bit cleaner, add immediate shifting
specializations.
* riscv64: Refactor `rotr` lowerings
Same as the prior `rotl` lowerings, move the rules to `lower.isle` and
additionally add constant rules.
* Fix shift-by-128
* Remove empty comments
* cranelift: Add egraph rules for `bswap`
WebAssembly doesn't currently have a byte-swapping instruction so this
commit pattern matches what LLVM currently generates to simplify to a
single `bswap` instruction in CLIF which can be lowered within each
backend to respective native instructions.
* Use an `optimize` instead of `compile` test
This commit optimizes some `select_spectre_guard` patterns which are
frequently generated with the wasm backend when dynamic bounds checks
are enabled by pattern-matching when one of the values being selected is
zero. This enables shaving off a few instructions which can be constant
folded away.
* Support set_fuel in store APIs
Fixes: https://github.com/bytecodealliance/wasmtime/issues/5109
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
* rename set_fuel to reset_fuel
To make it more clear that consumed fuel is being reset.
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
* update out of date documentation for fuel in C API
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
---------
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>