The `wasmtime-cpp` test suite uncovered an issue where asking for the
frames of a trap would fail immediately after the trap was created. In
addition to fixing this issue I've also updated the documentation of
`Trap::frames` to indicate when it returns `None`.
I was running tests recently and was surprised that the `--test all`
test was taking more than a minute to run when I didn't recall it ever
taking more than a minute historically. A bisection pointed out #4183 as
the cause and after re-reviewing I realized I forgot that we capture
unresolved backtraces by default (and don't actually resolve them
anywhere yet but that's a problem for another day) rather than resolved
backtraces. This means that it's intended that we use
`Backtrace::new_unresolved` instead of `Backtrace::new` in the
traphandlers crate.
The reason that tests were running so slowly is that the tests which
deal with deep stacks (e.g. stack overflow) would take forever in
testing as the Rust-based decoding of DWARF information is egregiously
slow in unoptimized mode. I did discover independently that optimizing
these dependencies makes the tests ~6x faster, but that's irrelevant if
we're not symbolicating in the first place.
* sorta working in runtime
* wasmtime-runtime: get rid of wasm-backtrace feature
* wasmtime: factor to make backtraces recording optional. not configurable yet
* get rid of wasm-backtrace features
* trap tests: now a Trap optionally contains backtrace
* eliminate wasm-backtrace feature
* code review fixes
* ci: no more wasm-backtrace feature
* c_api: backtraces always enabled
* config: unwind required by backtraces and ref types
* plumbed
* test that disabling backtraces works
* code review comments
* fuzzing generator: wasm_backtrace is a runtime config now
* doc fix
* Make `ValRaw` fields private
Force accessing to go through constructors and accessors to localize the
knowledge about little-endian-ness. This is spawned since I made a
mistake in #4039 about endianness.
* Fix some tests
* Component model changes
* components: Implement the ability to call component exports
This commit is an implementation of the typed method of calling
component exports. This is intended to represent the most efficient way
of calling a component in Wasmtime, similar to what `TypedFunc`
represents today for core wasm.
Internally this contains all the traits and implementations necessary to
invoke component exports with any type signature (e.g. arbitrary
parameters and/or results). The expectation is that for results we'll
reuse all of this infrastructure except in reverse (arguments and
results will be swapped when defining imports).
Some features of this implementation are:
* Arbitrary type hierarchies are supported
* The Rust-standard `Option`, `Result`, `String`, `Vec<T>`, and tuple
types all map down to the corresponding type in the component model.
* Basic utf-16 string support is implemented as proof-of-concept to show
what handling might look like. This will need further testing and
benchmarking.
* Arguments can be behind "smart pointers", so for example
`&Rc<Arc<[u8]>>` corresponds to `list<u8>` in interface types.
* Bulk copies from linear memory never happen unless explicitly
instructed to do so.
The goal of this commit is to create the ability to actually invoke wasm
components. This represents what is expected to be the performance
threshold for these calls where it ideally should be optimal how
WebAssembly is invoked. One major missing piece of this is a `#[derive]`
of some sort to generate Rust types for arbitrary `*.wit` types such as
custom records, variants, flags, unions, etc. The current trait impls
for tuples and `Result<T, E>` are expected to have fleshed out most of
what such a derive would look like.
There are some downsides and missing pieces to this commit and method of
calling components, however, such as:
* Passing `&[u8]` to WebAssembly is currently not optimal. Ideally this
compiles down to a `memcpy`-equivalent somewhere but that currently
doesn't happen due to all the bounds checks of copying data into
memory. I have been unsuccessful so far at getting these bounds checks
to be removed.
* There is no finalization at this time (the "post return" functionality
in the canonical ABI). Implementing this should be relatively
straightforward but at this time requires `wasmparser` changes to
catch up with the current canonical ABI.
* There is no guarantee that results of a wasm function will be
validated. As results are consumed they are validated but this means
that if function returns an invalid string which the host doesn't look
at then no trap will be generated. This is probably not the intended
semantics of hosts in the component model.
* At this time there's no support for memory64 memories, just a bunch of
`FIXME`s to get around to. It's expected that this won't be too
onerous, however. Some extra care will need to ensure that the various
methods related to size/alignment all optimize to the same thing they
do today (e.g. constants).
* The return value of a typed component function is either `T` or
`Value<T>`, and it depends on the ABI details of `T` and whether it
takes up more than one return value slot or not. This is an
ABI-implementation detail which is being forced through to the API
layer which is pretty unfortunate. For example if you say the return
value of a function is `(u8, u32)` then it's a runtime type-checking
error. I don't know of a great way to solve this at this time.
Overall I'm feeling optimistic about this trajectory of implementing
value lifting/lowering in Wasmtime. While there are a number of
downsides none seem completely insurmountable. There's naturally still a
good deal of work with the component model but this should be a
significant step up towards implementing and testing the component model.
* Review comments
* Write tests for calling functions
This commit adds a new test file for actually executing functions and
testing their results. This is not written as a `*.wast` test yet since
it's not 100% clear if that's the best way to do that for now (given
that dynamic signatures aren't supported yet). The tests themselves
could all largely be translated to `*.wast` testing in the future,
though, if supported.
Along the way a number of minor issues were fixed with lowerings with
the bugs exposed here.
* Fix an endian mistake
* Fix a typo and the `memory.fill` instruction
RA2 recently removed the need for a dedicated scratch register for
cyclic moves (bytecodealliance/regalloc2#51). This has moderate positive
performance impact on function bodies that were register-constrained, as
it means that one more register is available. In Sightglass, I measured
+5-8% on `blake3-scalar`, at least among current benchmarks.
* Remove unused srcloc in MachReloc
* Remove unused srcloc in MachTrap
* Use `into_iter` on array in bench code to suppress a warning
* Remove unused srcloc in MachCallSite
Previously, the pinned register (enabled by the `enable_pinned_reg`
Cranelift setting and used via the `get_pinned_reg` and `set_pinned_reg`
CLIF ops) was only used when Cranelift was embedded in SpiderMonkey, in
order to support a pinned heap register. SpiderMonkey has its own
calling convention in Cranelift (named after the integration layer,
"Baldrdash").
However, the feature is more general, and should be usable with the
default system calling convention too, e.g. SysV or Windows Fastcall.
This PR fixes the ABI code to properly treat the pinned register as a
globally allocated register -- and hence an implicit input and output to
every function, not saved/restored in the prologue/epilogue -- for SysV
on x86-64 and aarch64, and Fastcall on x86-64.
Fixes#4170.
* Initial skeleton of some component model processing
This commit is the first of what will likely be many to implement the
component model proposal in Wasmtime. This will be structured as a
series of incremental commits, most of which haven't been written yet.
My hope is to make this incremental and over time to make this easier to
review and easier to test each step in isolation.
Here much of the skeleton of how components are going to work in
Wasmtime is sketched out. This is not a complete implementation of the
component model so it's not all that useful yet, but some things you can
do are:
* Process the type section into a representation amenable for working
with in Wasmtime.
* Process the module section and register core wasm modules.
* Process the instance section for core wasm modules.
* Process core wasm module imports.
* Process core wasm instance aliasing.
* Ability to compile a component with core wasm embedded.
* Ability to instantiate a component with no imports.
* Ability to get functions from this component.
This is already starting to diverge from the previous module linking
representation where a `Component` will try to avoid unnecessary
metadata about the component and instead internally only have the bare
minimum necessary to instantiate the module. My hope is we can avoid
constructing most of the index spaces during instantiation only for it
to all ge thrown away. Additionally I'm predicting that we'll need to
see through processing where possible to know how to generate adapters
and where they are fused.
At this time you can't actually call a component's functions, and that's
the next PR that I would like to make.
* Add tests for the component model support
This commit uses the recently updated wasm-tools crates to add tests for
the component model added in the previous commit. This involved updating
the `wasmtime-wast` crate for component-model changes. Currently the
component support there is quite primitive, but enough to at least
instantiate components and verify the internals of Wasmtime are all
working correctly. Additionally some simple tests for the embedding API
have also been added.
* Improve the `wasmtime` crate's README
This commit is me finally getting back to #2688 and improving the README
of the `wasmtime` crate. Currently we have a [pretty drab README][drab]
that doesn't really convey what we want about Wasmtime.
While I was doing this I opted to update the feature list of Wasmtime as
well in the main README (which is mirrored into the crate readme),
namely adding a bullet point for "secure" which I felt was missing
relative to how we think about Wasmtime.
Naturally there's a lot of ways to paint this shed, so feedback is of
course welcome on this! (I'm not the best writer myself)
[drab]: https://crates.io/crates/wasmtime/0.37.0
* Expand the "Fast" bullet a bit more
* Reference the book from the wasmtime crate
* Update more security docs
Also merge the sandboxing security page with the main security page to
avoid the empty security page.
This PR adds a basic *alias analysis*, and optimizations that use it.
This is a "mid-end optimization": it operates on CLIF, the
machine-independent IR, before lowering occurs.
The alias analysis (or maybe more properly, a sort of memory-value
analysis) determines when it can prove a particular memory
location is equal to a given SSA value, and when it can, it replaces any
loads of that location.
This subsumes two common optimizations:
* Redundant load elimination: when the same memory address is loaded two
times, and it can be proven that no intervening operations will write
to that memory, then the second load is *redundant* and its result
must be the same as the first. We can use the first load's result and
remove the second load.
* Store-to-load forwarding: when a load can be proven to access exactly
the memory written by a preceding store, we can replace the load's
result with the store's data operand, and remove the load.
Both of these optimizations rely on a "last store" analysis that is a
sort of coloring mechanism, split across disjoint categories of abstract
state. The basic idea is that every memory-accessing operation is put
into one of N disjoint categories; it is disallowed for memory to ever
be accessed by an op in one category and later accessed by an op in
another category. (The frontend must ensure this.)
Then, given this, we scan the code and determine, for each
memory-accessing op, when a single prior instruction is a store to the
same category. This "colors" the instruction: it is, in a sense, a
static name for that version of memory.
This analysis provides an important invariant: if two operations access
memory with the same last-store, then *no other store can alias* in the
time between that last store and these operations. This must-not-alias
property, together with a check that the accessed address is *exactly
the same* (same SSA value and offset), and other attributes of the
access (type, extension mode) are the same, let us prove that the
results are the same.
Given last-store info, we scan the instructions and build a table from
"memory location" key (last store, address, offset, type, extension) to
known SSA value stored in that location. A store inserts a new mapping.
A load may also insert a new mapping, if we didn't already have one.
Then when a load occurs and an entry already exists for its "location",
we can reuse the value. This will be either RLE or St-to-Ld depending on
where the value came from.
Note that this *does* work across basic blocks: the last-store analysis
is a full iterative dataflow pass, and we are careful to check dominance
of a previously-defined value before aliasing to it at a potentially
redundant load. So we will do the right thing if we only have a
"partially redundant" load (loaded already but only in one predecessor
block), but we will also correctly reuse a value if there is a store or
load above a loop and a redundant load of that value within the loop, as
long as no potentially-aliasing stores happen within the loop.
* Refactor binary-compatible-builds for releases
I was poking around this yesterday and noticed a few things that could
be improved for our release builds:
* The centos container for the x86_64 builds contained a bunch of extra
tooling we no longer need such as python3 and a C++ compiler. Along
with custom toolchain things this could all get removed since the C we
include now is quite simple.
* The aarch64 and s390x cross-compiled builds had relatively high glibc
version requirements compared to the x86_64 build. This was because we
don't use a container to build the cross-compiled binaries. I added
containers here along the lines of the x86_64 build to use an older
glibc to build the release binary to lower our version requirement.
This lower the aarch64 version requirement from glibc 2.28 to 2.17.
Additionally the s390x requirement dropped from 2.28 to 2.16.
* To make the containers a bit easier to read/write I added
`Dockerfile`s for them in a new `ci/docker` directory instead of
hardcoding install commands in JS.
This isn't intended to be a really big change or anything for anyone,
but it's intended to keep our Linux-based builds consistent at least as
best we can.
* Remove temporary change
* Improve documentation around `ResourceLimiter`
This commit takes a pass through the `Store::limiter` method and related
types/traits to improve the documentation with an example and soup up
any recent developments in the documentation.
Closes#4138
* Fix a broken doc link
This is required now that the simd specification has been merged into
the upstream specification, so to run the spec tests this must always be
enabled instead of being left to the whims of the fuzzer about whether
to enable it or not.
* Update the wasm-tools family of crates
This commit updates these crates as used by Wasmtime for the recently
published versions to pull in changes necessary to support the component
model. I've split this out from #4005 to make it clear what's impacted
here and #4005 can simply rebase on top of this to pick up the necessary
changes.
* More test fixes
A new version of rustc was released this morning and we have a few small
breakages on our CI which need fixing:
* A new warning was coming out of the c-api crate about an unneeded
`unsafe` block.
* The panic message of a task in `cranelift-object` needed updating
since the standard library changed how it formats strings with the nul
byte.
I ran across a case in Wasmtime today where a poor error message came
out of the CLI. For example before this commit you would get:
$ cargo run wast --wasm-features component-model foo.wast
error: Invalid value "wast" for '<MODULE>': module name cannot be the same as a subcommand
and now after this commit you get:
$ cargo run wast --wasm-features component-model foo.wast
error: Invalid value "component-model" for '--wasm-features <FEATURE,FEATURE,...>': unsupported WebAssembly feature 'component-model'
I believe this was an accidental regression from #4082 since Wasmtime
0.36.0 produces the error message as expected.
I opted to invert the conditional logic for falling back to the `run`
subcommand. Instead of having a small set of error kinds that print the
first-level error a small set of error kinds are now used to fall back
to the `run` subcommand by default. My hope is that as `ErrorKind` is
extended over time with various sorts of errors of parsing argumenst
this'll be more robust because most of the time we want the CLI
invocation to print out the normal CLI error, it's only in a select few
cases that using `run` is likely to succeed.
* Upgrade to regalloc2 0.1.3.
This pulls in bytecodealliance/regalloc2#49, which slightly improves
codegen in some cases where a safepoint (for reference-typed values)
occurs in the same liverange as a register-constrained use. For
example, in bytecodealliance/wasmtime#3785, an extra move instruction
appeared and a callee-save register was used (necessitating a more
expensive prologue) because of suboptimal splitting heuristics, which
this PR fixes. The updated RA2 heuristics appear to have no measured
downsides in existing benchmarks and improve the manually-observed
codegen issue.
* Update filetests where regalloc2 improvement altered behavior with reftypes.
As per discussion today, when we have a holiday (affecting any regular
attendee), we will push the meeting by a week. This does mean we
sometimes have meetings in contiguous weeks, but given the number of
topics we usually have to discuss, erring on the side of more discussion
time (rather than just canceling) is probably not a bad thing.
For the rest of this calendar year, given an otherwise regular
biweekly-on-Mondays cadence, the holiday conflicts I am aware of are: US
Memorial Day (falls on Mon May 28, pushed meeting to Mon Jun 6); US
Labor Day (falls on Mon Sept 5, pushed to Mon Sept 12). If there are any
other holidays in the below dates, I'm happy to update further!
* Run a callback when the interruption epoch is reached
Adds Store::epoch_deadline_callback. This accepts a callback which, when
invoked, can mutate the store's contents. The callback can either return
an error (in which case we trap) or return a delta which we'll use to
set the new epoch deadline.
* Add a basic test for epoch interruption callback
* Some small nits
- Remove use of &mut in the pattern match
- Return both yields and state from run_and_count_yields_or_trap in
test code and assert on them separately.
- Add a test for trapping on a state failure.
In #4143 we made ISLE compilation part of the normal build flow again,
to avoid the issues with the checked-in source. To make this acceptably
fast, we cut down dependencies of the ISLE compiler, so the "fancy"
error printing is now optional. When not included, it just prints error
messages to stderr in a list. However, this did not include file
locations. It might be nice to have this without enabling the "fancy
printing" and waiting for that to build.
Fortunately most of the plumbing for this was already present (we had it
at one point before switching to miette). This PR adds back locations to
the basic error output. It now looks like:
```
Error building ISLE files: ISLE errors:
src/isa/aarch64/inst.isle:1:1: parse error: Unexpected token Symbol("asdf")
```
* Don't attempt to track the generated clif.isle in cargo
This causes the build script to rerun every time for me.
* Put build script debug messages on stderr instead of stdout
This keeps stdout reserved for cargo build script directives
Current codegen had a number of logic errors confusing
NAND with AND WITH COMPLEMENT, and NOR with OR WITH COMPLEMENT.
Add support for the missing z15 instructions and fix logic.
The HashMap implementation is significantly simpler than the BTreeMap
implementation. Because of this switching reduces compilation time of
cranelift-isle by about 10%.
# Before
$ hyperfine --prepare "cargo clean" "cargo build"
Benchmark 1: cargo build
Time (mean ± σ): 5.221 s ± 0.094 s [User: 10.659 s, System: 0.734 s]
Range (min … max): 5.151 s … 5.420 s 10 runs
# After
$ hyperfine --prepare "cargo clean" "cargo build"
Benchmark 1: cargo build
Time (mean ± σ): 4.746 s ± 0.150 s [User: 9.109 s, System: 0.721 s]
Range (min … max): 4.630 s … 5.144 s 10 runs
This PR fixes#4066: it modifies the Cranelift `build.rs` workflow to
invoke the ISLE DSL compiler on every compilation, rather than only
when the user specifies a special "rebuild ISLE" feature.
The main benefit of this change is that it vastly simplifies the mental
model required of developers, and removes a bunch of failure modes
we have tried to work around in other ways. There is now just one
"source of truth", the ISLE source itself, in the repository, and so there
is no need to understand a special "rebuild" step and how to handle
merge errors. There is no special process needed to develop the compiler
when modifying the DSL. And there is no "noise" in the git history produced
by constantly-regenerated files.
The two main downsides we discussed in #4066 are:
- Compile time could increase, by adding more to the "meta" step before the main build;
- It becomes less obvious where the source definitions are (everything becomes
more "magic"), which makes exploration and debugging harder.
This PR addresses each of these concerns:
1. To maintain reasonable compile time, it includes work to cut down the
dependencies of the `cranelift-isle` crate to *nothing* (only the Rust stdlib),
in the default build. It does this by putting the error-reporting bits
(`miette` crate) under an optional feature, and the logging (`log` crate) under
a feature-controlled macro, and manually writing an `Error` impl rather than
using `thiserror`. This completely avoids proc macros and the `syn` build slowness.
The user can still get nice errors out of `miette`: this is enabled by specifying
a Cargo feature `--features isle-errors`.
2. To allow the user to optionally inspect the generated source, which nominally
lives in a hard-to-find path inside `target/` now, this PR adds a feature `isle-in-source-tree`
that, as implied by the name, moves the target for ISLE generated source into
the source tree, at `cranelift/codegen/isle_generated_source/`. It seems reasonable
to do this when an explicit feature (opt-in) is specified because this is how ISLE regeneration
currently works as well. To prevent surprises, if the feature is *not* specified, the
build fails if this directory exists.
In #4104 we discussed whether it makes sense for the division and
remainder ops to support vector types. We concluded that because most
hardware doesn't support it directly, it probably is not ideal to force
all backends to polyfill it. In the future we can always reverse this
decision, perhaps with a platform-independent legalization.
This PR restricts the allowed types on the CLIF ops to integer types
only.
This test was added between the last CI run on #4088 and its merge to
main, and the changes in #4088 (use of constants directly in instruction
via load from constant pool, rather than from a register initialized by
a separate instruction) cause it to fail now.
This PR alters the test to be invariant to regalloc and argument
decisions during lowering, as the test is really checking (per the
comment) that we get two cmoves without an intervening move. As such, it
just matches the instruction opcodes, irrespective of the arguments.
* Allow emitting u64 constants into constant pool.
* Use constant pool for constants on x64 that do not fit in a simm32 and are needed as a RegMem or RegMemImm.
* Fix rip-relative addressing bug in pinsrd emission.
* Narrow `allow(dead_code)` declarations
Having module wide `allow(dead_code)` may hide some code that's really
dead. In this commit I just narrowed the declarations to the specific
enum variants that were not used (as it seems reasonable to keep them
and their handling in all the matches, for future use). And the compiler
found more dead code that I think we can remove safely in the short
term.
With this, the only files annotated with a module-wide
`allow(dead_code)` are isle-generated files.
* resurrect some functions as test helpers
* ISLE compiler: fix priority-trie interval bug. (#4093)
This PR fixes a bug in the ISLE compiler related to rule priorities.
An important note first: the bug did not affect the correctness of the
Cranelift backends, either in theory (because the rules should be
correct applied in any order, even contrary to the stated priorities)
or in practice (because the generated code actually does not change at
all with the DSL compiler fix, only with a separate minimized bug
example).
The issue was a simple swap of `min` for `max` (see first
commit). This is the minimal fix, I think, to get a correct
priority-trie with the minimized bug example in this commit.
However, while debugging this, I started to convince myself that the
complexity of merging multiple priority ranges using the sort of
hybrid interval tree / string-matching trie data structure was
unneeded. The original design was built with the assumption we might
have a bunch of different priority levels, and would need the
efficiency of merging where possible. But in practice we haven't used
priorities this way: the vast majority of lowering rules exist at the
default (priority 0), and just a few overrides are explicitly at prio
1, 2 or (rarely) 3.
So, it turns out to be a lot simpler to label trie edges with (prio,
symbol) rather than (prio-range, symbol), and delete the whole mess of
interval-splitting logic on insertion. It's easier (IMHO) to convince
oneself that the resulting insertion algorithm is correct.
I was worried that this might impact the size of the generated Rust
code or its runtime, but In fact, to my initial surprise (but it makes
sense given the above "rarely used" factor), the generated code with
this compiler fix is *exactly the same*. I rebuilt with `--features
rebuild-isle,all-arch` but... there were no diffs to commit! This is
to me the simplest evidence that we didn't really need that
complexity.
* Fix earlier commit from #4093: properly sort trie.
This commit fixes an in-hindsight-obvious bug in #4093: the trie's edges
must be sorted recursively, not just at the top level.
With this fix, the generated code differs only in one cosmetic way (a
let-binding moves) but otherwise is the same.
This includes @fitzgen's fix to the CI (from the revert in #4102) that
deletes manifests to actually check that the checked-in source is
consistent with the checked-in compiler. The force-rebuild step is now
in a shell script for convenience: anyone hacking on the ISLE compiler
itself can use this script to more easily rebuild everything.
* Add note to build.rs to remind to update force-rebuild-isle.sh
This commit fixes an issue introduced in #4046 where the checks for
ensuring that the memory initialization image for a module was
constrained in its size failed to trigger and a very small module could
produce an arbitrarily large memory image.
The bug in question was that if a module only had empty data segments at
arbitrarily small and large addresses then the loop which checks whether
or not the image is allowed was skipped entirely since it was seen that
the memory had no data size. The fix here is to skip segments that are
empty to ensure that if the validation loop is skipped then no data
segments will be processed to create the image (and the module won't end
up having an image in the end).
* Revert "ISLE compiler: fix priority-trie interval bug. (#4093)"
This reverts commit 2122337112.
* ci: Delete ISLE manifest files to force an ISLE rebuild
This makes it so that we actually rebuild ISLE rather than just check that the
manifests are up-to-date.
The automated release process failed last night because the `./publish
bump` process failed to automatically increment the version number. This
is fixed in #4097 but it also seems prudent to run this continually on
CI as well to ensure that this doesn't regress again.