After adding the `call`-oriented benchmark recently I just noticed that
running benchmarks on CI is taking 30+ minutes which is not intended.
Instead of running a full benchmark run on CI (which I believe we're not
looking at anyway) instead only run the benchmarks for a single
iteration to ensure they still work but otherwise don't collect
statistics about them.
Additionally cap the number of parallel instantiations to 16 to avoid
running tons of tests for machines with lots of cpus.
This avoids a situation such as for the last point release where we had
different point releases all use the same branch name which forced us
make one release at a time. By putting the version number in the branch
name that should hopefully fix this.
It seems our `compile` fuzz target for ISLE has not been regularly
tested, as it was never updated for the `isle` -> `cranelift_isle` crate
renaming. This PR fixes it to compile again.
This also includes a simple fix in the typechecking: when verifying that
a term decl is valid, we might insert a term ID into the name->ID map
before fully checking that all of the types exist, and then skipping
(for error recovery purposes) the actual push onto the term-signature
vector if one of the types does have an error. This phantom TID can
later cause a panic. The fix is to avoid adding to the map until we have
fully verified the term decl.
Triage is scheduled to run once every 5 minutes but it's often queued up
during the day as builders are otherwise occupied with actual CI builds.
This can end up in a lot of triage tasks queued up back-to-back. While
this doesn't seem to be a huge issue one thing I suspect is that this is
perhaps somewhat related to API rate limits getting hit when recent
versions were published. In any case there's no need for each and every
triage run to do something, it's fine to only have one at a time
pending.
Looks like the 0.34.1 release is missing artifacts and some jobs
building artifacts ended up being cancelled because of API rate limits
being hit on the builders. Artifacts are uploaded to the job, however,
which means we can always go back and grab them to upload them, unless
the whole job was cancelled. For 0.34.1 it looks like the Linux builder
hit an error but its error then subsequently cancelled the Windows
builders, so we don't actually have artifacts for Windows for the 0.34.1
release. This will hopefully prevent this from causing further issues in
the future where if one builder hits an error while uploading artifacts
the others will continue and we can manually upload what's missing if
necessary.
cc #3812
* Automatically label Wasmtime `Config` object changes
* Add the "Label Messager" github action
This allows us to have custom messages left in comments on issues and pull
requests when they get labeled with a specific label.
* Add a message for `wasmtime:config`-labeled pull requests
* CI: Consolidate issue/PR triage workflows
In #3721, we have been discussing what to do about the ARM32 backend in
Cranelift. Currently, this backend supports only 32-bit types, which is
insufficient for full Wasm-MVP; it's missing other critical bits, like
floating-point support; and it has only ever been exercised, AFAIK, via
the filetests for the individual CLIF instructions that are implemented.
We were very very thankful for the original contribution of this
backend, even in its partial state, and we had hoped at the time that we
could eventually mature it in-tree until it supported e.g. Wasm and
other use-cases. But that hasn't yet happened -- to the blame of no-one,
to be clear, we just haven't had a contributor with sufficient time.
Unfortunately, the existence of the backend and lack of active
maintainer now potentially pose a bit of a burden as we hope to make
continuing changes to the backend framework. For example, the ISLE
migration, and the use of regalloc2 that it will allow, would need all
of the existing lowering patterns in the hand-written ARM32 backend to
be rewritten as ISLE rules.
Given that we don't currently have the resources to do this, we think
it's probably best if we, sadly, for now remove this partial backend.
This is not in any way a statement of what we might accept in the
future, though. If, in the future, an ARM32 backend updated to our
latest codebase with an active maintainer were to appear, we'd be happy
to merge it (and likewise for any other architecture!). But for now,
this is probably the best path. Thanks again to the original contributor
@jmkrauz and we hope that this work can eventually be brought back and
reused if someone has the time to do so!
We currently skip some tests when running our qemu-based tests for
aarch64 and s390x. Qemu has broken madvise(MADV_DONTNEED) semantics --
specifically, it just ignores madvise() [1].
We could continue to whack-a-mole the tests whenever we create new
functionality that relies on madvise() semantics, but ideally we'd just
have emulation that properly emulates!
The earlier discussions on the qemu mailing list [2] had a proposed
patch for this, but (i) this patch doesn't seem to apply cleanly anymore
(it's 3.5 years old) and (ii) it's pretty complex due to the need to
handle qemu's ability to emulate differing page sizes on host and guest.
It turns out that we only really need this for CI when host and guest
have the same page size (4KiB), so we *could* just pass the madvise()s
through. I wouldn't expect such a patch to ever land upstream in qemu,
but it satisfies our needs I think. So this PR modifies our CI setup to
patch qemu before building it locally with a little one-off patch.
[1]
https://github.com/bytecodealliance/wasmtime/pull/2518#issuecomment-747280133
[2]
https://lists.gnu.org/archive/html/qemu-devel/2018-08/msg05416.html
As first suggested by Jan on the Zulip here [1], a cheap and effective
way to obtain copy-on-write semantics of a "backing image" for a Wasm
memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism
provided by the Linux kernel allows us to create anonymous,
in-memory-only files that we can use for this mapping, so we can
construct the image contents on-the-fly then effectively create a CoW
overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)`
will discard the CoW overlay, returning the mapping to its original
state.
By itself this is almost enough for a very fast
instantiation-termination loop of the same image over and over,
without changing the address space mapping at all (which is
expensive). The only missing bit is how to implement
heap *growth*. But here memfds can help us again: if we create another
anonymous file and map it where the extended parts of the heap would
go, we can take advantage of the fact that a `mmap()` mapping can
be *larger than the file itself*, with accesses beyond the end
generating a `SIGBUS`, and the fact that we can cheaply resize the
file with `ftruncate`, even after a mapping exists. So we can map the
"heap extension" file once with the maximum memory-slot size and grow
the memfd itself as `memory.grow` operations occur.
The above CoW technique and heap-growth technique together allow us a
fastpath of `madvise()` and `ftruncate()` only when we re-instantiate
the same module over and over, as long as we can reuse the same
slot. This fastpath avoids all whole-process address-space locks in
the Linux kernel, which should mean it is highly scalable. It also
avoids the cost of copying data on read, as the `uffd` heap backend
does when servicing pagefaults; the kernel's own optimized CoW
logic (same as used by all file mmaps) is used instead.
[1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772
This updates our `push-tag.yml` workflow to check all commits to
`release-*` branches in addition to the `main` branch for cmomits which
should be tagged.
Actually poking around the `github` context it looks like for
workflow-dispatch-triggered-events `base_ref` is blank but `ref_name`
has the branch name we're interested in.
This was accidentally ommitted from our CI configuration which meant
that release branches didn't get PR CI. They still won't get on-merge CI
but that shouldn't be an issue because the PR CI is the full CI.
* Update to cap-std 0.22.0.
The main change relevant to Wasmtime here is that this includes the
rustix fix for compilation errors on Rust nightly with the `asm!` macro.
* Add itoa to deny.toml.
* Update the doc and fuzz builds to the latest Rust nightly.
* Update to libc 0.2.112 to pick up the `POLLRDHUP` fix.
* Update to cargo-fuzz 0.11, for compatibility with Rust nightly.
This appears to be the fix for rust-fuzz/cargo-fuzz#277.
Peepmatic was an early attempt at a DSL for peephole optimizations, with the
idea that maybe sometime in the future we could user it for instruction
selection as well. It didn't really pan out, however:
* Peepmatic wasn't quite flexible enough, and adding new operators or snippets
of code implemented externally in Rust was a bit of a pain.
* The performance was never competitive with the hand-written peephole
optimizers. It was *very* size efficient, but that came at the cost of
run-time efficiency. Everything was table-based and interpreted, rather than
generating any Rust code.
Ultimately, because of these reasons, we never turned Peepmatic on by default.
These days, we just landed the ISLE domain-specific language, and it is better
suited than Peepmatic for all the things that Peepmatic was originally designed
to do. It is more flexible and easy to integrate with external Rust code. It is
has better time efficiency, meeting or even beating hand-written code. I think a
small part of the reason why ISLE excels in these things is because its design
was informed by Peepmatic's failures. I still plan on continuing Peepmatic's
mission to make Cranelift's peephole optimizer passes generated from DSL rewrite
rules, but using ISLE instead of Peepmatic.
Thank you Peepmatic, rest in peace!
- The Windows line-ending canonicalization was incomplete: we need to
canonicalize the manifest text itself too!
- The "meta deterministic check" runs the cranelift-codegen build script
N times outside of the source tree, examining what it produces to
ensure the output is always the same (is detministic). This works fine
when everything comes from the internal DSL, but when reading ISLE,
this breaks because we no longer have the ISLE source paths.
The initial ISLE integration did not hit this because without the
`rebuild-isle` feature, it simply did nothing in the build script;
now, with the manifest check, we hit the issue.
The fix for now is just to turn off all ISLE-specific behavior in the
build script by setting a special-purpose Cargo feature in the
specific CI job. Eventually IMHO we should remove or find a better way
to do this check.
This commit adds the `pooling-allocator` feature to both the `wasmtime` and
`wasmtime-runtime` crates.
The feature controls whether or not the pooling allocator implementation is
built into the runtime and exposed as a supported instance allocation strategy
in the wasmtime API.
The feature is on by default for the `wasmtime` crate.
Closes#3513.
* Change the bump-version workflow's schedule
Either I don't understand cron or GitHub doesn't understand cron. It's
not clear which. I think that
https://github.com/bytecodealliance/wasmtime/pull/3511 may have fallen
within our schedule but it was supposed to be on a weekday. Otherwise
https://github.com/bytecodealliance/wasmtime/pull/3499 was certainly
spurious. This commit moves to a simpler "just do it on the same day
each month" and we can manually figure out weekdays and such. Hopefully
this should reduce the number of spurious PRs we're getting to bump
versions.
This also removes the script to force a version bump since I found a
button on the GitHub UI to do the same thing. Additionally I've updated
the patch-release documentation to use this button. Note that this
button takes inputs as well which means we can further automate patch
releases to look even more like normal release process, differing only
in one part of the argument used to trigger the workflow.
* Fix a typo
* Automate more of Wasmtime's release process
This change revamps the release process for Wasmtime and intends to make
it nearly 100% automated for major release and hopefully still pretty
simple for patch releases. New workflows are introduced as part of
this commit:
* Once a month a PR is created with major version bumps
* Specifically hinted commit messages to the `main` branch will get
tagged and pushed to the main repository.
* On tags we'll now not only build releases after running CI but
additionally crates will be published to crates.io.
In conjunction with other changes this means that the release process
for a new major version of Wasmtime is simply merging a PR. Patch
releases will involve running some steps locally but most of the
nitty-gritty should be simply merging the PR that's generated.
* Use an anchor in a regex
Before this commit we actually have two builders checking for security
advisories on CI, one is `cargo audit` and one is `cargo deny`. The
`cargo deny` builder is slightly different in that it checks a few other
things about our dependency tree such as licenses, duplicates, etc. This
commit removes the advisory check from `cargo deny` on CI and then moves
the `cargo audit` check to a separate workflow.
The `cargo audit` check will now run nightly and will open an issue on
the Wasmtime repository when an advisory is found. This should help make
it such that our CI is never broken by the publication of an advisory
but we're still promptly notified whenever an advisory is made. I've
updated the release process notes to indicate that the open issues
should be double-checked to ensure that there are no open advisories
that we need to take care of.
On the build side, this commit introduces two things:
1. The automatic generation of various ISLE definitions for working with
CLIF. Specifically, it generates extern type definitions for clif opcodes and
the clif instruction data `enum`, as well as extractors for matching each clif
instructions. This happens inside the `cranelift-codegen-meta` crate.
2. The compilation of ISLE DSL sources to Rust code, that can be included in the
main `cranelift-codegen` compilation.
Next, this commit introduces the integration glue code required to get
ISLE-generated Rust code hooked up in clif-to-x64 lowering. When lowering a clif
instruction, we first try to use the ISLE code path. If it succeeds, then we are
done lowering this instruction. If it fails, then we proceed along the existing
hand-written code path for lowering.
Finally, this commit ports many lowering rules over from hand-written,
open-coded Rust to ISLE.
In the process of supporting ISLE, this commit also makes the x64 `Inst` capable
of expressing SSA by supporting 3-operand forms for all of the existing
instructions that only have a 2-operand form encoding:
dst = src1 op src2
Rather than only the typical x86-64 2-operand form:
dst = dst op src
This allows `MachInst` to be in SSA form, since `dst` and `src1` are
disentangled.
("3-operand" and "2-operand" are a little bit of a misnomer since not all
operations are binary operations, but we do the same thing for, e.g., unary
operations by disentangling the sole operand from the result.)
There are two motivations for this change:
1. To allow ISLE lowering code to have value-equivalence semantics. We want ISLE
lowering to translate a CLIF expression that evaluates to some value into a
`MachInst` expression that evaluates to the same value. We want both the
lowering itself and the resulting `MachInst` to be pure and referentially
transparent. This is both a nice paradigm for compiler writers that are
authoring and maintaining lowering rules and is a prerequisite to any sort of
formal verification of our lowering rules in the future.
2. Better align `MachInst` with `regalloc2`'s API, which requires that the input
be in SSA form.
This commit removes the Lightbeam backend from Wasmtime as per [RFC 14].
This backend hasn't received maintenance in quite some time, and as [RFC
14] indicates this doesn't meet the threshold for keeping the code
in-tree, so this commit removes it.
A fast "baseline" compiler may still be added in the future. The
addition of such a backend should be in line with [RFC 14], though, with
the principles we now have for stable releases of Wasmtime. I'll close
out Lightbeam-related issues once this is merged.
[RFC 14]: https://github.com/bytecodealliance/rfcs/pull/14
* Use rsix to make system calls in Wasmtime.
`rsix` is a system call wrapper crate that we use in `wasi-common`,
which can provide the following advantages in the rest of Wasmtime:
- It eliminates some `unsafe` blocks in Wasmtime's code. There's
still an `unsafe` block in the library, but this way, the `unsafe`
is factored out and clearly scoped.
- And, it makes error handling more consistent, factoring out code for
checking return values and `io::Error::last_os_error()`, and code that
does `errno::set_errno(0)`.
This doesn't cover *all* system calls; `rsix` doesn't implement
signal-handling APIs, and this doesn't cover calls made through `std` or
crates like `userfaultfd`, `rand`, and `region`.
* Finished the Markdown Parser Example for Wasmtime
* Made requested changes
* Tiny change to explanation of `--dir` CLI arg
* Add `bash` annotations to shell script code blocks
* Trying to fix the markdown example bug
* Figured out rustdoc, and what needed to be done
* Made requested changes
Co-authored-by: Till Schneidereit <till@tillschneidereit.net>
* Remove unnecessary into_iter/map
Forgotten from a previous refactoring, this variable was already of the
right type!
* Move `wasmtime_jit::Compiler` into `wasmtime`
This `Compiler` struct is mostly a historical artifact at this point and
wasn't necessarily pulling much weight any more. This organization also
doesn't lend itself super well to compiling out `cranelift` when the
`Compiler` here is used for both parallel iteration configuration
settings as well as compilation.
The movement into `wasmtime` is relatively small, with
`Module::build_artifacts` being the main function added here which is a
merging of the previous functions removed from the `wasmtime-jit` crate.
* Add a `cranelift` compile-time feature to `wasmtime`
This commit concludes the saga of refactoring Wasmtime and making
Cranelift an optional dependency by adding a new Cargo feature to the
`wasmtime` crate called `cranelift`, which is enabled by default.
This feature is implemented by having a new cfg for `wasmtime` itself,
`cfg(compiler)`, which is used wherever compilation is necessary. This
bubbles up to disable APIs such as `Module::new`, `Func::new`,
`Engine::precompile_module`, and a number of `Config` methods affecting
compiler configuration. Checks are added to CI that when built in this
mode Wasmtime continues to successfully build. It's hoped that although
this is effectively "sprinkle `#[cfg]` until things compile" this won't
be too too bad to maintain over time since it's also an use case we're
interested in supporting.
With `cranelift` disabled the only way to create a `Module` is with the
`Module::deserialize` method, which requires some form of precompiled
artifact.
Two consequences of this change are:
* `Module::serialize` is also disabled in this mode. The reason for this
is that serialized modules contain ISA/shared flags encoded in them
which were used to produce the compiled code. There's no storage for
this if compilation is disabled. This could probably be re-enabled in
the future if necessary, but it may not end up being all that necessary.
* Deserialized modules are not checked to ensure that their ISA/shared
flags are compatible with the host CPU. This is actually already the
case, though, with normal modules. We'll likely want to fix this in
the future using a shared implementation for both these locations.
Documentation should be updated to indicate that `cranelift` can be
disabled, although it's not really the most prominent documentation
because this is expected to be a somewhat niche use case (albeit
important, just not too common).
* Always enable cranelift for the C API
* Fix doc example builds
* Fix check tests on GitHub Actions
* Reimplement how unwind information is stored
This commit is a major refactoring of how unwind information is stored
after compilation of a function has finished. Previously we would store
the raw `UnwindInfo` as a result of compilation and this would get
serialized/deserialized alongside the rest of the ELF object that
compilation creates. Whenever functions were registered with
`CodeMemory` this would also result in registering unwinding information
dynamically at runtime, which in the case of Unix, for example, would
dynamically created FDE/CIE entries on-the-fly.
Eventually I'd like to support compiling Wasmtime without Cranelift, but
this means that `UnwindInfo` wouldn't be easily available to decode into
and create unwinding information from. To solve this I've changed the
ELF object created to have the unwinding information encoded into it
ahead-of-time so loading code into memory no longer needs to create
unwinding tables. This change has two different implementations for
Windows/Unix:
* On Windows the implementation was much easier. The unwinding
information on Windows is already stored after the function itself in
the text section. This was actually slightly duplicated in object
building and in code memory allocation. Now the object building
continues to do the same, recording unwinding information after
functions, and code memory no longer manually tracks this.
Additionally Wasmtime will emit a special custom section in the object
file with unwinding information which is the list of
`RUNTIME_FUNCTION` structures that `RtlAddFunctionTable` expects. This
means that the object file has all the information precompiled into it
and registration at runtime is simply passing a few pointers around to
the runtime.
* Unix was a little bit more difficult than Windows. Today a `.eh_frame`
section is created on-the-fly with offsets in FDEs specified as the
absolute address that functions are loaded at. This absolute
address hindered the ability to precompile the FDE into the object
file itself. I've switched how addresses are encoded, though, to using
`DW_EH_PE_pcrel` which means that FDE addresses are now specified
relative to the FDE itself. This means that we can maintain a fixed
offset between the `.eh_frame` loaded in memory and the beginning of
code memory. When doing so this enables precompiling the `.eh_frame`
section into the object file and at runtime when loading an object no
further construction of unwinding information is needed.
The overall result of this commit is that unwinding information is no
longer stored in its cranelift-data-structure form on disk. This means
that this unwinding information format is only present during
compilation, which will make it that much easier to compile out
cranelift in the future.
This commit also significantly refactors `CodeMemory` since the way
unwinding information is handled is not much different from before.
Previously `CodeMemory` was suitable for incrementally adding more and
more functions to it, but nowadays a `CodeMemory` either lives per
module (in which case all functions are known up front) or it's created
once-per-`Func::new` with two trampolines. In both cases we know all
functions up front so the functionality of incrementally adding more and
more segments is no longer needed. This commit removes the ability to
add a function-at-a-time in `CodeMemory` and instead it can now only
load objects in their entirety. A small helper function is added to
build a small object file for trampolines in `Func::new` to handle
allocation there.
Finally, this commit also folds the `wasmtime-obj` crate directly into
the `wasmtime-cranelift` crate and its builder structure to be more
amenable to this strategy of managing unwinding tables.
It is not intentional to have any real functional change as a result of
this commit. This might accelerate loading a module from cache slightly
since less work is needed to manage the unwinding information, but
that's just a side benefit from the main goal of this commit which is to
remove the dependence on cranelift unwinding information being available
at runtime.
* Remove isa reexport from wasmtime-environ
* Trim down reexports of `cranelift-codegen`
Remove everything non-essential so that only the bits which will need to
be refactored out of cranelift remain.
* Fix debug tests
* Review comments