This patch fixes how jumps are handled in `br_table`; prior to this
change, `br_table` was implemented using
`CodeGenContext::unconditional_jump`; this function ensures, among other
invariants that the value stack and stack pointer must be balanced
according to the expectation of the target branch. Even though in
`br_table` there's branch to a potentially known location, it's
impossible be certain at compile time, which branch will be taken; in
that regard, `br_table` behaves more like `br_if`. Using
`unconditional_jump` resulted in the stack being manipulated multiple
times and breaking the other existing invariants around stack balancing.
This commit makes it so that `br_table` doesn't rely on
`unconditional_jump` anymore and instead it delegates control flow to
the target branch, which will ensure that the value stack and stack
pointer are correctly balanced when restoring reachability, very similar
to what happens with `br_if`.
This issue was discovered while fuzzing and a file test is included with
the test case.
* ISLE: Allow callers to pass in storage for multi returns
This allows callers to reuse storage and avoid allocations.
Internal constructors calling internal constructors won't reuse storage (at
least for now) but it is an option for the user code that is calling into ISLE.
Part of #7500
* Cranelift: Reuse storage for optimized values coming out of ISLE in the midend
* Make child fields immutable
* Add `get_fields` and remove `FieldMapMutability`
Clean up the interface to immutable fields by adding a different
accessor.
* Clean up the diff
* delete wasi-clocks timezone interface: import wit changes from https://github.com/WebAssembly/wasi-clocks/pull/55
* remove other references to wasi:clocks/timezone in wits
* remove todo! implementation of clocks/timezone and add_to_linker funcs
This commit improves unconditional jumps by balancing the stack pointer
as well as the value stack when the current stack pointer and value
stack are greater than the target stack pointer and value stack. The invariant that
this changes maintains is that the the value stack should always reflect
the the state of the machine stack. The value stack might have excess
stack values in a presence of a fallthrough (`br_if` or `br_table`) in
which the target branch is not known at compile time; in this situation
instructions like `return` or `br` discard any excess values.
Currently `cargo run test filetests` in the `cranelift` directory prints
a bunch of `ERROR:` logs for the pcc tests which I'm assuming is
expected as they're testing failures. This commit turns down the log
level to `info!` so it's suppressed by default in this situation.
* mpk: reenable MPK support with vendor string check
In #7446 I disabled MPK support temporarily due to failures in CI runs.
Looking into this further in #7445, I discovered that it is due to how
`has_cpuid_bit_set` works on different x86 machines: Intel's `CPUID`
instruction reports support for MPK in a certain leaf bit, AMD does it
some other (unknown?) way. The CI problem boiled down to occasional runs
on AMD machines that would fail with `SIGILL` because the AMD machine
reported that it had MPK support when it really did not. This change
fixes the issue by first checking if the CPU vendor string is
`GenuineIntel` before inspecting the MPK `CPUID` leaf bit.
Closes#7445.
* review: use `u32::from_le_bytes` to self-document the string check
* Remove redundant unicast prefix.
* Add is-listening / SO_ACCEPTCONN
* Be stricter in what implementations must and must not accept.
* Add remaining keep-alive options
prtest:full
* MacOS fix
Depending on which backends were enabled or not at compile time we might or
might not use some of these imports, which results in warnings. Moved imports
and re-exports around a little to avoid these warnings.
No functional changes.
This commit fixes a mistake where these tests were not being run on
macOS after signal handling was moved behind a `Config` flag instead of
being a crate feature. Additionally these tests are moved to their own
test file to ensure that each process either uses Mach Ports or signals
(this one uses signals which is non-default).
This commit properly derives a scratch register for a particular
WebAssembly type. The included spec test uncovered that the previous
implementation used a int scratch register to assign float stack
arguments, which resulted in a panic.
This increases instruction-level parallelism and shrinks live ranges. It also
canonicalizes into the shallow-and-wide form for reassociating constants
together for cprop.
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Trevor Elliott <telliott@fastly.com>
* Cranelift: Switch egraph `Cost` to a struct with named fields
Mechanical change.
* Cranelift: Break op cost ties with expression depth in egraphs
This means that, when the opcode cost is the same, we prefer shallow and wide
expressions to narrow and deep. For example, `(a + b) + (c + d)` is preferred to
`((a + b) + c) + d`. This is beneficial because it exposes more
instruction-level parallelism and shortens live ranges.
Co-Authored-By: Trevor Elliott <telliott@fastly.com>
* Cranelift: Bitpack the egraph `Cost` structure
Co-Authored-By: Chris Fallin <chris@cfallin.org>
Co-Authored-By: Trevor Elliott <telliott@fastly.com>
* Make it so you can't construct `Cost::inifinity()` by accident
* Use fold to code golf
---------
Co-authored-by: Trevor Elliott <telliott@fastly.com>
Co-authored-by: Chris Fallin <chris@cfallin.org>
* Cranelift: Fix union node bitpacking
It turns out we have just been taking the newest rewrite's value for a eclass
union and never actually comparing costs and taking the value with the minimum
cost. Whoops!
Fixing this made some test expectations fail, which we resolved by tweaking the
cost function to give materializing constants nonzero cost. This way we prefer
`-x` to `0 - x`.
We also made elaboration function break ties between values with the same cost
with the value index. It prefers larger value indices, since the original
value's index will be lower than all of its rewritten values' indices. This
heuristically prefers rewritten values because we hope our rewrites are all
improvements even when the cost function can't show that.
Co-Authored-By: Chris Fallin <chris@cfallin.org>
Co-Authored-By: Trevor Elliott <telliott@fastly.com>
* Add more information to assertion message
* Fix off-by-one bug in assertion
* Limit number of matches consumed from ISLE
We generally want to clamp down and avoid runaway behavior here.
But there also seems to be some sort of rustc/llvm bug on Rust 1.71 that is
causing iteration to wild here. This commit avoids that bug.
* Update test expectation
* prtest:full
---------
Co-authored-by: Chris Fallin <chris@cfallin.org>
Co-authored-by: Trevor Elliott <telliott@fastly.com>
On PRs it's helpful to see full failures if they arise, but for non-PRs
they're typically in the background so continuing after a job fails
isn't going to help much.
Pull in bytecodealliance/wasm-tools#1277 to switch the defaults for some
WIT behaviors:
* Semicolons are now required by default in WIT files
* Wasm-encoded WIT packages now use the "new" format by default.
WIT files without semicolons can still be parsed with
`WIT_REQUIRE_SEMICOLONS=0`. The "old" format for wasm-encoded WIT
packages can be emitted via `WIT_COMPONENT_ENCODING_V2=0`. Note that for
wasm-encoded WIT packages both the old and the new format can be decoded
irregardless of env vars.
* Refactor ip_name_lookup test
* Update ip-name-lookup::resolve-addresses
- Remove the non-essential parameters
- Lift the restriction against parsing IP addresses. Implementations would still have to parse IP addresses to decide whether or not to return an error
* Deduplicate to_canonical
* Refactor & move utilities. So that UDP can use them too.
* Minor tweak in error code while connecting
* Refactor: move portability workarounds out of the TCP implementation.
* Sync up UDP implementation and tests
based on the existing TCP implementation & tests.
* Make test less flaky in CI.
* Remove a writer task in wasi-http
This commit updates the wasi-http `BodyWriteStream` implementation to
avoid an extra task spawned to manage the outgoing body. Some local
profiling shows that a "hello world" component currently maxes out at
500rps locally which is suprisingly low.
Some investigation shows that one of the primary reasons for this is
concurrency-related issues. For example `taskset -c 1 wasmtime serve ...`
has a significantly better rps than without `taskset`. My hypothesis is
that the extra tasks for both reading the incoming body and writing the
outgoing body are introducing opportunities for the guest to block and
get work-stolen to another thread in the "happy path" where reads/writes
are immediately ready and don't need blocking.
For example writing to the body currently sends the chunk of bytes to a
task which then sends it to another task. This means that if a guest
writes some bytes and then waits for readiness it'll synchronize on
these bytes by doing a round-trip with another task. This opportunity
for blocking apparently lets something, be it Linux or the Tokio
scheduler, to work-steal the wasm task to another thread.
Once a wasm task is work-stolen to another thread it can significantly
increase the cost of VM management syscalls because any happy path the
kernel might have for a single thread is no longer applicable, meaning
that many costs start shooting through the roof.
The fix here in this PR is to remove a helper task when writing HTTP
bodies. This extra task didn't end up doing much and was largely just a
proxy into a channel, so the channel is now directly manipulated
instead. Namely the `flush` implementation in the old worker task didn't
actually do anything and so the only thing that needs to be managed in
the stream is sending chunks along through the channel. This proved
relatively easy and enables the "happy path" where if a body is written
and then flushed it's immediately "ready" so no opportunity for a
context switch happens and wasm stays planted on its thread.
The end result of this PR is that `wasmtime serve` goes from 500rps to
5000rps (10x increase) for the simple "hello world" example that I'm
working with.
* Don't buffer up a pending message
* PCC: refactor heap memory-type creation to allow for use of GVs in facts.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: WIP: add some facts for dynamic memories.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: WIP: add fact to loaded length on dynamic memories.
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* PCC: add facts for dynamic bounds-checks and extend PCC verification as needed.
This commit also adds an integration test to Wasmtime that checks all
dynamic and static memory cases on x86-64 and aarch64.
Test expected-output changes are due to a change from `iadd_imm(x, -k)`
to `isub(x, iconst(k))` in bounds-check code to facilitate facts on the
computation.
* turn off trace-log in cranelift-codegen.
* rustfmt
* Review feedback.
* Avoid unused-item warnings on s390x/riscv64 in pcc test.
* ignore PCC test in miri run.
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Rename header-error to validation-error
* Make outbound-request setters fallible
* Add additional tests for setters
* Revert changes to header-error, and use a plain `result` for validation
* Doc fix for fallible setters
* Import fussing
This commit is in the same spirit as #7461 and is aimed at removing an
extra task when interoperating between WebAssembly and Hyper. The
intention here is to ensure that when data is ready immediately a
context switch between Tokio tasks isn't necessary. I'm still not sure
why this context switch would tank performance so much.
The goal of this commit is to remove the Tokio task associated with
reading bodies. Currently a task is spawned with two channels coming
out: one for frames and one for trailers. Instead the body is now read
directly when data frames are read and trailers are managed similarly.
The interactions here are somewhat nontrivial due to the management of
resources and how the body needs to jump from the body stream object to
the trailers object, but it's overall not the worst thing in the world.
Locally on Linux this takes a "hello world" component which starts out
by reading its HTTP body and then responds with a string from 6.7k rps
locally via `wasmtime serve` to 16.8k rps. This restores the performance
of wasi-http to be in-line with the historical SDK that Fermyon offered,
for example, that's based on a custom HTTP interface. Note that these
performance numbers use #7461 as a baseline.
This commit has a downside, however, that if the incoming body is not
being subscribed to actively by the wasm then no activity will happen on
the host. This is a function of `poll` isn't being actively watched
unless `subscribe`'s result is being actively watched via a wasm call to
`wasi:io/poll`. Manual investigation of Hyper shows that this is
probably ok for now since bodies are primarily channel-based which seems
like the tasks doing the actual I/O are managed elsewhere. In that sense
this shouldn't cause any immediate impact, but Hyper's situation in the
future could change or my analysis could additionally be wrong. Given
the performance benefits here though I'd be tempted to push in favor of
this commit and handle "poll in the background" as necessary in the
future if needed.
* Print the error from the wasi-http incoming handler to stderr
Prior to this change, the error was never printed out, making it a challenge to figure out what caused the error
* use log::error instead of eprintln
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Update src/commands/serve.rs
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Enable copy-on-write memories for components
This commit fixes a mistake in component compilation where copy-on-write
and lazy initialization of function tables was accidentally not
performed. I believe this is an accidental regression from a previous
refactor, and this commit now ensures that the shared infrastructure
between components and core modules accounts for copy-on-write and lazy
table initialization.
* Add some tests
This change is a follow up to https://github.com/bytecodealliance/wasmtime/pull/7443;
after it landed I realized that Winch doesn't include spec tests for
local.get and loca.set.
Those tests uncovered a bug on the handling of the constant pool: given
Winch's singlepass nature, there's very little room know all the
constants ahead of time and to register them all at once at emission
time; instead they are emitted when they are needed by an instruction.
Even though Cranelift's machinery is capable of deuplicated constants in
the pool, `register_constant` assumes and checks that each constat
should only be pushed once. In Winch's case, since we emit as we go, we
need to carefully check if the constant is one was not emitted before,
and if that's the case, register it. Else we break the invariant that
each constant should only be registered once.
This allows clangd to not import the headers in the wasmtime directory,
as not all of them include their dependencies correctly. This means that
clangd can import a header directly and it breaks the build, as the
order `wasmtime.h` #includes is important.
See: https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
* Enable `wasmtime/wat` when the `wat` feature is enabled on the CLI
* Additionally always enable the `wasmtime/wat` feature for the
`wasmtime-wast` crate since the text format is used by some tests.
Closes#7460
* aarch64: Add support for `tbz` and `tbnz`
I noticed these instructions when glancing at some disassembly for other
code and also noticed that Cranelift didn't have support for them. This
adds a few new lowerings for conditional branches in the aarch64 backend
which are for testing a bit and branching if it's zero or not zero.
* Review comments on naming