* Implement `Display` for `wasmtime-types` things
We were pretty much already implementing it already, and it is generally useful.
* Fix compiler errors (on non-nightly rust? shrug)
* A handful of little tweaks for GC types
Splitting this out from a larger PR so that it is easier to review and
everything can land quicker.
* Don't use `Cow`, just use references
This commit is aimed at fixing an accidental regression from #8861 where
the `wasmtime serve` subcommand stopped reporting some instances of
authority and scheme. After discussion in #8878 the new logic
implemented is:
* Creation of an incoming request now explicitly requires specifying a
scheme which is out-of-band information about how the surrounding
server received the request.
* The authority is stored separately within a request and is inferred
from the URI's authority or the `Host` header if present. If neither
are present then an error is returned.
Closes#8878
This commit fixes an issue with static memory initialization and custom
page sizes interacting together on aarch64 Linux. (is that specific
enough?)
When static memory initialization is enabled chunks of memory to
initialize the linear memory are made in host-page-size increments of
memory. This is done to enable page-mapping via copy-on-write if
customized. With the custom page sizes proposal, however, for the first
time it's possible for a linear memory to be smaller than this chunk of
memory. This means that a virtual memory allocation of a single host
page can be made which is smaller than the initialization chunk.
This currently only happens on aarch64 Linux where we coarsely
approximate that the host page size is 64k but many hosts run with 4k
pages. This means that a 64k initializer is created but the host only
allocates 4k for a linear memory. This means that memory initialization
can crash when a 64k initializer is copied into a 4k memory.
This was not caught via fuzzing because fuzzing only runs on x86_64.
This was not caught via CI because on CI guard pages are disabled
entirely on QEMU and we got lucky in that a number of virtual memory
allocations were all placed next to each other meaning that this copy
was probably corrupting some other memory. Locally this was found by
running tests on `main` as-is on AArch64 Linux (by bjorn3).
This commit implements a few safeguards and a fix for this issue:
* On CI with QEMU modestly-size guard pages are now enabled to catch
this sooner in development should it happen again in the future.
* An `assert!` is added during memory initialization that the memory
copy is indeed valid. This causes the tests to fail as-is on `main`
even on x86_64.
* The issue itself is fixed by bailing out of static memory
initialization should the host page size exceed the wasm page size
which can now happen on aarch64 Linux with smaller page sizes.
* closes#2827: remove srclocs from bugpoint
* cranelift(fix): bugpoint only removes sourcelocs if function still crashes
* fix formatting
---------
Co-authored-by: zleyyij <zleyyij@users.noreply.github.com>
* Add wasi adapter provider template which is materialised in CI
* Add rustfmt component to adapter CI
* Draft an extra publish step for the adapter provider
* Check adapter provider in a separate step with adapter artifacts
* Use artifact downloads in the publish action as well
* Record results from adapter provider step as well
* Refactor to use composite actions
* Add missing shell property
* Fix spelling mistake
* Try using the env context
I'm not entirely sure what causes this but Wasmtime shouldn't panic with
invalid DWARF. In general (#5537) Wasmtime's support for DWARF needs to
be rewritten, but in the meantime let's play whack-a-mole with panics
and try to paper over issues.
Closes#8884Closes#8904
* Wasmtime: Pop GC LIFO roots even when there is no GC heap
We can create and root `i31ref`s without ever allocating a GC heap for the
store, so we can't guard popping LIFO roots on the presence of a GC heap or else
we risk unbounded growth of the LIFO root set.
* Fix build with the gc feature disabled
* Update the wasm-tools family of crates
This notably brings in a limitation where component model flags types
must have 32 or fewer flags in accordance with the transition plan of
https://github.com/WebAssembly/component-model/issues/370. A feature
flag is added to go back to the previous behavior to avoid breaking
anyone too much.
This additionally brings in a fix for a panic when validating invalid
modules with tail calls.
* Add vet entries
* Use Ubuntu-16.04 for x86_64-linux binary-compatible-builds
* Revert "Use Ubuntu-16.04 for x86_64-linux binary-compatible-builds"
This reverts commit 5625941dee.
* Use AlmaLinux 8
prtest:full
Fixes: https://github.com/bytecodealliance/wasmtime/issues/8848
Similar to all the control instructions, any state must be explicitly
saved before emitting the code for `br_if`.
This commit ensures that live locals and registers are explicilty saved
before emitting the code for `br_if`. Prior to this commit, live
locals and registers were not saved every time causing incorrect
behavior in cases where the calculation of the conditional argument
didn't trigger a spill.
This change introduces the explicit spill after calculating the branch
condition argument to minimize memory traffic in case the conditional is
already in a register.
This commit fixes writes to stdout/stderr which don't end in a newline
to not get split across lines with a prefix on each line. Instead
internally a flag is used to track whether a prefix is required at the
beginning of each chunk.
* wasi-nn: use resources
Recent discussion in the wasi-nn proposal (see [wasi-nn#59], e.g.) has
concluded that the right approach for representing wasi-nn "things"
(tensors, graph, etc.) is with a component model _resource_. This
sweeping change brings Wasmtime's implementation in line with that
decision.
Initially I had structured this PR to remove all of the WITX-based
implementation (#8530). But, after consulting in a Zulip [thread] on
what other WASI proposals aim to do, this PR pivoted to support _both_`
the WITX-based and WIT-based ABIs (e.g., preview1 era versus preview2,
component model era). What is clear is that the WITX-based specification
will remain "frozen in time" while the WIT-based implementation moves
forward.
What that means for this PR is a "split world" paradigm. In many places,
we have to distinguish between the `wit` and `witx` versions of the same
thing. This change isn't the end state yet: it's a big step forward
towards bringing Wasmtime back in line with the WIT spec but, despite my
best efforts, doesn't fully fix all the TODOs left behind over several
years of development. I have, however, taken the liberty to refactor and
fix various parts as I came across them (e.g., the ONNX backend). I plan
to continue working on this in future PRs to figure out a good error
paradigm (the current one is too wordy) and device residence.
[wasi-nn#59]: https://github.com/WebAssembly/wasi-nn/pull/59
[thread]: https://bytecodealliance.zulipchat.com/#narrow/stream/219900-wasi/topic/wasi-nn's.20preview1.20vs.20preview2.20timeline
prtest:full
* vet: audit `ort`-related crate updates
* Simplify `WasiNnView`
With @alexcrichton's help, this change removes the `trait WasiNnView`
and `struct WasiNnImpl` wrapping that the WIT-based implementation used
for accessing the host context. Instead, `WasiNnView` is now a `struct`
containing the mutable references it needs to make things work. This
unwraps one complex layer of abstraction, though it does have the
downside that it complicates CLI code to split borrows of `Host`.
* Temporarily disable WIT check
* Refactor errors to use `trappable_error_type`
This change simplifies the return types of the host implementations of
the WIT-based wasi-nn. There is more work to be done with errors, e.g.,
to catch up with the upstream decision to return errors as resources.
But this is better than the previous mess.
* Cranelift: Take user stack maps through lowering and emission
Previously, user stack maps were inserted by the frontend and preserved in the
mid-end. This commit takes them from the mid-end CLIF into the backend vcode,
and then from that vcode into the finalized mach buffer during emission.
During lowering, we compile the `UserStackMapEntry`s into packed
`UserStackMap`s. This is the appropriate moment in time to do that coalescing,
packing, and compiling because the stack map entries are immutable from this
point on.
Additionally, we include user stack maps in the `Debug` and disassembly
implementations for vcode, just after their associated safepoint
instructions. This allows us to see the stack maps we are generating when
debugging, as well as write filetests that check we are generating the expected
stack maps for the correct instructions.
Co-Authored-By: Trevor Elliott <telliott@fastly.com>
* uncomment debug assert that was commented out for debugging
* Address review feedback
* remove new method that was actually never needed
---------
Co-authored-by: Trevor Elliott <telliott@fastly.com>
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.
Moreover, in #8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.
In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in #8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.
Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
* riscv64: Increase max inst size
* riscv64: Emit islands in return call sequence
* riscv64: Update worst case size tests
Having duplicate registers was preventing
some moves from being generated
* Improve some documentation of the `wasmtime-wasi` crate
Show a few examples of using `with` to point to upstream `wasmtime-wasi`
for bindings.
* Refactor and document the `wasmtime-wasi-http` more
This commit primarily adds a complete example of using
`wasmtime-wasi-http` to the documentation. Along the way I've done a
number of other refactorings too:
* `bindgen!`-generated `*Pre` structures now implement `Clone`.
* `bindgen!`-generated `*Pre` structures now have an `engine` method.
* `bindgen!`-generated `*Pre` structures now have an `instance_pre` method.
* The structure of `wasmtime-wasi-http` now matches `wasmtime-wasi`,
notably:
* The `proxy` module is removed
* `wasmtime_wasi_http::add_to_linker_{a,}sync` is the top level
add-to-linker function.
* The `bindings` module now contains `Proxy` and `ProxyPre` along with
a `sync` submodule.
* The `bindings` module contains all bindings for `wasi:http` things.
* The `add_only_*` methods are un-hidden and documented.
* Code processing `req` has been simplified by avoiding
decomposing-and-reconstructing a request.
* The `new_incoming_request` method is now generic to avoid callers
having to do boxing/mapping themselves.
* Update expanded macro expectations
* Remove unused import
This reduces the size of wasi_snapshot_preview1.command.wasm from 75029
bytes to 52212 bytes for a total win of 22817 bytes. This is done by
deduplicating most of the trap messages and the code for printing those
trap messages. Also got some small wins by making the assertion message
shorter.
This reduces the size of wasi_snapshot_preview1.command.wasm from 79625
bytes to 75029 bytes for a total win of 4596 bytes. Of this reduction
enabling LTO is responsible for 3103 bytes, while enabling bulk-memory
is responsible for 1493 bytes
* upgrade to wasm-tools 0.211.1
* code review
* cargo vet: auto imports
* fuzzing: fix wasm-smith changes
* fuzzing: changes for HeapType
* Configure features on `Parser` when parsing
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
The identifier for the `cold` calling convention overlaps with the
`cold` keyword for basic blocks so handle another kind of token when
parsing signatures.
The epoch interruption implementation caches the current deadline in a
register, and avoids reloading that cache until the cached deadline has
passed.
However, the first epoch check happens immediately after the cache has
been populated on function entry, so there's never a reason to reload
the cache at that point. It only needs to be reloaded in loops. So this
commit eliminates the double-check on function entry.
When Cranelift optimizations are enabled, the alias analysis pass
correctly detected that this load was redundant, and the egraph pass
optimized away the `icmp` as well. However, since we don't do
conditional constant propagation, the branch couldn't be optimized away.
On x86 this lowered to a redundant `cmp`/`jae` pair of instructions in
every function entry, which this commit eliminates.
To keep track of what code we're generating for epoch interruptions,
I've also added disas tests with a trivial infinite loop.
This was accidentally broken in #8692. It turns out bitcasts from i128 to i128 are legal, that PR accidentally reverted that use case.
This is now added to a runtest to ensure it works on all platforms.
This commit raises the default setting of `max_memory_size` in the
pooling allocator from 10M to 4G. This won't actually impact the virtual
memory reserved in the pooling allocator because we already reserved 6G
of virtual memory for each linear memory this instead allows access to
all of it by default. This matches the default behavior of Wasmtime for
the non-pooling allocator which is to not artificially limit memory by
default.
The main impact of this setting is that the memory-protection-keys
feature, which is disabled by default, will have no effect by default
unless `max_memory_size` is also configured to something smaller than
4G. The documentation has been updated to this effect.
Closes#8846
I noticed that the wasm_memory64 flag was left out of Config's debug impl,
so rather than add it, I decided to use the `bitflags::Flags::FLAGS`
const to iterate the complete set of flags.
THe downside of this change is that it will print flags which do not
have a setter in Config, e.g. `wasm_component_model_nested_names`.
An alternative to this change is, rather than expanding out the single
`features: WasmFeatures` member into many different debug_struct fields,
the debug impl of WasmFeatures is used.
Here is a sample debug of Config with this change:
Config { debug_info: None, wasm_mutable_global: true, wasm_saturating_float_to_int: true, wasm_sign_extension: true, wasm_reference_types: true, wasm_multi_value: true, wasm_bulk_memory: true, wasm_simd: true, wasm_relaxed_simd: false, wasm_threads: false, wasm_shared_everything_threads: false, wasm_tail_call: false, wasm_floats: true, wasm_multi_memory: false, wasm_exceptions: false, wasm_memory64: false, wasm_extended_const: false, wasm_component_model: false, wasm_function_references: false, wasm_memory_control: false, wasm_gc: false, wasm_custom_page_sizes: false, wasm_component_model_values: false, wasm_component_model_nested_names: false, parallel_compilation: true, compiler_config: CompilerConfig { strategy: Some(Cranelift), target: None, settings: {"opt_level": "speed", "enable_verifier": "true"}, flags: {}, cache_store: None, clif_dir: None, wmemcheck: false }, parse_wasm_debuginfo: false }
This commit removes the `simm32` extractor from lowerings as it's not as
useful as it was when it was first introduced. Nowadays an `Imm64` needs
to be interpreted with the type known as well to understand whether bits
being masked off is significant or not. The old `simm32` extractor only
took `Imm64` meaning that it was unable to do this and wouldn't match
negative numbers. This is because the high 32 bits of `Imm64` were
always zero and `simm64` would take the `i64` value from `Imm64` and try
to convert it to an `i32`.
This commit replaces `simm32`, and uses of it, with a new extractor
`i32_from_iconst`. This matches the preexisting `i64_from_iconst` and is
able to take the type of the value into account and produce a correctly
sign-extended value.
cc #8706
* Add tests for patterns I'm about to optimize
* x64: Optimize vector compare-and-branch
This commit implements lowering optimizations for the `vall_true` and
`vany_true` CLIF instructions when combined with `brif`. This is in the
same manner as `icmp` and `fcmp` combined with `brif` where the result
of the comparison is never materialized into a general purpose register
which helps lower register pressure and remove some instructions.
* x64: Optimize `vconst` with an all-ones pattern
This has a single-instruction lowering which doesn't load from memory so
it's probably cheaper than loading all-ones from memory.
* cranelift-entity: Implement `EntitySet` in terms of `cranelift_bitset::CompoundBitSet`
* Shrink the size of `CompoundBitSet` so we don't perturb vmctx size test expectations
* Update vmctx size test expectations anyways because we shrunk "too much"
* Move `cranelift-bitset` to the front of `CRATES_TO_PUBLISH`