* its tracing-subscriber now, not pretty_env_logger
* point to tracing-subscribers docs on filter directives
* correct invocation of wasmtime in the example
* Add release binaries for x86_64-musl
This was requested in bytecodealliance/wasmtime-py#237 and shouldn't
cost us too much in terms of CI resources and maintenance overhead.
* Fix combining rustflags
prtest:full
* Update wit-bindgen
This commit updates wit-bindgen to 0.25 and applies some "extra
trickery" to work around the now-default providing of the realloc
symbol.
* Add audits
We intend to use this when computing liveness of GC references in
`cranelift-frontend` to manually construct safepoints and ultimately remove
`r{32,64}` reference types from CLIF, `cranelift-codegen`, and `regalloc2`.
Co-authored-by: Trevor Elliott <telliott@fastly.com>
* cranelift: Always consider sret arguments used
In #8438 we stopped emitting register bindings for unused arguments,
based on the use-counts from `compute_use_states`. However, that doesn't
count the use of the struct-return argument that's automatically added
after lowering when the `rets` instruction is generated in the epilogue.
As a result, using a struct-return argument caused register allocation
to panic due to the VReg not being defined anywhere.
This commit adds a use to the struct-return argument so that it's always
available in the epilogue.
Fixes#8659
* Review comments
* Add Cranelift and Winch features to the C API
This commit adds `cranelift` and `winch` features to the C API and
enables them by default. This means that the C API can now be built
without compiler support to only support loading precompiled binaries.
Closes#7349
* Fix doc link
* More doc fixes
* Add more doc input dirs
This commit adds the `winch` feature to the default feature set of the
`wasmtime-cli` package meaning that the `wasmtime` CLI will, by default,
have support for the Winch compiler.
* Refactor installation of C API and features supported
This commit overhauls and refactors the management of the building of
the C API. Instead of this being script-based it's now done entirely
through CMake and CMake is the primary focus for building the C API. For
example building the C API release artifacts is now done through CMake
instead of through a shell script's `cargo build` and manually moving
artifacts.
The benefits that this brings are:
* The C API now properly hides symbols in its header files that weren't
enabled at build time. This is done through a new build-time generated
`conf.h` templated on a `conf.h.in` file in the source tree.
* The C API's project now supports enabling/disabling Cargo features to
have finer-grained support over what's included (plus auto-management
of the header file).
* Building the C API and managing it is now exclusively done through
CMake. For example invoking `doxygen` now lives in CMake, installation
lives there, etc.
The `CMakeLists.txt` file for the C API is overhauled in the process of
doing this. The build now primarily matches on the Rust target being
built rather than the host target. Additionally installation will now
install both the static and shared libraries instead of just one.
Additionally during this refactoring various bits and pieces of
Android-specific code were all removed. Management of the C toolchain
feels best left in scope of the caller (e.g. configuring `CC_*` env vars
and such) rather than here.
prtest:full
* Don't use `option` for optional strings
* Invert release build check
Also adjust some indentation
* Fix more indentation
* Remove no-longer-used variable
* Reduce duplication in feature macro
This commit enables the `Func::new` constructor and related other
functions when `cranelift` and `winch` features are both disabled,
meaning this is now available in compiler-less builds. This builds on
the support of #8629.
* Update the frame layout comment
* Remove more references to nominal SP
* Remove the nominal_sp_offset from backend emit states
* Continue removing references to the nominal sp
* Remove nominal-sp from the aarch64 backend
* Remove nominal-sp from the s390x backend
* Remove nominal-sp from the riscv64 backend
* Remove old comment
* Remove the native ABI calling convention from Wasmtime
This commit proposes removing the "native abi" calling convention used
in Wasmtime. For background this ABI dates back to the origins of
Wasmtime. Originally Wasmtime only had `Func::call` and eventually I
added `TypedFunc` with `TypedFunc::call` and `Func::wrap` for a faster
path. At the time given the state of trampolines it was easiest to call
WebAssembly code directly without any trampolines using the native ABI
that wasm used at the time. This is the original source of the native
ABI and it's persisted over time under the assumption that it's faster
than the array ABI due to keeping arguments in registers rather than
spilling them to the stack.
Over time, however, this design decision of using the native ABI has not
aged well. Trampolines have changed quite a lot in the meantime and it's
no longer possible for the host to call wasm without a trampoline, for
example. Compilations nowadays maintain both native and array
trampolines for wasm functions in addition to host functions. There's a
large split between `Func::new` and `Func::wrap`. Overall, there's quite
a lot of weight that we're pulling for the design decision of using the
native ABI.
Functionally this hasn't ever really been the end of the world.
Trampolines aren't a known issue in terms of performance or code size.
There's no known faster way to invoke WebAssembly from the host (or
vice-versa). One major downside of this design, however, is that
`Func::new` requires Cranelift as a backend to exist. This is due to the
fact that it needs to synthesize various entries in the matrix of ABIs
we have that aren't available at any other time. While this is itself
not the worst of issues it means that the C API cannot be built without
a compiler because the C API does not have access to `Func::wrap`.
Overall I'd like to reevaluate given where Wasmtime is today whether it
makes sense to keep the native ABI trampolines. Sure they're supposed to
be fast, but are they really that much faster than the array-call ABI as
an alternative? This commit is intended to measure this.
This commit removes the native ABI calling convention entirely. For
example `VMFuncRef` is now one pointer smaller. All of `TypedFunc` now
uses `*mut ValRaw` for loads/stores rather than dealing with ABI
business. The benchmarks with this PR are:
* `sync/no-hook/core - host-to-wasm - typed - nop` - 5% faster
* `sync/no-hook/core - host-to-wasm - typed - nop-params-and-results` - 10% slower
* `sync/no-hook/core - wasm-to-host - typed - nop` - no change
* `sync/no-hook/core - wasm-to-host - typed - nop-params-and-results` - 7% faster
These numbers are a bit surprising as I would have suspected no change
in both "nop" benchmarks as well as both being slower in the
params-and-results benchmarks. Regardless it is apparent that this is
not a major change in terms of performance given Wasmtime's current
state. In general my hunch is that there are more expensive sources of
overhead than reads/writes from the stack when dealing with wasm values
(e.g. trap handling, store management, etc).
Overall this commit feels like a large simplification of what we
currently do in `TypedFunc`:
* The number of ABIs that Wasmtime deals with is reduced by one. ABIs
are pretty much always tricky and having fewer moving parts should
help improve the understandability of the system.
* All of the `WasmTy` trait methods and `TypedFunc` infrastructure is
simplified. Traits now work with simple `load`/`store` methods rather
than various other flavors of conversion.
* The multi-return-value handling of the native ABI is all gone now
which gave rise to significant complexity within Wasmtime's Cranelift
translation layer in addition to the `TypedFunc` backing traits.
* This aligns components and core wasm where components always use the
array ABI and now core wasm additionally will always use the array ABI
when communicating with the host.
I'll note that this still leaves a major ABI "complexity" with respect
to native functions do not have a wasm ABI function pointer until
they're "attached" to a `Store` with a `Module`. That's required to
avoid needing Cranelift for creating host functions and that property is
still true today. This is a bit simpler to understand though now that
`Func::new` and `Func::wrap` are treated uniformly rather than one being
special-cased.
* Fix miri unsafety
prtest:full
* Use bytes for maximum size of linear memory with pooling
This commit changes configuration of the pooling allocator to use a
byte-based unit rather than a page based unit. The previous
`PoolingAllocatorConfig::memory_pages` configuration option configures
the maximum size that a linear memory may grow to at runtime. This is an
important factor in calculation of stripes for MPK and is also a
coarse-grained knob apart from `StoreLimiter` to limit memory
consumption. This configuration option has been renamed to
`max_memory_size` and documented that it's in terms of bytes rather than
pages as before.
Additionally the documented constraint of `max_memory_size` must be
smaller than `static_memory_bound` is now additionally enforced as a
minor clean-up as part of this PR as well.
* Review comments
* Fix benchmark build
* cranelift: expand umbrella crate with more crates
* Break the dependency cycle between cranelift-jit and cranelift
---------
Co-authored-by: Trevor Elliott <telliott@fastly.com>
* gen_nominal_sp_adj now returns a smallvec
* Remove the virtual sp offset from the x64 backend
* Remove the virtual sp offset from the aarch64 backend
* Remove the virtual sp offset from the riscv64 backend
* Remove the virtual sp offset from the s390x backend
* Remove gen_nomninal_sp_adj, and argument area management functions
* Remove get_virtual_sp_offset_from_state
* Code review suggestions
* Use WASM function names in compiled objects
Instead of generating symbol names in the format
"wasm[$MODULE_ID]::function[$FUNCTION_INDEX]", generate (if possible)
something more readable, such as "wasm[$MODULE_ID]::$FUNCTION_NAME".
This helps when debugging or profiling the generated code.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Ensure symbol names are cleaned up and have function indexes
Filter symbol names to include only characters that are usually used
for function names, and that might be produced by name mangling.
Replace everything else with a question mark (and all repeated question
marks by a single one), and then truncate to a length of 96 characters.
This should be enough to not only avoid passing user-controlled strings
to tools such as "perf" and "objdump", and make it easier to
disambiguate symbols that might have the same name but different
indices.
* Make symbol cleaning slightly more efficient
* Update symbol names to be closer to what tests expect
* Ensure only alphanumeric ASCII characters are allowed in a symbol name
* Ensure sliced symbol name is within its bounds
* Update test expectations after adding function name to symbol name
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Cranelift: add alignment parameter to stack slots.
Fixes#6716.
Currently, stack slots on the stack are aligned only to a machine-word
boundary. This is insufficient for some use-cases: for example, storing
SIMD data or structs that require a larger alignment.
This PR adds a parameter to the `StackSlotData` to specify alignment,
and the associated logic to the CLIF parser and printer. It updates the
shared ABI code to compute the stackslot layout taking the alignment
into account. In order to ensure the alignment is always a power of two,
it is stored as a shift amount (log2 of actual alignment) in the IR.
* Apply suggestions from code review
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Update filetest.
* Update alignment to ValRaw vector.
* Fix printer test.
* cargo-fmt from suggestion update.
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
This fixes an accidental regression from #8616 where page alignment was
implicitly happening due to how configuration was processed but it
wasn't re-added in the refactoring.
The egraph pass was already doing this, when it ran, and it never adds
any aliases. So do it slightly earlier and unconditionally, and avoid
needing to resolve any aliases during lowering.
Consumption of non-allocatable operands was added in #5253 and #5132,
and removed in #8524 and following PRs. Now they are not only ignored by
regalloc2, but the placeholders that it leaves in the allocation results
are also ignored by Cranelift. So let's stop adding them to the operands
list entirely.
This commit builds on the support from #8448 to remove all blanket impls
from the WASI crates and instead replace them with concrete impls. This
is slightly functionally different from before where impls are now on
trait objects meaning dynamic dispatch is involved where previously
dynamic dispatch was used. That being said the perf hit here is expected
to be negligible-to-nonexistent since the implementations are large
enough that the dynamic dispatch won't be the hot path.
The motivations for this commit are:
* Removes the need for an odd `skip_mut_forwarding_impls` option - but
this'll be left for a bit in case others need it.
* Improves incremental compile time of these crates where the crates
themselves now contain all object code for all of WASI instead of
forcing the final consume to codegen everything (although there's
still a significant amount monomorphized).
* Improves future compatibility with refactorings of
bindgen-generated-traits and such because blanket impls are pretty
hard to work around where concrete impls are easier to reason about
(and document).
The latter is what Wasmtime uses today but it pulls in parsers for all
object formats supported by `object`. In the context of Wasmtime,
however, we know that all objects produced are 64-bit ELF files so
there's no need to pull in, for example, a COFF parser as that'll always
return an error anyway. This commit switches uses of the `object::File`
convenience to `ElfFile64` instead.
* Change `Tunables::static_memory_bound` to bytes
This commit changes the wasm-page-sized `static_memory_bound` field to
instead being a byte-defined unit rather than a page-defined unit. To
accomplish this the field is renamed to `static_memory_reservation` and
all references are updated. This builds on the support from #8608 to
remove another page-based variable from the internals of Wasmtime.
* Fix tests
* Test that wasi file streams can handle read(0)
* Zero-sized reads don't fail for file streams
* Accidentally removed the `read(0)` when refactoring the test
* Allow env/args/preopens to exceed 64k in size
This commit fixes an issue with the wasip1 adapter published with
Wasmtime which current limits the size of environment variables,
arguments, and preopens to not exceed 64k. This bug comes from the fact
that we more-or-less forgot to account for this when designing the
adapter initially. The adapter allocates a single WebAssembly page for
itself but does not have a means of allocating much more than that. It's
technically possible to continue to call `memory.grow` or possibly
`cabi_realloc` from the original main module but it's pretty awkward.
The solution in this commit is to take an alternative approach to how
these properties are all processed. Previously arguments/env
vars/preopens were all allocated once within the adapter and stored
statically. This means that after startup they're copied from where they
reside in-memory, but the downside is that we have to have enough memory
to hold everything. This commit instead tries to "stream" the items so
they're never held entirely within the adapter itself.
The general idea in this commit is to use the "align" parameter to
`cabi_import_realloc` to figure out what's being allocated and route the
allocation to the destination. For example an allocation with alignment
1 is a string and can go directly into a user-supplied pointer where an
allocation with alignment 4 is a pointer-based allocation which must be
stored within the adapter, but only temporarily.
With this redesign it's now possible to have the sum total of
args/envs/preopens to exceed 64k. The new limitation is that the
max-length string plus size of the max length of these arrays must be
less than 64k. This should be a more reasonable limit than before where
any one individual argument/env var is unlikely to exceed 64k (or get
close).
Closes#8556
* Comment descriptors are closed
* Update crates/wasi-preview1-component-adapter/src/descriptors.rs
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* Turn down process limits for macOS
Looks like a 1M env block is a bit too large.
---------
Co-authored-by: Trevor Elliott <awesomelyawesome@gmail.com>
* wasmtime: Make table lazy-init configurable
Lazy initialization of tables has trade-offs that we haven't explored in
a while. Making it configurable makes it easier to test the effects of
these trade-offs on a variety of WebAssembly programs, and allows
embedders to decide whether the trade-offs are worth-while for their use
cases.
* Review comments
This commit aims to address #8607 by dynamically determining whether the
pooling allocator should be used rather than unconditionally using it.
It looks like some systems don't have enough virtual memory to support
the default configuration settings so this should help `wasmtime serve`
work on those systems.
Closes#8607
The toml file specifies version `0.4.1` instead of `0.4.2`.
Using version `0.4.1` produces a compile error:
```
error[E0432]: unresolved import `mach2::ndr`
--> external/crate_index__wasmtime-runtime-20.0.2/src/sys/unix/machports.rs:44:12
|
44 | use mach2::ndr::*;
| ^^^ could not find `ndr` in `mach2`
```
That's because `ndr` was added in version `0.4.2`.
Note that the lock file specifies version `0.4.2` which explains
why this error doesn't happen normally.
This introduces a `DecommitQueue` for batching decommits together in the pooling
allocator:
* Deallocating a memory/table/stack enqueues their associated regions of memory
for decommit; it no longer immediately returns the associated slot to the
pool's free list. If the queue's length has reached the configured batch size,
then we flush the queue by running all the decommits, and finally returning
the memory/table/stack slots to their respective pools and free lists.
* Additionally, if allocating a new memory/table/stack fails because the free
list is empty (aka we've reached the max concurrently-allocated limit for this
entity) then we fall back to a slow path before propagating the error. This
slow path flushes the decommit queue and then retries allocation, hoping that
the queue flush reclaimed slots and made them available for this fallback
allocation attempt. This involved defining a new `PoolConcurrencyLimitError`
to match on, which is also exposed in the public embedder API.
It is also worth noting that we *always* use this new decommit queue now. To
keep the existing behavior, where e.g. a memory's decommits happen immediately
on deallocation, you can use a batch size of one. This effectively disables
queueing, forcing all decommits to be flushed immediately.
The default decommit batch size is one.
This commit, with batch size of one, consistently gives me an increase on
`wasmtime serve`'s requests-per-second versus its parent commit, as measured by
`benches/wasmtime-serve-rps.sh`. I get ~39K RPS on this commit compared to ~35K
RPS on the parent commit. This is quite puzzling to me. I was expecting no
change, and hoping there wouldn't be a regression. I was not expecting a speed
up. I cannot explain this result at this time.
prtest:full
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Change `MemoryStyle::Static` to store bytes, not pages
This commit is inspired by me looking at some configuration in the
pooling allocator and noticing that configuration of wasm pages vs bytes
of linear memory is somewhat inconsistent in `Config`. In the end I'd
like to remove or update the `memory_pages` configuration in the pooling
allocator to being bytes of linear memory instead to be more consistent
with `Config` (and additionally anticipate the custom-page-sizes
wasm proposal where terms-of-pages will become ambiguous). The first
step in this change is to update one of the lowest layered usages of
pages, the `MemoryStyle::Static` configuration.
Note that this is not a trivial conversion because the purpose of
carrying around pages instead of bytes is that bytes may overflow where
overflow-with-pages typically happens during validation. This means that
extra care is taken to handle errors related to overflow to ensure that
everything is still reported at the same time.
* Update crates/wasmtime/src/runtime/vm/instance/allocator/pooling/memory_pool.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Fix tests
* Really fix tests
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This is the final type system change for Wasm GC: the ability to explicitly
declare supertypes and finality. A final type may not be a supertype of another
type. A concrete heap type matches another concrete heap type if its concrete
type is a subtype (potentially transitively) of the other heap type's concrete
type.
Next, I'll begin support for allocating GC structs and arrays at runtime.
I've also implemented `O(1)` subtype checking in the types registry:
In a type system with single inheritance, the subtyping relationships between
all types form a set of trees. The root of each tree is a type that has no
supertype; each node's immediate children are the types that directly subtype
that node.
For example, consider these types:
class Base {}
class A subtypes Base {}
class B subtypes Base {}
class C subtypes A {}
class D subtypes A {}
class E subtypes C {}
These types produce the following tree:
Base
/ \
A B
/ \
C D
/
E
Note the following properties:
1. If `sub` is a subtype of `sup` (either directly or transitively) then
`sup` *must* be on the path from `sub` up to the root of `sub`'s tree.
2. Additionally, `sup` *must* be the `i`th node down from the root in that path,
where `i` is the length of the path from `sup` to its tree's root.
Therefore, if we maintain a vector containing the path to the root for each
type, then we can simply check if `sup` is at index `supertypes(sup).len()`
within `supertypes(sub)`.
* Remove unused generated `add_root_to_linker` method
* WIP: bindgen GetHost
* Compile with Rust 1.78+
Use <https://users.rust-lang.org/t/generic-closure-returns-that-can-capture-arguments/76513/3>
as a guide of how to implement this by making the `GetHost` trait a bit
uglier.
* Add an option to skip `&mut T -> T` impls
Also enable this for WASI crates since they do their own thing with
`WasiView` for now. A future refactoring should be able to remove this
option entirely and switch wasi crates to a new design of `WasiView`.
* Update test expectations
* Review comments
* Undo temporary change
* Handle some TODOs
* Remove no-longer-relevant note
* Fix wasmtime-wasi-http doc link
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
The egraph pass and the dead-code elimination pass both remove
instructions whose results are unused. If the optimization level is
"none", neither pass runs, and if it's anything else both passes run. I
don't think we should do this work twice.
Note that the DCE pass is different than the "eliminate unreachable
code" pass, which removes entire blocks that are unreachable from the
entry block. That pass might still be necessary.
* Move wast tests to their own test suite
This commit moves testing of `*.wast` files out of the `all` test suite
binary and into its own separate binary. The motivation for this is
well-described in #4861 with one of the chief reasons being that if the
test suite is run and then a new file is added re-running the test suite
won't see the file.
The `libtest-mimic` crate provides an easy way of regaining most of the
features of the `libtest` harness such as parallel test execution and
filters, meaning that it's pretty easy to switch everything over. The
only slightly-tricky bit was redoing the filter for whether a test is
ignored or not, but most of the pieces were copied over from the
previous `build.rs` logic.
Closes#4861
* Fix the `all` suite
* Review comments
* Add Android release binaries to CI
This commit is inspired by #6480 and historical asks for Android
binaries. This does the bare minimum necessary to configure C compilers
such that we can produce binaries but I'll admit that I'm no Android
developer myself so I have no idea if these are actually suitable for
use anywhere. Otherwise though this build subsumes the preexisting check
in CI that the build works for Android, so that part is removed too.
This additionally changes how the NDK is managed from before. Previously
a GitHub Action was used to download Java and the NDK and additionally
used the `cargo ndk` subcommand. That's all removed now in favor of
configuring C compilers directly with a pre-installed version of the NDK
which should help reduce the CI dependencies a bit.
* Review comments
* List Android as tier 3 target