This commit switches to add `all-features = true` for both the
`wasmtime` and `wasmtime-environ` crates on docs.rs. I occasionally look
at `wasmtime-environ` online as it can be helpful for exploration and
otherwise the docs are empty by default. Otherwise using `all-features`
for Wasmtime also includes APIs specific to Winch, for example.
* Remove the native ABI calling convention from Wasmtime
This commit proposes removing the "native abi" calling convention used
in Wasmtime. For background this ABI dates back to the origins of
Wasmtime. Originally Wasmtime only had `Func::call` and eventually I
added `TypedFunc` with `TypedFunc::call` and `Func::wrap` for a faster
path. At the time given the state of trampolines it was easiest to call
WebAssembly code directly without any trampolines using the native ABI
that wasm used at the time. This is the original source of the native
ABI and it's persisted over time under the assumption that it's faster
than the array ABI due to keeping arguments in registers rather than
spilling them to the stack.
Over time, however, this design decision of using the native ABI has not
aged well. Trampolines have changed quite a lot in the meantime and it's
no longer possible for the host to call wasm without a trampoline, for
example. Compilations nowadays maintain both native and array
trampolines for wasm functions in addition to host functions. There's a
large split between `Func::new` and `Func::wrap`. Overall, there's quite
a lot of weight that we're pulling for the design decision of using the
native ABI.
Functionally this hasn't ever really been the end of the world.
Trampolines aren't a known issue in terms of performance or code size.
There's no known faster way to invoke WebAssembly from the host (or
vice-versa). One major downside of this design, however, is that
`Func::new` requires Cranelift as a backend to exist. This is due to the
fact that it needs to synthesize various entries in the matrix of ABIs
we have that aren't available at any other time. While this is itself
not the worst of issues it means that the C API cannot be built without
a compiler because the C API does not have access to `Func::wrap`.
Overall I'd like to reevaluate given where Wasmtime is today whether it
makes sense to keep the native ABI trampolines. Sure they're supposed to
be fast, but are they really that much faster than the array-call ABI as
an alternative? This commit is intended to measure this.
This commit removes the native ABI calling convention entirely. For
example `VMFuncRef` is now one pointer smaller. All of `TypedFunc` now
uses `*mut ValRaw` for loads/stores rather than dealing with ABI
business. The benchmarks with this PR are:
* `sync/no-hook/core - host-to-wasm - typed - nop` - 5% faster
* `sync/no-hook/core - host-to-wasm - typed - nop-params-and-results` - 10% slower
* `sync/no-hook/core - wasm-to-host - typed - nop` - no change
* `sync/no-hook/core - wasm-to-host - typed - nop-params-and-results` - 7% faster
These numbers are a bit surprising as I would have suspected no change
in both "nop" benchmarks as well as both being slower in the
params-and-results benchmarks. Regardless it is apparent that this is
not a major change in terms of performance given Wasmtime's current
state. In general my hunch is that there are more expensive sources of
overhead than reads/writes from the stack when dealing with wasm values
(e.g. trap handling, store management, etc).
Overall this commit feels like a large simplification of what we
currently do in `TypedFunc`:
* The number of ABIs that Wasmtime deals with is reduced by one. ABIs
are pretty much always tricky and having fewer moving parts should
help improve the understandability of the system.
* All of the `WasmTy` trait methods and `TypedFunc` infrastructure is
simplified. Traits now work with simple `load`/`store` methods rather
than various other flavors of conversion.
* The multi-return-value handling of the native ABI is all gone now
which gave rise to significant complexity within Wasmtime's Cranelift
translation layer in addition to the `TypedFunc` backing traits.
* This aligns components and core wasm where components always use the
array ABI and now core wasm additionally will always use the array ABI
when communicating with the host.
I'll note that this still leaves a major ABI "complexity" with respect
to native functions do not have a wasm ABI function pointer until
they're "attached" to a `Store` with a `Module`. That's required to
avoid needing Cranelift for creating host functions and that property is
still true today. This is a bit simpler to understand though now that
`Func::new` and `Func::wrap` are treated uniformly rather than one being
special-cased.
* Fix miri unsafety
prtest:full
* Change `Tunables::static_memory_bound` to bytes
This commit changes the wasm-page-sized `static_memory_bound` field to
instead being a byte-defined unit rather than a page-defined unit. To
accomplish this the field is renamed to `static_memory_reservation` and
all references are updated. This builds on the support from #8608 to
remove another page-based variable from the internals of Wasmtime.
* Fix tests
* wasmtime: Make table lazy-init configurable
Lazy initialization of tables has trade-offs that we haven't explored in
a while. Making it configurable makes it easier to test the effects of
these trade-offs on a variety of WebAssembly programs, and allows
embedders to decide whether the trade-offs are worth-while for their use
cases.
* Review comments
* Change `MemoryStyle::Static` to store bytes, not pages
This commit is inspired by me looking at some configuration in the
pooling allocator and noticing that configuration of wasm pages vs bytes
of linear memory is somewhat inconsistent in `Config`. In the end I'd
like to remove or update the `memory_pages` configuration in the pooling
allocator to being bytes of linear memory instead to be more consistent
with `Config` (and additionally anticipate the custom-page-sizes
wasm proposal where terms-of-pages will become ambiguous). The first
step in this change is to update one of the lowest layered usages of
pages, the `MemoryStyle::Static` configuration.
Note that this is not a trivial conversion because the purpose of
carrying around pages instead of bytes is that bytes may overflow where
overflow-with-pages typically happens during validation. This means that
extra care is taken to handle errors related to overflow to ensure that
everything is still reported at the same time.
* Update crates/wasmtime/src/runtime/vm/instance/allocator/pooling/memory_pool.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
* Fix tests
* Really fix tests
---------
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This is the final type system change for Wasm GC: the ability to explicitly
declare supertypes and finality. A final type may not be a supertype of another
type. A concrete heap type matches another concrete heap type if its concrete
type is a subtype (potentially transitively) of the other heap type's concrete
type.
Next, I'll begin support for allocating GC structs and arrays at runtime.
I've also implemented `O(1)` subtype checking in the types registry:
In a type system with single inheritance, the subtyping relationships between
all types form a set of trees. The root of each tree is a type that has no
supertype; each node's immediate children are the types that directly subtype
that node.
For example, consider these types:
class Base {}
class A subtypes Base {}
class B subtypes Base {}
class C subtypes A {}
class D subtypes A {}
class E subtypes C {}
These types produce the following tree:
Base
/ \
A B
/ \
C D
/
E
Note the following properties:
1. If `sub` is a subtype of `sup` (either directly or transitively) then
`sup` *must* be on the path from `sub` up to the root of `sub`'s tree.
2. Additionally, `sup` *must* be the `i`th node down from the root in that path,
where `i` is the length of the path from `sup` to its tree's root.
Therefore, if we maintain a vector containing the path to the root for each
type, then we can simply check if `sup` is at index `supertypes(sup).len()`
within `supertypes(sub)`.
* wasmtime(gc): Fix wasm-to-native trampoline lookup for subtyping
Previously, we would look up a wasm-to-native trampoline in the Wasm module
based on the host function's type. With Wasm GC and subtyping, this becomes
problematic because a Wasm module can import a function of type `T` but the host
can define a function of type `U` where `U <: T`. And if the Wasm has never
defined type `U` then it wouldn't have a trampoline for it. But our trampolines
don't actually care, they treat all reference values within the same type
hierarchy identically. So the trampoline for `T` would have worked in
practice. But once we find a trampoline for a function, we cache it and reuse it
every time that function is used in the same store again. Even if the function
is imported with its precise type somewhere else. So then we would have a
trampoline of the wrong type. But this happened to be okay in practice because
the trampolines happen not to inspect their arguments or do anything with them
other than forward them between calling convention locations. But relying on
that accidental invariant seems fragile and like a gun aimed at the future's
feet.
This commit makes that invariant non-accidental, centering it and hopefully
making it less fragile by doing so, by making every function type have an
associated "trampoline type". A trampoline type is the original function type
but where all the reference types in its params and results are replaced with
the nullable top versions, e.g. `(ref $my_struct)` is replaced with `(ref null
any)`. Often a function type is its own associated trampoline type, as is the
case for all functions that don't have take or return any references, for
example. Then, all trampoline lookup begins by first getting the trampoline type
of the actual function type, or actual import type, and then only afterwards
finding for the pre-compiled trampoline in the Wasm module.
Fixes https://github.com/bytecodealliance/wasmtime/issues/8432
Co-Authored-By: Jamey Sharp <jsharp@fastly.com>
* Fix no-std build
---------
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
* Always fall back to custom platform for Wasmtime
This commit updates Wasmtime's platform support to no longer require an
opt-in `RUSTFLAGS` `--cfg` flag to be specified. With `no_std` becoming
officially supported this should provide a better onboarding experience
where the fallback custom platform is used. This will cause linker
errors if the symbols aren't implemented and searching/googling should
lead back to our docs/repo (eventually, hopefully).
* Change Wasmtime's TLS state to a single pointer
This commit updates the management of TLS to rely on just a single
pointer rather than a pair of a pointer and a `bool`. Additionally
management of the TLS state is pushed into platform-specific modules to
enable different means of managing it, namely the "custom" platform now
has a C function required to implement TLS state for Wasmtime.
* Delay conversion to `Instant` in atomic intrinsics
The `Duration` type is available in `no_std` but the `Instant` type is
not. The intention is to only support the `threads` proposal if `std` is
active but to assist with this split push the `Duration` further into
Wasmtime to avoid using a type that can't be mentioned in `no_std`.
* Gate more parts of Wasmtime on the `profiling` feature
Move `serde_json` to an optional dependency and gate the guest profiler
entirely on the `profiling` feature.
* Refactor conversion to `anyhow::Error` in `wasmtime-environ`
Have a dedicated trait for consuming `self` in addition to a
`Result`-friendly trait.
* Gate `gimli` in Wasmtime on `addr2line`
Cut down the dependency list if `addr2line` isn't enabled since then
the dependency is not used. While here additionally lift the version
requirement for `addr2line` up to the workspace level.
* Update `bindgen!` to have `no_std`-compatible output
Pull most types from Wasmtime's `__internal` module as the source of
truth.
* Use an `Option` for `gc_store` instead of `OnceCell`
No need for synchronization here when mutability is already available in
the necessary contexts.
* Enable embedder-defined host feature detection
* Add `#![no_std]` support to the `wasmtime` crate
This commit enables compiling the `runtime`, `gc`, and `component-model`
features of the `wasmtime` crate on targets that do not have `std`. This
tags the crate as `#![no_std]` and then updates everything internally to
import from `core` or `alloc` and adapt for the various idioms. This
ended up requiring some relatively extensive changes, but nothing too
too bad in the grand scheme of things.
* Require `std` for the perfmap profiling agent
prtest:full
* Fix build on wasm
* Fix windows build
* Remove unused import
* Fix Windows/Unix build without `std` feature
* Fix some doc links
* Remove unused import
* Fix build of wasi-common in isolation
* Fix no_std build on macos
* Re-fix build
* Fix standalone build of wasmtime-cli-flags
* Resolve a merge conflict
* Review comments
* Remove unused import
* Call-indirect caching: protect against out-of-bounds table index during prescan.
Call-indirect caching requires a "prescan" of a module's code section
during compilation in order to know which tables are possibly written,
and to count call-indirect callsites so that each separate function
compilation can enable caching if possible and use the right slot
range.
This prescan is not integrated with the usual validation logic (nor
should it be, probably, for performance), so it's possible that an
out-of-bounds table index or other illegal instruction could be
present. We previously indexed into an internal data
structure (`table_plans`) with this index, allowing for a compilation
panic on certain invalid modules before validation would have caught
it. This PR fixes that with a one-off check at the single point that
we interpret a parameter (the table index) from an instruction during
the prescan.
* Add test case.
This commit fixes a fuzz bug that popped up where the
`cache_call_indirects` feature wasn't reading memory64-based offsets
correctly. This is due to (not great) API design in `wasmparser`
(introduced by me) where wasmparser by default won't read 64-bit offsets
unless explicitly allowed to. This is to handle how spec tests assert
that overlong 32-bit encodings are invalid as the memory64 proposal
isn't merged into the spec yet.
The fix here is to call `allow_memarg64` with whether memory64 is
enabled or not and then that'll enable reading these overlong and/or
larger offsets correctly.
Similar to https://github.com/bytecodealliance/wasmtime/pull/8481 but for struct
types instead of array types.
Note that this is support for only defining these types in Wasm or the host; we
don't support allocating instances of these types yet. That will come in follow
up PRs.
* Migrate the `wasmtime-environ` crate to `no_std`
This commit migrates the `wasmtime-environ` crate to by default being
tagged with `#![no_std]`. Only the `component-model` and `gc` features
are able to be built without `std`, all other features will implicitly
activate the `std` feature as they currently require it one way or
another. CI is updated to build `wasmtime-environ` with these two
features active on a no_std platform.
This additionally, for the workspace, disables the `std` feature for the
`target-lexicon`, `indexmap`, `object`, and `gimli` dependencies. For
object/gimli all other crates in the workspace now enable the `std`
feature, but for `wasmtime-environ` this activation is omitted.
The `thiserror` dependency was dropped from `wasmtime-environ` and
additionally `hashbrown` was added for explicit usage of maps.
* Always enable `std` for environ for now
prtest:full
* Add some more std features
* Migrate the wasmtime-types crate to no_std
This commit is where no_std for Wasmtime starts to get a bit
interesting. Specifically the `wasmtime-types` crate is the first crate
that depends on some nontrivial crates that also need to be migrated to
`no_std`. This PR disables the default feature of `wasmparser` by
default and additionally does the same for `serde`. This enables them to
compile in `no_std` contexts by default and default features will be
enabled elsewhere in this repository as necessary.
This also opts to drop the `thiserror` dependency entirely in favor of a
manual `Display` implementation with a cfg'd implementation of `Error`.
As before CI checks are added for `wasmtime-types` with a `no_std`
target itself to ensure the crate and all dependencies all avoid `std`.
* Fix adapter build
* Wasmtime: add one-entry call-indirect caching.
In WebAssembly, an indirect call is somewhat slow, because of the
indirection required by CFI (control-flow integrity) sandboxing. In
particular, a "function pointer" in most source languages compiled to
Wasm is represented by an index into a table of funcrefs. The
`call_indirect` instruction then has to do the following steps to invoke
a function pointer:
- Load the funcref table's base and length values from the vmctx.
- Bounds-check the invoked index against the actual table size; trap if
out-of-bounds.
- Spectre mitigation (cmove) on that bounds-check.
- Load the `vmfuncref` from the table given base and index.
- For lazy table init, check if this is a non-initialized funcref
pointer, and initialize the entry.
- Load the signature from the funcref struct and compare it against the
`call_indirect`'s expected signature; trap if wrong.
- Load the actual code pointer for the callee's Wasm-ABI entry point.
- Load the callee vmctx (which may be different for a cross-module
call).
- Put that vmctx in arg 0, our vmctx in arg 1, and invoke the loaded
code pointer with an indirect call instruction.
Compare and contrast to the process involved in invoking a native
function pointer:
- Invoke the code pointer with an indirect call instruction.
This overhead buys us something -- it is part of the SFI sandbox
boundary -- but it is very repetitive and unnecessary work in *most*
cases when indirect function calls are performed repeatedly (such as
within an inner loop).
This PR introduces the idea of *caching*: if we know that the result of
all the above checks won't change, then if we use the same index as "the
last time" (for some definition), we can skip straight to the "invoke
the code pointer" step, with a cached code pointer from that last time.
Concretely, it introduces a two-word struct inlined into the vmctx for
each `call_indirect` instruction in the module (up to a limit):
- The last invoked index;
- The code pointer that index corresponded to.
When compiling the module, we check whether the table could possibly be
mutable at a given index once read: any instructions like `table.set`,
or the whole table exported thus writable from the outside. We also
check whether index 0 is a non-null funcref. If neither of these things
are true, then we know we can cache an index-to-code-pointer mapping,
and we know we can use index 0 as a sentinel for "no cached value".
We then make use of the struct for each indirect call site and generate
code to check if the index matches; if so, call cached pointer; if not,
load the vmfuncref, check the signature, check that the callee vmctx is
the same as caller (intra-module case), and stash the code pointer and
index away (fill the cache), then make the call.
On an in-development branch of SpiderMonkey-in-Wasm with ICs (using
indirect calls), this is about a 20% speedup; I haven't yet measured on
other benchmarks. It is expected that this might be an
instantiation-time slowdown due to a larger vmctx (but we could use
madvise to zero if needed).
This feature is off by default right now.
* Addressed review feedback.
* Added some more comments.
* Allow unused VMCallIndirectCache struct (defined for parity with other bits but not needed in actual runtime).
* Add a limit to the number of call-indirect cache slots.
* Fix merge conflict: handle ConstOp element offset.
* Review feedback.
* wasmtime: Use ConstExpr for element segment offsets
This shouldn't change any behavior currently, but prepares us for
supporting extended constant expressions.
* Fix clippy::cast_sign_loss lint
* Expose `wasmtime-runtime` as `crate::runtime::vm` internally for the `wasmtime` crate
* Rewrite uses of `wasmtime_runtime` to `crate::runtime::vm`
* Remove dep on `wasmtime-runtime` from `wasmtime-cli`
* Move the `wasmtime-runtime` crate into the `wasmtime::runtime::vm` module
* Update labeler for merged crates
* Fix `publish verify`
prtest:full
This commit adds support for defining array types from Wasm or the host, and
managing them inside the engine's types registry. It does not introduce support
for allocating or manipulating array values. That functionality will come in
future pull requests.
* Gate type-builder types from `wasmtime-environ` on `compile`
This commit gates the `*Builder` types related to building sets of types
in the `wasmtime-environ` crate on the `compile` feature. This helps
bring in less code when the feature is disabled and helps exclude some
dependencies for the upcoming `no_std` migration as well.
This commit doesn't change anything, it's just moving code around.
* Remove no-longer-needed import
prtest:full
* Start migrating some Wasmtime crates to no_std
This commit is the first in what will be multiple PRs to migrate
Wasmtime to being compatible with `#![no_std]`. This work is outlined
in #8341 and the rough plan I have in mind is to go on a crate-by-crate
basis and use CI as a "ratchet" to ensure that `no_std` compat is
preserved. In that sense this PR is a bit of a template for future PRs.
This PR migrates a few small crates to `no_std`, basically those that
need no changes beyond simply adding the attribute. The nontrivial parts
introduced in this PR are:
* CI is introduced to verify that a subset of crates can indeed be
built on a `no_std` target. The target selected is
`x86_64-unknown-none` which is known to not have `std` and will result
in a build error if it's attempted to be used.
* The `anyhow` crate, which `wasmtime-jit-icache-coherence` now depends
on, has its `std` feature disabled by default in Wasmtime's workspace.
This means that some crates which require `std` now need to explicitly
enable the feature, but it means that by-default its usage is
appropriate for `no_std`.
The first point should provide CI checks that compatibility with
`no_std` indeed works, at least from an "it compiles" perspective. Note
that it's not sufficient to test with a target like
`x86_64-unknown-linux-gnu` because `extern crate std` will work on that
target, even when `#![no_std]` is active.
The second point however is likely to increase maintenance burden
in Wasmtime unfortunately. Namely we'll inevitably, either here or in
the future, forget to turn on some feature for some crate that's not
covered in CI checks. While I've tried to do my best here in covering it
there's no guarantee that everything will work and the combinatorial
explosion of what could be checked in CI can't all be added to CI.
Instead we'll have to rely on bug fixes, users, and perhaps point
releases to add more use cases to CI over time as we see fit.
* Add another std feature
* Another std feature
* Enable anyhow/std for another crate
* Activate `std` in more crates
* Fix miri build
* Fix compile on riscv64
prtest:full
* Fix min-platform example build
* Fix icache-coherence again
Wasmtime and Cranelift have a few miscellaenous use cases for "just take
this Rust type and make it bytes", for example Wasmtime's serialization
of internal metadata into a compiled module. Previously Wasmtime used
the `bincode` crate for performing these tasks as the format was
generally optimized to be small and fast, not general purpose (e.g.
JSON). The `bincode` crate on crates.io doesn't work on `no_std`,
however, and with the work in #8341 that's an issue now for Wasmtime.
This crate switches instead to the `postcard` crate. This crate is
listed in Serde's documentation as:
> Postcard, a no_std and embedded-systems friendly compact binary
> format.
While I've not personally used it before it checks all the boxes we
relied on `bincode` for and additionally works with `no_std`. After
auditing the crate this commit then switches out Wasmtime's usage of
`bincode` for `postcard` throughout the repository.
* Rename `WasmHeapType::Concrete(_)` to `WasmHeapType::ConcreteFunc(_)`
* Rename `wasmtime::HeapType::Concrete` to `wasmtime::HeapType::ConcreteFunc`
* Introduce Wasm sub- and composite-types
Right now, these are only ever final function types that don't have a supertype,
but this refactoring paves the way for array and struct types, and lets us make
sure that `match`es are exhaustive for when we add new enum variants. (Although
I did add an `unwrap_func` helper for use when it is clear that the type should
be a function type, and if it isn't then we should panic.)
We had something hacked together to support `(ref.i31 (i32.const N))`. It wasn't
a long-term solution. This is the first time that we have to really deal with
multi-instruction const expressions.
This commit introduces a tiny interpreter to evaluate const expressions.
* add cloning for String attributes
* use into_owned instead of to_vec to avoid a clone if possible.
* runs, but does not substitute
* show vars from c program, start cleanup
* tiday
* resolve conflicts
* remove WASI folder
* Add module_builder
add dwarf_package to state for cache
* move dwarf loading to module_environ.rs
pass the dwarf as binary as low as module_environ.rs
* pass dwarf package rather than add to debug_info
* tidy option/result nested if
* revert some toml and whitespace.
* add features cranelift,winch to module_builder and compute_artifacts
remove some `use`s
* address some feedback
remove unused 'use's
* address some feedback
remove unused 'use's
* move wat feature condition to cover whole method.
* More feedback
Another try at wat feature move
* Another try at wat feature move
* change gimli exemption version
add typed-arena exemption
* add None for c-api
* move `use` to #cfg
* fix another config build
* revert unwanted code deletion
* move inner function closer to use
* revert extra param to Module::new
* workaround object crate bug.
* add missing parameter
* add missing parameter
* Merge remote-tracking branch 'origin/main' into dwarf-att-string
# Conflicts:
# crates/wasmtime/src/engine.rs
# crates/wasmtime/src/runtime/module.rs
# src/common.rs
* remove moduke
* use common gimli version of 28.1
* remove wasm feature, revert gimli version
* remove use of object for wasm dwarf
* remove NativeFile workaround, add feature for dwp loading
* sync winch signature
* revert bench api change
* add dwarf for no cache feature
* put back merge loss of module kind
* remove param from docs
* add dwarf fission lldb test
* simplify and include test source
* clang-format
* address feedback, remove packages
add docs
simplify return type
* remove Default use on ModuleTypesBuilder
* Remove an `unwrap()` and use `if let` instead
* Use `&[u8]` instead of `&Vec<u8>`
* Remove an `unwrap()` and return `None` instead
* Clean up some code in `transform_dwarf`
* Clean up some code in `replace_unit_from_split_dwarf`
* Clean up some code in `split_unit`
* Minor refactorings and documentation in `CodeBuilder`
* Restrict visibility of `dwarf_package_binary`
* Revert supply-chain folder changes
* Fix compile error on nightly
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* prtest:full
* use lldb 15
* prtest:full
* prtest:full
* load dwp when loading wasm bytes with path
* correct source file name
* remove debug
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
\### The `GcRuntime` and `GcCompiler` Traits
This commit factors out the details of the garbage collector away from the rest
of the runtime and the compiler. It does this by introducing two new traits,
very similar to a subset of [those proposed in the Wasm GC RFC], although not
all equivalent functionality has been added yet because Wasmtime doesn't
support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run
collections within them, and execute the various GC barriers the collector
requires.
Rather than monomorphize all of Wasmtime on this trait, we use it
as a dynamic trait object. This does imply some virtual call overhead and
missing some inlining (and resulting post-inlining) optimization
opportunities. However, it is *much* less disruptive to the existing embedder
API, results in a cleaner embedder API anyways, and we don't believe that VM
runtime/embedder code is on the hot path for working with the GC at this time
anyways (that would be the actual Wasm code, which has inlined GC barriers
and direct calls and all of that). In the future, once we have optimized
enough of the GC that such code is ever hot, we have options we can
investigate at that time to avoid these dynamic virtual calls, like only
enabling one single collector at build time and then creating a static type
alias like `type TheOneGcImpl = ...;` based on the compile time
configuration, and using this type alias in the runtime rather than a dynamic
trait object.
The `GcRuntime` trait additionally defines a method to reset a GC heap, for
use by the pooling allocator. This allows reuse of GC heaps across different
stores. This integration is very rudimentary at the moment, and is missing
all kinds of configuration knobs that we should have before deploying Wasm GC
in production. This commit is large enough as it is already! Ideally, in the
future, I'd like to make it so that GC heaps receive their memory region,
rather than allocate/reserve it themselves, and let each slot in the pooling
allocator's memory pool be *either* a linear memory or a GC heap. This would
unask various capacity planning questions such as "what percent of memory
capacity should we dedicate to linear memories vs GC heaps?". It also seems
like basically all the same configuration knobs we have for linear memories
apply equally to GC heaps (see also the "Indexed Heaps" section below).
2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements
GC barriers for various operations on GC-managed references. The Rust code
calls into this trait dynamically via a trait object, but since it is
customizing the CLIF that is generated for Wasm code, the Wasm code itself is
not making dynamic, indirect calls for GC barriers. The `GcCompiler`
implementation can inline the parts of GC barrier that it believes should be
inline, and leave out-of-line calls to rare slow paths.
All that said, there is still only a single implementation of each of these
traits: the existing deferred reference-counting (DRC) collector. So there is a
bunch of code motion in this commit as the DRC collector was further isolated
from the rest of the runtime and moved to its own submodule. That said, this was
not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply
skipping over the DRC collector's code in review.
\### Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all
collector implementations, such as a shared `VMGcHeader` that all objects
allocated within a GC heap must begin with, but the most notable and
far-reaching of these assumptions is that all collectors will use "indexed
heaps".
What we are calling indexed heaps are basically the three following invariants:
1. All GC heaps will be a single contiguous region of memory, and all GC objects
will be allocated within this region of memory. The collector may ask the
system allocator for additional memory, e.g. to maintain its free lists, but
GC objects themselves will never be allocated via `malloc`.
2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into
the GC heap's contiguous region of memory. We never hold raw pointers to GC
objects (although, of course, we have to compute them and use them
temporarily when actually accessing objects). This means that deref'ing GC
pointers is equivalent to deref'ing linear memory pointers: we need to add a
base and we also check that the GC pointer/index is within the bounds of the
GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly
common technique among high-performance GC
implementations[^compressed-oops][^v8-ptr-compression] so we are in good
company.
3. Anything stored inside the GC heap is untrusted. Even each GC reference that
is an element of an `(array (ref any))` is untrusted, and bounds checked on
access. This means that, for example, we do not store the raw pointer to an
`externref`'s host object inside the GC heap. Instead an `externref` now
stores an ID that can be used to index into a side table in the store that
holds the actual `Box<dyn Any>` host object, and accessing that side table is
always checked.
[^compressed-oops]: See ["Compressed OOPs" in
OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops)
[^v8-ptr-compression]: See [V8's pointer
compression](https://v8.dev/blog/pointer-compression).
The good news with regards to all the bounds checking that this scheme implies
is that we can use all the same virtual memory tricks that linear memories use
to omit explicit bounds checks. Additionally, (2) means that the sizes of GC
objects is that much smaller (and therefore that much more cache friendly)
because they are only holding onto 32-bit, rather than 64-bit, references to
other GC objects. (We can, in the future, support GC heaps up to 16GiB in size
without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment
and storing aligned indices rather than byte indices, while still leaving the
bottom bit available for tagging as an `i31ref` discriminant. Should we ever
need to support even larger GC heap capacities, we could go to full 64-bit
references, but we would need explicit bounds checks.)
The biggest benefit of indexed heaps is that, because we are (explicitly or
implicitly) bounds checking GC heap accesses, and because we are not otherwise
trusting any data from inside the GC heap, we greatly reduce how badly things
can go wrong in the face of collector bugs and GC heap corruption. We are
essentially sandboxing the GC heap region, the same way that linear memory is a
sandbox. GC bugs could lead to the guest program accessing the wrong GC object,
or getting garbage data from within the GC heap. But only garbage data from
within the GC heap, never outside it. The worse that could happen would be if we
decided not to zero out GC heaps between reuse across stores (which is a valid
trade off to make, since zeroing a GC heap is a defense-in-depth technique
similar to zeroing a Wasm stack and not semantically visible in the absence of
GC bugs) and then a GC bug would allow the current Wasm guest to read old GC
data from the old Wasm guest that previously used this GC heap. But again, it
could never access host data.
Taken altogether, this allows for collector implementations that are nearly free
from `unsafe` code, and unsafety can otherwise be targeted and limited in scope,
such as interactions with JIT code. Most importantly, we do not have to maintain
critical invariants across the whole system -- invariants which can't be nicely
encapsulated or abstracted -- to preserve memory safety. Such holistic
invariants that refuse encapsulation are otherwise generally a huge safety
problem with GC implementations.
\### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore
`VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here
was to be sure that I was actually calling GC barriers at all the correct
places. I couldn't be sure before. Now, you can still explicitly copy a raw GC
reference without running GC barriers if you need to and understand why that's
okay (aka you are implementing the collector), but that is something you have to
opt into explicitly by calling `unchecked_copy`. The default now is that you
can't just copy the reference, and instead call an explicit `clone` method (not
*the* `Clone` trait, because we need to pass in the GC heap context to run the
GC barriers) and it is hard to forget to do that accidentally. This resulted in
a pretty big amount of churn, but I am wayyyyyy more confident that the correct
GC barriers are called at the correct times now than I was before.
\### `i31ref`
I started this commit by trying to add `i31ref` support. And it grew into the
whole traits interface because I found that I needed to abstract GC barriers
into helpers anyways to avoid running them for `i31ref`s, so I figured that I
might as well add the whole traits interface. In comparison, `i31ref` support is
much easier and smaller than that other part! But it was also difficult to pull
apart from this commit, sorry about that!
---------------------
Overall, I know this is a very large commit. I am super happy to have some
synchronous meetings to walk through this all, give an overview of the
architecture, answer questions directly, etc... to make review easier!
prtest:full
* Lift all serde deps to the workspace level
Deduplicate some versions mentioned throughout crates in the workspace.
* Lift `bincode` deps to the workspace configuration level
Deduplicate some mentioned versions throughout.
* Lift libc deps up to the workspace root
As with prior commits, deduplicate some versions mentioned.
* Gate support for the wasm `threads` proposal behind a Cargo feature
This commit moves support for the `threads` proposal behind a new
on-by-default Cargo feature: `threads`. This is intended to support
building Wasmtime with fewer runtime dependencies such as those required
for the atomic operations on memories.
This additionally adds the `gc` feature in a few missing places too.
* Fix compile of C API without threads
* Add a `compile` feature to `wasmtime-environ`
This commit adds a compile-time feature to remove some dependencies of
the `wasmtime-environ` crate. This compiles out support for compiling
modules/components and makes the crate slimmer in terms of amount of
code compiled along with its dependencies. Much of this should already
have been statically removed by native linkers so this likely won't have
any compile-size impact, but it's a nice-to-have in terms of
organization.
This has a fair bit of shuffling around of code, but apart from
renamings and movement there are no major changes here.
* Fix compile issue
* Gate `ModuleTranslation` and its methods on `compile`
* Fix doc link
* Fix doc link
This commit slightly improves the error message of feeding a component
into a module-taking API. This at least tries to convey that a component
was recognized but a module was expected.
* Exit through Cranelift-generated trampolines for builtins
This commit changes how builtin functions in Wasmtime (think
`memory.grow`) are implemented. These functions are required to exit
through some manner of trampoline to handle runtime requirements for
backtracing right now. Currently this is done via inline assembly for
each architecture (or external assembly for s390x). This is a bit
unfortunate as it's a lot of hand-coding and making sure everything is
right, and it's not easy to update as it's multiple platforms to update.
The change in this commit is to instead use Cranelift-generated
trampolines for this purpose instead. The path for invoking a builtin
function now looks like:
* Wasm code calls a statically known symbol for each builtin.
* The statically known symbol will perform exit trampoline duties (e.g.
pc/fp/etc) and then load a function pointer to the host
implementation.
* The host implementation is invoked and then proceeds as usual.
The main new piece for this PR is that all wasm modules and functions
are compiled in parallel but an output of this compilation phase is what
builtin functions are required. All builtin functions are then unioned
together into one set and then anything required is generated just
afterwards. That means that only one builtin-trampoline per-module is
generated per-builtin.
This work is inspired by #8135 and my own personal desire to have as
much about our ABI details flowing through Cranelift as we can. This in
theory makes it more flexible to deal with future improvements to our
ABI.
prtest:full
* Fix some build issues
* Update winch test expectations
* Update Winch to use new builtin shims.
This commit refactors the Winch compiler to use the new trampolines for
all Wasmtime builtins created in the previous commits. This required a
fair bit of refactoring to handle plumbing through a new kind of
relocation and function call.
Winch's `FuncEnv` now contains a `PrimaryMap` from `UserExternalNameRef`
to `UserExternalName`. This is because there's now more than one kind of
name than just wasm function relocations, so the raw index space of
`UserExternalNameRef` is no longer applicable. This required threading
`FuncEnv` to more locations along with some refactorings to ensure that
lifetimes work out ok.
The `CompiledFunction` no longer stores a trait object of how to map
name refs to names and now directly has a `Primarymap`. This also means
that Winch's return value from its `TargetIsa` is a `CompiledFunction`
as opposed to the previous just-a-`MachBuffer` so it can also package up
all the relocation information. This ends up having `winch-codegen`
depend on `wasmtime-cranelift-shared` as a new dependency.
* Review feedback
Currently even when the `wmemcheck` Cargo feature is disabled the
various related libcalls are still compiled into `wasmtime-runtime`.
Additionally their signatures are translated when lowering functions,
although the signatures are never used. This commit adds `#[cfg]`
annotations to compile these all out when they're not enabled.
Applying this change, however, uncovered a subtle bug in our libcalls.
Libcalls are numbered in-order as-listed in the macro ignoring `#[cfg]`,
but they're assigned a runtime slot in a `VMBuiltinFunctionsArray`
structure which does respect `#[cfg]`. This meant, for example, that if
`gc` was enabled and `wmemcheck` was disabled, as is the default for our
tests, then there was a hole in the numbering where libcall numbers were
mismatched at runtime and compile time.
To fix this I've first added a const assertion that the runtime-number
of libcalls equals the build-time number of libcalls. I then updated the
macro a bit to plumb the `#[cfg]` differently and now libcalls are
unconditionally defined regardless of cfgs but the implementation is
`std::process::abort()` if it's compiled out.
This ended up having a large-ish impact on the `disas` test suite. Lots
of functions have fewer signatures translation because wmemcheck, even
when disabled, was translating a few signatures. This also had some
assembly changes, too, because I believe functions are considered leaves
based on whether they declare a signature or not, so declaring an unused
signature was preventing all wasm functions from being considered leaves.
* Don't lookup trap codes twice on traps
Currently whenever a signal or trap is handled in Wasmtime we perform
two lookups of the trap code. One during the trap handling itself to
ensure we can handle the trap, and then a second once the trap is
handled. There's not really any need to do two here, however, as the
result of the first can be carried over to the second.
While I was here refactoring things I also changed how some return
values are encoded, such as `take_jmp_buf_if_trap` now returns a more
self-descriptive enum.
* Fix dead code warning on MIRI
* Update crates/environ/src/trap_encoding.rs
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
* Fix min-platform build
---------
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
* Use wasmtime-cranelift compiler for winch trampolines
* Plumb the winch calling convention through Tunables
* Guard setting the winch_callable tunable
* Allow the trampolines field to be unused when component-model is disabled
This commit is a large refactor of the `Val` type as used with
components to remove inherent type information present currently. The
`Val` type is now only an AST of what a component model value looks like
and cannot fully describe the type that it is without further context.
For example enums only store the case that's being used, not the full
set of cases.
The motivation for this commit is to make it simpler to use and
construct `Val`, especially in the face of resources. Some problems
solved here are:
* With resources in play managing type information is not trivial and
can often be surprising. For example if you learn the type of a
function from a component and the instantiate the component twice the
type information is not suitable to use with either function due to
exported resources acquiring unique types on all instantiations.
* Functionally it's much easier to construct values when type
information is not required as it no longer requires probing various
pieces for type information here and there.
* API-wise there's far less for us to maintain as there's no need for a
type-per-variant of component model types. Pieces now fit much more
naturally into a `Val` shape without extra types.
* Functionally when working with `Val` there's now only one typecheck
instead of two. Previously a typecheck was performed first when a
`Val` was created and then again later when it was passed to wasm. Now
the typecheck only happens when passed to wasm.
It's worth pointing out that `Val` as-is is a pretty inefficient
representation of component model values, for example flags are stored
as a list of strings. While semantically correct this is quite
inefficient for most purposes other than "get something working". To
that extent my goal is to, in the future, add traits that enable
building a custom user-defined `Val` (of sorts), but still dynamically.
This should enable embedders to opt-in to a more efficient
representation that relies on contextual knowledge.
* Fix `wasmtime settings` command.
Currently the `settings` command panics because the tunables are not set in the
compiler builder.
This commit creates a default tunables based on either the target triple passed
on the command line or uses the default for the host.
Fixes#8058.
* Additional clean up.
This commit updates Wasmtime to support `global.get` in constant
expressions when located in table initializers and element segments.
Pre-reference-types this never came up because there was no valid
`global.get` that would typecheck. After the reference-types proposal
landed however this became possible but Wasmtime did not support it.
This was surfaced in #6705 when the spec test suite was updated and has
a new test that exercises this functionality.
This commit both updates the spec test suite and additionally adds
support for this new form of element segment and table initialization
expression.
The fact that Wasmtime hasn't supported this until now also means that
we have a gap in our fuzz-testing infrastructure. The `wasm-smith`
generator is being updated in bytecodealliance/wasm-tools#1426 to
generate modules with this particular feature and I've tested that with
that PR fuzzing here eventually generates an error before this PR.
Closes#6705