This moves Wasmtime over from the old, regalloc-based stack maps system to the
new "user" stack maps system.
Removing the old regalloc-based stack maps system is left for follow-up work.
Right now this is only on some crates such as `wasmtime` itself and
`wasmtime-cli`, but by applying it to all crates it helps with version
selection of those using just Cranelift for example.
* Derive `Copy` for `Val`
* Fix `clippy::clone_on_copy` for the whole repo
* Enforce `clippy::clone_on_copy` for the workspace
* fix some more clippy::clone_on_copy that got missed
* Fix test of wasmtime-fuzzing with `fuzz-pcc` enabled
Allow pcc errors when generating fuzz modules since it's currently gated
as the errors haven't all been weeded out.
* Gate wmemcheck instrumentation more in `wasmtime-cranelift`
This fixes an issue where when the `wmemcheck` feature was enabled it
would unconditionally enable the `wmemcheck`-related instrumentation
around `malloc` and `free`. The fix is to additionally check the runtime
`wmemcheck` setting as well.
Closes#8997
* Forward the `wmemcheck` feature from `wasmtime-cranelift` to `wasmtime-environ`
Not necessarily strictly required but it should help prevent issues
such as #8997 in the future.
* Update wasm-tools and wit-bindgen crates
This keeps things updated and pulls in some new features from
wasm-tools:
* Multiple returns in the component model are now gated by default. This
is reflected in a new `Config` option and CLI flag. The hope is to
remove this feature is no one ends up needing it.
* Support for more `shared` things were added but the feature is always
disabled so internal handlers just panic.
Tests were updated to not use multiple returns where appropriate, except
for one test which is specifically testing for multiple returns and how
it's reflected into the Rust type system.
* Fix fuzz build
In the original development of this feature, guided by JS AOT
compilation to Wasm of a microbenchmark heavily focused on IC sites, I
was seeing a ~20% speedup. However, in more recent measurements, on full
programs (e.g., the Octane benchmark suite), the benefit is more like
5%.
Moreover, in #8870, I attempted to switch over to a direct-mapped cache,
to address a current shortcoming of the design, namely that it has a
hard-capped number of callsites it can apply to (50k) to limit impact on
VMContext struct size. With all of the needed checks for correctness,
though, that change results in a 2.5% slowdown relative to no caching at
all, so it was dropped.
In the process of thinking through that, I discovered the current design
on `main` incorrectly handles null funcrefs: it invokes a null code pointer,
rather than loading a field from a null struct pointer. The latter was
specifically designed to cause the necessary Wasm trap in #8159, but I
had missed that the call to a null code pointer would not have the same
effect. As a result, we actually can crash the VM (safely at least, but
still no good vs. a proper Wasm trap!) with the feature enabled. (It's
off by default still.) That could be fixed too, but at this point with
the small benefit on real programs, together with the limitation on
module size for full benefit, I think I'd rather opt for simplicity and
remove the cache entirely.
Thus, this PR removes call-indirect caching. It's not a direct revert
because the original PR refactored the call-indirect generation into
smaller helpers and IMHO it's a bit nicer to keep that. But otherwise
all traces of the setting, code pre-scan during compilation and special
conditions tracked on tables, and codegen changes are gone.
* upgrade to wasm-tools 0.211.1
* code review
* cargo vet: auto imports
* fuzzing: fix wasm-smith changes
* fuzzing: changes for HeapType
* Configure features on `Parser` when parsing
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Disable memory protection keys by default at compile time
This commit gates memory protection keys behind a new Cargo feature
which is disabled by default. Memory protection keys are already
disabled by default on all platforms and are only configured to possibly
work with Linux x64. When enabled, however, it unconditionally adds a
small amount of overhead to WebAssembly entries/exits even if the
feature is disabled at runtime for the same reason that the `call-hook`
feature adds overhead. With `call-hook` being disabled by default
in #8808 it seemed reasonable to additionally gate memory protection
keys to avoid needing to disable features in Wasmtime to get the best
performance wasm<->host calls.
* Enable Wasmtime feature for fuzzing
* Wasmtime: Implement the custom-page-sizes proposal
This commit adds support for the custom-page-sizes proposal to Wasmtime:
https://github.com/WebAssembly/custom-page-sizes
I've migrated, fixed some bugs within, and extended the `*.wast` tests for this
proposal from the `wasm-tools` repository. I intend to upstream them into the
proposal shortly.
There is a new `wasmtime::Config::wasm_custom_page_sizes_proposal` method to
enable or disable the proposal. It is disabled by default.
Our fuzzing config has been updated to turn this feature on/off as dictated by
the arbitrary input given to us from the fuzzer.
Additionally, there were getting to be so many constructors for
`wasmtime::MemoryType` that I added a builder rather than add yet another
constructor.
In general, we store the `log2(page_size)` rather than the page size
directly. This helps cut down on invalid states and properties we need to
assert.
I've also intentionally written this code such that supporting any power of two
page size (rather than just the exact values `1` and `65536` that are currently
valid) will essentially just involve updating `wasmparser`'s validation and
removing some debug asserts in Wasmtime.
* Update error string expectation
* Remove debug logging
* Use a right shift instead of a division
* fix error message expectation again
* remove page size from VMMemoryDefinition
* fix size of VMMemoryDefinition again
* Only dynamically check for `-1` sentinel for 1-byte page sizes
* Import functions that are used a few times
* Better handle overflows when rounding up to the host page size
Propagate errors instead of returning a value that is not actually a rounded up
version of the input.
Delay rounding up various config sizes until runtime instead of eagerly doing it
at config time (which isn't even guaranteed to work, so we already had to have a
backup plan to round up at runtime, since we might be cross-compiling wasm or
not have the runtime feature enabled).
* Fix some anyhow and nostd errors
* Add missing rounding up to host page size at runtime
* Add validate feature to wasmparser dep
* Add some new rounding in a few places, due to no longer rounding in config methods
* Avoid actually trying to allocate the whole address space in the `massive_64_bit_still_limited` test
The point of the test is to ensure that we hit the limiter, so just cancel the
allocation from the limiter, and otherwise avoid MIRI attempting to allocate a
bunch of memory after we hit the limiter.
* prtest:full
* Revert "Avoid actually trying to allocate the whole address space in the `massive_64_bit_still_limited` test"
This reverts commit ccfa34a78dd3d53e49a6158ca03077d42ce8bcd7.
* miri: don't attempt to allocate more than 4GiB of memory
It seems that rather than returning a null pointer from `std::alloc::alloc`,
miri will sometimes choose to simply crash the whole program.
* remove duplicate prelude import after rebasing
We have slightly different bounds checks for when Spectre mitigations are
enabled or disabled, so add a knob to our fuzzing machinery to exercise all
cases.
* Enable rustc's `unused-lifetimes` lint
This is allow-by-default doesn't seem to have any false positives in
Wasmtime's codebase so enable it by default to help clean up vestiges of
old refactorings.
* Remove another unused lifetime
* Remove another unused lifetime
* Use bytes for maximum size of linear memory with pooling
This commit changes configuration of the pooling allocator to use a
byte-based unit rather than a page based unit. The previous
`PoolingAllocatorConfig::memory_pages` configuration option configures
the maximum size that a linear memory may grow to at runtime. This is an
important factor in calculation of stripes for MPK and is also a
coarse-grained knob apart from `StoreLimiter` to limit memory
consumption. This configuration option has been renamed to
`max_memory_size` and documented that it's in terms of bytes rather than
pages as before.
Additionally the documented constraint of `max_memory_size` must be
smaller than `static_memory_bound` is now additionally enforced as a
minor clean-up as part of this PR as well.
* Review comments
* Fix benchmark build
* wasmtime: Make table lazy-init configurable
Lazy initialization of tables has trade-offs that we haven't explored in
a while. Making it configurable makes it easier to test the effects of
these trade-offs on a variety of WebAssembly programs, and allows
embedders to decide whether the trade-offs are worth-while for their use
cases.
* Review comments
This introduces a `DecommitQueue` for batching decommits together in the pooling
allocator:
* Deallocating a memory/table/stack enqueues their associated regions of memory
for decommit; it no longer immediately returns the associated slot to the
pool's free list. If the queue's length has reached the configured batch size,
then we flush the queue by running all the decommits, and finally returning
the memory/table/stack slots to their respective pools and free lists.
* Additionally, if allocating a new memory/table/stack fails because the free
list is empty (aka we've reached the max concurrently-allocated limit for this
entity) then we fall back to a slow path before propagating the error. This
slow path flushes the decommit queue and then retries allocation, hoping that
the queue flush reclaimed slots and made them available for this fallback
allocation attempt. This involved defining a new `PoolConcurrencyLimitError`
to match on, which is also exposed in the public embedder API.
It is also worth noting that we *always* use this new decommit queue now. To
keep the existing behavior, where e.g. a memory's decommits happen immediately
on deallocation, you can use a batch size of one. This effectively disables
queueing, forcing all decommits to be flushed immediately.
The default decommit batch size is one.
This commit, with batch size of one, consistently gives me an increase on
`wasmtime serve`'s requests-per-second versus its parent commit, as measured by
`benches/wasmtime-serve-rps.sh`. I get ~39K RPS on this commit compared to ~35K
RPS on the parent commit. This is quite puzzling to me. I was expecting no
change, and hoping there wouldn't be a regression. I was not expecting a speed
up. I cannot explain this result at this time.
prtest:full
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=68735.
That fuzzbug bisected to the call-indirect caching changes, but this
turned out to be a red herring (the options added in that PR mean that
the fuzzbug config deserializes differently prior to the commit). In
any case, it's an easy fix -- it appears that V8 added a new error
message, so we need to add it to the allowlist of messages that we
expect for a table out-of-bounds condition.
* Bump Wasmtime's MSRV to 1.76.0
* Update Rust in CI to 1.78.0, the current stable
* Update nightly tests to the latest nightly
prtest:full
* Fix check-cfg with nightly
* More check-cfg fixes
* Remove an async cfg
This is no longer specified for the root crate.
* Move definition of Wasmtime's nightly into one place
Don't change a bunch of places when this is updated, try to update just
one single location instead.
* Wasmtime: add one-entry call-indirect caching.
In WebAssembly, an indirect call is somewhat slow, because of the
indirection required by CFI (control-flow integrity) sandboxing. In
particular, a "function pointer" in most source languages compiled to
Wasm is represented by an index into a table of funcrefs. The
`call_indirect` instruction then has to do the following steps to invoke
a function pointer:
- Load the funcref table's base and length values from the vmctx.
- Bounds-check the invoked index against the actual table size; trap if
out-of-bounds.
- Spectre mitigation (cmove) on that bounds-check.
- Load the `vmfuncref` from the table given base and index.
- For lazy table init, check if this is a non-initialized funcref
pointer, and initialize the entry.
- Load the signature from the funcref struct and compare it against the
`call_indirect`'s expected signature; trap if wrong.
- Load the actual code pointer for the callee's Wasm-ABI entry point.
- Load the callee vmctx (which may be different for a cross-module
call).
- Put that vmctx in arg 0, our vmctx in arg 1, and invoke the loaded
code pointer with an indirect call instruction.
Compare and contrast to the process involved in invoking a native
function pointer:
- Invoke the code pointer with an indirect call instruction.
This overhead buys us something -- it is part of the SFI sandbox
boundary -- but it is very repetitive and unnecessary work in *most*
cases when indirect function calls are performed repeatedly (such as
within an inner loop).
This PR introduces the idea of *caching*: if we know that the result of
all the above checks won't change, then if we use the same index as "the
last time" (for some definition), we can skip straight to the "invoke
the code pointer" step, with a cached code pointer from that last time.
Concretely, it introduces a two-word struct inlined into the vmctx for
each `call_indirect` instruction in the module (up to a limit):
- The last invoked index;
- The code pointer that index corresponded to.
When compiling the module, we check whether the table could possibly be
mutable at a given index once read: any instructions like `table.set`,
or the whole table exported thus writable from the outside. We also
check whether index 0 is a non-null funcref. If neither of these things
are true, then we know we can cache an index-to-code-pointer mapping,
and we know we can use index 0 as a sentinel for "no cached value".
We then make use of the struct for each indirect call site and generate
code to check if the index matches; if so, call cached pointer; if not,
load the vmfuncref, check the signature, check that the callee vmctx is
the same as caller (intra-module case), and stash the code pointer and
index away (fill the cache), then make the call.
On an in-development branch of SpiderMonkey-in-Wasm with ICs (using
indirect calls), this is about a 20% speedup; I haven't yet measured on
other benchmarks. It is expected that this might be an
instantiation-time slowdown due to a larger vmctx (but we could use
madvise to zero if needed).
This feature is off by default right now.
* Addressed review feedback.
* Added some more comments.
* Allow unused VMCallIndirectCache struct (defined for parity with other bits but not needed in actual runtime).
* Add a limit to the number of call-indirect cache slots.
* Fix merge conflict: handle ConstOp element offset.
* Review feedback.
This commit adds support for defining array types from Wasm or the host, and
managing them inside the engine's types registry. It does not introduce support
for allocating or manipulating array values. That functionality will come in
future pull requests.
* Rename `WasmHeapType::Concrete(_)` to `WasmHeapType::ConcreteFunc(_)`
* Rename `wasmtime::HeapType::Concrete` to `wasmtime::HeapType::ConcreteFunc`
* Introduce Wasm sub- and composite-types
Right now, these are only ever final function types that don't have a supertype,
but this refactoring paves the way for array and struct types, and lets us make
sure that `match`es are exhaustive for when we add new enum variants. (Although
I did add an `unwrap_func` helper for use when it is clear that the type should
be a function type, and if it isn't then we should panic.)
* Add a fuzzer for async wasm
This commit revives a very old branch of mine to add a fuzzer for
Wasmtime in async mode. This work was originally blocked on
llvm/llvm-project#53891 and while that's still an issue it now contains
a workaround for that issue. Support for async fuzzing required a good
deal of refactorings and changes, and the highlights are:
* The main part is that new intrinsics,
`__sanitizer_{start,finish}_fiber_switch` are now invoked around the
stack-switching routines of fibers. This only works on Unix and is set
to only compile when ASAN is enabled (otherwise everything is a noop).
This required refactoring of things to get it all in just the right
way for ASAN since it appears that these functions not only need to be
called but more-or-less need to be adjacent to each other in the code.
My guess is that while we're switching ASAN is in a "weird state" and
it's not ready to run arbitrary code.
* Stacks are a problem. The above issue in LLVM outlines how stacks
cannot be deallocated at this time because if the deallocated virtual
memory is later used for the heap then ASAN will have a false positive
about stack overflow. To handle this stacks are specially handled in
asan mode by using a special allocation path that never deallocates
stacks. This logic additionally applies to the pooling allocator which
uses a different stack allocation strategy with ASAN.
With all of the above a new fuzzer is added. This fuzzer generates an
arbitrary module, selects an arbitrary means of async (e.g.
epochs/fuel), and then tries to execute the exports of the module with
various values. In general the fuzzer is looking for crashes/panics as
opposed to correct answers as there's no oracle here. This is also
intended to stress the code used to switch on and off stacks.
* Fix non-async build
* Remove unused import
* Review comments
* Fix compile on MIRI
* Fix Windows build
\### The `GcRuntime` and `GcCompiler` Traits
This commit factors out the details of the garbage collector away from the rest
of the runtime and the compiler. It does this by introducing two new traits,
very similar to a subset of [those proposed in the Wasm GC RFC], although not
all equivalent functionality has been added yet because Wasmtime doesn't
support, for example, GC structs yet:
[those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface
1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run
collections within them, and execute the various GC barriers the collector
requires.
Rather than monomorphize all of Wasmtime on this trait, we use it
as a dynamic trait object. This does imply some virtual call overhead and
missing some inlining (and resulting post-inlining) optimization
opportunities. However, it is *much* less disruptive to the existing embedder
API, results in a cleaner embedder API anyways, and we don't believe that VM
runtime/embedder code is on the hot path for working with the GC at this time
anyways (that would be the actual Wasm code, which has inlined GC barriers
and direct calls and all of that). In the future, once we have optimized
enough of the GC that such code is ever hot, we have options we can
investigate at that time to avoid these dynamic virtual calls, like only
enabling one single collector at build time and then creating a static type
alias like `type TheOneGcImpl = ...;` based on the compile time
configuration, and using this type alias in the runtime rather than a dynamic
trait object.
The `GcRuntime` trait additionally defines a method to reset a GC heap, for
use by the pooling allocator. This allows reuse of GC heaps across different
stores. This integration is very rudimentary at the moment, and is missing
all kinds of configuration knobs that we should have before deploying Wasm GC
in production. This commit is large enough as it is already! Ideally, in the
future, I'd like to make it so that GC heaps receive their memory region,
rather than allocate/reserve it themselves, and let each slot in the pooling
allocator's memory pool be *either* a linear memory or a GC heap. This would
unask various capacity planning questions such as "what percent of memory
capacity should we dedicate to linear memories vs GC heaps?". It also seems
like basically all the same configuration knobs we have for linear memories
apply equally to GC heaps (see also the "Indexed Heaps" section below).
2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements
GC barriers for various operations on GC-managed references. The Rust code
calls into this trait dynamically via a trait object, but since it is
customizing the CLIF that is generated for Wasm code, the Wasm code itself is
not making dynamic, indirect calls for GC barriers. The `GcCompiler`
implementation can inline the parts of GC barrier that it believes should be
inline, and leave out-of-line calls to rare slow paths.
All that said, there is still only a single implementation of each of these
traits: the existing deferred reference-counting (DRC) collector. So there is a
bunch of code motion in this commit as the DRC collector was further isolated
from the rest of the runtime and moved to its own submodule. That said, this was
not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply
skipping over the DRC collector's code in review.
\### Indexed Heaps
This commit does bake in a couple assumptions that must be shared across all
collector implementations, such as a shared `VMGcHeader` that all objects
allocated within a GC heap must begin with, but the most notable and
far-reaching of these assumptions is that all collectors will use "indexed
heaps".
What we are calling indexed heaps are basically the three following invariants:
1. All GC heaps will be a single contiguous region of memory, and all GC objects
will be allocated within this region of memory. The collector may ask the
system allocator for additional memory, e.g. to maintain its free lists, but
GC objects themselves will never be allocated via `malloc`.
2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into
the GC heap's contiguous region of memory. We never hold raw pointers to GC
objects (although, of course, we have to compute them and use them
temporarily when actually accessing objects). This means that deref'ing GC
pointers is equivalent to deref'ing linear memory pointers: we need to add a
base and we also check that the GC pointer/index is within the bounds of the
GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly
common technique among high-performance GC
implementations[^compressed-oops][^v8-ptr-compression] so we are in good
company.
3. Anything stored inside the GC heap is untrusted. Even each GC reference that
is an element of an `(array (ref any))` is untrusted, and bounds checked on
access. This means that, for example, we do not store the raw pointer to an
`externref`'s host object inside the GC heap. Instead an `externref` now
stores an ID that can be used to index into a side table in the store that
holds the actual `Box<dyn Any>` host object, and accessing that side table is
always checked.
[^compressed-oops]: See ["Compressed OOPs" in
OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops)
[^v8-ptr-compression]: See [V8's pointer
compression](https://v8.dev/blog/pointer-compression).
The good news with regards to all the bounds checking that this scheme implies
is that we can use all the same virtual memory tricks that linear memories use
to omit explicit bounds checks. Additionally, (2) means that the sizes of GC
objects is that much smaller (and therefore that much more cache friendly)
because they are only holding onto 32-bit, rather than 64-bit, references to
other GC objects. (We can, in the future, support GC heaps up to 16GiB in size
without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment
and storing aligned indices rather than byte indices, while still leaving the
bottom bit available for tagging as an `i31ref` discriminant. Should we ever
need to support even larger GC heap capacities, we could go to full 64-bit
references, but we would need explicit bounds checks.)
The biggest benefit of indexed heaps is that, because we are (explicitly or
implicitly) bounds checking GC heap accesses, and because we are not otherwise
trusting any data from inside the GC heap, we greatly reduce how badly things
can go wrong in the face of collector bugs and GC heap corruption. We are
essentially sandboxing the GC heap region, the same way that linear memory is a
sandbox. GC bugs could lead to the guest program accessing the wrong GC object,
or getting garbage data from within the GC heap. But only garbage data from
within the GC heap, never outside it. The worse that could happen would be if we
decided not to zero out GC heaps between reuse across stores (which is a valid
trade off to make, since zeroing a GC heap is a defense-in-depth technique
similar to zeroing a Wasm stack and not semantically visible in the absence of
GC bugs) and then a GC bug would allow the current Wasm guest to read old GC
data from the old Wasm guest that previously used this GC heap. But again, it
could never access host data.
Taken altogether, this allows for collector implementations that are nearly free
from `unsafe` code, and unsafety can otherwise be targeted and limited in scope,
such as interactions with JIT code. Most importantly, we do not have to maintain
critical invariants across the whole system -- invariants which can't be nicely
encapsulated or abstracted -- to preserve memory safety. Such holistic
invariants that refuse encapsulation are otherwise generally a huge safety
problem with GC implementations.
\### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore
`VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here
was to be sure that I was actually calling GC barriers at all the correct
places. I couldn't be sure before. Now, you can still explicitly copy a raw GC
reference without running GC barriers if you need to and understand why that's
okay (aka you are implementing the collector), but that is something you have to
opt into explicitly by calling `unchecked_copy`. The default now is that you
can't just copy the reference, and instead call an explicit `clone` method (not
*the* `Clone` trait, because we need to pass in the GC heap context to run the
GC barriers) and it is hard to forget to do that accidentally. This resulted in
a pretty big amount of churn, but I am wayyyyyy more confident that the correct
GC barriers are called at the correct times now than I was before.
\### `i31ref`
I started this commit by trying to add `i31ref` support. And it grew into the
whole traits interface because I found that I needed to abstract GC barriers
into helpers anyways to avoid running them for `i31ref`s, so I figured that I
might as well add the whole traits interface. In comparison, `i31ref` support is
much easier and smaller than that other part! But it was also difficult to pull
apart from this commit, sorry about that!
---------------------
Overall, I know this is a very large commit. I am super happy to have some
synchronous meetings to walk through this all, give an overview of the
architecture, answer questions directly, etc... to make review easier!
prtest:full
There have been more fuzzbugs than expected and the onslaught of issues
something I definitely don't have time to deal with right now; let's try
again later in the year (unless someone else wants to drive this!).
This puts the fuzzing logic under an off-by-default feature so it can
still be tested and developed in-tree as desired.
* Update PCC test to expose failure.
* Reduce test coverage only to fully-static (bounds-check-elided) case.
* Configure PCC when fuzzing
* Ensure that we panic for pcc errors in the instantiate fuzz target
* Adjust Wasmtime configuration generation: PCC forces static memory configuration.
* Properly force memory config when fuzzing PCC.
---------
Co-authored-by: Trevor Elliott <telliott@fastly.com>
* Run all `*.wast` tests in fuzzing
Currently we have a `spectest` fuzzer which uses fuzz input to generate
an arbitrary configuration for Wasmtime and then executes the spec test.
This ensures that no matter the configuration Wasmtime can pass spec
tests. This commit expands this testing to include all `*.wast` tests we
have in this repository. While we don't have a ton we still have some
significant ones like in #8118 which will only reproduce when turning
knobs on CPU features.
* Fix CLI build
* Fix wast testing
* Remove type information from dynamic component funcs
This commit removes the `&Component` argument from the
`component::Linker::func_new` API. This is inspired by #8062 where `Val`
holds less type information as well in addition to the realization that
type-checking happens at runtime rather than instantiation time.
This argument was originally added to mirror
`wasmtime::Linker::func_new` which takes a type argument of the core
wasm function that's being defined. Unlike core wasm, though, component
functions already have to carry along their type information as part of
function calls to handle resources correctly. This means that when a
host function is invoked the type is already known of all the parameters
and results. Additionally values are already required to be type-checked
going back into wasm, so there's less of a need to perform an additional
type-check up front.
The main consequence of this commit is that it's a bit more difficult
for embeddings to know what the expected types of results are. No type
information is provided when a host function is defined, not even
function arity. This means that when the host function is invoked it may
not know how many results are expected to be produced and of what type.
Typically though a bindings generator is used somewhere along the way so
that's expected to alleviate this issue.
Finally my hope is to enhance this "dynamic" API in the future with a
bit more information so the type information is more readily accessible
at runtime. For now though hosts will have to "simply know what to do".
* Update crates/wasmtime/src/runtime/component/linker.rs
Co-authored-by: Joel Dice <joel.dice@fermyon.com>
* Fix doc links
* Fix component call benchmarks
---------
Co-authored-by: Joel Dice <joel.dice@fermyon.com>
This commit is a large refactor of the `Val` type as used with
components to remove inherent type information present currently. The
`Val` type is now only an AST of what a component model value looks like
and cannot fully describe the type that it is without further context.
For example enums only store the case that's being used, not the full
set of cases.
The motivation for this commit is to make it simpler to use and
construct `Val`, especially in the face of resources. Some problems
solved here are:
* With resources in play managing type information is not trivial and
can often be surprising. For example if you learn the type of a
function from a component and the instantiate the component twice the
type information is not suitable to use with either function due to
exported resources acquiring unique types on all instantiations.
* Functionally it's much easier to construct values when type
information is not required as it no longer requires probing various
pieces for type information here and there.
* API-wise there's far less for us to maintain as there's no need for a
type-per-variant of component model types. Pieces now fit much more
naturally into a `Val` shape without extra types.
* Functionally when working with `Val` there's now only one typecheck
instead of two. Previously a typecheck was performed first when a
`Val` was created and then again later when it was passed to wasm. Now
the typecheck only happens when passed to wasm.
It's worth pointing out that `Val` as-is is a pretty inefficient
representation of component model values, for example flags are stored
as a list of strings. While semantically correct this is quite
inefficient for most purposes other than "get something working". To
that extent my goal is to, in the future, add traits that enable
building a custom user-defined `Val` (of sorts), but still dynamically.
This should enable embedders to opt-in to a more efficient
representation that relies on contextual knowledge.
* Define garbage collection rooting APIs
Rooting prevents GC objects from being collected while they are actively being
used.
We have a few sometimes-conflicting goals with our GC rooting APIs:
1. Safety: It should never be possible to get a use-after-free bug because the
user misused the rooting APIs, the collector "mistakenly" determined an
object was unreachable and collected it, and then the user tried to access
the object. This is our highest priority.
2. Moving GC: Our rooting APIs should moving collectors (such as generational
and compacting collectors) where an object might get relocated after a
collection and we need to update the GC root's pointer to the moved
object. This means we either need cooperation and internal mutability from
individual GC roots as well as the ability to enumerate all GC roots on the
native Rust stack, or we need a level of indirection.
3. Performance: Our rooting APIs should generally be as low-overhead as
possible. They definitely shouldn't require synchronization and locking to
create, access, and drop GC roots.
4. Ergonomics: Our rooting APIs should be, if not a pleasure, then at least not
a burden for users. Additionally, the API's types should be `Sync` and `Send`
so that they work well with async Rust.
For example, goals (3) and (4) are in conflict when we think about how to
support (2). Ideally, for ergonomics, a root would automatically unroot itself
when dropped. But in the general case that requires holding a reference to the
store's root set, and that root set needs to be held simultaneously by all GC
roots, and they each need to mutate the set to unroot themselves. That implies
`Rc<RefCell<...>>` or `Arc<Mutex<...>>`! The former makes the store and GC root
types not `Send` and not `Sync`. The latter imposes synchronization and locking
overhead. So we instead make GC roots indirect and require passing in a store
context explicitly to unroot in the general case. This trades worse ergonomics
for better performance and support for moving GC and async Rust.
Okay, with that out of the way, this module provides two flavors of rooting
API. One for the common, scoped lifetime case, and another for the rare case
where we really need a GC root with an arbitrary, non-LIFO/non-scoped lifetime:
1. `RootScope` and `Rooted<T>`: These are used for temporarily rooting GC
objects for the duration of a scope. Upon exiting the scope, they are
automatically unrooted. The internal implementation takes advantage of the
LIFO property inherent in scopes, making creating and dropping `Rooted<T>`s
and `RootScope`s super fast and roughly equivalent to bump allocation.
This type is vaguely similar to V8's [`HandleScope`].
[`HandleScope`]: https://v8.github.io/api/head/classv8_1_1HandleScope.html
Note that `Rooted<T>` can't be statically tied to its context scope via a
lifetime parameter, unfortunately, as that would allow the creation and use
of only one `Rooted<T>` at a time, since the `Rooted<T>` would take a borrow
of the whole context.
This supports the common use case for rooting and provides good ergonomics.
2. `ManuallyRooted<T>`: This is the fully general rooting API used for holding
onto non-LIFO GC roots with arbitrary lifetimes. However, users must manually
unroot them. Failure to manually unroot a `ManuallyRooted<T>` before it is
dropped will result in the GC object (and everything it transitively
references) leaking for the duration of the `Store`'s lifetime.
This type is roughly similar to SpiderMonkey's [`PersistentRooted<T>`],
although they avoid the manual-unrooting with internal mutation and shared
references. (Our constraints mean we can't do those things, as mentioned
explained above.)
[`PersistentRooted<T>`]: http://devdoc.net/web/developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS::PersistentRooted.html
At the end of the day, both `Rooted<T>` and `ManuallyRooted<T>` are just tagged
indices into the store's `RootSet`. This indirection allows working with Rust's
borrowing discipline (we use `&mut Store` to represent mutable access to the GC
heap) while still allowing rooted references to be moved around without tying up
the whole store in borrows. Additionally, and crucially, this indirection allows
us to update the *actual* GC pointers in the `RootSet` and support moving GCs
(again, as mentioned above).
* Reorganize GC-related submodules in `wasmtime-runtime`
* Reorganize GC-related submodules in `wasmtime`
* Use `Into<StoreContext[Mut]<'a, T>` for `Externref::data[_mut]` methods
* Run rooting tests under MIRI
* Make `into_abi` take an `AutoAssertNoGc`
* Don't use atomics to update externref ref counts anymore
* Try to make lifetimes/safety more-obviously correct
Remove some transmute methods, assert that `VMExternRef`s are the only valid
`VMGcRef`, etc.
* Update extenref constructor examples
* Make `GcRefImpl::transmute_ref` a non-default trait method
* Make inline fast paths for GC LIFO scopes
* Make `RootSet::unroot_gc_ref` an `unsafe` function
* Move Hash and Eq for Rooted, move to impl methods
* Remove type parameter from `AutoAssertNoGc`
Just wrap a `&mut StoreOpaque` directly.
* Make a bunch of internal `ExternRef` methods that deal with raw `VMGcRef`s take `AutoAssertNoGc` instead of `StoreOpaque`
* Fix compile after rebase
* rustfmt
* revert unrelated egraph changes
* Fix non-gc build
* Mark `AutoAssertNoGc` methods inline
* review feedback
* Temporarily remove externref support from the C API
Until we can add proper GC rooting.
* Remove doxygen reference to temp deleted function
* Remove need to `allow(private_interfaces)`
* Fix call benchmark compilation
* Wasmtime: Add a `gc` cargo feature
This controls whether support for `ExternRef` and its associated deferred,
reference-counting garbage collector is enabled at compile time or not. It will
also be used for similarly for Wasmtime's full Wasm GC support as that gets
added.
* Add CI for `gc` Cargo feature
* Cut down on the number of `#[cfg(feature = "gc")]`s outside the implementation of `[VM]ExternRef`
* Fix wasmparser reference types configuration with GC disabled/enabled
* More config fix
* doc cfg
* Make the dummy `VMExternRefActivationsTable` inhabited
* Fix winch tests
* final review bits
* Enable wasmtime's gc cargo feature for the C API
* Enable wasmtime's gc cargo feature from wasmtime-cli-flags
* enable gc cargo feature in a couple other crates