rust jit wasm webassembly runtime wasi cranelift sandbox wasmtime

History

Nick Fitzgerald 0fa130131d Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196 ) \### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn't support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is much less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don't believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I'd like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator's memory pool be either a linear memory or a GC heap. This would unask various capacity planning questions such as "what percent of memory capacity should we dedicate to linear memories vs GC heaps?". It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the "Indexed Heaps" section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not purely code motion (see "Indexed Heaps" below) so it is worth not simply skipping over the DRC collector's code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use "indexed heaps". What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap's contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref'ing GC pointers is equivalent to deref'ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`'s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box<dyn Any>` host object, and accessing that side table is always checked. [^compressed-oops]: See ["Compressed OOPs" in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8's pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can't be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is NOT `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn't be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that's okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can't just copy the reference, and instead call an explicit `clone` method (not the `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full		7 months ago
..
fuzz_targets	Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196)	7 months ago
.gitignore	fuzzing: remove the in-repo corpus	5 years ago
Cargo.toml	Run all `*.wast` tests in fuzzing (#8121)	8 months ago
README.md	fuzzgen: Allow restricting generated opcodes with env var (#7433)	1 year ago
build.rs	Update some CI dependencies (#7983)	9 months ago

README.md

`cargo fuzz` Targets for Wasmtime

This crate defines various libFuzzer fuzzing targets for Wasmtime, which can be run via cargo fuzz.

These fuzz targets just glue together pre-defined test case generators with oracles and pass libFuzzer-provided inputs to them. The test case generators and oracles themselves are independent from the fuzzing engine that is driving the fuzzing process and are defined in wasmtime/crates/fuzzing.

Example

To start fuzzing run the following command, where $MY_FUZZ_TARGET is one of the available fuzz targets:

cargo fuzz run $MY_FUZZ_TARGET

Available Fuzz Targets

At the time of writing, we have the following fuzz targets:

api_calls: stress the Wasmtime API by executing sequences of API calls; only the subset of the API is currently supported.
compile: Attempt to compile libFuzzer's raw input bytes with Wasmtime.
compile-maybe-invalid: Attempt to compile a wasm-smith-generated Wasm module with code sequences that may be invalid.
cranelift-fuzzgen: Generate a Cranelift function and check that it returns the same results when compiled to the host and when using the Cranelift interpreter; only a subset of Cranelift IR is currently supported.
cranelift-icache: Generate a Cranelift function A, applies a small mutation to its source, yielding a function A', and checks that A compiled + incremental compilation generates the same machine code as if A' was compiled from scratch.
differential: Generate a Wasm module, evaluate each exported function with random inputs, and check that Wasmtime returns the same results as a choice of another engine: the Wasm spec interpreter (see the wasm-spec-interpreter crate), the wasmi interpreter, V8 (through the v8 crate), or Wasmtime itself run with a different configuration.
instantiate: Generate a Wasm module and Wasmtime configuration and attempt to compile and instantiate with them.
instantiate-many: Generate many Wasm modules and attempt to compile and instantiate them concurrently.
spectests: Pick a random spec test and run it with a generated configuration.
table_ops: Generate a sequence of externref table operations and run them in a GC environment.

The canonical list of fuzz targets is the .rs files in the fuzz_targets directory:

ls wasmtime/fuzz/fuzz_targets/

Corpora

While you can start from scratch, libFuzzer will work better if it is given a corpus of seed inputs to kick start the fuzzing process. We maintain a corpus for each of these fuzz targets in a dedicated repo on github.

You can use our corpora by cloning it and placing it at wasmtime/fuzz/corpus:

git clone \
    https://github.com/bytecodealliance/wasmtime-libfuzzer-corpus.git \
    wasmtime/fuzz/corpus

Reproducing a Fuzz Bug

When investigating a fuzz bug (especially one found by OSS-Fuzz), use the following steps to reproduce it locally:

Download the test case (either the "Minimized Testcase" or "Unminimized Testcase" from OSS-Fuzz will do).
Run the test case in the correct fuzz target:
```
cargo +nightly fuzz run <target> <test case>
```
If all goes well, the bug should reproduce and libFuzzer will dump the failure stack trace to stdout
For more debugging information, run the command above with RUST_LOG=debug to print the configuration and WebAssembly input used by the test case (see uses of log_wasm in the wasmtime-fuzzing crate).

Target specific options

`cranelift-fuzzgen`

Fuzzgen supports passing the FUZZGEN_ALLOWED_OPS environment variable, which when available restricts the instructions that it will generate.

Running FUZZGEN_ALLOWED_OPS=ineg,ishl cargo fuzz run cranelift-fuzzgen will run fuzzgen but only generate ineg or ishl opcodes.

`cranelift-icache`

The icache target also uses the fuzzgen library, thus also supports the FUZZGEN_ALLOWED_OPS enviornment variable as described in the cranelift-fuzzgen section above.