github/cranelift - cranelift - Gitea: Git with home nas server

Author	SHA1	Message	Date
Jamey Sharp	3befbe52c9	cranelift: Drop unused arguments before regalloc (#8438 ) Before this, Cranelift ABI code would emit a stack-load instruction for every stack argument and add all register arguments to the `args` pseudo-instruction, whether those arguments were used or not. However, we already know which arguments are used at that point because we need the analysis for load-sinking, so it's easy to filter the unused arguments out. This avoids generating loads that are immediately dead, which is good for the generated code. It also slightly reduces the size of the register allocation problem, which is a small win in compile time. This also changes which registers RA2 chooses in some cases because it no longer considers unused defs from the `args` pseudo-instruction. There was an existing method named `arg_is_needed_in_body` which sounded like it should be the right place to implement this. However, that method was only used for Baldrdash integration and has been a stub since that integration was removed in #4571. Also it didn't have access to the `value_ir_uses` map needed here. But the place where that method was called does have access to that map and was perfect for this. Also, don't emit a `dummy_use` pseudo-instruction for the vmctx if it's otherwise unused everywhere, as we want to drop it from the `args` instruction in that case and then RA2 complains that it's used without being defined. Furthermore, don't emit debug info specially for the vmctx parameter, because it's already emitted for all block parameters including vmctx. Thanks to @elliottt for doing the initial investigation of this change with me, and to @cfallin for helping me track down the `dummy_use` false dependency.	7 months ago
Nick Fitzgerald	0fa130131d	Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196 ) \### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn't support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is much less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don't believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I'd like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator's memory pool be either a linear memory or a GC heap. This would unask various capacity planning questions such as "what percent of memory capacity should we dedicate to linear memories vs GC heaps?". It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the "Indexed Heaps" section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not purely code motion (see "Indexed Heaps" below) so it is worth not simply skipping over the DRC collector's code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use "indexed heaps". What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap's contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref'ing GC pointers is equivalent to deref'ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`'s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box<dyn Any>` host object, and accessing that side table is always checked. [^compressed-oops]: See ["Compressed OOPs" in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8's pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can't be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is NOT `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn't be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that's okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can't just copy the reference, and instead call an explicit `clone` method (not the `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full	7 months ago
Alex Crichton	70b076d3e5	Migrate all Winch filetests to `tests/disas` (#8243 ) * Switch Winch tests to ATT syntax * Update all test expectations * Move all winch tests to `disas` folder * Add `test = "winch"` to `disas` * Add `test = "winch"` to all winch test files * Stub out bits to get AArch64 Winch tests working * Update expectations for all aarch64 winch tests * Update flags in Winch tests Use CLI syntax as that's what `flags` was repurposes as in the new test suite. * Update all test expectations for x64 winch * Omit more offsets by default * Delete now-dead code * Update an error message * Update non-winch test expectations	8 months ago
Alex Crichton	ad308105fb	Disassemble `.cwasm` for `compile` disas tests (#8237 ) Disassemble `.cwasm` for `compile` disas tests This commit changes how the `compile` mode of the `disas` test suite works. Previously this would use `--emit-clif` and run the Cranelift pipeline for each individual function and use the custom VCode-based disassembly for instruction output. This commit instead uses the raw binary coming out of Wasmtime. The ELF file itself is parsed and is disassembled in a manner similar to Winch tests. The goal of this commit is somewhat twofold: Lay the groundwork to migrate all Winch-based filetests to `tests/disas`. * Test the raw output from Cranelift/Wasmtime which includes optimizations like branch chomping in the `MachBuffer`. This commit doesn't itself move the Winch tests yet, that's left for a future commit. * Update all test expectations for new output * Fix PR-based CI when too many files are changed	8 months ago
Alex Crichton	ead7f735b4	Compile out wmemcheck-related libcalls when not enabled (#8203 ) Currently even when the `wmemcheck` Cargo feature is disabled the various related libcalls are still compiled into `wasmtime-runtime`. Additionally their signatures are translated when lowering functions, although the signatures are never used. This commit adds `#[cfg]` annotations to compile these all out when they're not enabled. Applying this change, however, uncovered a subtle bug in our libcalls. Libcalls are numbered in-order as-listed in the macro ignoring `#[cfg]`, but they're assigned a runtime slot in a `VMBuiltinFunctionsArray` structure which does respect `#[cfg]`. This meant, for example, that if `gc` was enabled and `wmemcheck` was disabled, as is the default for our tests, then there was a hole in the numbering where libcall numbers were mismatched at runtime and compile time. To fix this I've first added a const assertion that the runtime-number of libcalls equals the build-time number of libcalls. I then updated the macro a bit to plumb the `#[cfg]` differently and now libcalls are unconditionally defined regardless of cfgs but the implementation is `std::process::abort()` if it's compiled out. This ended up having a large-ish impact on the `disas` test suite. Lots of functions have fewer signatures translation because wmemcheck, even when disabled, was translating a few signatures. This also had some assembly changes, too, because I believe functions are considered leaves based on whether they declare a signature or not, so declaring an unused signature was preventing all wasm functions from being considered leaves.	8 months ago
Chris Fallin	afaf1c73f6	PCC: x64: 32- and 64-bit XMM loads/stores are 32 and 64 bits, respectively. (#8177 ) These are definitely not 128 bits wide; let's make the PCC model aware of that! Fixes https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=67427.	8 months ago