You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

348 lines
10 KiB

//! Build program to generate a program which runs all the testsuites.
//!
//! By generating a separate `#[test]` test for each file, we allow cargo test
//! to automatically run the files in parallel.
use anyhow::Context;
use std::env;
use std::fmt::Write;
use std::fs;
use std::path::{Path, PathBuf};
use std::process::Command;
fn main() -> anyhow::Result<()> {
println!("cargo:rerun-if-changed=build.rs");
set_commit_info_for_rustc();
let out_dir = PathBuf::from(
env::var_os("OUT_DIR").expect("The OUT_DIR environment variable must be set"),
);
let mut out = String::new();
for strategy in &["Cranelift", "Winch"] {
writeln!(out, "#[cfg(test)]")?;
writeln!(out, "#[allow(non_snake_case)]")?;
if *strategy == "Winch" {
// We only test Winch on x86_64, for now.
writeln!(out, "{}", "#[cfg(all(target_arch = \"x86_64\"))]")?;
}
writeln!(out, "mod {} {{", strategy)?;
with_test_module(&mut out, "misc", |out| {
test_directory(out, "tests/misc_testsuite", strategy)?;
test_directory_module(out, "tests/misc_testsuite/multi-memory", strategy)?;
test_directory_module(out, "tests/misc_testsuite/simd", strategy)?;
Wasmtime: Add support for Wasm tail calls (#6774) * Wasmtime: Add support for Wasm tail calls This adds the `Config::wasm_tail_call` method and `--wasm-features tail-call` CLI flag to enable the Wasm tail calls proposal in Wasmtime. This PR is mostly just plumbing and enabling tests, since all the prerequisite work (Wasmtime trampoline overhauls and Cranelift tail calls) was completed in earlier pull requests. When Wasm tail calls are enabled, Wasm code uses the `tail` calling convention. The `tail` calling convention is known to cause a 1-7% slow down for regular code that isn&#39;t using tail calls, which is why it isn&#39;t used unconditionally. This involved shepherding `Tunables` through to Wasm signature construction methods. The eventual plan is for the `tail` calling convention to be used unconditionally, but not until the performance regression is addressed. This work is tracked in https://github.com/bytecodealliance/wasmtime/issues/6759 Additionally while our x86-64, aarch64, and riscv64 backends support tail calls, the s390x backend does not support them yet. Attempts to use tail calls on s390x will return errors. Support for s390x is tracked in https://github.com/bytecodealliance/wasmtime/issues/6530 * Store `Tunables` inside the `Compiler` Instead of passing as an argument to every `Compiler` method. * Cranelift: Support &#34;direct&#34; return calls on riscv64 They still use `jalr` instead of `jal` but this allows us to use the `RiscvCall` reloc, which Wasmtime handles. Before we were using `LoadExternalName` which produces an `Abs8` reloc, which Wasmtime intentionally does not handle since that involves patching code at runtime, which makes loading code slower. * Fix tests that assume tail call support on s390x
1 year ago
test_directory_module(out, "tests/misc_testsuite/tail-call", strategy)?;
test_directory_module(out, "tests/misc_testsuite/threads", strategy)?;
Implement the memory64 proposal in Wasmtime (#3153) * Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can&#39;t grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that&#39;s what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we&#39;re going to/from wasm. Overall I&#39;ve tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren&#39;t aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn&#39;t want to enable this by default naturally until we&#39;ve fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it&#39;s implemented with &#34;dynamic&#34; heaps which have a few consequences: * All memory accesses are bounds-checked. I&#39;m not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven&#39;t stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it&#39;s really not all that comprehensive. I&#39;ve tried adding more tests for basic things as I&#39;ve had to implement guards for them, but I wouldn&#39;t really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages
3 years ago
test_directory_module(out, "tests/misc_testsuite/memory64", strategy)?;
test_directory_module(out, "tests/misc_testsuite/component-model", strategy)?;
Function references (#5288) * Make wasmtime-types type check * Make wasmtime-environ type check. * Make wasmtime-runtime type check * Make cranelift-wasm type check * Make wasmtime-cranelift type check * Make wasmtime type check * Make wasmtime-wast type check * Make testsuite compile * Address Luna&#39;s comments * Restore compatibility with effect-handlers/wasm-tools#func-ref-2 * Add function refs feature flag; support testing * Provide function references support in helpers - Always support Index in blocktypes - Support Index as table type by pretending to be Func - Etc * Implement ref.as_non_null * Add br_on_null * Update Cargo.lock to use wasm-tools with peek This will ultimately be reverted when we refer to wasm-tools#function-references, which doesn&#39;t have peek, but does have type annotations on CallRef * Add call_ref * Support typed function references in ref.null * Implement br_on_non_null * Remove extraneous flag; default func refs false * Use IndirectCallToNull trap code for call_ref * Factor common call_indirect / call_ref into a fn * Remove copypasta clippy attribute / format * Add a some more tests for typed table instructions There certainly need to be many more, but this at least catches the bugs fixed in the next commit * Fix missing typed cases for table_grow, table_fill * Document trap code; remove answered question * Mark wasm-tools to wasmtime reftype infallible * Fix reversed conditional * Scope externref/funcref shorthands within WasmRefType * Merge with upstream * Make wasmtime compile again * Fix warnings * Remove Bot from the type algebra * Fix table tests. `wast::Cranelift::spec::function_references::table` `wast::Cranelift::spec::function_references::table_pooling` * Fix table{get,set} tests. ``` wast::Cranelift::misc::function_references::table_get wast::Cranelift::misc::function_references::table_get_pooling wast::Cranelift::misc::function_references::table_set wast::Cranelift::misc::function_references::table_set_pooling ``` * Insert subtype check to fix local_get tests. ``` wast::Cranelift::spec::function_references::local_get wast::Cranelift::spec::function_references::local_get_pooling ``` * Fix compilation of `br_on_non_null`. The branch destinations were the other way round... :-) Fixes the following test failures: ``` wast::Cranelift::spec::function_references::br_on_non_null wast::Cranelift::spec::function_references::br_on_non_null_pooling ``` * Fix ref_as_non_null tests. The test was failing due to the wrong error message being printed. As per upstream folks&#39; suggest we were using the trap code `IndirectCallToNull`, but it produces an unexpected error message. This commit reinstates the `NullReference` trap code. It produces the expected error message. We will have to chat with the maintainers upstream about how to handle these &#34;test failures&#34;. Fixes the following test failures: ``` wast::Cranelift::spec::function_references::ref_as_non_null wast::Cranelift::spec::function_references::ref_as_non_null_pooling ``` * Fix a call_ref regression. * Fix global tests. Extend `is_matching_assert_invalid_error_message` to circumvent the textual error message failure. Fixes the following test failures: ``` wast::Cranelift::spec::function_references::global wast::Cranelift::spec::function_references::global_pooling ``` * Cargo update * Update * Spell out some cases in match_val * Disgusting hack to subvert limitations of type reconstruction. In the function `wasmtime::values::Val::ty()` attempts to reconstruct the type of its underlying value purely based on the shape of the value. With function references proposal this sort of reconstruction is no longer complete as a source reference type may have been nullable. Nullability is not inferrable by looking at the shape of the runtime object alone. Consequently, the runtime cannot reconstruct the type for `Val::FuncRef` and `Val::ExternRef` by looking at their respective shapes. * Address workflows comments. * null reference =&gt; null_reference for CLIF parsing compliance. * Delete duplicate-loads-dynamic-memory-egraph (again) * Idiomatic code change. * Nullability subtyping + fix non-null storage check. This commit removes the `hacky_eq` check in `func.rs`. Instead it is replaced by a subtype check. This subtype check occurs in `externals.rs` too. This commit also fixes a bug. Previously, it was possible to store a null reference into a non-null table cell. I have added to new test cases for this bug: one for funcrefs and another for externrefs. * Trigger unimplemented for typed function references. Format values.rs * run cargo fmt * Explicitly match on HeapType::Extern. * Address cranelift-related feedback * Remove PartialEq,Eq from ValType, RefType, HeapType. * Pin wasmparser to a fairly recent commit. * Run cargo fmt * Ignore tail call tests. * Remove garbage * Revert changes to wasmtime public API. * Run cargo fmt * Get more CI passing (#19) * Undo Cargo.lock changes * Fix build of cranelift tests * Implement link-time matches relation. Disable tests failing due to lack of public API support. * Run cargo fmt * Run cargo fmt * Initial implementation of eager table initialization * Tidy up eager table initialisation * Cargo fmt * Ignore type-equivalence test * Replace TODOs with descriptive comments. * Various changes found during review (#21) * Clarify a comment This isn&#39;t only used for null references * Resolve a TODO in local init Don&#39;t initialize non-nullable locals to null, instead skip initialization entirely and wasm validation will ensure it&#39;s always initialized in the scope where it&#39;s used. * Clarify a comment and skipping the null check. * Remove a stray comment * Change representation of `WasmHeapType` Use a `SignatureIndex` instead of a `u32` which while not 100% correct should be more correct. This additionally renames the `Index` variant to `TypedFunc` to leave space for future types which aren&#39;t functions to not all go into an `Index` variant. This required updates to Winch because `wasmtime_environ` types can no longer be converted back to their `wasmparser` equivalents. Additionally this means that all type translation needs to go through some form of context to resolve indices which is now encapsulated in a `TypeConvert` trait implemented in various locations. * Refactor table initialization Reduce some duplication and simplify some data structures to have a more direct form of table initialization and a bit more graceful handling of element-initialized tables. Additionally element-initialize tables are now treated the same as if there&#39;s a large element segment initializing them. * Clean up some unrelated chagnes * Simplify Table bindings slightly * Remove a no-longer-needed TODO * Add a FIXME for `SignatureIndex` in `WasmHeapType` * Add a FIXME for panicking on exposing function-references types * Fix a warning on nightly * Fix tests for winch and cranelift * Cargo fmt * Fix arity mismatch in aarch64/abi --------- Co-authored-by: Daniel Hillerström &lt;daniel.hillerstrom@ed.ac.uk&gt; Co-authored-by: Daniel Hillerström &lt;daniel.hillerstrom@huawei.com&gt; Co-authored-by: Alex Crichton &lt;alex@alexcrichton.com&gt;
1 year ago
test_directory_module(out, "tests/misc_testsuite/function-references", strategy)?;
Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196) \### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn&#39;t support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is *much* less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don&#39;t believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I&#39;d like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator&#39;s memory pool be *either* a linear memory or a GC heap. This would unask various capacity planning questions such as &#34;what percent of memory capacity should we dedicate to linear memories vs GC heaps?&#34;. It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the &#34;Indexed Heaps&#34; section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not *purely* code motion (see &#34;Indexed Heaps&#34; below) so it is worth not simply skipping over the DRC collector&#39;s code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use &#34;indexed heaps&#34;. What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap&#39;s contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref&#39;ing GC pointers is equivalent to deref&#39;ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`&#39;s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box&lt;dyn Any&gt;` host object, and accessing that side table is always checked. [^compressed-oops]: See [&#34;Compressed OOPs&#34; in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8&#39;s pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can&#39;t be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn&#39;t be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that&#39;s okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can&#39;t just copy the reference, and instead call an explicit `clone` method (not *the* `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full
7 months ago
test_directory_module(out, "tests/misc_testsuite/gc", strategy)?;
// The testsuite of Winch is a subset of the official
// WebAssembly test suite, until parity is reached. This
// check is in place to prevent Cranelift from duplicating
// tests.
if *strategy == "Winch" {
test_directory_module(out, "tests/misc_testsuite/winch", strategy)?;
}
Ok(())
})?;
with_test_module(&mut out, "spec", |out| {
let spec_tests = test_directory(out, "tests/spec_testsuite", strategy)?;
// Skip running spec_testsuite tests if the submodule isn't checked
// out.
if spec_tests > 0 {
Implement the memory64 proposal in Wasmtime (#3153) * Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can&#39;t grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that&#39;s what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we&#39;re going to/from wasm. Overall I&#39;ve tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren&#39;t aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn&#39;t want to enable this by default naturally until we&#39;ve fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it&#39;s implemented with &#34;dynamic&#34; heaps which have a few consequences: * All memory accesses are bounds-checked. I&#39;m not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven&#39;t stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it&#39;s really not all that comprehensive. I&#39;ve tried adding more tests for basic things as I&#39;ve had to implement guards for them, but I wouldn&#39;t really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages
3 years ago
test_directory_module(out, "tests/spec_testsuite/proposals/memory64", strategy)?;
Function references (#5288) * Make wasmtime-types type check * Make wasmtime-environ type check. * Make wasmtime-runtime type check * Make cranelift-wasm type check * Make wasmtime-cranelift type check * Make wasmtime type check * Make wasmtime-wast type check * Make testsuite compile * Address Luna&#39;s comments * Restore compatibility with effect-handlers/wasm-tools#func-ref-2 * Add function refs feature flag; support testing * Provide function references support in helpers - Always support Index in blocktypes - Support Index as table type by pretending to be Func - Etc * Implement ref.as_non_null * Add br_on_null * Update Cargo.lock to use wasm-tools with peek This will ultimately be reverted when we refer to wasm-tools#function-references, which doesn&#39;t have peek, but does have type annotations on CallRef * Add call_ref * Support typed function references in ref.null * Implement br_on_non_null * Remove extraneous flag; default func refs false * Use IndirectCallToNull trap code for call_ref * Factor common call_indirect / call_ref into a fn * Remove copypasta clippy attribute / format * Add a some more tests for typed table instructions There certainly need to be many more, but this at least catches the bugs fixed in the next commit * Fix missing typed cases for table_grow, table_fill * Document trap code; remove answered question * Mark wasm-tools to wasmtime reftype infallible * Fix reversed conditional * Scope externref/funcref shorthands within WasmRefType * Merge with upstream * Make wasmtime compile again * Fix warnings * Remove Bot from the type algebra * Fix table tests. `wast::Cranelift::spec::function_references::table` `wast::Cranelift::spec::function_references::table_pooling` * Fix table{get,set} tests. ``` wast::Cranelift::misc::function_references::table_get wast::Cranelift::misc::function_references::table_get_pooling wast::Cranelift::misc::function_references::table_set wast::Cranelift::misc::function_references::table_set_pooling ``` * Insert subtype check to fix local_get tests. ``` wast::Cranelift::spec::function_references::local_get wast::Cranelift::spec::function_references::local_get_pooling ``` * Fix compilation of `br_on_non_null`. The branch destinations were the other way round... :-) Fixes the following test failures: ``` wast::Cranelift::spec::function_references::br_on_non_null wast::Cranelift::spec::function_references::br_on_non_null_pooling ``` * Fix ref_as_non_null tests. The test was failing due to the wrong error message being printed. As per upstream folks&#39; suggest we were using the trap code `IndirectCallToNull`, but it produces an unexpected error message. This commit reinstates the `NullReference` trap code. It produces the expected error message. We will have to chat with the maintainers upstream about how to handle these &#34;test failures&#34;. Fixes the following test failures: ``` wast::Cranelift::spec::function_references::ref_as_non_null wast::Cranelift::spec::function_references::ref_as_non_null_pooling ``` * Fix a call_ref regression. * Fix global tests. Extend `is_matching_assert_invalid_error_message` to circumvent the textual error message failure. Fixes the following test failures: ``` wast::Cranelift::spec::function_references::global wast::Cranelift::spec::function_references::global_pooling ``` * Cargo update * Update * Spell out some cases in match_val * Disgusting hack to subvert limitations of type reconstruction. In the function `wasmtime::values::Val::ty()` attempts to reconstruct the type of its underlying value purely based on the shape of the value. With function references proposal this sort of reconstruction is no longer complete as a source reference type may have been nullable. Nullability is not inferrable by looking at the shape of the runtime object alone. Consequently, the runtime cannot reconstruct the type for `Val::FuncRef` and `Val::ExternRef` by looking at their respective shapes. * Address workflows comments. * null reference =&gt; null_reference for CLIF parsing compliance. * Delete duplicate-loads-dynamic-memory-egraph (again) * Idiomatic code change. * Nullability subtyping + fix non-null storage check. This commit removes the `hacky_eq` check in `func.rs`. Instead it is replaced by a subtype check. This subtype check occurs in `externals.rs` too. This commit also fixes a bug. Previously, it was possible to store a null reference into a non-null table cell. I have added to new test cases for this bug: one for funcrefs and another for externrefs. * Trigger unimplemented for typed function references. Format values.rs * run cargo fmt * Explicitly match on HeapType::Extern. * Address cranelift-related feedback * Remove PartialEq,Eq from ValType, RefType, HeapType. * Pin wasmparser to a fairly recent commit. * Run cargo fmt * Ignore tail call tests. * Remove garbage * Revert changes to wasmtime public API. * Run cargo fmt * Get more CI passing (#19) * Undo Cargo.lock changes * Fix build of cranelift tests * Implement link-time matches relation. Disable tests failing due to lack of public API support. * Run cargo fmt * Run cargo fmt * Initial implementation of eager table initialization * Tidy up eager table initialisation * Cargo fmt * Ignore type-equivalence test * Replace TODOs with descriptive comments. * Various changes found during review (#21) * Clarify a comment This isn&#39;t only used for null references * Resolve a TODO in local init Don&#39;t initialize non-nullable locals to null, instead skip initialization entirely and wasm validation will ensure it&#39;s always initialized in the scope where it&#39;s used. * Clarify a comment and skipping the null check. * Remove a stray comment * Change representation of `WasmHeapType` Use a `SignatureIndex` instead of a `u32` which while not 100% correct should be more correct. This additionally renames the `Index` variant to `TypedFunc` to leave space for future types which aren&#39;t functions to not all go into an `Index` variant. This required updates to Winch because `wasmtime_environ` types can no longer be converted back to their `wasmparser` equivalents. Additionally this means that all type translation needs to go through some form of context to resolve indices which is now encapsulated in a `TypeConvert` trait implemented in various locations. * Refactor table initialization Reduce some duplication and simplify some data structures to have a more direct form of table initialization and a bit more graceful handling of element-initialized tables. Additionally element-initialize tables are now treated the same as if there&#39;s a large element segment initializing them. * Clean up some unrelated chagnes * Simplify Table bindings slightly * Remove a no-longer-needed TODO * Add a FIXME for `SignatureIndex` in `WasmHeapType` * Add a FIXME for panicking on exposing function-references types * Fix a warning on nightly * Fix tests for winch and cranelift * Cargo fmt * Fix arity mismatch in aarch64/abi --------- Co-authored-by: Daniel Hillerström &lt;daniel.hillerstrom@ed.ac.uk&gt; Co-authored-by: Daniel Hillerström &lt;daniel.hillerstrom@huawei.com&gt; Co-authored-by: Alex Crichton &lt;alex@alexcrichton.com&gt;
1 year ago
test_directory_module(
out,
"tests/spec_testsuite/proposals/function-references",
strategy,
)?;
Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196) \### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn&#39;t support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is *much* less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don&#39;t believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I&#39;d like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator&#39;s memory pool be *either* a linear memory or a GC heap. This would unask various capacity planning questions such as &#34;what percent of memory capacity should we dedicate to linear memories vs GC heaps?&#34;. It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the &#34;Indexed Heaps&#34; section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not *purely* code motion (see &#34;Indexed Heaps&#34; below) so it is worth not simply skipping over the DRC collector&#39;s code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use &#34;indexed heaps&#34;. What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap&#39;s contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref&#39;ing GC pointers is equivalent to deref&#39;ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`&#39;s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box&lt;dyn Any&gt;` host object, and accessing that side table is always checked. [^compressed-oops]: See [&#34;Compressed OOPs&#34; in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8&#39;s pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can&#39;t be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn&#39;t be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that&#39;s okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can&#39;t just copy the reference, and instead call an explicit `clone` method (not *the* `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full
7 months ago
test_directory_module(out, "tests/spec_testsuite/proposals/gc", strategy)?;
test_directory_module(
out,
"tests/spec_testsuite/proposals/multi-memory",
strategy,
)?;
test_directory_module(out, "tests/spec_testsuite/proposals/threads", strategy)?;
test_directory_module(
out,
"tests/spec_testsuite/proposals/relaxed-simd",
strategy,
)?;
Wasmtime: Add support for Wasm tail calls (#6774) * Wasmtime: Add support for Wasm tail calls This adds the `Config::wasm_tail_call` method and `--wasm-features tail-call` CLI flag to enable the Wasm tail calls proposal in Wasmtime. This PR is mostly just plumbing and enabling tests, since all the prerequisite work (Wasmtime trampoline overhauls and Cranelift tail calls) was completed in earlier pull requests. When Wasm tail calls are enabled, Wasm code uses the `tail` calling convention. The `tail` calling convention is known to cause a 1-7% slow down for regular code that isn&#39;t using tail calls, which is why it isn&#39;t used unconditionally. This involved shepherding `Tunables` through to Wasm signature construction methods. The eventual plan is for the `tail` calling convention to be used unconditionally, but not until the performance regression is addressed. This work is tracked in https://github.com/bytecodealliance/wasmtime/issues/6759 Additionally while our x86-64, aarch64, and riscv64 backends support tail calls, the s390x backend does not support them yet. Attempts to use tail calls on s390x will return errors. Support for s390x is tracked in https://github.com/bytecodealliance/wasmtime/issues/6530 * Store `Tunables` inside the `Compiler` Instead of passing as an argument to every `Compiler` method. * Cranelift: Support &#34;direct&#34; return calls on riscv64 They still use `jalr` instead of `jal` but this allows us to use the `RiscvCall` reloc, which Wasmtime handles. Before we were using `LoadExternalName` which produces an `Abs8` reloc, which Wasmtime intentionally does not handle since that involves patching code at runtime, which makes loading code slower. * Fix tests that assume tail call support on s390x
1 year ago
test_directory_module(out, "tests/spec_testsuite/proposals/tail-call", strategy)?;
} else {
println!(
"cargo:warning=The spec testsuite is disabled. To enable, run `git submodule \
update --remote`."
);
}
Ok(())
})?;
writeln!(out, "}}")?;
}
// Write out our auto-generated tests and opportunistically format them with
// `rustfmt` if it's installed.
let output = out_dir.join("wast_testsuite_tests.rs");
fs::write(&output, out)?;
drop(Command::new("rustfmt").arg(&output).status());
Ok(())
}
Implement the relaxed SIMD proposal (#5892) * Initial support for the Relaxed SIMD proposal This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn&#39;t been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn&#39;t have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they&#39;re now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `*.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. * Tests for relaxed-simd are run both with a default `Engine` as well as a &#34;force deterministic&#34; `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated. * x64: Add x86-specific lowerings for relaxed simd This commit builds on the prior commit and adds an array of `x86_*` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the &#34;deterministic&#34; lowering. * Add libcall support for fma to Wasmtime This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn&#39;t specify the `has_fma` feature. * Ignore relaxed-simd tests on s390x and riscv64 * Enable relaxed-simd tests on s390x * Update cranelift/codegen/meta/src/shared/instructions.rs Co-authored-by: Andrew Brown &lt;andrew.brown@intel.com&gt; * Add a FIXME from review * Add notes about deterministic semantics * Don&#39;t default `has_native_fma` to `true` * Review comments and rebase fixes --------- Co-authored-by: Andrew Brown &lt;andrew.brown@intel.com&gt;
2 years ago
fn test_directory_module(
out: &mut String,
path: impl AsRef<Path>,
strategy: &str,
) -> anyhow::Result<usize> {
let path = path.as_ref();
let testsuite = &extract_name(path);
with_test_module(out, testsuite, |out| test_directory(out, path, strategy))
}
fn test_directory(
out: &mut String,
path: impl AsRef<Path>,
strategy: &str,
) -> anyhow::Result<usize> {
let path = path.as_ref();
let mut dir_entries: Vec<_> = path
.read_dir()
.context(format!("failed to read {:?}", path))?
.map(|r| r.expect("reading testsuite directory entry"))
.filter_map(|dir_entry| {
let p = dir_entry.path();
let ext = p.extension()?;
// Only look at wast files.
if ext != "wast" {
return None;
}
// Ignore files starting with `.`, which could be editor temporary files
if p.file_stem()?.to_str()?.starts_with('.') {
return None;
}
Some(p)
})
.collect();
dir_entries.sort();
let testsuite = &extract_name(path);
for entry in dir_entries.iter() {
write_testsuite_tests(out, entry, testsuite, strategy, false)?;
write_testsuite_tests(out, entry, testsuite, strategy, true)?;
}
Ok(dir_entries.len())
}
/// Extract a valid Rust identifier from the stem of a path.
fn extract_name(path: impl AsRef<Path>) -> String {
path.as_ref()
.file_stem()
.expect("filename should have a stem")
.to_str()
.expect("filename should be representable as a string")
.replace(['-', '/'], "_")
}
fn with_test_module<T>(
out: &mut String,
testsuite: &str,
f: impl FnOnce(&mut String) -> anyhow::Result<T>,
) -> anyhow::Result<T> {
out.push_str("mod ");
out.push_str(testsuite);
out.push_str(" {\n");
let result = f(out)?;
out.push_str("}\n");
Ok(result)
}
fn write_testsuite_tests(
out: &mut String,
path: impl AsRef<Path>,
testsuite: &str,
strategy: &str,
pooling: bool,
) -> anyhow::Result<()> {
let path = path.as_ref();
let testname = extract_name(path);
writeln!(out, "#[test]")?;
// Ignore when using QEMU for running tests (limited memory).
Implement the memory64 proposal in Wasmtime (#3153) * Implement the memory64 proposal in Wasmtime This commit implements the WebAssembly [memory64 proposal][proposal] in both Wasmtime and Cranelift. In terms of work done Cranelift ended up needing very little work here since most of it was already prepared for 64-bit memories at one point or another. Most of the work in Wasmtime is largely refactoring, changing a bunch of `u32` values to something else. A number of internal and public interfaces are changing as a result of this commit, for example: * Acessors on `wasmtime::Memory` that work with pages now all return `u64` unconditionally rather than `u32`. This makes it possible to accommodate 64-bit memories with this API, but we may also want to consider `usize` here at some point since the host can&#39;t grow past `usize`-limited pages anyway. * The `wasmtime::Limits` structure is removed in favor of minimum/maximum methods on table/memory types. * Many libcall intrinsics called by jit code now unconditionally take `u64` arguments instead of `u32`. Return values are `usize`, however, since the return value, if successful, is always bounded by host memory while arguments can come from any guest. * The `heap_addr` clif instruction now takes a 64-bit offset argument instead of a 32-bit one. It turns out that the legalization of `heap_addr` already worked with 64-bit offsets, so this change was fairly trivial to make. * The runtime implementation of mmap-based linear memories has changed to largely work in `usize` quantities in its API and in bytes instead of pages. This simplifies various aspects and reflects that mmap-memories are always bound by `usize` since that&#39;s what the host is using to address things, and additionally most calculations care about bytes rather than pages except for the very edge where we&#39;re going to/from wasm. Overall I&#39;ve tried to minimize the amount of `as` casts as possible, using checked `try_from` and checked arithemtic with either error handling or explicit `unwrap()` calls to tell us about bugs in the future. Most locations have relatively obvious things to do with various implications on various hosts, and I think they should all be roughly of the right shape but time will tell. I mostly relied on the compiler complaining that various types weren&#39;t aligned to figure out type-casting, and I manually audited some of the more obvious locations. I suspect we have a number of hidden locations that will panic on 32-bit hosts if 64-bit modules try to run there, but otherwise I think we should be generally ok (famous last words). In any case I wouldn&#39;t want to enable this by default naturally until we&#39;ve fuzzed it for some time. In terms of the actual underlying implementation, no one should expect memory64 to be all that fast. Right now it&#39;s implemented with &#34;dynamic&#34; heaps which have a few consequences: * All memory accesses are bounds-checked. I&#39;m not sure how aggressively Cranelift tries to optimize out bounds checks, but I suspect not a ton since we haven&#39;t stressed this much historically. * Heaps are always precisely sized. This means that every call to `memory.grow` will incur a `memcpy` of memory from the old heap to the new. We probably want to at least look into `mremap` on Linux and otherwise try to implement schemes where dynamic heaps have some reserved pages to grow into to help amortize the cost of `memory.grow`. The memory64 spec test suite is scheduled to now run on CI, but as with all the other spec test suites it&#39;s really not all that comprehensive. I&#39;ve tried adding more tests for basic things as I&#39;ve had to implement guards for them, but I wouldn&#39;t really consider the testing adequate from just this PR itself. I did try to take care in one test to actually allocate a 4gb+ heap and then avoid running that in the pooling allocator or in emulation because otherwise that may fail or take excessively long. [proposal]: https://github.com/WebAssembly/memory64/blob/master/proposals/memory64/Overview.md * Fix some tests * More test fixes * Fix wasmtime tests * Fix doctests * Revert to 32-bit immediate offsets in `heap_addr` This commit updates the generation of addresses in wasm code to always use 32-bit offsets for `heap_addr`, and if the calculated offset is bigger than 32-bits we emit a manual add with an overflow check. * Disable memory64 for spectest fuzzing * Fix wrong offset being added to heap addr * More comments! * Clarify bytes/pages
3 years ago
if ignore(testsuite, &testname, strategy) {
writeln!(out, "#[ignore]")?;
Run some tests in MIRI on CI (#6332) * Run some tests in MIRI on CI This commit is an implementation of getting at least chunks of Wasmtime to run in MIRI on CI. The full test suite is not possible to run in MIRI because MIRI cannot run Cranelift-produced code at runtime (aka it doesn&#39;t support JITs). Running MIRI, however, is still quite valuable if we can manage it because it would have trivially detected GHSA-ch89-5g45-qwc7, our most recent security advisory. The goal of this PR is to select a subset of the test suite to execute in CI under MIRI and boost our confidence in the copious amount of `unsafe` code in Wasmtime&#39;s runtime. Under MIRI&#39;s default settings, which is to use the [Stacked Borrows][stacked] model, much of the code in `Instance` and `VMContext` is considered invalid. Under the optional [Tree Borrows][tree] model, however, this same code is accepted. After some [extremely helpful discussion][discuss] on the Rust Zulip my current conclusion is that what we&#39;re doing is not fundamentally un-sound but we need to model it in a different way. This PR, however, uses the Tree Borrows model for MIRI to get something onto CI sooner rather than later, and I hope to follow this up with something that passed Stacked Borrows. Additionally that&#39;ll hopefully make this diff smaller and easier to digest. Given all that, the end result of this PR is to get 131 separate unit tests executing on CI. These unit tests largely exercise the embedding API where wasm function compilation is not involved. Some tests compile wasm functions but don&#39;t run them, but compiling wasm through Cranelift in MIRI is so slow that it doesn&#39;t seem worth it at this time. This does mean that there&#39;s a pretty big hole in MIRI&#39;s test coverage, but that&#39;s to be expected as we&#39;re a JIT compiler after all. To get tests working in MIRI this PR uses a number of strategies: * When platform-specific code is involved there&#39;s now `#[cfg(miri)]` for MIRI&#39;s version. For example there&#39;s a custom-built &#34;mmap&#34; just for MIRI now. Many of these are simple noops, some are `unimplemented!()` as they shouldn&#39;t be reached, and some are slightly nontrivial implementations such as mmaps and trap handling (for native-to-native function calls). * Many test modules are simply excluded via `#![cfg(not(miri))]` at the top of the file. This excludes the entire module&#39;s worth of tests from MIRI. Other modules have `#[cfg_attr(miri, ignore)]` annotations to ignore tests by default on MIRI. The latter form is used in modules where some tests work and some don&#39;t. This means that all future test additions will need to be effectively annotated whether they work in MIRI or not. My hope though is that there&#39;s enough precedent in the test suite of what to do to not cause too much burden. * A number of locations are fixed with respect to MIRI&#39;s analysis. For example `ComponentInstance`, the component equivalent of `wasmtime_runtime::Instance`, was actually left out from the fix for the CVE by accident. MIRI dutifully highlighted the issues here and I&#39;ve fixed them locally. Some locations fixed for MIRI are changed to something that looks similar but is subtly different. For example retaining items in a `Store&lt;T&gt;` is now done with a Wasmtime-specific `StoreBox&lt;T&gt;` type. This is because, with MIRI&#39;s analyses, moving a `Box&lt;T&gt;` invalidates all pointers derived from this `Box&lt;T&gt;`. We don&#39;t want these semantics, so we effectively have a custom `Box&lt;T&gt;` to suit our needs in this regard. * Some default configuration is different under MIRI. For example most linear memories are dynamic with no guards and no space reserved for growth. Settings such as parallel compilation are disabled. These are applied to make MIRI &#34;work by default&#34; in more places ideally. Some tests which perform N iterations of something perform fewer iterations on MIRI to not take quite so long. This PR is not intended to be a one-and-done-we-never-look-at-it-again kind of thing. Instead this is intended to lay the groundwork to continuously run MIRI in CI to catch any soundness issues. This feels, to me, overdue given the amount of `unsafe` code inside of Wasmtime. My hope is that over time we can figure out how to run Wasm in MIRI but that may take quite some time. Regardless this will be adding nontrivial maintenance work to contributors to Wasmtime. MIRI will be run on CI for merges, MIRI will have test failures when everything else passes, MIRI&#39;s errors will need to be deciphered by those who have probably never run MIRI before, things like that. Despite all this to me it seems worth the cost at this time. Just getting this running caught two possible soundness bugs in the component implementation that could have had a real-world impact in the future! [stacked]: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md [tree]: https://perso.crans.org/vanille/treebor/ [discuss]: https://rust-lang.zulipchat.com/#narrow/stream/269128-miri/topic/Tree.20vs.20Stacked.20Borrows.20.26.20a.20debugging.20question * Update alignment comment
2 years ago
} else {
writeln!(out, "#[cfg_attr(miri, ignore)]")?;
}
writeln!(
out,
"fn r#{}{}() {{",
&testname,
if pooling { "_pooling" } else { "" }
)?;
writeln!(out, " let _ = env_logger::try_init();")?;
writeln!(
out,
" crate::wast::run_wast(r#\"{}\"#, crate::wast::Strategy::{}, {}).unwrap();",
path.display(),
strategy,
pooling,
)?;
writeln!(out, "}}")?;
writeln!(out)?;
Ok(())
}
/// Ignore tests that aren't supported yet.
fn ignore(testsuite: &str, testname: &str, strategy: &str) -> bool {
assert!(strategy == "Cranelift" || strategy == "Winch");
winch: Add support for WebAssembly loads/stores (#7894) * winch: Add support for WebAssembly loads/stores Closes https://github.com/bytecodealliance/wasmtime/issues/6529 This patch adds support for all the instructions involving WebAssembly loads and stores for 32-bit memories. Given that the `memory64` proposal is not enabled by default, this patch doesn&#39;t include an implementation/tests for it; in theory minimal tweaks to the currrent implementation will be needed in order to support 64-bit memories. Implemenation-wise, this change, follows a similar pattern as Cranelift in order to calculate addresses for dynamic/static heaps, the main difference being that in some cases, doing less work at compile time is preferred; the current implemenation only checks for the general case of out-of-bounds access for dynamic heaps for example. Another important detail regarding the implementation, is the introduction of `MacroAssembler::wasm_load` and `MacroAssembler::wasm_store`, which internally use a common implemenation for loads and stores, with the only difference that the `wasm_*` variants set the right flags in order to signal that these operations are not trusted and might trap. Finally, given that this change introduces support for the last set of instructions missing for a Wasm MVP, it removes most of Winch&#39;s copy of the spectest suite, and switches over to using the official test suite where possible (for tests that don&#39;t use SIMD or Reference Types). Follow-up items: * Before doing any deep benchmarking I&#39;m planning on landing a couple of improvements regarding compile times that I&#39;ve identified in parallel to this change. * The `imports.wast` tests are disabled because I&#39;ve identified a bug with `call_indirect`, which is not related to this change and exists in main. * Find a way to run the `tests/all/memory.rs` (or perhaps most of integration tests) with Winch. -- prtest:full * Review comments
9 months ago
// Ignore some tests for when testing Winch.
if strategy == "Winch" {
winch(x64): Call indirect (#7100) * winch(x64): Call indirect This change adds support for the `call_indirect` instruction to Winch. Libcalls are a pre-requisite for supporting `call_indirect` in order to lazily initialy funcrefs. This change adds support for libcalls to Winch by introducing a `BuiltinFunctions` struct similar to Cranelift&#39;s `BuiltinFunctionSignatures` struct. In general, libcalls are handled like any other function call, with the only difference that given that not all the information to fulfill the function call might be known up-front, control is given to the caller for finalizing the call. The introduction of function references also involves dealing with pointer-sized loads and stores, so this change also adds the required functionality to `FuncEnv` and `MacroAssembler` to be pointer aware, making it straight forward to derive an `OperandSize` or `WasmType` from the target&#39;s pointer size. Finally, given the complexity of the call_indirect instrunction, this change bundles an improvement to the register allocator, allowing it to track the allocatable vs non-allocatable registers, this is done to avoid any mistakes when allocating/de-allocating registers that are not alloctable. -- prtest:full * Address review comments * Fix typos * Better documentation for `new_unchecked` * Introduce `max` for `BitSet` * Make allocatable property `u64` * winch(calls): Overhaul `FnCall` This commit simplifies `FnCall`&#39;s interface making its usage more uniform throughout the compiler. In summary, this change: * Avoids side effects in the `FnCall::new` constructor, and also makes it the only constructor. * Exposes `FnCall::save_live_registers` and `FnCall::calculate_call_stack_space` to calculate the stack space consumed by the call and so that the caller can decide which one to use at callsites depending on their use-case. * tests: Fix regset tests
1 year ago
if testsuite == "misc_testsuite" {
let denylist = [
"externref_id_function",
"int_to_float_splat",
"issue6562",
"many_table_gets_lead_to_gc",
"mutable_externref_globals",
"no_mixup_stack_maps",
"no_panic",
"simple_ref_is_null",
"table_grow_with_funcref",
];
return denylist.contains(&testname);
winch(x64): Call indirect (#7100) * winch(x64): Call indirect This change adds support for the `call_indirect` instruction to Winch. Libcalls are a pre-requisite for supporting `call_indirect` in order to lazily initialy funcrefs. This change adds support for libcalls to Winch by introducing a `BuiltinFunctions` struct similar to Cranelift&#39;s `BuiltinFunctionSignatures` struct. In general, libcalls are handled like any other function call, with the only difference that given that not all the information to fulfill the function call might be known up-front, control is given to the caller for finalizing the call. The introduction of function references also involves dealing with pointer-sized loads and stores, so this change also adds the required functionality to `FuncEnv` and `MacroAssembler` to be pointer aware, making it straight forward to derive an `OperandSize` or `WasmType` from the target&#39;s pointer size. Finally, given the complexity of the call_indirect instrunction, this change bundles an improvement to the register allocator, allowing it to track the allocatable vs non-allocatable registers, this is done to avoid any mistakes when allocating/de-allocating registers that are not alloctable. -- prtest:full * Address review comments * Fix typos * Better documentation for `new_unchecked` * Introduce `max` for `BitSet` * Make allocatable property `u64` * winch(calls): Overhaul `FnCall` This commit simplifies `FnCall`&#39;s interface making its usage more uniform throughout the compiler. In summary, this change: * Avoids side effects in the `FnCall::new` constructor, and also makes it the only constructor. * Exposes `FnCall::save_live_registers` and `FnCall::calculate_call_stack_space` to calculate the stack space consumed by the call and so that the caller can decide which one to use at callsites depending on their use-case. * tests: Fix regset tests
1 year ago
}
if testsuite == "spec_testsuite" {
winch: Add support for WebAssembly loads/stores (#7894) * winch: Add support for WebAssembly loads/stores Closes https://github.com/bytecodealliance/wasmtime/issues/6529 This patch adds support for all the instructions involving WebAssembly loads and stores for 32-bit memories. Given that the `memory64` proposal is not enabled by default, this patch doesn&#39;t include an implementation/tests for it; in theory minimal tweaks to the currrent implementation will be needed in order to support 64-bit memories. Implemenation-wise, this change, follows a similar pattern as Cranelift in order to calculate addresses for dynamic/static heaps, the main difference being that in some cases, doing less work at compile time is preferred; the current implemenation only checks for the general case of out-of-bounds access for dynamic heaps for example. Another important detail regarding the implementation, is the introduction of `MacroAssembler::wasm_load` and `MacroAssembler::wasm_store`, which internally use a common implemenation for loads and stores, with the only difference that the `wasm_*` variants set the right flags in order to signal that these operations are not trusted and might trap. Finally, given that this change introduces support for the last set of instructions missing for a Wasm MVP, it removes most of Winch&#39;s copy of the spectest suite, and switches over to using the official test suite where possible (for tests that don&#39;t use SIMD or Reference Types). Follow-up items: * Before doing any deep benchmarking I&#39;m planning on landing a couple of improvements regarding compile times that I&#39;ve identified in parallel to this change. * The `imports.wast` tests are disabled because I&#39;ve identified a bug with `call_indirect`, which is not related to this change and exists in main. * Find a way to run the `tests/all/memory.rs` (or perhaps most of integration tests) with Winch. -- prtest:full * Review comments
9 months ago
let denylist = [
"br_table",
"global",
"table_fill",
"table_get",
"table_set",
"table_grow",
"table_size",
"elem",
"select",
"unreached_invalid",
"linking",
]
.contains(&testname);
let ref_types = testname.starts_with("ref_");
let simd = testname.starts_with("simd_");
return denylist || ref_types || simd;
}
if testsuite == "memory64" {
return testname.starts_with("simd") || testname.starts_with("threads");
}
if testsuite != "winch" {
return true;
}
}
// This is an empty file right now which the `wast` crate doesn't parse
if testname.contains("memory_copy1") {
return true;
}
Add `GcRuntime` and `GcCompiler` traits; `i31ref` support (#8196) \### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn&#39;t support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is *much* less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don&#39;t believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I&#39;d like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator&#39;s memory pool be *either* a linear memory or a GC heap. This would unask various capacity planning questions such as &#34;what percent of memory capacity should we dedicate to linear memories vs GC heaps?&#34;. It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the &#34;Indexed Heaps&#34; section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not *purely* code motion (see &#34;Indexed Heaps&#34; below) so it is worth not simply skipping over the DRC collector&#39;s code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use &#34;indexed heaps&#34;. What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap&#39;s contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref&#39;ing GC pointers is equivalent to deref&#39;ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`&#39;s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box&lt;dyn Any&gt;` host object, and accessing that side table is always checked. [^compressed-oops]: See [&#34;Compressed OOPs&#34; in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8&#39;s pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can&#39;t be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn&#39;t be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that&#39;s okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can&#39;t just copy the reference, and instead call an explicit `clone` method (not *the* `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full
7 months ago
if testsuite == "gc" {
if [
"array_copy",
"array_fill",
"array_init_data",
"array_init_elem",
"array",
"binary_gc",
"binary",
"br_on_cast_fail",
"br_on_cast",
"br_on_non_null",
"br_on_null",
"br_table",
"call_ref",
"data",
"elem",
"extern",
"func",
"global",
"if",
"linking",
"local_get",
"local_init",
"ref_as_non_null",
"ref_cast",
"ref_eq",
"ref_is_null",
"ref_null",
"ref_test",
"ref",
"return_call_indirect",
"return_call_ref",
"return_call",
"select",
"struct",
"table_sub",
"table",
"type_canon",
"type_equivalence",
"type_rec",
"type_subtyping",
"unreached_invalid",
"unreached_valid",
]
.contains(&testname)
{
return true;
}
}
match env::var("CARGO_CFG_TARGET_ARCH").unwrap().as_str() {
"s390x" => {
// TODO(#6530): These tests require tail calls, but s390x
// doesn't support them yet.
testsuite == "function_references" || testsuite == "tail_call"
}
_ => false,
}
}
fn set_commit_info_for_rustc() {
if !Path::new(".git").exists() {
return;
}
let output = match Command::new("git")
.arg("log")
.arg("-1")
.arg("--date=short")
.arg("--format=%H %h %cd")
.arg("--abbrev=9")
.output()
{
Ok(output) if output.status.success() => output,
_ => return,
};
let stdout = String::from_utf8(output.stdout).unwrap();
let mut parts = stdout.split_whitespace();
let mut next = || parts.next().unwrap();
println!("cargo:rustc-env=WASMTIME_GIT_HASH={}", next());
println!(
"cargo:rustc-env=WASMTIME_VERSION_INFO={} ({} {})",
env!("CARGO_PKG_VERSION"),
next(),
next()
);
}