Tree:
4005a813e8
cfallin/lucet-pr612-base
fitzgen-patch-1
main
pch/bound_tcp_userland_buffer
pch/bump_wasm_tools_210
pch/cli_wasi_legacy
pch/component_call_hooks
pch/resource_table
pch/resource_table_2
pch/upstream_wave
release-0.32.0
release-0.33.0
release-0.34.0
release-0.35.0
release-0.36.0
release-0.37.0
release-0.38.0
release-0.39.0
release-0.40.0
release-1.0.0
release-10.0.0
release-11.0.0
release-12.0.0
release-13.0.0
release-14.0.0
release-15.0.0
release-16.0.0
release-17.0.0
release-18.0.0
release-19.0.0
release-2.0.0
release-20.0.0
release-21.0.0
release-22.0.0
release-23.0.0
release-24.0.0
release-3.0.0
release-4.0.0
release-5.0.0
release-6.0.0
release-7.0.0
release-8.0.0
release-9.0.0
revert-9191-trevor/upgrade-regalloc
revert-union-find
stable-v0.26
trevor/fuzz-pcc
trevor/hyper-rc4
trevor/io-error-interface
0.2.0
0.3.0
cranelift-v0.31.0
cranelift-v0.32.0
cranelift-v0.33.0
cranelift-v0.34.0
cranelift-v0.35.0
cranelift-v0.36.0
cranelift-v0.37.0
cranelift-v0.39.0
cranelift-v0.40.0
cranelift-v0.41.0
cranelift-v0.42.0
cranelift-v0.43.0
cranelift-v0.43.1
cranelift-v0.44.0
cranelift-v0.45.0
cranelift-v0.46.0
cranelift-v0.46.1
cranelift-v0.60.0
cranelift-v0.61.0
cranelift-v0.62.0
cranelift-v0.69.0
dev
filecheck-v0.0.1
minimum-viable-wasi-proxy-serve
v0.10.0
v0.11.0
v0.12.0
v0.15.0
v0.16.0
v0.17.0
v0.18.0
v0.19.0
v0.2.0
v0.20.0
v0.21.0
v0.22.0
v0.22.1
v0.23.0
v0.24.0
v0.25.0
v0.26.0
v0.26.1
v0.27.0
v0.28.0
v0.29.0
v0.3.0
v0.30.0
v0.31.0
v0.32.0
v0.32.1
v0.33.0
v0.33.1
v0.34.0
v0.34.1
v0.34.2
v0.35.0
v0.35.1
v0.35.2
v0.35.3
v0.36.0
v0.37.0
v0.38.0
v0.38.1
v0.38.2
v0.38.3
v0.39.0
v0.39.1
v0.4.0
v0.40.0
v0.40.1
v0.6.0
v0.8.0
v0.9.0
v1.0.0
v1.0.1
v1.0.2
v10.0.0
v10.0.1
v10.0.2
v11.0.0
v11.0.1
v11.0.2
v12.0.0
v12.0.1
v12.0.2
v13.0.0
v13.0.1
v14.0.0
v14.0.1
v14.0.2
v14.0.3
v14.0.4
v15.0.0
v15.0.1
v16.0.0
v17.0.0
v17.0.1
v17.0.2
v17.0.3
v18.0.0
v18.0.1
v18.0.2
v18.0.3
v18.0.4
v19.0.0
v19.0.1
v19.0.2
v2.0.0
v2.0.1
v2.0.2
v20.0.0
v20.0.1
v20.0.2
v21.0.0
v21.0.1
v22.0.0
v23.0.0
v23.0.1
v23.0.2
v24.0.0
v3.0.0
v3.0.1
v4.0.0
v4.0.1
v5.0.0
v5.0.1
v6.0.0
v6.0.1
v6.0.2
v7.0.0
v7.0.1
v8.0.0
v8.0.1
v9.0.0
v9.0.1
v9.0.2
v9.0.3
v9.0.4
${ noResults }
44 Commits (4005a813e81e7f1423598d873fdb9a07919e54e9)
Author | SHA1 | Message | Date |
---|---|---|---|
Hamir Mahal |
46098121bc
|
style: simplify string formatting (#9047)
* style: simplify string formatting * fix: formatting in `benches/call.rs` |
3 months ago |
Alex Crichton |
05095c1868
|
Rename the `wasm32-wasi` target to `wasm32-wasip1` (#8867)
This rename is happening in upstream Rust and should be in enough places now. prtest:full |
3 months ago |
Nick Fitzgerald |
f2e689cd86
|
Introduce `wasmtime::StructRef` and allocating Wasm GC structs (#8933)
* Introduce `wasmtime::StructRef` and allocating Wasm GC structs This commit introduces the `wasmtime::StructRef` type and support for allocating Wasm GC structs from the host. This commit does *not* add support for the `struct.new` family of Wasm instructions; guests still cannot allocate Wasm GC objects yet, but initial support should be pretty straightforward after this commit lands. The `StructRef` type has everything you expect from other value types in the `wasmtime` crate: * A method to get its type or check whether it matches a given type * An implementation of `WasmTy` so that it can be used with `Func::wrap`-style APIs * The ability to upcast it into an `AnyRef` and to do checked downcasts in the opposite direction There are, additionally, methods for getting, setting, and enumerating a `StructRef`'s fields. To allocate a `StructRef`, we need proof that the struct type we are allocating is being kept alive for the duration that the allocation may live. This is required for many reasons, but a basic example is getting a struct instance's type from the embedder API: this does a type-index-to-`StructType` lookup and conversion and if the type wasn't kept alive, then the type-index lookup will result in what is logically a use-after-free bug. This won't be a problem for Wasm guests (when we get around to implementing allocation for them) since their module defines the type, the store holds onto its instances' modules, and the allocation cannot outlive the store. For the host, we need another method of keeping the object's type alive, since it might be that the host defined the type and there is no module that also defined it, let alone such a module that is being kept alive in the store. The solution to the struct-type-lifetime problem that this commit implements for hosts is for the store to hold a hash set of `RegisteredType`s specifically for objects which were allocated via the embedder API. But we also don't want to do a hash lookup on every allocation, so we also implement a `StructRefPre` type. A `StructRefPre` is proof that the embedder has inserted a `StructType`'s inner `RegisteredType` into a store. Structurally, it is a pair of the struct type and a store id. All `StructRef` allocation methods require a `StructRefPre` argument, which does a fast store id check, rather than a whole hash table insertion. I opted to require `StructRefPre` in all allocation cases -- even though this has the downside of always forcing callers to create one before they allocate, even if they are only allocating a single object -- because of two reasons. First, this avoids needing to define duplicate methods, with and without a `StructRefPre` argument. Second, this avoids a performance footgun in the API where users don't realize that they *can* avoid extra work by creating a single `StructRefPre` and then using it multiple times. Anecdotally, I've heard multiple people complain about instantiation being slower than advertised but it turns out they weren't using `InstancePre`, and I'd like to avoid that situation for allocation if we can. * Move `allow(missing_docs)` up to `gc::disabled` module instead of each `impl` * Rename `cast` to `unchecked_cast` * fix `GcHeapOutOfMemory` error example in doc example * document additional error case for `StructRef::new` * Use `unpack` method instead of open-coding it * deallocate on failed initialization * Refactor field access methods to share more code And define `fields()` in terms of `field()` rather than the other way around. * Add upcast methods from structref to anyref * Remove duplicate type checking and add clarifying comments about initializing vs writing fields * make the `PodValType` trait safe * fix benchmarks build * prtest:full * add miri ignores to new tests that call into wasm |
4 months ago |
Shane Snover |
05fe62829e
|
Refactor wasmtime::Func to "unsplat" arguments for the async API (#8732)
* Complete implementation in wasmtime * Get the impl of IntoFunc to point at the new HostContext from_closure method to tie it back to the new implementation * A little bit of cleanup to comments and naming * Update doc comment for Func wrap_async |
5 months ago |
Alex Crichton |
f676c1760d
|
Enable rustc's `unused-lifetimes` lint (#8711)
* Enable rustc's `unused-lifetimes` lint This is allow-by-default doesn't seem to have any false positives in Wasmtime's codebase so enable it by default to help clean up vestiges of old refactorings. * Remove another unused lifetime * Remove another unused lifetime |
5 months ago |
Alex Crichton |
906ea01734
|
Use bytes for maximum size of linear memory with pooling (#8628)
* Use bytes for maximum size of linear memory with pooling This commit changes configuration of the pooling allocator to use a byte-based unit rather than a page based unit. The previous `PoolingAllocatorConfig::memory_pages` configuration option configures the maximum size that a linear memory may grow to at runtime. This is an important factor in calculation of stripes for MPK and is also a coarse-grained knob apart from `StoreLimiter` to limit memory consumption. This configuration option has been renamed to `max_memory_size` and documented that it's in terms of bytes rather than pages as before. Additionally the documented constraint of `max_memory_size` must be smaller than `static_memory_bound` is now additionally enforced as a minor clean-up as part of this PR as well. * Review comments * Fix benchmark build |
6 months ago |
Nick Fitzgerald |
e1f8b9b74b
|
Wasmtime(pooling allocator): Batch decommits (#8590)
This introduces a `DecommitQueue` for batching decommits together in the pooling allocator: * Deallocating a memory/table/stack enqueues their associated regions of memory for decommit; it no longer immediately returns the associated slot to the pool's free list. If the queue's length has reached the configured batch size, then we flush the queue by running all the decommits, and finally returning the memory/table/stack slots to their respective pools and free lists. * Additionally, if allocating a new memory/table/stack fails because the free list is empty (aka we've reached the max concurrently-allocated limit for this entity) then we fall back to a slow path before propagating the error. This slow path flushes the decommit queue and then retries allocation, hoping that the queue flush reclaimed slots and made them available for this fallback allocation attempt. This involved defining a new `PoolConcurrencyLimitError` to match on, which is also exposed in the public embedder API. It is also worth noting that we *always* use this new decommit queue now. To keep the existing behavior, where e.g. a memory's decommits happen immediately on deallocation, you can use a batch size of one. This effectively disables queueing, forcing all decommits to be flushed immediately. The default decommit batch size is one. This commit, with batch size of one, consistently gives me an increase on `wasmtime serve`'s requests-per-second versus its parent commit, as measured by `benches/wasmtime-serve-rps.sh`. I get ~39K RPS on this commit compared to ~35K RPS on the parent commit. This is quite puzzling to me. I was expecting no change, and hoping there wouldn't be a regression. I was not expecting a speed up. I cannot explain this result at this time. prtest:full Co-authored-by: Jamey Sharp <jsharp@fastly.com> |
6 months ago |
Alex Crichton |
6a710b92d8
|
Remove type information from dynamic component funcs (#8070)
* Remove type information from dynamic component funcs This commit removes the `&Component` argument from the `component::Linker::func_new` API. This is inspired by #8062 where `Val` holds less type information as well in addition to the realization that type-checking happens at runtime rather than instantiation time. This argument was originally added to mirror `wasmtime::Linker::func_new` which takes a type argument of the core wasm function that's being defined. Unlike core wasm, though, component functions already have to carry along their type information as part of function calls to handle resources correctly. This means that when a host function is invoked the type is already known of all the parameters and results. Additionally values are already required to be type-checked going back into wasm, so there's less of a need to perform an additional type-check up front. The main consequence of this commit is that it's a bit more difficult for embeddings to know what the expected types of results are. No type information is provided when a host function is defined, not even function arity. This means that when the host function is invoked it may not know how many results are expected to be produced and of what type. Typically though a bindings generator is used somewhere along the way so that's expected to alleviate this issue. Finally my hope is to enhance this "dynamic" API in the future with a bit more information so the type information is more readily accessible at runtime. For now though hosts will have to "simply know what to do". * Update crates/wasmtime/src/runtime/component/linker.rs Co-authored-by: Joel Dice <joel.dice@fermyon.com> * Fix doc links * Fix component call benchmarks --------- Co-authored-by: Joel Dice <joel.dice@fermyon.com> |
8 months ago |
Nick Fitzgerald |
bd2ea901d3
|
Define garbage collection rooting APIs (#8011)
* Define garbage collection rooting APIs Rooting prevents GC objects from being collected while they are actively being used. We have a few sometimes-conflicting goals with our GC rooting APIs: 1. Safety: It should never be possible to get a use-after-free bug because the user misused the rooting APIs, the collector "mistakenly" determined an object was unreachable and collected it, and then the user tried to access the object. This is our highest priority. 2. Moving GC: Our rooting APIs should moving collectors (such as generational and compacting collectors) where an object might get relocated after a collection and we need to update the GC root's pointer to the moved object. This means we either need cooperation and internal mutability from individual GC roots as well as the ability to enumerate all GC roots on the native Rust stack, or we need a level of indirection. 3. Performance: Our rooting APIs should generally be as low-overhead as possible. They definitely shouldn't require synchronization and locking to create, access, and drop GC roots. 4. Ergonomics: Our rooting APIs should be, if not a pleasure, then at least not a burden for users. Additionally, the API's types should be `Sync` and `Send` so that they work well with async Rust. For example, goals (3) and (4) are in conflict when we think about how to support (2). Ideally, for ergonomics, a root would automatically unroot itself when dropped. But in the general case that requires holding a reference to the store's root set, and that root set needs to be held simultaneously by all GC roots, and they each need to mutate the set to unroot themselves. That implies `Rc<RefCell<...>>` or `Arc<Mutex<...>>`! The former makes the store and GC root types not `Send` and not `Sync`. The latter imposes synchronization and locking overhead. So we instead make GC roots indirect and require passing in a store context explicitly to unroot in the general case. This trades worse ergonomics for better performance and support for moving GC and async Rust. Okay, with that out of the way, this module provides two flavors of rooting API. One for the common, scoped lifetime case, and another for the rare case where we really need a GC root with an arbitrary, non-LIFO/non-scoped lifetime: 1. `RootScope` and `Rooted<T>`: These are used for temporarily rooting GC objects for the duration of a scope. Upon exiting the scope, they are automatically unrooted. The internal implementation takes advantage of the LIFO property inherent in scopes, making creating and dropping `Rooted<T>`s and `RootScope`s super fast and roughly equivalent to bump allocation. This type is vaguely similar to V8's [`HandleScope`]. [`HandleScope`]: https://v8.github.io/api/head/classv8_1_1HandleScope.html Note that `Rooted<T>` can't be statically tied to its context scope via a lifetime parameter, unfortunately, as that would allow the creation and use of only one `Rooted<T>` at a time, since the `Rooted<T>` would take a borrow of the whole context. This supports the common use case for rooting and provides good ergonomics. 2. `ManuallyRooted<T>`: This is the fully general rooting API used for holding onto non-LIFO GC roots with arbitrary lifetimes. However, users must manually unroot them. Failure to manually unroot a `ManuallyRooted<T>` before it is dropped will result in the GC object (and everything it transitively references) leaking for the duration of the `Store`'s lifetime. This type is roughly similar to SpiderMonkey's [`PersistentRooted<T>`], although they avoid the manual-unrooting with internal mutation and shared references. (Our constraints mean we can't do those things, as mentioned explained above.) [`PersistentRooted<T>`]: http://devdoc.net/web/developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/JSAPI_reference/JS::PersistentRooted.html At the end of the day, both `Rooted<T>` and `ManuallyRooted<T>` are just tagged indices into the store's `RootSet`. This indirection allows working with Rust's borrowing discipline (we use `&mut Store` to represent mutable access to the GC heap) while still allowing rooted references to be moved around without tying up the whole store in borrows. Additionally, and crucially, this indirection allows us to update the *actual* GC pointers in the `RootSet` and support moving GCs (again, as mentioned above). * Reorganize GC-related submodules in `wasmtime-runtime` * Reorganize GC-related submodules in `wasmtime` * Use `Into<StoreContext[Mut]<'a, T>` for `Externref::data[_mut]` methods * Run rooting tests under MIRI * Make `into_abi` take an `AutoAssertNoGc` * Don't use atomics to update externref ref counts anymore * Try to make lifetimes/safety more-obviously correct Remove some transmute methods, assert that `VMExternRef`s are the only valid `VMGcRef`, etc. * Update extenref constructor examples * Make `GcRefImpl::transmute_ref` a non-default trait method * Make inline fast paths for GC LIFO scopes * Make `RootSet::unroot_gc_ref` an `unsafe` function * Move Hash and Eq for Rooted, move to impl methods * Remove type parameter from `AutoAssertNoGc` Just wrap a `&mut StoreOpaque` directly. * Make a bunch of internal `ExternRef` methods that deal with raw `VMGcRef`s take `AutoAssertNoGc` instead of `StoreOpaque` * Fix compile after rebase * rustfmt * revert unrelated egraph changes * Fix non-gc build * Mark `AutoAssertNoGc` methods inline * review feedback * Temporarily remove externref support from the C API Until we can add proper GC rooting. * Remove doxygen reference to temp deleted function * Remove need to `allow(private_interfaces)` * Fix call benchmark compilation |
8 months ago |
Nick Fitzgerald |
ff93bce067
|
Wasmtime: Finish support for the typed function references proposal (#7943)
* Wasmtime: Finish support for the typed function references proposal While we supported the function references proposal inside Wasm, we didn't support it on the "outside" in the Wasmtime embedder APIs. So much of the work here is exposing typed function references, and their type system updates, in the embedder API. These changes include: * `ValType::FuncRef` and `ValType::ExternRef` are gone, replaced with the introduction of the `RefType` and `HeapType` types and a `ValType::Ref(RefType)` variant. * `ValType` and `FuncType` no longer implement `Eq` and `PartialEq`. Instead there are `ValType::matches` and `FuncType::matches` methods which check directional subtyping. I also added `ValType::eq` and `FuncType::eq` static methods for the rare case where someone needs to check precise equality, but that is almost never actually the case, 99.99% of the time you want to check subtyping. * There are also public `Val::matches_ty` predicates for checking if a value is an instance of a type, as well as internal helpers like `Val::ensure_matches_ty` that return a formatted error if the value does not match the given type. These helpers are used throughout Wasmtime internals now. * There is now a dedicated `wasmtime::Ref` type that represents reference values. Table operations have been updated to take and return `Ref`s rather than `Val`s. Furthermore, this commit also includes type registry changes to correctly manage lifetimes of types that reference other types. This wasn't previously an issue because the only thing that could reference types that reference other types was a Wasm module that added all the types that could reference each other at the same time and removed them all at the same time. But now that the previously discussed work to expose these things in the embedder API is done, type lifetime management in the registry becomes a little trickier because the embedder might grab a reference to a type that references another type, and then unload the Wasm module that originally defined that type, but then the user should still be able use that type and the other types it transtively references. Before, we were refcounting individual registry entries. Now, we still are refcounting individual entries, but now we are also accounting for type-to-type references and adding a new type to the registry will increment the refcounts of each of the types that it references, and removing a type from the registry will decrement the refcounts of each of the types it references, and then recursively (logically, not literally) remove any types whose refcount has now reached zero. Additionally, this PR adds support for subtyping to `Func::typed`- and `Func::wrap`-style APIs. For result types, you can always use a supertype of the WebAssembly function's actual declared return type in `Func::typed`. And for param types, you can always use a subtype of the Wasm function's actual declared param type. Doing these things essentially erases information but is always correct. But additionally, for functions which take a reference to a concrete type as a parameter, you can also use the concrete type's supertype. Consider a WebAssembly function that takes a reference to a function with a concrete type: `(ref null <func type index>)`. In this scenario, there is no static `wasmtime::Foo` Rust type that corresponds to that particular Wasm-defined concrete reference type because Wasm modules are loaded dynamically at runtime. You *could* do `f.typed::<Option<NoFunc>, ()>()`, and while that is correctly typed and valid, it is often overly restrictive. The only value you could call the resulting typed function with is the null function reference, but we'd like to call it with non-null function references that happen to be of the correct type. Therefore, `f.typed<Option<Func>, ()>()` is also allowed in this case, even though `Option<Func>` represents `(ref null func)` which is the supertype, not subtype, of `(ref null <func type index>)`. This does imply some minimal dynamic type checks in this case, but it is supported for better ergonomics, to enable passing non-null references into the function. We can investigate whether it is possible to use generic type parameters and combinators to define Rust types that precisely match concrete reference types in future, follow-up pull requests. But for now, we've made things usable, at least. Finally, this also takes the first baby step towards adding support for the Wasm GC proposal. Right now the only thing that is supported is `nofunc` references, and this was mainly to make testing function reference subtyping easier. But that does mean that supporting `nofunc` references entailed also adding a `wasmtime::NoFunc` type as well as the `Config::wasm_gc(enabled)` knob. So we officially have an in-progress implementation of Wasm GC in Wasmtime after this PR lands! Fixes https://github.com/bytecodealliance/wasmtime/issues/6455 * Fix WAT in test to be valid * Check that dependent features are enabled for function-references and GC * Remove unnecessary engine parameters from a few methods Ever since `FuncType`'s internal `RegisteredType` holds onto its own `Engine`, we don't need these anymore. Still useful to keep the `Engine` parameter around for the `ensure_matches` methods because that can be used to check correct store/engine usage for embedders. * Add missing dependent feature enabling for some tests * Remove copy-paste bit from test * match self to show it is uninhabited * Add a missing `is_v128` method * Short circuit a few func type comparisons * Turn comment into part of doc comment * Add test for `Global::new` and subtyping * Add tests for embedder API, tables, and subtyping * Add an embedder API test for setting globals and subtyping * Construct realloc's type from its index, rather than from scratch * Help LLVM better optimize our dynamic type checks in `TypedFunc::call_raw` * Fix call benchmark compilation * Change `WasmParams::into_abi` to take the whole func type instead of iter of params * Fix doc links prtest:full * Fix size assertion on s390x |
9 months ago |
Nick Fitzgerald |
b736585e25
|
Add a script for benchmarking `wasmtime serve`'s requests per second (#7955)
There are a *ton* of different options such a script could have, and things we could tweak. This is just meant to be a shared starting point so it is easy to get up and running with something. Example output: ``` ./benches/wasmtime-serve-rps.sh -O pooling-allocator hello_wasi_http.wasm Finished `release` profile [optimized] target(s) in 0.17s Running `target/release/wasmtime serve -O pooling-allocator hello_wasi_http.wasm` Serving HTTP on http://0.0.0.0:8080/ Running `wasmtime serve` in background as pid 2476839 Benchmarking for 10 seconds... Summary: Total: 10.0025 secs Slowest: 0.0138 secs Fastest: 0.0001 secs Average: 0.0013 secs Requests/sec: 39461.5419 Response time histogram: 0.000 [1] | 0.001 [277602] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 0.003 [111637] |■■■■■■■■■■■■■■■■ 0.004 [4182] |■ 0.006 [720] | 0.007 [301] | 0.008 [143] | 0.010 [76] | 0.011 [26] | 0.012 [19] | 0.014 [7] | Latency distribution: 10% in 0.0006 secs 25% in 0.0009 secs 50% in 0.0012 secs 75% in 0.0016 secs 90% in 0.0019 secs 95% in 0.0022 secs 99% in 0.0031 secs Details (average, fastest, slowest): DNS+dialup: 0.0000 secs, 0.0001 secs, 0.0138 secs DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs req write: 0.0000 secs, 0.0000 secs, 0.0092 secs resp wait: 0.0012 secs, 0.0001 secs, 0.0137 secs resp read: 0.0000 secs, 0.0000 secs, 0.0113 secs Status code distribution: [200] 394714 responses ``` |
9 months ago |
Pat Hickey |
2b00a541f3
|
Make wasi-common self-contained, deprecate exports from wasmtime-wasi (#7881)
* WIP: try to make wasi-common self contained. * rebase: cargo.lock * remove all dependencies between wasi-common and wasmtime-wasi * use wasi-common directly throughout tests, benches, examples, cli run * wasi-threads: use wasi-common's maybe_exit_on_error in spawned thread not a very modular design, but at this point wasi-common and wasi-threads are forever wed * fix wasmtime's docs * re-introduce wasmtime-wasi's exports of wasi-common definitions behind deprecated * factor out determining i32 process exit code and remove libc dep because rustix provides the same constant * commands/run: inline the logic about aborting on trap since this is the sole place in the codebase its used * Add high-level summary to wasi-common's top-level doc comment. * c-api: fix use of wasi_cap_std_sync => wasi_common::sync, wasmtime_wasi => wasi_common * fix tokio example * think better of combining downcast and masking into one method * fix references to wasmtime_wasi in docs prtest:full * benches: use wasi-common * cfg-if around use of rustix::process because that doesnt exist on windows * wasi-common: include tests, caught by verify-publish * fix another bench * exit requires wasmtime dep. caught by verify-publish. |
9 months ago |
Nick Fitzgerald |
8652011f69
|
Refactor `wasmtime::FuncType` to hold a handle to its registered type (#7892)
* Refactor `wasmtime::FuncType` to hold a handle to its registered type Rather than holding a copy of the type directly, it now holds a `RegisteredType` which internally is * A `VMSharedTypeIndex` pointing into the engine's types registry. * An `Arc` handle to the engine's type registry. * An `Arc` handle to the actual type. The last exists only to keep it so that accessing a `wasmtime::FuncType`'s parameters and results fast, avoiding any new locking on call hot paths. This is helping set the stage for further types and `TypeRegistry` refactors needed for Wasm GC. * Update the C API for the function types refactor prtest:full * rustfmt * Fix benches build |
9 months ago |
Andrew Brown |
f3b5478bfc
|
mpk: allow benchmarking MPK (#7787)
* mpk: also force MPK during benchmarking This change takes advantage of the `WASMTIME_TEST_FORCE_MPK` environment variable to force its use in the `call.rs` benchmarks. I see differences in several cases when running: ```console $ taskset --cpu-list 0-15 cargo bench -- async $ WASMTIME_TEST_FORCE_MPK=1 taskset --cpu-list 0-15 cargo bench -- async ``` To properly isolate the MPK effects, this adds a `sync-pool` option (`IsAsync::NoPooling`). * mpk: disable PKRU read unless logging is turned on After noticing some minor effects with this PKRU read, I chose to avoid it unless logging is enabled. A perfectly valid alternative would be to remove the logging altogether, but I have found this very helpful when trying to troubleshoot MPK issues. * mpk: inline PKRU functions To avoid any unnecessary call overhead, we hint to the compiler that `pkru::read` and `pkru::write` should be inlined. |
10 months ago |
Nick Fitzgerald |
17f5fffa53
|
Add component call micro-benchmarks (#6981)
This commit adds the component equivalents of the existing core Wasm call micro-benchmarks. This also adds a sprinkling of `#[inline]` to some functions that I noticed when glancing at some profiles. The two most important numbers: ``` sync/no-hook/component - host-to-wasm - typed - nop time: [75.849 ns 76.476 ns 77.247 ns] sync/no-hook/component - wasm-to-host - typed - nop time: [33.614 ns 33.872 ns 34.170 ns] ``` The full benchmark results are in here: <details> ``` $ cargo bench --features component-model --bench call 'component' Finished bench [optimized] target(s) in 0.19s Running benches/call.rs (target/release/deps/call-4d8d1585dd2825a2) sync/no-hook/component - host-to-wasm - typed - nop time: [75.849 ns 76.476 ns 77.247 ns] Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop time: [108.29 ns 109.66 ns 111.51 ns] Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [79.968 ns 80.756 ns 81.728 ns] Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [210.27 ns 211.72 ns 213.34 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/hook-sync/component - host-to-wasm - typed - nop time: [76.840 ns 77.295 ns 77.770 ns] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sync/hook-sync/component - host-to-wasm - untyped - nop time: [109.63 ns 110.42 ns 111.26 ns] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe sync/hook-sync/component - host-to-wasm - typed - nop-params-and-results time: [81.324 ns 82.344 ns 83.663 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe sync/hook-sync/component - host-to-wasm - untyped - nop-params-and-results time: [211.84 ns 215.06 ns 219.22 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe async/no-hook/component - host-to-wasm - typed - nop time: [23.759 µs 23.969 µs 24.221 µs] Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe async/no-hook/component - host-to-wasm - untyped - nop time: [23.941 µs 24.093 µs 24.254 µs] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe async/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [24.286 µs 24.459 µs 24.629 µs] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe async/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [24.258 µs 24.390 µs 24.528 µs] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe async/hook-sync/component - host-to-wasm - typed - nop time: [24.055 µs 24.224 µs 24.408 µs] Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe async/hook-sync/component - host-to-wasm - untyped - nop time: [24.217 µs 24.364 µs 24.517 µs] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe async/hook-sync/component - host-to-wasm - typed - nop-params-and-results time: [24.207 µs 24.331 µs 24.463 µs] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe async/hook-sync/component - host-to-wasm - untyped - nop-params-and-results time: [24.607 µs 24.767 µs 24.936 µs] Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe async-pool/no-hook/component - host-to-wasm - typed - nop time: [456.89 ns 459.65 ns 462.68 ns] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async-pool/no-hook/component - host-to-wasm - untyped - nop time: [490.07 ns 492.87 ns 495.88 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe async-pool/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [471.68 ns 475.01 ns 478.59 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe async-pool/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [597.02 ns 600.61 ns 604.53 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe async-pool/hook-sync/component - host-to-wasm - typed - nop time: [458.06 ns 460.82 ns 463.77 ns] Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high severe async-pool/hook-sync/component - host-to-wasm - untyped - nop time: [494.20 ns 497.65 ns 501.48 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe async-pool/hook-sync/component - host-to-wasm - typed - nop-params-and-results time: [472.40 ns 476.08 ns 480.10 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe async-pool/hook-sync/component - host-to-wasm - untyped - nop-params-and-results time: [598.55 ns 603.79 ns 610.18 ns] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop time: [33.614 ns 33.872 ns 34.170 ns] Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) high mild 8 (8.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [37.416 ns 37.700 ns 38.002 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop time: [58.126 ns 58.478 ns 58.846 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [170.14 ns 171.33 ns 172.68 ns] Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe sync/hook-sync/component - wasm-to-host - typed - nop time: [33.336 ns 33.556 ns 33.796 ns] Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe sync/hook-sync/component - wasm-to-host - typed - nop-params-and-results time: [37.399 ns 37.654 ns 37.904 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe sync/hook-sync/component - wasm-to-host - untyped - nop time: [58.438 ns 58.924 ns 59.485 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe sync/hook-sync/component - wasm-to-host - untyped - nop-params-and-results time: [169.36 ns 170.44 ns 171.60 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe async/no-hook/component - wasm-to-host - typed - nop time: [33.882 ns 34.198 ns 34.573 ns] Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [37.407 ns 37.820 ns 38.371 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe async/no-hook/component - wasm-to-host - untyped - nop time: [58.400 ns 58.937 ns 59.537 ns] Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe async/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [170.15 ns 171.72 ns 173.52 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe async/no-hook/component - wasm-to-host - async-typed - nop time: [48.383 ns 48.801 ns 49.317 ns] Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe async/no-hook/component - wasm-to-host - async-typed - nop-params-and-results time: [59.723 ns 60.158 ns 60.657 ns] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe async/hook-sync/component - wasm-to-host - typed - nop time: [33.537 ns 34.056 ns 34.742 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe async/hook-sync/component - wasm-to-host - typed - nop-params-and-results time: [37.390 ns 37.888 ns 38.562 ns] Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe async/hook-sync/component - wasm-to-host - untyped - nop time: [58.506 ns 58.906 ns 59.361 ns] Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe async/hook-sync/component - wasm-to-host - untyped - nop-params-and-results time: [170.70 ns 172.62 ns 174.80 ns] Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe async/hook-sync/component - wasm-to-host - async-typed - nop time: [48.308 ns 48.764 ns 49.267 ns] Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe async/hook-sync/component - wasm-to-host - async-typed - nop-params-and-results time: [57.503 ns 57.887 ns 58.307 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe async-pool/no-hook/component - wasm-to-host - typed - nop time: [33.473 ns 33.792 ns 34.142 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe async-pool/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [37.523 ns 38.040 ns 38.638 ns] Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe async-pool/no-hook/component - wasm-to-host - untyped - nop time: [57.989 ns 58.350 ns 58.737 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe async-pool/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [169.55 ns 170.93 ns 172.48 ns] Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe async-pool/no-hook/component - wasm-to-host - async-typed - nop time: [48.323 ns 48.700 ns 49.144 ns] Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) high mild 4 (4.00%) high severe async-pool/no-hook/component - wasm-to-host - async-typed - nop-params-and-results time: [57.521 ns 58.090 ns 58.739 ns] Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/component - wasm-to-host - typed - nop time: [33.379 ns 33.602 ns 33.838 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/component - wasm-to-host - typed - nop-params-and-results time: [37.361 ns 37.608 ns 37.857 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/component - wasm-to-host - untyped - nop time: [58.523 ns 58.848 ns 59.180 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe async-pool/hook-sync/component - wasm-to-host - untyped - nop-params-and-results time: [170.59 ns 171.57 ns 172.63 ns] Found 2 outliers among 100 measurements (2.00%) 2 (2.00%) high severe async-pool/hook-sync/component - wasm-to-host - async-typed - nop time: [48.265 ns 48.520 ns 48.794 ns] Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe async-pool/hook-sync/component - wasm-to-host - async-typed - nop-params-and-results time: [57.619 ns 57.918 ns 58.234 ns] Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe ``` </details> |
1 year ago |
Nick Fitzgerald |
a34427a3d2
|
Wasmtime: refactor the pooling allocator for components (#6835)
* Wasmtime: Rename `IndexAllocator` to `ModuleAffinityIndexAllocator` We will have multiple kinds of index allocators soon, so clarify which one this is. * Wasmtime: Introduce a simple index allocator This will be used in future commits refactoring the pooling allocator. * Wasmtime: refactor the pooling allocator for components We used to have one index allocator, an index per instance, and give out N tables and M memories to every instance regardless how many tables and memories they need. Now we have an index allocator for memories and another for tables. An instance isn't associated with a single instance, each of its memories and tables have an index. We allocate exactly as many tables and memories as the instance actually needs. Ultimately, this gives us better component support, where a component instance might have varying numbers of internal tables and memories. Additionally, you can now limit the number of tables, memories, and core instances a single component can allocate from the pooling allocator, even if there is the capacity for that many available. This is to give embedders tools to limit individual component instances and prevent them from hogging too much of the pooling allocator's resources. * Remove unused file Messed up from rebasing, this code is actually just inline in the index allocator module. * Address review feedback * Fix benchmarks build * Fix ignoring test under miri The `async_functions` module is not even compiled-but-ignored with miri, it is completely `cfg`ed off. Therefore we ahve to do the same with this test that imports stuff from that module. * Fix doc links * Allow testing utilities to be unused The exact `cfg`s that unlock the tests that use these are platform and feature dependent and ends up being like 5 things and super long. Simpler to just allow unused for when we are testing on other platforms or don't have the compile time features enabled. * Debug assert that the pool is empty on drop, per Alex's suggestion Also fix a couple scenarios where we could leak indices if allocating an index for a memory/table succeeded but then creating the memory/table itself failed. * Fix windows compile errors |
1 year ago |
Alex Crichton |
bb3734bd72
|
Change preview2 builder methods to use `&mut self` (#6770)
* Change preview2 builder methods to use `&mut self` This commit changes the `WasiCtxBuilder` for preview2 to use a builder pattern more similar to `std::process::Command` where methods take `&mut self` and return `&mut Self` instead of taking `self` and returning `Self`. This pattern enables more easily building up configuration over time throughout code where ownership transfer might otherwise be awkward. A small caveat to this is that the ergonomics of this pattern only really work out well if the final "build" method takes `&mut self` as well. In this situation it's difficult to try to figure out what's supposed to happen if this method is called twice, so I left it to panic for now so we can more easily update it in the future possibly. * Synchronize preview1/preview2 builders * Move preview1 builders to `&mut`-style * Rename methods on preview2 builder to match names on the preview1 builders * Fix C API * Fix more tests * Fix benchmark build * Fix unused variable |
1 year ago |
Marcin S |
ba6c9fe212
|
Add benchmark for `deserialize` to compare with `deserialize_file` (#6420)
I was curious how `deserialize` compared with `deserialize_file`. There was already a `bench_deserialize_module` so I expanded it to support `deserialize`. Some results: ``` Benchmarking deserialize/deserialize/wasi.wasm Benchmarking deserialize/deserialize/wasi.wasm: Warming up for 3.0000 s Benchmarking deserialize/deserialize/wasi.wasm: Collecting 100 samples in estimated 5.4918 s (50k iterations) Benchmarking deserialize/deserialize/wasi.wasm: Analyzing deserialize/deserialize/wasi.wasm time: [107.44 µs 108.00 µs 108.56 µs] Found 9 outliers among 100 measurements (9.00%) 6 (6.00%) high mild 3 (3.00%) high severe Benchmarking deserialize/deserialize_file/wasi.wasm Benchmarking deserialize/deserialize_file/wasi.wasm: Warming up for 3.0000 s Benchmarking deserialize/deserialize_file/wasi.wasm: Collecting 100 samples in estimated 5.0337 s (56k iterations) Benchmarking deserialize/deserialize_file/wasi.wasm: Analyzing deserialize/deserialize_file/wasi.wasm time: [89.572 µs 89.705 µs 89.872 µs] Found 14 outliers among 100 measurements (14.00%) 9 (9.00%) high mild 5 (5.00%) high severe ``` |
1 year ago |
Nick Fitzgerald |
913efdf24d
|
wasmtime: Overhaul trampolines (#6262)
This commit splits `VMCallerCheckedFuncRef::func_ptr` into three new function pointers: `VMCallerCheckedFuncRef::{wasm,array,native}_call`. Each one has a dedicated calling convention, so callers just choose the version that works for them. This is as opposed to the previous behavior where we would chain together many trampolines that converted between calling conventions, sometimes up to four on the way into Wasm and four more on the way back out. See [0] for details. [0] https://github.com/bytecodealliance/rfcs/blob/main/accepted/tail-calls.md#a-review-of-our-existing-trampolines-calling-conventions-and-call-paths Thanks to @bjorn3 for the initial idea of having multiple function pointers for different calling conventions. This is generally a nice ~5-10% speed up to our call benchmarks across the board: both Wasm-to-host and host-to-Wasm. The one exception is typed calls from Wasm to the host, which have a minor regression. We hypothesize that this is because the old hand-written assembly trampolines did not maintain a call frame and do a tail call, but the new Cranelift-generated trampolines do maintain a call frame and do a regular call. The regression is only a couple nanoseconds, which seems well-explained by these differences explain, and ultimately is not a big deal. However, this does lead to a ~5% code size regression for compiled modules. Before, we compiled a trampoline per escaping function's signature and we deduplicated these trampolines by signature. Now we compile two trampolines per escaping function: one for if the host calls via the array calling convention and one for it the host calls via the native calling convention. Additionally, we compile a trampoline for every type in the module, in case there is a native calling convention function from the host that we `call_indirect` of that type. Much of this is in the `.eh_frame` section in the compiled module, because each of our trampolines needs an entry there. Note that the `.eh_frame` section is not required for Wasmtime's correctness, and you can disable its generation to shrink compiled module code size; we just emit it to play nice with external unwinders and profilers. We believe there are code size gains available for follow up work to offset this code size regression in the future. Backing up a bit: the reason each Wasm module needs to provide these Wasm-to-native trampolines is because `wasmtime::Func::wrap` and friends allow embedders to create functions even when there is no compiler available, so they cannot bring their own trampoline. Instead the Wasm module has to supply it. This in turn means that we need to look up and patch in these Wasm-to-native trampolines during roughly instantiation time. But instantiation is super hot, and we don't want to add more passes over imports or any extra work on this path. So we integrate with `wasmtime::InstancePre` to patch these trampolines in ahead of time. Co-Authored-By: Jamey Sharp <jsharp@fastly.com> Co-Authored-By: Alex Crichton <alex@alexcrichton.com> prtest:full |
2 years ago |
Alex Crichton |
63d80fc509
|
Remove the need to have a `Store` for an `InstancePre` (#5683)
* Remove the need to have a `Store` for an `InstancePre` This commit relaxes a requirement of the `InstancePre` API, notably its construction via `Linker::instantiate_pre`. Previously this function required a `Store<T>` to be present to be able to perform type-checking on the contents of the linker, and now this requirement has been removed. Items stored within a linker are either a `HostFunc`, which has type information inside of it, or an `Extern`, which doesn't have type information inside of it. Due to the usage of `Extern` this is why a `Store` was required during the `InstancePre` construction process, it's used to extract the type of an `Extern`. This commit implements a solution where the type information of an `Extern` is stored alongside the `Extern` itself, meaning that the `InstancePre` construction process no longer requires a `Store<T>`. One caveat of this implementation is that some items, such as tables and memories, technically have a "dynamic type" where during type checking their current size is consulted to match against the minimum size required of an import. This no longer works when using `Linker::instantiate_pre` as the current size used is the one when it was inserted into the linker rather than the one available at instantiation time. It's hoped, however, that this is a relatively esoteric use case that doesn't impact many real-world users. Additionally note that this is an API-breaking change. Not only is the `Store` argument removed from `Linker::instantiate_pre`, but some other methods such as `Linker::define` grew a `Store` argument as the type needs to be extracted when an item is inserted into a linker. Closes #5675 * Fix the C API * Fix benchmark compilation * Add C API docs * Update crates/wasmtime/src/linker.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com> |
2 years ago |
Andrew Brown |
4899537328
|
bench: add more WASI benchmarks (#5309)
* bench: add more WASI benchmarks This follows up on #5274 to add several more scenarios with which to benchmark WASI performance: - `open-file.wat`: opens and closes a file - `read-file.wat`: opens a file, reads 4K bytes from it, then closes it - `read-dir.wat`: reads a directory's entries Each benchmark is hand-crafted WAT to more clearly control what WASI calls are made. As with #5274, these modules' sole entry point takes a parameter indicating the number of iterations to run in order to use `criterion`'s `iter_custom` feature. * fix: reduce expected size of directory entries |
2 years ago |
Alex Crichton |
b0939f6626
|
Remove explicit `S` type parameters (#5275)
* Remove explicit `S` type parameters This commit removes the explicit `S` type parameter on `Func::typed` and `Instance::get_typed_func`. Historical versions of Rust required that this be a type parameter but recent rustcs support a mixture of explicit type parameters and `impl Trait`. This removes, at callsites, a superfluous `, _` argument which otherwise never needs specification. * Fix mdbook examples |
2 years ago |
Andrew Brown |
8426904129
|
bench: benchmark several common WASI scenarios (#5274)
In order to properly understand the impact of providing thread-safe implmentations of WASI contexts (#5235), we need benchmarks that measure the current performance of WASI calls using Wiggle. This change adds several common WASI scenarios as WAT files (see `benches/wasi/*.wat`) and benchmarks them with `criterion`. Using `criterion`'s `iter_custom`, the WAT file runs the desired number of benchmark iterations internally and the total duration of the runs is divided to get the average time for each loop iteration. Why WAT? When compiling these benchmarks from Rust to `wasm32-wasi`, the output files are large, contain other WASI imports than the desired ones, and overall it is difficult to tell if we are measuring what we expect. By hand-writing the WAT, it is (slightly) more clear what each benchmark is doing. |
2 years ago |
Alex Crichton |
b14551d7ca
|
Refactor configuration for the pooling allocator (#5205)
This commit changes the APIs in the `wasmtime` crate for configuring the pooling allocator. I plan on adding a few more configuration options in the near future and the current structure was feeling unwieldy for adding these new abstractions. The previous `struct`-based API has been replaced with a builder-style API in a similar shape as to `Config`. This is done to help make it easier to add more configuration options in the future through adding more methods as opposed to adding more field which could break prior initializations. |
2 years ago |
Nick Fitzgerald |
a2f846f124
|
Don't re-capture backtraces when propagating traps through host frames (#5049)
* Add a benchmark for traps with many Wasm<-->host calls on the stack * Add a test for expected Wasm stack traces with Wasm<--host calls on the stack when we trap * Don't re-capture backtraces when propagating traps through host frames This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a *ton* of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes #5037 * debug assert that we don't record unnecessary backtraces for traps * Add assertions around `needs_backtrace` Unfortunately we can't do debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some()); because `needs_backtrace` doesn't consider whether Wasm backtraces have been disabled via config. * Consolidate `needs_backtrace` calculation followed by calling `raise_user_trap` into one place |
2 years ago |
Nick Fitzgerald |
46782b18c2
|
`wasmtime`: Implement fast Wasm stack walking (#4431)
* Always preserve frame pointers in Wasmtime This allows us to efficiently and simply capture Wasm stacks without maintaining and synchronizing any safety-critical side tables between the compiler and the runtime. * wasmtime: Implement fast Wasm stack walking Why do we want Wasm stack walking to be fast? Because we capture stacks whenever there is a trap and traps actually happen fairly frequently with short-lived programs and WASI's `exit`. Previously, we would rely on generating the system unwind info (e.g. `.eh_frame`) and using the system unwinder (via the `backtrace`crate) to walk the full stack and filter out any non-Wasm stack frames. This can, unfortunately, be slow for two primary reasons: 1. The system unwinder is doing `O(all-kinds-of-frames)` work rather than `O(wasm-frames)` work. 2. System unwind info and the system unwinder need to be much more general than a purpose-built stack walker for Wasm needs to be. It has to handle any kind of stack frame that any compiler might emit where as our Wasm frames are emitted by Cranelift and always have frame pointers. This translates into implementation complexity and general overhead. There can also be unnecessary-for-our-use-cases global synchronization and locks involved, further slowing down stack walking in the presence of multiple threads trying to capture stacks in parallel. This commit introduces a purpose-built stack walker for traversing just our Wasm frames. To find all the sequences of Wasm-to-Wasm stack frames, and ignore non-Wasm stack frames, we keep a linked list of `(entry stack pointer, exit frame pointer)` pairs. This linked list is maintained via Wasm-to-host and host-to-Wasm trampolines. Within a sequence of Wasm-to-Wasm calls, we can use frame pointers (which Cranelift preserves) to find the next older Wasm frame on the stack, and we keep doing this until we reach the entry stack pointer, meaning that the next older frame will be a host frame. The trampolines need to avoid a couple stumbling blocks. First, they need to be compiled ahead of time, since we may not have access to a compiler at runtime (e.g. if the `cranelift` feature is disabled) but still want to be able to call functions that have already been compiled and get stack traces for those functions. Usually this means we would compile the appropriate trampolines inside `Module::new` and the compiled module object would hold the trampolines. However, we *also* need to support calling host functions that are wrapped into `wasmtime::Func`s and there doesn't exist *any* ahead-of-time compiled module object to hold the appropriate trampolines: ```rust // Define a host function. let func_type = wasmtime::FuncType::new( vec![wasmtime::ValType::I32], vec![wasmtime::ValType::I32], ); let func = Func::new(&mut store, func_type, |_, params, results| { // ... Ok(()) }); // Call that host function. let mut results = vec![wasmtime::Val::I32(0)]; func.call(&[wasmtime::Val::I32(0)], &mut results)?; ``` Therefore, we define one host-to-Wasm trampoline and one Wasm-to-host trampoline in assembly that work for all Wasm and host function signatures. These trampolines are careful to only use volatile registers, avoid touching any register that is an argument in the calling convention ABI, and tail call to the target callee function. This allows forwarding any set of arguments and any returns to and from the callee, while also allowing us to maintain our linked list of Wasm stack and frame pointers before transferring control to the callee. These trampolines are not used in Wasm-to-Wasm calls, only when crossing the host-Wasm boundary, so they do not impose overhead on regular calls. (And if using one trampoline for all host-Wasm boundary crossing ever breaks branch prediction enough in the CPU to become any kind of bottleneck, we can do fun things like have multiple copies of the same trampoline and choose a random copy for each function, sharding the functions across branch predictor entries.) Finally, this commit also ends the use of a synthetic `Module` and allocating a stubbed out `VMContext` for host functions. Instead, we define a `VMHostFuncContext` with its own magic value, similar to `VMComponentContext`, specifically for host functions. <h2>Benchmarks</h2> <h3>Traps and Stack Traces</h3> Large improvements to taking stack traces on traps, ranging from shaving off 64% to 99.95% of the time it used to take. <details> ``` multi-threaded-traps/0 time: [2.5686 us 2.5808 us 2.5934 us] thrpt: [0.0000 elem/s 0.0000 elem/s 0.0000 elem/s] change: time: [-85.419% -85.153% -84.869%] (p = 0.00 < 0.05) thrpt: [+560.90% +573.56% +585.84%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe multi-threaded-traps/1 time: [2.9021 us 2.9167 us 2.9322 us] thrpt: [341.04 Kelem/s 342.86 Kelem/s 344.58 Kelem/s] change: time: [-91.455% -91.294% -91.096%] (p = 0.00 < 0.05) thrpt: [+1023.1% +1048.6% +1070.3%] Performance has improved. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) high mild 5 (5.00%) high severe multi-threaded-traps/2 time: [2.9996 us 3.0145 us 3.0295 us] thrpt: [660.18 Kelem/s 663.47 Kelem/s 666.76 Kelem/s] change: time: [-94.040% -93.910% -93.762%] (p = 0.00 < 0.05) thrpt: [+1503.1% +1542.0% +1578.0%] Performance has improved. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high severe multi-threaded-traps/4 time: [5.5768 us 5.6052 us 5.6364 us] thrpt: [709.68 Kelem/s 713.63 Kelem/s 717.25 Kelem/s] change: time: [-93.193% -93.121% -93.052%] (p = 0.00 < 0.05) thrpt: [+1339.2% +1353.6% +1369.1%] Performance has improved. multi-threaded-traps/8 time: [8.6408 us 9.1212 us 9.5438 us] thrpt: [838.24 Kelem/s 877.08 Kelem/s 925.84 Kelem/s] change: time: [-94.754% -94.473% -94.202%] (p = 0.00 < 0.05) thrpt: [+1624.7% +1709.2% +1806.1%] Performance has improved. multi-threaded-traps/16 time: [10.152 us 10.840 us 11.545 us] thrpt: [1.3858 Melem/s 1.4760 Melem/s 1.5761 Melem/s] change: time: [-97.042% -96.823% -96.577%] (p = 0.00 < 0.05) thrpt: [+2821.5% +3048.1% +3281.1%] Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild many-modules-registered-traps/1 time: [2.6278 us 2.6361 us 2.6447 us] thrpt: [378.11 Kelem/s 379.35 Kelem/s 380.55 Kelem/s] change: time: [-85.311% -85.108% -84.909%] (p = 0.00 < 0.05) thrpt: [+562.65% +571.51% +580.76%] Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe many-modules-registered-traps/8 time: [2.6294 us 2.6460 us 2.6623 us] thrpt: [3.0049 Melem/s 3.0235 Melem/s 3.0425 Melem/s] change: time: [-85.895% -85.485% -85.022%] (p = 0.00 < 0.05) thrpt: [+567.63% +588.95% +608.95%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe many-modules-registered-traps/64 time: [2.6218 us 2.6329 us 2.6452 us] thrpt: [24.195 Melem/s 24.308 Melem/s 24.411 Melem/s] change: time: [-93.629% -93.551% -93.470%] (p = 0.00 < 0.05) thrpt: [+1431.4% +1450.6% +1469.5%] Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild many-modules-registered-traps/512 time: [2.6569 us 2.6737 us 2.6923 us] thrpt: [190.17 Melem/s 191.50 Melem/s 192.71 Melem/s] change: time: [-99.277% -99.268% -99.260%] (p = 0.00 < 0.05) thrpt: [+13417% +13566% +13731%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high mild many-modules-registered-traps/4096 time: [2.7258 us 2.7390 us 2.7535 us] thrpt: [1.4876 Gelem/s 1.4955 Gelem/s 1.5027 Gelem/s] change: time: [-99.956% -99.955% -99.955%] (p = 0.00 < 0.05) thrpt: [+221417% +223380% +224881%] Performance has improved. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe many-stack-frames-traps/1 time: [1.4658 us 1.4719 us 1.4784 us] thrpt: [676.39 Kelem/s 679.38 Kelem/s 682.21 Kelem/s] change: time: [-90.368% -89.947% -89.586%] (p = 0.00 < 0.05) thrpt: [+860.23% +894.72% +938.21%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe many-stack-frames-traps/8 time: [2.4772 us 2.4870 us 2.4973 us] thrpt: [3.2034 Melem/s 3.2167 Melem/s 3.2294 Melem/s] change: time: [-85.550% -85.370% -85.199%] (p = 0.00 < 0.05) thrpt: [+575.65% +583.51% +592.03%] Performance has improved. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe many-stack-frames-traps/64 time: [10.109 us 10.171 us 10.236 us] thrpt: [6.2525 Melem/s 6.2925 Melem/s 6.3309 Melem/s] change: time: [-78.144% -77.797% -77.336%] (p = 0.00 < 0.05) thrpt: [+341.22% +350.38% +357.55%] Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe many-stack-frames-traps/512 time: [126.16 us 126.54 us 126.96 us] thrpt: [4.0329 Melem/s 4.0461 Melem/s 4.0583 Melem/s] change: time: [-65.364% -64.933% -64.453%] (p = 0.00 < 0.05) thrpt: [+181.32% +185.17% +188.71%] Performance has improved. Found 4 outliers among 100 measurements (4.00%) 4 (4.00%) high severe ``` </details> <h3>Calls</h3> There is, however, a small regression in raw Wasm-to-host and host-to-Wasm call performance due the new trampolines. It seems to be on the order of about 2-10 nanoseconds per call, depending on the benchmark. I believe this regression is ultimately acceptable because 1. this overhead will be vastly dominated by whatever work a non-nop callee actually does, 2. we will need these trampolines, or something like them, when implementing the Wasm exceptions proposal to do things like translate Wasm's exceptions into Rust's `Result`s, 3. and because the performance improvements to trapping and capturing stack traces are of such a larger magnitude than this call regressions. <details> ``` sync/no-hook/host-to-wasm - typed - nop time: [28.683 ns 28.757 ns 28.844 ns] change: [+16.472% +17.183% +17.904%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 1 (1.00%) low mild 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/host-to-wasm - untyped - nop time: [42.515 ns 42.652 ns 42.841 ns] change: [+12.371% +14.614% +17.462%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop time: [33.936 ns 34.052 ns 34.179 ns] change: [+25.478% +26.938% +28.369%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/host-to-wasm - typed - nop-params-and-results time: [34.290 ns 34.388 ns 34.502 ns] change: [+40.802% +42.706% +44.526%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 5 (5.00%) high mild 8 (8.00%) high severe sync/no-hook/host-to-wasm - untyped - nop-params-and-results time: [62.546 ns 62.721 ns 62.919 ns] change: [+2.5014% +3.6319% +4.8078%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe sync/no-hook/host-to-wasm - unchecked - nop-params-and-results time: [42.609 ns 42.710 ns 42.831 ns] change: [+20.966% +22.282% +23.475%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 4 (4.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop time: [29.546 ns 29.675 ns 29.818 ns] change: [+20.693% +21.794% +22.836%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop time: [45.448 ns 45.699 ns 45.961 ns] change: [+17.204% +18.514% +19.590%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop time: [34.334 ns 34.437 ns 34.558 ns] change: [+23.225% +24.477% +25.886%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe sync/hook-sync/host-to-wasm - typed - nop-params-and-results time: [36.594 ns 36.763 ns 36.974 ns] change: [+41.967% +47.261% +52.086%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe sync/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [63.541 ns 63.831 ns 64.194 ns] change: [-4.4337% -0.6855% +2.7134%] (p = 0.73 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe sync/hook-sync/host-to-wasm - unchecked - nop-params-and-results time: [43.968 ns 44.169 ns 44.437 ns] change: [+18.772% +21.802% +24.623%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 3 (3.00%) high mild 12 (12.00%) high severe async/no-hook/host-to-wasm - typed - nop time: [4.9612 us 4.9743 us 4.9889 us] change: [+9.9493% +11.911% +13.502%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async/no-hook/host-to-wasm - untyped - nop time: [5.0030 us 5.0211 us 5.0439 us] change: [+10.841% +11.873% +12.977%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe async/no-hook/host-to-wasm - typed - nop-params-and-results time: [4.9273 us 4.9468 us 4.9700 us] change: [+4.7381% +6.8445% +8.8238%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async/no-hook/host-to-wasm - untyped - nop-params-and-results time: [5.1151 us 5.1338 us 5.1555 us] change: [+9.5335% +11.290% +13.044%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/hook-sync/host-to-wasm - typed - nop time: [4.9330 us 4.9394 us 4.9467 us] change: [+10.046% +11.038% +12.035%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 5 (5.00%) high mild 7 (7.00%) high severe async/hook-sync/host-to-wasm - untyped - nop time: [5.0073 us 5.0183 us 5.0310 us] change: [+9.3828% +10.565% +11.752%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe async/hook-sync/host-to-wasm - typed - nop-params-and-results time: [4.9610 us 4.9839 us 5.0097 us] change: [+9.0857% +11.513% +14.359%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [5.0995 us 5.1272 us 5.1617 us] change: [+9.3600% +11.506% +13.809%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop time: [2.4242 us 2.4316 us 2.4396 us] change: [+7.8756% +8.8803% +9.8346%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 5 (5.00%) high mild 3 (3.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop time: [2.5102 us 2.5155 us 2.5210 us] change: [+12.130% +13.194% +14.270%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 4 (4.00%) high mild 8 (8.00%) high severe async-pool/no-hook/host-to-wasm - typed - nop-params-and-results time: [2.4203 us 2.4310 us 2.4440 us] change: [+4.0380% +6.3623% +8.7534%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe async-pool/no-hook/host-to-wasm - untyped - nop-params-and-results time: [2.5501 us 2.5593 us 2.5700 us] change: [+8.8802% +10.976% +12.937%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop time: [2.4135 us 2.4190 us 2.4254 us] change: [+8.3640% +9.3774% +10.435%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop time: [2.5172 us 2.5248 us 2.5357 us] change: [+11.543% +12.750% +13.982%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/host-to-wasm - typed - nop-params-and-results time: [2.4214 us 2.4353 us 2.4532 us] change: [+1.5158% +5.0872% +8.6765%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 2 (2.00%) high mild 13 (13.00%) high severe async-pool/hook-sync/host-to-wasm - untyped - nop-params-and-results time: [2.5499 us 2.5607 us 2.5748 us] change: [+10.146% +12.459% +14.919%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe sync/no-hook/wasm-to-host - nop - typed time: [6.6135 ns 6.6288 ns 6.6452 ns] change: [+37.927% +38.837% +39.869%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.930 ns 15.993 ns 16.067 ns] change: [+3.9583% +5.6286% +7.2430%] (p = 0.00 < 0.05) Performance has regressed. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe sync/no-hook/wasm-to-host - nop - untyped time: [20.596 ns 20.640 ns 20.690 ns] change: [+4.3293% +5.2047% +6.0935%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - untyped time: [42.659 ns 42.882 ns 43.159 ns] change: [-2.1466% -0.5079% +1.2554%] (p = 0.58 > 0.05) No change in performance detected. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe sync/no-hook/wasm-to-host - nop - unchecked time: [10.671 ns 10.691 ns 10.713 ns] change: [+83.911% +87.620% +92.062%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe sync/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.136 ns 11.190 ns 11.263 ns] change: [-29.719% -28.446% -27.029%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop - typed time: [6.7964 ns 6.8087 ns 6.8226 ns] change: [+21.531% +24.206% +27.331%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 4 (4.00%) high mild 10 (10.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.865 ns 15.921 ns 15.985 ns] change: [+4.8466% +6.3330% +7.8317%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe sync/hook-sync/wasm-to-host - nop - untyped time: [21.505 ns 21.587 ns 21.677 ns] change: [+8.0908% +9.1943% +10.254%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 4 (4.00%) high mild 4 (4.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [44.018 ns 44.128 ns 44.261 ns] change: [-1.4671% -0.0458% +1.2443%] (p = 0.94 > 0.05) No change in performance detected. Found 14 outliers among 100 measurements (14.00%) 5 (5.00%) high mild 9 (9.00%) high severe sync/hook-sync/wasm-to-host - nop - unchecked time: [11.264 ns 11.326 ns 11.387 ns] change: [+80.225% +81.659% +83.068%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.816 ns 11.865 ns 11.920 ns] change: [-29.152% -28.040% -26.957%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 8 (8.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop - typed time: [6.6221 ns 6.6385 ns 6.6569 ns] change: [+43.618% +44.755% +45.965%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 6 (6.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.884 ns 15.929 ns 15.983 ns] change: [+3.5987% +5.2053% +6.7846%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 3 (3.00%) high mild 13 (13.00%) high severe async/no-hook/wasm-to-host - nop - untyped time: [20.615 ns 20.702 ns 20.821 ns] change: [+6.9799% +8.1212% +9.2819%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.956 ns 42.207 ns 42.521 ns] change: [-4.3057% -2.7730% -1.2428%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async/no-hook/wasm-to-host - nop - unchecked time: [10.440 ns 10.474 ns 10.513 ns] change: [+83.959% +85.826% +87.541%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.476 ns 11.512 ns 11.554 ns] change: [-29.857% -28.383% -26.978%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe async/no-hook/wasm-to-host - nop - async-typed time: [26.427 ns 26.478 ns 26.532 ns] change: [+6.5730% +7.4676% +8.3983%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [28.557 ns 28.693 ns 28.880 ns] change: [+1.9099% +3.7332% +5.9731%] (p = 0.00 < 0.05) Performance has regressed. Found 15 outliers among 100 measurements (15.00%) 1 (1.00%) high mild 14 (14.00%) high severe async/hook-sync/wasm-to-host - nop - typed time: [6.7488 ns 6.7630 ns 6.7784 ns] change: [+19.935% +22.080% +23.683%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.928 ns 16.031 ns 16.149 ns] change: [+5.5188% +6.9567% +8.3839%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 9 (9.00%) high mild 2 (2.00%) high severe async/hook-sync/wasm-to-host - nop - untyped time: [21.930 ns 22.114 ns 22.296 ns] change: [+4.6674% +7.7588% +10.375%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.684 ns 42.858 ns 43.081 ns] change: [-5.2957% -3.4693% -1.6217%] (p = 0.00 < 0.05) Performance has improved. Found 14 outliers among 100 measurements (14.00%) 2 (2.00%) high mild 12 (12.00%) high severe async/hook-sync/wasm-to-host - nop - unchecked time: [11.026 ns 11.053 ns 11.086 ns] change: [+70.751% +72.378% +73.961%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.840 ns 11.900 ns 11.982 ns] change: [-27.977% -26.584% -24.887%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async/hook-sync/wasm-to-host - nop - async-typed time: [27.601 ns 27.709 ns 27.882 ns] change: [+8.1781% +9.1102% +10.030%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 3 (3.00%) high mild 6 (6.00%) high severe async/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [28.955 ns 29.174 ns 29.413 ns] change: [+1.1226% +3.0366% +5.1126%] (p = 0.00 < 0.05) Performance has regressed. Found 13 outliers among 100 measurements (13.00%) 7 (7.00%) high mild 6 (6.00%) high severe async-pool/no-hook/wasm-to-host - nop - typed time: [6.5626 ns 6.5733 ns 6.5851 ns] change: [+40.561% +42.307% +44.514%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - typed time: [15.820 ns 15.886 ns 15.969 ns] change: [+4.1044% +5.7928% +7.7122%] (p = 0.00 < 0.05) Performance has regressed. Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) high mild 13 (13.00%) high severe async-pool/no-hook/wasm-to-host - nop - untyped time: [20.481 ns 20.521 ns 20.566 ns] change: [+6.7962% +7.6950% +8.7612%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 6 (6.00%) high mild 5 (5.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - untyped time: [41.834 ns 41.998 ns 42.189 ns] change: [-3.8185% -2.2687% -0.7541%] (p = 0.01 < 0.05) Change within noise threshold. Found 13 outliers among 100 measurements (13.00%) 3 (3.00%) high mild 10 (10.00%) high severe async-pool/no-hook/wasm-to-host - nop - unchecked time: [10.353 ns 10.380 ns 10.414 ns] change: [+82.042% +84.591% +87.205%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 4 (4.00%) high mild 3 (3.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - unchecked time: [11.123 ns 11.168 ns 11.228 ns] change: [-30.813% -29.285% -27.874%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 11 (11.00%) high mild 1 (1.00%) high severe async-pool/no-hook/wasm-to-host - nop - async-typed time: [27.442 ns 27.528 ns 27.638 ns] change: [+7.5215% +9.9795% +12.266%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/no-hook/wasm-to-host - nop-params-and-results - async-typed time: [29.014 ns 29.148 ns 29.312 ns] change: [+2.0227% +3.4722% +4.9047%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild 1 (1.00%) high severe async-pool/hook-sync/wasm-to-host - nop - typed time: [6.7916 ns 6.8116 ns 6.8325 ns] change: [+20.937% +22.050% +23.281%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 5 (5.00%) high mild 6 (6.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - typed time: [15.917 ns 15.975 ns 16.051 ns] change: [+4.6404% +6.4217% +8.3075%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 5 (5.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop - untyped time: [21.558 ns 21.612 ns 21.679 ns] change: [+8.1158% +9.1409% +10.217%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 2 (2.00%) high mild 7 (7.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - untyped time: [42.475 ns 42.614 ns 42.775 ns] change: [-6.3613% -4.4709% -2.7647%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 3 (3.00%) high mild 15 (15.00%) high severe async-pool/hook-sync/wasm-to-host - nop - unchecked time: [11.150 ns 11.195 ns 11.247 ns] change: [+74.424% +77.056% +79.811%] (p = 0.00 < 0.05) Performance has regressed. Found 14 outliers among 100 measurements (14.00%) 3 (3.00%) high mild 11 (11.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - unchecked time: [11.639 ns 11.695 ns 11.760 ns] change: [-30.212% -29.023% -27.954%] (p = 0.00 < 0.05) Performance has improved. Found 15 outliers among 100 measurements (15.00%) 7 (7.00%) high mild 8 (8.00%) high severe async-pool/hook-sync/wasm-to-host - nop - async-typed time: [27.480 ns 27.712 ns 27.984 ns] change: [+2.9764% +6.5061% +9.8914%] (p = 0.00 < 0.05) Performance has regressed. Found 8 outliers among 100 measurements (8.00%) 6 (6.00%) high mild 2 (2.00%) high severe async-pool/hook-sync/wasm-to-host - nop-params-and-results - async-typed time: [29.218 ns 29.380 ns 29.600 ns] change: [+5.2283% +7.7247% +10.822%] (p = 0.00 < 0.05) Performance has regressed. Found 16 outliers among 100 measurements (16.00%) 2 (2.00%) high mild 14 (14.00%) high severe ``` </details> * Add s390x support for frame pointer-based stack walking * wasmtime: Allow `Caller::get_export` to get all exports * fuzzing: Add a fuzz target to check that our stack traces are correct We generate Wasm modules that keep track of their own stack as they call and return between functions, and then we periodically check that if the host captures a backtrace, it matches what the Wasm module has recorded. * Remove VM offsets for `VMHostFuncContext` since it isn't used by JIT code * Add doc comment with stack walking implementation notes * Document the extra state that can be passed to `wasmtime_runtime::Backtrace` methods * Add extensive comments for stack walking function * Factor architecture-specific bits of stack walking out into modules * Initialize store-related fields in a vmctx to null when there is no store yet Rather than leaving them as uninitialized data. * Use `set_callee` instead of manually setting the vmctx field * Use a more informative compile error message for unsupported architectures * Document unsafety of `prepare_host_to_wasm_trampoline` * Use `bti c` instead of `hint #34` in inline aarch64 assembly * Remove outdated TODO comment * Remove setting of `last_wasm_exit_fp` in `set_jit_trap` This is no longer needed as the value is plumbed through to the backtrace code directly now. * Only set the stack limit once, in the face of re-entrancy into Wasm * Add comments for s390x-specific stack walking bits * Use the helper macro for all libcalls If we forget to use it, and then trigger a GC from the libcall, that means we could miss stack frames when walking the stack, fail to find live GC refs, and then get use after free bugs. Much less risky to always use the helper macro that takes care of all of that for us. * Use the `asm_sym!` macro in Wasm-to-libcall trampolines This macro handles the macOS-specific underscore prefix stuff for us. * wasmtime: add size and align to `externref` assertion error message * Extend the `stacks` fuzzer to have host frames in between Wasm frames This way we get one or more contiguous sequences of Wasm frames on the stack, instead of exactly one. * Add documentation for aarch64-specific backtrace helpers * Clarify that we only support little-endian aarch64 in trampoline comment * Use `.machine z13` in s390x assembly file Since apparently our CI machines have pretty old assemblers that don't have `.machine z14`. This should be fine though since these trampolines don't make use of anything that is introduced in z14. * Fix aarch64 build * Fix macOS build * Document the `asm_sym!` macro * Add windows support to the `wasmtime-asm-macros` crate * Add windows support to host<--->Wasm trampolines * Fix trap handler build on windows * Run `rustfmt` on s390x trampoline source file * Temporarily disable some assertions about a trap's backtrace in the component model tests Follow up to re-enable this and fix the associated issue: https://github.com/bytecodealliance/wasmtime/issues/4535 * Refactor libcall definitions with less macros This refactors the `libcall!` macro to use the `foreach_builtin_function!` macro to define all of the trampolines. Additionally the macro surrounding each libcall itself is no longer necessary and helps avoid too many macros. * Use `VMOpaqueContext::from_vm_host_func_context` in `VMHostFuncContext::new` * Move `backtrace` module to be submodule of `traphandlers` This avoids making some things `pub(crate)` in `traphandlers` that really shouldn't be. * Fix macOS aarch64 build * Use "i64" instead of "word" in aarch64-specific file * Save/restore entry SP and exit FP/return pointer in the face of panicking imported host functions Also clean up assertions surrounding our saved entry/exit registers. * Put "typed" vs "untyped" in the same position of call benchmark names Regardless if we are doing wasm-to-host or host-to-wasm * Fix stacks test case generator build for new `wasm-encoder` * Fix build for s390x * Expand libcalls in s390x asm * Disable more parts of component tests now that backtrace assertions are a bit tighter * Remove assertion that can maybe fail on s390x Co-authored-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com> |
2 years ago |
Nick Fitzgerald |
7000b0a4cf
|
wasmtime: Add criterion micro benchmarks for traps (#4398)
* wasmtime: Rename host->wasm trampolines As we introduce new types of trampolines, having clear names for our existing trampolines will be helpful. * Fix typo in docs for `VMCOMPONENT_MAGIC` * wasmtime: Add criterion micro benchmarks for traps |
2 years ago |
Alex Crichton |
82a31680d6
|
Use a `StoreOpaque` during backtraces for metadata (#4325)
Previous to this commit Wasmtime would use the `GlobalModuleRegistry` when learning information about a trap such as its trap code, the symbols for each frame, etc. This has a downside though of holding a global read-write lock for the duration of this operation which hinders registration of new modules in parallel. In addition there was a fair amount of internal duplication between this "global module registry" and the store-local module registry. Finally relying on global state for information like this gets a bit more brittle over time as it seems best to scope global queries to precisely what's necessary rather than holding extra information. With the refactoring in wasm backtraces done in #4183 it's now possible to always have a `StoreOpaque` reference when a backtrace is collected for symbolication and otherwise Trap-identification purposes. This commit adds a `StoreOpaque` parameter to the `Trap::from_runtime` constructor and then plumbs that everywhere. Note that while doing this I changed the internal `traphandlers::lazy_per_thread_init` function to no longer return a `Result` and instead just `panic!` on Unix if memory couldn't be allocated for a stack. This removed quite a lot of error-handling code for a case that's expected to quite rarely happen. If necessary in the future we can add a fallible initialization point but this feels like a better default balance for the code here. With a `StoreOpaque` in use when a trap is being symbolicated that means we have a `ModuleRegistry` which can be used for queries and such. This meant that the `GlobalModuleRegistry` state could largely be dismantled and moved to per-`Store` state (within the `ModuleRegistry`, mostly just moving methods around). The final state is that the global rwlock is not exclusively scoped around insertions/deletions/`is_wasm_trap_pc` which is just a lookup and atomic add. Otherwise symbolication for a backtrace exclusively uses store-local state now (as intended). The original motivation for this commit was that frame information lookup and pieces were looking to get somewhat complicated with the addition of components which are a new vector of traps coming out of Cranelift-generated code. My hope is that by having a `Store` around for more operations it's easier to plumb all this through. |
2 years ago |
Alex Crichton |
f4b9020913
|
Change wasm-to-host trampolines to take the values_vec size (#4192)
* Change wasm-to-host trampolines to take the values_vec size This commit changes the ABI of wasm-to-host trampolines, which are only used right now for functions created with `Func::new`, to pass along the size of the `values_vec` argument. Previously the trampoline simply received `*mut ValRaw` and assumed that it was the appropriate size. By receiving a size as well we can thread through `&mut [ValRaw]` internally instead of `*mut ValRaw`. The original motivation for this is that I'm planning to leverage these trampolines for the component model for host-defined functions. Out of an abundance of caution of making sure that everything lines up I wanted to be able to write down asserts about the size received at runtime compared to the size expected. This overall led me to the desire to thread this size parameter through on the assumption that it would not impact performance all that much. I ran two benchmarks locally from the `call.rs` benchmark and got: * `sync/no-hook/wasm-to-host - nop - unchecked` - no change * `sync/no-hook/wasm-to-host - nop-params-and-results - unchecked` - 5% slower This is what I roughly expected in that if nothing actually reads the new parameter (e.g. no arguments) then threading through the parameter is effectively otherwise free. Otherwise though accesses to the `ValRaw` storage is now bounds-checked internally in Wasmtime instead of assuming it's valid, leading to the 5% slowdown (~9.6ns to ~10.3ns). If this becomes a peformance bottleneck for a particular use case then we should be fine to remove the bounds checking here or otherwise only bounds check in debug mode, otherwise I plan on leaving this as-is. Of particular note this also changes the C API for `*_unchecked` functions where the C callback now receives the size of the array as well. * Add docs |
2 years ago |
Alex Crichton |
a02a609528
|
Make `ValRaw` fields private (#4186)
* Make `ValRaw` fields private Force accessing to go through constructors and accessors to localize the knowledge about little-endian-ness. This is spawned since I made a mistake in #4039 about endianness. * Fix some tests * Component model changes |
2 years ago |
Benjamin Bouvier |
6e828df632
|
Remove unused `SourceLoc` in many `Mach` data structures (#4180)
* Remove unused srcloc in MachReloc * Remove unused srcloc in MachTrap * Use `into_iter` on array in bench code to suppress a warning * Remove unused srcloc in MachCallSite |
2 years ago |
Alex Crichton |
3f3afb455e
|
Remove support for userfaultfd (#4040)
This commit removes support for the `userfaultfd` or "uffd" syscall on Linux. This support was originally added for users migrating from Lucet to Wasmtime, but the recent developments of kernel-supported copy-on-write support for memory initialization wound up being more appropriate for these use cases than usefaultfd. The main reason for moving to copy-on-write initialization are: * The `userfaultfd` feature was never necessarily intended for this style of use case with wasm and was susceptible to subtle and rare bugs that were extremely difficult to track down. We were never 100% certain that there were kernel bugs related to userfaultfd but the suspicion never went away. * Handling faults with userfaultfd was always slow and single-threaded. Only one thread could handle faults and traveling to user-space to handle faults is inherently slower than handling them all in the kernel. The single-threaded aspect in particular presented a significant scaling bottleneck for embeddings that want to run many wasm instances in parallel. * One of the major benefits of userfaultfd was lazy initialization of wasm linear memory which is also achieved with the copy-on-write initialization support we have right now. * One of the suspected benefits of userfaultfd was less frobbing of the kernel vma structures when wasm modules are instantiated. Currently the copy-on-write support has a mitigation where we attempt to reuse the memory images where possible to avoid changing vma structures. When comparing this to userfaultfd's performance it was found that kernel modifications of vmas aren't a worrisome bottleneck so copy-on-write is suitable for this as well. Overall there are no remaining benefits that userfaultfd gives that copy-on-write doesn't, and copy-on-write solves a major downsides of userfaultfd, the scaling issue with a single faulting thread. Additionally copy-on-write support seems much more robust in terms of kernel implementation since it's only using standard memory-management syscalls which are heavily exercised. Finally copy-on-write support provides a new bonus where read-only memory in WebAssembly can be mapped directly to the same kernel cache page, even amongst many wasm instances of the same module, which was never possible with userfaultfd. In light of all this it's expected that all users of userfaultfd should migrate to the copy-on-write initialization of Wasmtime (which is enabled by default). |
3 years ago |
Alex Crichton |
7c3dd3398d
|
Reduce benchmark runtime on CI (#3896)
After adding the `call`-oriented benchmark recently I just noticed that running benchmarks on CI is taking 30+ minutes which is not intended. Instead of running a full benchmark run on CI (which I believe we're not looking at anyway) instead only run the benchmarks for a single iteration to ensure they still work but otherwise don't collect statistics about them. Additionally cap the number of parallel instantiations to 16 to avoid running tons of tests for machines with lots of cpus. |
3 years ago |
Alex Crichton |
8c9c72caaa
|
Add a benchmark for measuring call overhead (#3883)
The goal of this new benchmark, `call`, is to help us measure the overhead of both calling into WebAssembly from the host as well as calling the host from WebAssembly. There's lots of various ways to measure this so this benchmark is a bit large but should hopefully be pretty thorough. It's expected that this benchmark will rarely be run in its entirety but rather only a subset of the benchmarks will be run at any one given time. Some metrics measured here are: * Typed vs Untyped vs Unchecked - testing the cost of both calling wasm with these various methods as well as having wasm call the host where the host function is defined with these various methods. * With and without `call_hook` - helps to measure the overhead of the `Store::call_hook` API. * Synchronous and Asynchronous - measures the overhead of calling into WebAssembly asynchronously (with and without the pooling allocator) in addition to defining host APIs in various methods when wasm is called asynchronously. Currently all the numbers are as expected, notably: * Host calling WebAssembly is ~25ns of overhead * WebAssembly calling the host is ~3ns of overhead * "Unchecked" is a bit slower than "typed", and "Untyped" is slower than unchecked. * Asynchronous wasm calling a synchronous host function has ~3ns of overhead (nothing more than usual). * Asynchronous calls are much slower, on the order of 2-3us due to `madvise`. Lots of other fiddly bits that can be measured here, but this will hopefully help establish a benchmark through which we can measure in the future in addition to measuring changes such as #3876 |
3 years ago |
Alex Crichton |
15bb0c6903
|
Remove the `ModuleLimits` pooling configuration structure (#3837)
* Remove the `ModuleLimits` pooling configuration structure This commit is an attempt to improve the usability of the pooling allocator by removing the need to configure a `ModuleLimits` structure. Internally this structure has limits on all forms of wasm constructs but this largely bottoms out in the size of an allocation for an instance in the instance pooling allocator. Maintaining this list of limits can be cumbersome as modules may get tweaked over time and there's otherwise no real reason to limit the number of globals in a module since the main goal is to limit the memory consumption of a `VMContext` which can be done with a memory allocation limit rather than fine-tuned control over each maximum and minimum. The new approach taken in this commit is to remove `ModuleLimits`. Some fields, such as `tables`, `table_elements` , `memories`, and `memory_pages` are moved to `InstanceLimits` since they're still enforced at runtime. A new field `size` is added to `InstanceLimits` which indicates, in bytes, the maximum size of the `VMContext` allocation. If the size of a `VMContext` for a module exceeds this value then instantiation will fail. This involved adding a few more checks to `{Table, Memory}::new_static` to ensure that the minimum size is able to fit in the allocation, since previously modules were validated at compile time of the module that everything fit and that validation no longer happens (it happens at runtime). A consequence of this commit is that Wasmtime will have no built-in way to reject modules at compile time if they'll fail to be instantiated within a particular pooling allocator configuration. Instead a module must attempt instantiation see if a failure happens. * Fix benchmark compiles * Fix some doc links * Fix a panic by ensuring modules have limited tables/memories * Review comments * Add back validation at `Module` time instantiation is possible This allows for getting an early signal at compile time that a module will never be instantiable in an engine with matching settings. * Provide a better error message when sizes are exceeded Improve the error message when an instance size exceeds the maximum by providing a breakdown of where the bytes are all going and why the large size is being requested. * Try to fix test in qemu * Flag new test as 64-bit only Sizes are all specific to 64-bit right now |
3 years ago |
Alex Crichton |
1cb08d4e67
|
Minor instantiation benchmark updates (#3790)
This commit has a few minor updates and some improvements to the instantiation benchmark harness: * A `once_cell::unsync::Lazy` type is now used to guard creation of modules/engines/etc. This enables running singular benchmarks to be much faster since the benchmark no longer compiles all other benchmarks that are filtered out. Unfortunately I couldn't find a way in criterion to test whether a `BenchmarkId` is filtered out or not so we rely on the runtime laziness to initialize on the first run for benchmarks that do so. * All files located in `benches/instantiation` are now loaded for benchmarking instead of a hardcoded list. This makes it a bit easier to throw files into the directory and have them benchmarked instead of having to recompile when working with new files. * Finally a module deserialization benchmark was added to measure the time it takes to deserialize a precompiled module from disk (inspired by discussion on #3787) While I was at it I also upped some limits to be able to instantiate cfallin's `spidermonkey.wasm`. |
3 years ago |
Alex Crichton |
43b37944ff
|
Tweak parallelism and the instantiation benchmark (#3775)
Currently the "sequential" and "parallel" benchmarks reports somewhat different timings. For sequential it's time-to-instantiate, but for parallel it's time-to-instantiate-10k instances. The parallelism in the parallel benchmark can also theoretically be affected by rayon's work-stealing. For example if rayon doesn't actually do any work stealing at all then this ends up being a sequential test again. Otherwise though it's possible for some threads to finish much earlier as rayon isn't guaranteed to keep threads busy. This commit applies a few updates to the benchmark: * First an `InstancePre<T>` is now used instead of a `Linker<T>` to front-load type-checking and avoid that on each instantiation (and this is generally the fastest path to instantiate right now). * Next the instantiation benchmark is changed to measure one instantiation-per-iteration to measure per-instance instantiation to better compare with sequential numbers. * Finally rayon is removed in favor of manually creating background threads that infinitely do work until we tell them to stop. These background threads are guaranteed to be working for the entire time the benchmark is executing and should theoretically exhibit what the situation that there's N units of work all happening at once. I also applied some minor updates here such as having the parallel instantiation defined conditionally for multiple modules as well as upping the limits of the pooling allocator to handle a large module (rustpython.wasm) that I threw at it. |
3 years ago |
Nick Fitzgerald | e3a423c3e8 |
Remove some uses of `foo.expect(&format!(...))` pattern
This eagerly evaluates the `format!` and produces a `String` with a heap allocation, regardless whether `foo` is `Some`/`Ok` or `None`/`Err`. Using `foo.unwrap_or_else(|| panic!(...))` makes it so that the error message formatting is only evaluated if `foo` is `None`/`Err`. |
3 years ago |
Alex Crichton |
7ce46043dc
|
Add guard pages to the front of linear memories (#2977)
* Add guard pages to the front of linear memories This commit implements a safety feature for Wasmtime to place guard pages before the allocation of all linear memories. Guard pages placed after linear memories are typically present for performance (at least) because it can help elide bounds checks. Guard pages before a linear memory, however, are never strictly needed for performance or features. The intention of a preceding guard page is to help insulate against bugs in Cranelift or other code generators, such as CVE-2021-32629. This commit adds a `Config::guard_before_linear_memory` configuration option, defaulting to `true`, which indicates whether guard pages should be present both before linear memories as well as afterwards. Guard regions continue to be controlled by `{static,dynamic}_memory_guard_size` methods. The implementation here affects both on-demand allocated memories as well as the pooling allocator for memories. For on-demand memories this adjusts the size of the allocation as well as adjusts the calculations for the base pointer of the wasm memory. For the pooling allocator this will place a singular extra guard region at the very start of the allocation for memories. Since linear memories in the pooling allocator are contiguous every memory already had a preceding guard region in memory, it was just the previous memory's guard region afterwards. Only the first memory needed this extra guard. I've attempted to write some tests to help test all this, but this is all somewhat tricky to test because the settings are pretty far away from the actual behavior. I think, though, that the tests added here should help cover various use cases and help us have confidence in tweaking the various `Config` settings beyond their defaults. Note that this also contains a semantic change where `InstanceLimits::memory_reservation_size` has been removed. Instead this field is now inferred from the `static_memory_maximum_size` and guard size settings. This should hopefully remove some duplication in these settings, canonicalizing on the guard-size/static-size settings as the way to control memory sizes and virtual reservations. * Update config docs * Fix a typo * Fix benchmark * Fix wasmtime-runtime tests * Fix some more tests * Try to fix uffd failing test * Review items * Tweak 32-bit defaults Makes the pooling allocator a bit more reasonable by default on 32-bit with these settings. |
3 years ago |
Pat Hickey | 035d541fd9 |
add docs
|
3 years ago |
Pat Hickey | 2a4c51b77d |
switch eager vs lazy instantiation to a criterion bench
|
3 years ago |
Alex Crichton |
7a1b7cdf92
|
Implement RFC 11: Redesigning Wasmtime's APIs (#2897)
Implement Wasmtime's new API as designed by RFC 11. This is quite a large commit which has had lots of discussion externally, so for more information it's best to read the RFC thread and the PR thread. |
3 years ago |
Pat Hickey |
0f5bdc6497
|
only wasi_cap_std_sync and wasi_tokio need to define WasiCtxBuilders (#2917)
* wasmtime-wasi: re-exporting this WasiCtxBuilder was shadowing the right one wasi-common's WasiCtxBuilder is really only useful wasi_cap_std_sync and wasi_tokio to implement their own Builder on top of. This re-export of wasi-common's is 1. not useful and 2. shadow's the re-export of the right one in sync::*. * wasi-common: eliminate WasiCtxBuilder, make the builder methods on WasiCtx instead * delete wasi-common::WasiCtxBuilder altogether just put those methods directly on &mut WasiCtx. As a bonus, the sync and tokio WasiCtxBuilder::build functions are no longer fallible! * bench fixes * more test fixes |
4 years ago |
Peter Huene |
1b8efa7bbd
|
Implement simple benchmarks for instantiation.
This adds benchmarks around module instantiation using criterion. Both the default (i.e. on-demand) and pooling allocators are tested sequentially and in parallel using a thread pool. Instantiation is tested with an empty module, a module with a single page linear memory, a larger linear memory with a data initializer, and a "hello world" Rust WASI program. |
4 years ago |