cranelift

Commit Graph

Author	SHA1	Message	Date
Chris Fallin	6011420557	Pooling allocator: add a reuse-affinity policy. This policy attempts to reuse the same instance slot for subsequent instantiations of the same module. This is particularly useful when using a pooling backend such as memfd that benefits from this reuse: for example, in the memfd case, instantiating the same module into the same slot allows us to avoid several calls to mmap() because the same mappings can be reused. The policy tracks a freelist per "compiled module ID", and when allocating a slot for an instance, tries these three options in order: 1. A slot from the freelist for this module (i.e., last used for another instantiation of this particular module), or 3. A slot that was last used by some other module or never before. The "victim" slot for choice 2 is randomly chosen. The data structures are carefully designed so that all updates are O(1), and there is no retry-loop in any of the random selection. This policy is now the default when the memfd backend is selected via the `memfd-allocator` feature flag.	3 years ago
Chris Fallin	9880eba2a8	Skip memfd tests when on qemu, due to differing madvise semantics.	3 years ago
Chris Fallin	d7b04f5ced	Review comments.	3 years ago
Chris Fallin	0ec45d3ae4	Add additional tests for MemFdSlot.	3 years ago
Chris Fallin	94410a8d4b	Review comments.	3 years ago
Dan Gohman	ffa9fe32aa	Use is-terminal instead of atty. Following up on #3696, use the new is-terminal crate to test for a tty rather than having platform-specific logic in Wasmtime. The is-terminal crate has a platform-independent API which takes a handle. This also updates the tree to cap-std 0.24 etc., to avoid depending on multiple versions of io-lifetimes at once, as enforced by the cargo deny check.	3 years ago
Chris Fallin	84a8368e88	Fix to the optimization: mprotect(NONE) sometimes needed after skipping the initial mmap.	3 years ago
Chris Fallin	01e6bb81fb	Review feedback.	3 years ago
Chris Fallin	0ff8f6ab20	Make build-config magic use memfd by default.	3 years ago
Chris Fallin	ccfa245261	Optimization: only mprotect the new bit of heap, not all of it. (This was not a correctness bug, but is an obvious performance bug...)	3 years ago
Chris Fallin	982df2f2e5	Review feedback.	3 years ago
Chris Fallin	570dee63f3	Use MemFdSlot in the on-demand allocator as well.	3 years ago
Chris Fallin	3702e81d30	Remove ftruncate-trick for heap growth with memfd backend. Testing so far with recent Wasmtime has not been able to show the need for avoiding the process-wide mmap lock in real-world use-cases. As such, the technique of using an anonymous file and ftruncate() to extend it seems unnecessary; instead, memfd can always use anonymous zeroed memory for heap backing where the CoW image is not present, and mprotect() to extend the heap limit by changing page protections.	3 years ago
Chris Fallin	b73ac83c37	Add a pooling allocator mode based on copy-on-write mappings of memfds. As first suggested by Jan on the Zulip here [1], a cheap and effective way to obtain copy-on-write semantics of a "backing image" for a Wasm memory is to mmap a file with `MAP_PRIVATE`. The `memfd` mechanism provided by the Linux kernel allows us to create anonymous, in-memory-only files that we can use for this mapping, so we can construct the image contents on-the-fly then effectively create a CoW overlay. Furthermore, and importantly, `madvise(MADV_DONTNEED, ...)` will discard the CoW overlay, returning the mapping to its original state. By itself this is almost enough for a very fast instantiation-termination loop of the same image over and over, without changing the address space mapping at all (which is expensive). The only missing bit is how to implement heap growth. But here memfds can help us again: if we create another anonymous file and map it where the extended parts of the heap would go, we can take advantage of the fact that a `mmap()` mapping can be larger than the file itself, with accesses beyond the end generating a `SIGBUS`, and the fact that we can cheaply resize the file with `ftruncate`, even after a mapping exists. So we can map the "heap extension" file once with the maximum memory-slot size and grow the memfd itself as `memory.grow` operations occur. The above CoW technique and heap-growth technique together allow us a fastpath of `madvise()` and `ftruncate()` only when we re-instantiate the same module over and over, as long as we can reuse the same slot. This fastpath avoids all whole-process address-space locks in the Linux kernel, which should mean it is highly scalable. It also avoids the cost of copying data on read, as the `uffd` heap backend does when servicing pagefaults; the kernel's own optimized CoW logic (same as used by all file mmaps) is used instead. [1] https://bytecodealliance.zulipchat.com/#narrow/stream/206238-general/topic/Copy.20on.20write.20based.20instance.20reuse/near/266657772	3 years ago
Alex Crichton	a25f7bdba5	Don't copy `VMBuiltinFunctionsArray` into each `VMContext` (#3741 ) * Don't copy `VMBuiltinFunctionsArray` into each `VMContext` This is another PR along the lines of "let's squeeze all possible performance we can out of instantiation". Before this PR we would copy, by value, the contents of `VMBuiltinFunctionsArray` into each `VMContext` allocated. This array of function pointers is modestly-sized but growing over time as we add various intrinsics. Additionally it's the exact same for all `VMContext` allocations. This PR attempts to speed up instantiation slightly by instead storing an indirection to the function array. This means that calling a builtin intrinsic is a tad bit slower since it requires two loads instead of one (one to get the base pointer, another to get the actual address). Otherwise though `VMContext` initialization is now simply setting one pointer instead of doing a `memcpy` from one location to another. With some macro-magic this commit also replaces the previous implementation with one that's more `const`-friendly which also gets us compile-time type-checks of libcalls as well as compile-time verification that all libcalls are defined. Overall, as with #3739, the win is very modest here. Locally I measured a speedup from 1.9us to 1.7us taken to instantiate an empty module with one function. While small at these scales it's still a 10% improvement! * Review comments	3 years ago
Alex Crichton	2f494240f8	Lazily allocate the bump-alloc chunk in the externref table (#3739 ) This commit updates the allocation of a `VMExternRefActivationsTable` structure to perform zero malloc memory allocations. Previously it would allocate a page-size of `chunk` plus some space in hash sets for future insertions. The main trick here implemented is that after the first gc during the slow path the fast chunk allocation is allocated and configured. The motivation for this PR is that given our recent work to further refine and optimize the instantiation process this allocation started to show up in a nontrivial fashion. Most modules today never touch this table anyway as almost none of them use reference types, so the time spent allocation and deallocating the table per-store was largely wasted time. Concretely on a microbenchmark this PR speeds up instantiation of a module with one function by 30%, decreasing the instantiation cost from 1.8us to 1.2us. Overall a pretty minor win but when the instantiation times we're measuring start being in the single-digit microseconds this win ends up getting magnified!	3 years ago
Nick Fitzgerald	19f8d94959	Expand on activations table invariants comment in `libcalls.rs`	3 years ago
Nick Fitzgerald	cbc6f6071f	Fix a debug assertion in `externref` garbage collections When we GC, we assert the invariant that all `externref`s we find on the stack have a corresponding entry in the `VMExternRefActivationsTable`. However, we also might be in code that is in the process of fixing up this invariant and adding an entry to the table, but the table's bump chunk is full, and so we do a GC and then add the entry into the table. This will ultimately maintain our desired invariant, but there is a moment in time when we are doing the GC where the invariant is relaxed which is okay because the reference will be in the table before we return to Wasm or do anything else. This isn't a possible UAF, in other words. To make it so that the assertion won't trip, we explicitly insert the reference into the table before we GC, so that the invariant is not relaxed across a possibly-GCing operation (even though it would be safe in this particular case).	3 years ago
Dan Gohman	881c19473d	Use `ptr::cast` instead of `as` casts in several places. (#3507 ) `ptr::cast` has the advantage of being unable to silently cast `const T` to `mut T`. This turned up several places that were performing such casts, which this PR also fixes.	3 years ago
Chris Fallin	8a55b5c563	Add epoch-based interruption for cooperative async timeslicing. This PR introduces a new way of performing cooperative timeslicing that is intended to replace the "fuel" mechanism. The tradeoff is that this mechanism interrupts with less precision: not at deterministic points where fuel runs out, but rather when the Engine enters a new epoch. The generated code instrumentation is substantially faster, however, because it does not need to do as much work as when tracking fuel; it only loads the global "epoch counter" and does a compare-and-branch at backedges and function prologues. This change has been measured as ~twice as fast as fuel-based timeslicing for some workloads, especially control-flow-intensive workloads such as the SpiderMonkey JS interpreter on Wasm/WASI. The intended interface is that the embedder of the `Engine` performs an `engine.increment_epoch()` call periodically, e.g. once per millisecond. An async invocation of a Wasm guest on a `Store` can specify a number of epoch-ticks that are allowed before an async yield back to the executor's event loop. (The initial amount and automatic "refills" are configured on the `Store`, just as for fuel.) This call does only signal-safe work (it increments an `AtomicU64`) so could be invoked from a periodic signal, or from a thread that wakes up once per period.	3 years ago
Mrmaxmeier	2afd6900f4	runtime: expose DefaultMemoryCreator (#3670 )	3 years ago
Piotr Sikora	642102e699	Fix build with clang on s390x. (#3673 ) Signed-off-by: Piotr Sikora <piotrsikora@google.com>	3 years ago
wasmtime-publish	8043c1f919	Release Wasmtime 0.33.0 (#3648 ) * Bump Wasmtime to 0.33.0 [automatically-tag-and-release-this-commit] * Update relnotes for 0.33.0 * Wordsmithing relnotes Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	3 years ago
Dan Gohman	7b346b1f12	Update to cap-std 0.22.0. (#3611 ) * Update to cap-std 0.22.0. The main change relevant to Wasmtime here is that this includes the rustix fix for compilation errors on Rust nightly with the `asm!` macro. * Add itoa to deny.toml. * Update the doc and fuzz builds to the latest Rust nightly. * Update to libc 0.2.112 to pick up the `POLLRDHUP` fix. * Update to cargo-fuzz 0.11, for compatibility with Rust nightly. This appears to be the fix for rust-fuzz/cargo-fuzz#277.	3 years ago
wasmtime-publish	c1c4c59670	Release Wasmtime 0.32.0 (#3589 ) * Bump Wasmtime to 0.32.0 [automatically-tag-and-release-this-commit] * Update release notes for 0.32.0 Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	3 years ago
Dan Gohman	ea0cb971fb	Update to rustix 0.26.2. (#3521 ) This pulls in a fix for Android, where Android's seccomp policy on older versions is to make `openat2` irrecoverably crash the process, so we have to do a version check up front rather than relying on `ENOSYS` to determine if `openat2` is supported. And it pulls in the fix for the link errors when multiple versions of rsix/rustix are linked in. And it has updates for two crate renamings: rsix has been renamed to rustix, and unsafe-io has been renamed to io-extras.	3 years ago
Peter Huene	58aab85680	Add the `pooling-allocator` feature. This commit adds the `pooling-allocator` feature to both the `wasmtime` and `wasmtime-runtime` crates. The feature controls whether or not the pooling allocator implementation is built into the runtime and exposed as a supported instance allocation strategy in the wasmtime API. The feature is on by default for the `wasmtime` crate. Closes #3513.	3 years ago
Alex Crichton	6bcee7f5f7	Add a configuration option to force "static" memories (#3503 ) * Add a configuration option to force "static" memories In poking around at some things earlier today I realized that one configuration option for memories we haven't exposed from embeddings like the CLI is to forcibly limit the size of memory growth and force using a static memory style. This means that the CLI, for example, can't limit memory growth by default and memories are only limited in size by what the OS can give and the wasm's own memory type. This configuration option means that the CLI can artificially limit the size of wasm linear memories. Additionally another motivation for this is for testing out various codegen ramifications of static/dynamic memories. This is the only way to force a static memory, by default, for wasm64 memories with no maximum size listed for example. * Review feedback	3 years ago
wasmtime-publish	c1a6a0523d	Release Wasmtime 0.31.0 (#3489 ) * Bump Wasmtime to 0.31.0 [automatically-tag-and-release-this-commit] * Update 0.31.0 release notes Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com> Co-authored-by: Alex Crichton <alex@alexcrichton.com>	3 years ago
Alex Crichton	490d49a768	Adjust dependency directives between crates (#3420 ) * Adjust dependency directives between crates This commit is a preparation for the release process for Wasmtime. The specific changes here are to delineate which crates are "public", and all version requirements on non-public crates will now be done with `=A.B.C` version requirements instead of today's `A.B.C` version requirements. The purpose for doing this is to assist with patch releases that might happen in the future. Patch releases of wasmtime are already required to not break the APIs of "public" crates, but no such guarantee is given about "internal" crates. This means that a patch release runs the risk, for example, of breaking an internal API. In doing so though we would also need to release a new major version of the internal crate, but we wouldn't have a great hole in the number scheme of major versions to do so. By using `=A.B.C` requirements for internal crates it means we can safely ignore strict semver-compatibility between releases of internal crates for patch releases, since the only consumers of the crate will be the corresponding patch release of the `wasmtime` crate itself (or other public crates). The `publish.rs` script has been updated with a check to verify that dependencies on internal crates are all specified with an `=` dependency, and dependnecies on all public crates are without a `=` dependency. This will hopefully make it so we don't have to worry about what to use where, we just let CI tell us what to do. Using this modification all version dependency declarations have been updated. Note that some crates were adjusted to simply remove their `version` requirement in cases such as the crate wasn't published anyway (`publish = false` was specified) or it's in the `dev-dependencies` section which doesn't need version specifiers for path dependencies. * Switch to normal sever deps for cranelift dependencies These crates will now all be considered "public" where in patch releases they will be guaranteed to not have breaking changes.	3 years ago
Alex Crichton	2f2c5231b4	Add Alex's solution for null handling in TlsRestore	3 years ago
Pat Hickey	b00d811e83	code review	3 years ago
Pat Hickey	52542b6c01	mock enough of the store to pass the uffd test	3 years ago
Pat Hickey	efef0769fe	make uffd test compile, but not pass	3 years ago
Pat Hickey	0370d5c1a2	code review suggestion	3 years ago
Pat Hickey	a1301f8dae	add table_grow_failed	3 years ago
Pat Hickey	5aef8f47c8	catch panic in libcalls for memory and table grow	3 years ago
Pat Hickey	6c70b81ff5	review feedback	3 years ago
Pat Hickey	a5007f318f	runtime: use anyhow::Error instead of Box<dyn std::error::Error...>	3 years ago
Pat Hickey	67a6c27e22	pooling needs the store earlier	3 years ago
Pat Hickey	147c8f8ed7	rename	3 years ago
Pat Hickey	18a355e092	give sychronous ResourceLimiter an async alternative	3 years ago
Steve	807619a874	as requested: cargo fmt	3 years ago
Pat Hickey	8554d69e4b	update userfaultfd to 0.4.1 (#3442 ) which updates nix to 0.23.0, getting rid of the benign RUSTSEC-2021-0119 in our dep tree	3 years ago
Steve	92a10d1ace	Added resolve_vmctx_memory function to enable debuggers to resolve sandbox pointers - required because sandbox 'this' pointer cannot be resolved by lldb any other way as lldb expects "this" and "self" to be standard pointers, not sandbox handles.	3 years ago
Alex Crichton	5b3b459ad5	Fix some nightly dead code warnings (#3404 ) * Fix some nightly dead code warnings Looks like the "struct field not used" lint has improved on nightly and caught a few more instances of fields that were never actually read. * Fix windows	3 years ago
Dan Gohman	e5ebef1b94	Use `empty()` instead of `NONE` with rsix flags types. `empty()` is provided by all `bitflags` types, so it's more idiomatic than having `NONE` values.	3 years ago
Alex Crichton	bfdbd10a13	Add `_unchecked` variants of `Func` APIs for the C API (#3350 ) Add `_unchecked` variants of `Func` APIs for the C API This commit is what is hopefully going to be my last installment within the saga of optimizing function calls in/out of WebAssembly modules in the C API. This is yet another alternative approach to #3345 (sorry) but also contains everything necessary to make the C API fast. As in #3345 the general idea is just moving checks out of the call path in the same style of `TypedFunc`. This new strategy takes inspiration from previously learned attempts effectively "just" exposes how we previously passed `mut u128` through trampolines for arguments/results. This storage format is formalized through a new `ValRaw` union that is exposed from the `wasmtime` crate. By doing this it made it relatively easy to expose two new APIs: * `Func::new_unchecked` * `Func::call_unchecked` These are the same as their checked equivalents except that they're `unsafe` and they work with `mut ValRaw` rather than safe slices of `Val`. Working with these eschews type checks and such and requires callers/embedders to do the right thing. These two new functions are then exposed via the C API with new functions, enabling C to have a fast-path of calling/defining functions. This fast path is akin to `Func::wrap` in Rust, although that API can't be built in C due to C not having generics in the same way that Rust has. For some benchmarks, the benchmarks here are: `nop` - Call a wasm function from the host that does nothing and returns nothing. * `i64` - Call a wasm function from the host, the wasm function calls a host function, and the host function returns an `i64` all the way out to the original caller. * `many` - Call a wasm function from the host, the wasm calls host function with 5 `i32` parameters, and then an `i64` result is returned back to the original host * `i64` host - just the overhead of the wasm calling the host, so the wasm calls the host function in a loop. * `many` host - same as `i64` host, but calling the `many` host function. All numbers in this table are in nanoseconds, and this is just one measurement as well so there's bound to be some variation in the precise numbers here. \| Name \| Rust \| C (before) \| C (after) \| \|-----------\|------\|------------\|-----------\| \| nop \| 19 \| 112 \| 25 \| \| i64 \| 22 \| 207 \| 32 \| \| many \| 27 \| 189 \| 34 \| \| i64 host \| 2 \| 38 \| 5 \| \| many host \| 7 \| 75 \| 8 \| The main conclusion here is that the C API is significantly faster than before when using the `_unchecked` variants of APIs. The Rust implementation is still the ceiling (or floor I guess?) for performance The main reason that C is slower than Rust is that a little bit more has to travel through memory where on the Rust side of things we can monomorphize and inline a bit more to get rid of that. Overall though the costs are way way down from where they were originally and I don't plan on doing a whole lot more myself at this time. There's various things we theoretically could do I've considered but implementation-wise I think they'll be much more weighty. Tweak `wasmtime_externref_t` API comments	3 years ago
Advance Software	a8467d0824	Exports symbols to be shared with external GDB/JIT debugging interfac… (#3373 ) * Exports symbols to be shared with external GDB/JIT debugging interface tools. Windows O/S specific requirement. * Moved comments into platform specific compiler directive sections.	3 years ago
Steve	20f54bc252	Moved comments into platform specific compiler directive sections.	3 years ago

... 2 3 4 5 6 ...

448 Commits (0f9ac11bfbb8c6c876813781af484e07f998a25c)