* Remove `once-cell` dependency.
* Remove function address `BTreeMap` from `CompiledModule` in favor of binary
searching finished functions directly.
* Use `with_capacity` when populating `CompiledModule` finished functions and
trampolines.
This commit refactors the store frame information to eliminate the copying of
data out from `CompiledModule`.
It also moves the population of a `BTreeMap` out of the frame information and
into `CompiledModule` where it is only ever calculated once rather than at
every new module instantiation into a `Store`. The map is also lazy-initialized
so the cost of populating the map is incurred only when a trap occurs.
This should help improve instantiation time of modules with a large number of
functions and functions with lots of instructions.
* Fully support multiple returns in Wasmtime
For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.
The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.
This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.
These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!
Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.
Closes#1178
* Tweak some names and add docs
* "fix" lightbeam compile
* Fix TODO with dummy environ
* Unwind info is a property of the target, not the ABI
* Remove lightbeam unused imports
* Attempt to fix arm64
* Document new ABIs aren't stable
* Fix filetests to use the right target
* Don't always do 64-bit stores with cranelift
This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.
* At least get tests passing on the old backend
* Fix a typo
* Add some filetests with mixed abi calls
* Get `multi` example working
* Fix doctests on old x86 backend
* Add a mixture of wasmtime/system_v tests
This commit hides the existing WebAssembly feature CLI options (e.g.
`--enable-simd`) and adds a `--wasm-features` flag that enables multiple
(or all) WebAssembly features.
Features can be disabled by prefixing the value with `-`, e.g.
`--wasm-features=-simd`.
async methods used by wiggle currently need to Not have the Send
constraint, so rather than make all use sites pass the argument
to the re-exported async_trait macro, define a new macro that
applies the argument.
* Combine stack-based cleanups for faster wasm calls
This commit is an extension of #2757 where the goal is to optimize entry
into WebAssembly. Currently wasmtime has two stack-based cleanups when
entering wasm, one for the externref activation table and another for
stack limits getting reset. This commit fuses these two cleanups
together into one and moves some code around which enables less captures
for fewer closures and such to speed up calls in to wasm a bit more.
Overall this drops the execution time from 88ns to 80ns locally for me.
This also updates the atomic orderings when updating the stack limit
from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting
point the usage here should be safe to use `Relaxed` since we're not
using the atomics to actually protect any memory, it's simply receiving
signals from other threads.
* Determine whether a pc is wasm via a global map
The macOS implementation of traps recently changed to using mach ports
for handlers instead of signal handlers. This means that a previously
relied upon invariant, each thread fixes its own trap, was broken. The
macOS implementation worked around this by maintaining a global map from
thread id to thread local information, however, to solve the problem.
This global map is quite slow though. It involves taking a lock and
updating a hash map on all calls into WebAssembly. In my local testing
this accounts for >70% of the overhead of calling into WebAssembly on
macOS. Naturally it'd be great to remove this!
This commit fixes this issue and removes the global lock/map that is
updated on all calls into WebAssembly. The fix is to maintain a global
map of wasm modules and their trap addresses in the `wasmtime` crate.
Doing so is relatively simple since we're already tracking this
information at the `Store` level.
Once we've got a global map then the macOS implementation can use this
from a foreign thread and everything works out.
Locally this brings the overhead, on macOS specifically, of calling into
wasm from 80ns to ~20ns.
* Fix compiles
* Review comments
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).
Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
This commit splits out a `FiberStack` from `Fiber`, allowing the instance
allocator trait to return `FiberStack` rather than raw stack pointers. This
keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand
instance allocator can make use of it.
The instance allocators no longer have to return a "not supported" error to
indicate that the store should allocate its own fiber stack.
This includes a bunch of cleanup in the instance allocator to scope stacks to
the new "async" feature in the runtime.
Closes#2708.
* Switch macOS to using mach ports for trap handling
This commit moves macOS to using mach ports instead of signals for
handling traps. The motivation for this is listed in #2456, namely that
once mach ports are used in a process that means traditional UNIX signal
handlers won't get used. This means that if Wasmtime is integrated with
Breakpad, for example, then Wasmtime's trap handler never fires and
traps don't work.
The `traphandlers` module is refactored as part of this commit to split
the platform-specific bits into their own files (it was growing quite a
lot for one inline `cfg_if!`). The `unix.rs` and `windows.rs` files
remain the same as they were before with a few minor tweaks for some
refactored interfaces. The `macos.rs` file is brand new and lifts almost
its entire implementation from SpiderMonkey, adapted for Wasmtime
though.
The main gotcha with mach ports is that a separate thread is what
services the exception. Some unsafe magic allows this separate thread to
read non-`Send` and temporary state from other threads, but is hoped to
be safe in this context. The unfortunate downside is that calling wasm
on macOS now involves taking a global lock and modifying a global hash
map twice-per-call. I'm not entirely sure how to get out of this cost
for now, but hopefully for any embeddings on macOS it's not the end of
the world.
Closes#2456
* Add a sketch of arm64 apple support
* store: maintain CallThreadState mapping when switching fibers
* cranelift/aarch64: generate unwind directives to disable pointer auth
Aarch64 post ARMv8.3 has a feature called pointer authentication,
designed to fight ROP/JOP attacks: some pointers may be signed using new
instructions, adding payloads to the high (previously unused) bits of
the pointers. More on this here: https://lwn.net/Articles/718888/
Unwinders on aarch64 need to know if some pointers contained on the call
frame contain an authentication code or not, to be able to properly
authenticate them or use them directly. Since native code may have
enabled it by default (as is the case on the Mac M1), and the default is
that this configuration value is inherited, we need to explicitly
disable it, for the only kind of supported pointers (return addresses).
To do so, we set the value of a non-existing dwarf pseudo register (34)
to 0, as documented in
https://github.com/ARM-software/abi-aa/blob/master/aadwarf64/aadwarf64.rst#note-8.
This is done at the function granularity, in the spirit of Cranelift
compilation model. Alternatively, a single directive could be generated
in the CIE, generating less information per module.
* Make exception handling work on Mac aarch64 too
* fibers: use a breakpoint instruction after the final call in wasmtime_fiber_start
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Add `anyhow` dependency to `wasmtime-runtime`.
* Revert `get_data` back to `fn`.
* Remove `DataInitializer` and box the data in `Module` translation instead.
* Improve comments on `MemoryInitialization`.
* Remove `MemoryInitialization::OutOfBounds` in favor of proper bulk memory
semantics.
* Use segmented memory initialization except for when the uffd feature is
enabled on Linux.
* Validate modules with the allocator after translation.
* Updated various functions in the runtime to return `anyhow::Result`.
* Use a slice when copying pages instead of `ptr::copy_nonoverlapping`.
* Remove unnecessary casts in `OnDemandAllocator::deallocate`.
* Better document the `uffd` feature.
* Use WebAssembly page-sized pages in the paged initialization.
* Remove the stack pool from the uffd handler and simply protect just the guard
pages.
This commit implements copying paged initialization data upon a fault of a
linear memory page.
If the initialization data is "paged", then the appropriate pages are copied
into the Wasm page (or zeroed if the page is not present in the
initialization data).
If the initialization data is not "paged", the Wasm page is zeroed so that
module instantiation can initialize the pages.
This commit implements the pooling instance allocator.
The allocation strategy can be set with `Config::with_allocation_strategy`.
The pooling strategy uses the pooling instance allocator to preallocate a
contiguous region of memory for instantiating modules that adhere to various
limits.
The intention of the pooling instance allocator is to reserve as much of the
host address space needed for instantiating modules ahead of time and to reuse
committed memory pages wherever possible.
* Update wasm-tools crates
* Update Wasm SIMD spec tests
* Invert 'experimental_x64_should_panic' logic
By doing this, it is easier to see which spec tests currently panic. The new tests correspond to recently-added instructions.
* Fix: ignore new spec tests for all backends
* Implement support for `async` functions in Wasmtime
This is an implementation of [RFC 2] in Wasmtime which is to support
`async`-defined host functions. At a high level support is added by
executing WebAssembly code that might invoke an asynchronous host
function on a separate native stack. When the host function's future is
not ready we switch back to the main native stack to continue execution.
There's a whole bunch of details in this commit, and it's a bit much to
go over them all here in this commit message. The most important changes
here are:
* A new `wasmtime-fiber` crate has been written to manage the low-level
details of stack-switching. Unixes use `mmap` to allocate a stack and
Windows uses the native fibers implementation. We'll surely want to
refactor this to move stack allocation elsewhere in the future. Fibers
are intended to be relatively general with a lot of type paremters to
fling values back and forth across suspension points. The whole crate
is a giant wad of `unsafe` unfortunately and involves handwritten
assembly with custom dwarf CFI directives to boot. Definitely deserves
a close eye in review!
* The `Store` type has two new methods -- `block_on` and `on_fiber`
which bridge between the async and non-async worlds. Lots of unsafe
fiddly bits here as we're trying to communicate context pointers
between disparate portions of the code. Extra eyes and care in review
is greatly appreciated.
* The APIs for binding `async` functions are unfortunately pretty ugly
in `Func`. This is mostly due to language limitations and compiler
bugs (I believe) in Rust. Instead of `Func::wrap` we have a
`Func::wrapN_async` family of methods, and we've also got a whole
bunch of `Func::getN_async` methods now too. It may be worth
rethinking the API of `Func` to try to make the documentation page
actually grok'able.
This isn't super heavily tested but the various test should suffice for
engaging hopefully nearly all the infrastructure in one form or another.
This is just the start though!
[RFC 2]: https://github.com/bytecodealliance/rfcs/pull/2
* Add wasmtime-fiber to publish script
* Save vector/float registers on ARM too.
* Fix a typo
* Update lock file
* Implement periodically yielding with fuel consumption
This commit implements APIs on `Store` to periodically yield execution
of futures through the consumption of fuel. When fuel runs out a
future's execution is yielded back to the caller, and then upon
resumption fuel is re-injected. The goal of this is to allow cooperative
multi-tasking with futures.
* Fix compile without async
* Save/restore the frame pointer in fiber switching
Turns out this is another caller-saved register!
* Simplify x86_64 fiber asm
Take a leaf out of aarch64's playbook and don't have extra memory to
load/store these arguments, instead leverage how `wasmtime_fiber_switch`
already loads a bunch of data into registers which we can then
immediately start using on a fiber's start without any extra memory
accesses.
* Add x86 support to wasmtime-fiber
* Add ARM32 support to fiber crate
* Make fiber build file probing more flexible
* Use CreateFiberEx on Windows
* Remove a stray no-longer-used trait declaration
* Don't reach into `Caller` internals
* Tweak async fuel to eventually run out.
With fuel it's probably best to not provide any way to inject infinite
fuel.
* Fix some typos
* Cleanup asm a bit
* Use a shared header file to deduplicate some directives
* Guarantee hidden visibility for functions
* Enable gc-sections on macOS x86_64
* Add `.type` annotations for ARM
* Update lock file
* Fix compile error
* Review comments
This adds the ability to add feature flags (e.g. `--features wasi-nn`) when compiling `wasmtime-bench-api` to allow benchmarking Wasmtime with WASI proposals included. Note that due to https://github.com/rust-lang/cargo/issues/5364, these features are only available:
- in the `crates/bench-api` directory, e.g. `pushd crates/bench-api; cargo build --features wasi-crypto`
- or from the top-level project directory using `-Zpackage-features`, e.g. `OPENVINO_INSTALL_DIR=/opt/intel/openvino cargo +nightly build -p wasmtime-bench-api -Zpackage-features --features wasi-nn`
This commit updates to the 0.9 version of the witx crate implemented in
WebAssembly/wasi#395. This new version drastically changes code
generation and how we interface with the crate. The intention is to
abstract the code generation aspects and allow code generators to
implement much more low-level instructions to enable more flexible APIs
in the future. Additionally a bunch of `*.witx` files were updated in
the WASI repository.
It's worth pointing out, however, that `wasi-common` does not change as
a result of this change. The shape of the APIs that we need to implement
are effectively the same and the only difference is that the shim
functions generated by wiggle are a bit different.