I had no idea this was still in the repository, much less building!
There are much different ways to use wasmtime in Rust nowadays, such as
the `wasmtime` crate!
Previously the inclusion of the `criterion` crate had brought in a
transitive dependency to `cast`, which used old versions of several
libraries. Now that https://github.com/japaric/cast.rs/pull/26 is merged
and a new version published, we can update `cast` and remove the
cargo-deny rules for the duplicated, older versions.
This adds benchmarks around module instantiation using criterion.
Both the default (i.e. on-demand) and pooling allocators are tested
sequentially and in parallel using a thread pool.
Instantiation is tested with an empty module, a module with a single page
linear memory, a larger linear memory with a data initializer, and a "hello
world" Rust WASI program.
This change adds a criterion-enabled benchmark, x64-evex-encoding, to
compare the performance of the builder pattern used to encode EVEX
instructions in the new x64 backend against the function pattern
used to encode EVEX instructions in the legacy x86 backend. At face
value, the results imply that the builder pattern is faster, but no
efforts were made to analyze and optimize these approaches further.
* Upgrade to the latest versions of gimli, addr2line, object
And adapt to API changes. New gimli supports wasm dwarf, resulting in
some simplifications in the debug crate.
* upgrade gimli usage in linux-specific profiling too
* Add "continue" statement after interpreting a wasm local dwarf opcode
* wasi-nn: turn it on by default
This change makes the wasi-nn Cargo feature a default feature. Previously, a wasi-nn user would have to build a separate Wasmtime binary (e.g. `cargo build --features wasi-nn ...`) to use wasi-nn and the resulting binary would require OpenVINO shared libraries to be present in the environment in order to run (otherwise it would fail immediately with linking errors). With recent changes to the `openvino` crate, the wasi-nn implementation can defer the loading of the OpenVINO shared libraries until runtime (i.e., when the user Wasm program calls `wasi_ephemeral_nn::load`) and display a user-level error if anything goes wrong (e.g., the OpenVINO libraries are not present on the system). This runtime-linking addition allows the wasi-nn feature to be turned on by default and shipped with upcoming releases of Wasmtime. This change should be transparent for users who do not use wasi-nn: the `openvino` crate is small and the newly-available wasi-nn imports only affect programs in which they are used.
For those interested in reviewing the runtime linking approach added to the `openvino` crate, see https://github.com/intel/openvino-rs/pull/19.
* wasi-nn spec path: don't use canonicalize
* Allow dependencies using the ISC license
The ISC license should be [just as permissive](https://choosealicense.com/licenses/isc) as MIT, e.g., with no additional limitations.
* Add a `--wasi-modules` flag
This flag controls which WASI modules are made available to the Wasm program. This initial commit enables `wasi-common` by default (equivalent to `--wasi-modules=all`) and allows `wasi-nn` and `wasi-crypto` to be added in either individually (e.g., `--wasi-modules=wasi-nn`) or as a group (e.g., `--wasi-modules=all-experimental`).
* wasi-crypto: fix unused dependency
Co-authored-by: Pat Hickey <pat@moreproductive.org>
This removes an existing dependency on the byteorder crate in favor of
using std equivalents directly.
While not an issue for wasmtime per se, cranelift is now part of the
critical path of building and testing Rust, and minimizing dependencies,
even small ones, can help reduce the time and bandwidth required.
* Remove `once-cell` dependency.
* Remove function address `BTreeMap` from `CompiledModule` in favor of binary
searching finished functions directly.
* Use `with_capacity` when populating `CompiledModule` finished functions and
trampolines.
This commit refactors the store frame information to eliminate the copying of
data out from `CompiledModule`.
It also moves the population of a `BTreeMap` out of the frame information and
into `CompiledModule` where it is only ever calculated once rather than at
every new module instantiation into a `Store`. The map is also lazy-initialized
so the cost of populating the map is incurred only when a trap occurs.
This should help improve instantiation time of modules with a large number of
functions and functions with lots of instructions.
* Fully support multiple returns in Wasmtime
For quite some time now Wasmtime has "supported" multiple return values,
but only in the mose bare bones ways. Up until recently you couldn't get
a typed version of functions with multiple return values, and never have
you been able to use `Func::wrap` with functions that return multiple
values. Even recently where `Func::typed` can call functions that return
multiple values it uses a double-indirection by calling a trampoline
which calls the real function.
The underlying reason for this lack of support is that cranelift's ABI
for returning multiple values is not possible to write in Rust. For
example if a wasm function returns two `i32` values there is no Rust (or
C!) function you can write to correspond to that. This commit, however
fixes that.
This commit adds two new ABIs to Cranelift: `WasmtimeSystemV` and
`WasmtimeFastcall`. The intention is that these Wasmtime-specific ABIs
match their corresponding ABI (e.g. `SystemV` or `WindowsFastcall`) for
everything *except* how multiple values are returned. For multiple
return values we simply define our own version of the ABI which Wasmtime
implements, which is that for N return values the first is returned as
if the function only returned that and the latter N-1 return values are
returned via an out-ptr that's the last parameter to the function.
These custom ABIs provides the ability for Wasmtime to bind these in
Rust meaning that `Func::wrap` can now wrap functions that return
multiple values and `Func::typed` no longer uses trampolines when
calling functions that return multiple values. Although there's lots of
internal changes there's no actual changes in the API surface area of
Wasmtime, just a few more impls of more public traits which means that
more types are supported in more places!
Another change made with this PR is a consolidation of how the ABI of
each function in a wasm module is selected. The native `SystemV` ABI,
for example, is more efficient at returning multiple values than the
wasmtime version of the ABI (since more things are in more registers).
To continue to take advantage of this Wasmtime will now classify some
functions in a wasm module with the "fast" ABI. Only functions that are
not reachable externally from the module are classified with the fast
ABI (e.g. those not exported, used in tables, or used with `ref.func`).
This should enable purely internal functions of modules to have a faster
calling convention than those which might be exposed to Wasmtime itself.
Closes#1178
* Tweak some names and add docs
* "fix" lightbeam compile
* Fix TODO with dummy environ
* Unwind info is a property of the target, not the ABI
* Remove lightbeam unused imports
* Attempt to fix arm64
* Document new ABIs aren't stable
* Fix filetests to use the right target
* Don't always do 64-bit stores with cranelift
This was overwriting upper bits when 32-bit registers were being stored
into return values, so fix the code inline to do a sized store instead
of one-size-fits-all store.
* At least get tests passing on the old backend
* Fix a typo
* Add some filetests with mixed abi calls
* Get `multi` example working
* Fix doctests on old x86 backend
* Add a mixture of wasmtime/system_v tests
This commit hides the existing WebAssembly feature CLI options (e.g.
`--enable-simd`) and adds a `--wasm-features` flag that enables multiple
(or all) WebAssembly features.
Features can be disabled by prefixing the value with `-`, e.g.
`--wasm-features=-simd`.
async methods used by wiggle currently need to Not have the Send
constraint, so rather than make all use sites pass the argument
to the re-exported async_trait macro, define a new macro that
applies the argument.
* Combine stack-based cleanups for faster wasm calls
This commit is an extension of #2757 where the goal is to optimize entry
into WebAssembly. Currently wasmtime has two stack-based cleanups when
entering wasm, one for the externref activation table and another for
stack limits getting reset. This commit fuses these two cleanups
together into one and moves some code around which enables less captures
for fewer closures and such to speed up calls in to wasm a bit more.
Overall this drops the execution time from 88ns to 80ns locally for me.
This also updates the atomic orderings when updating the stack limit
from `SeqCst` to `Relaxed`. While `SeqCst` is a reasonable starting
point the usage here should be safe to use `Relaxed` since we're not
using the atomics to actually protect any memory, it's simply receiving
signals from other threads.
* Determine whether a pc is wasm via a global map
The macOS implementation of traps recently changed to using mach ports
for handlers instead of signal handlers. This means that a previously
relied upon invariant, each thread fixes its own trap, was broken. The
macOS implementation worked around this by maintaining a global map from
thread id to thread local information, however, to solve the problem.
This global map is quite slow though. It involves taking a lock and
updating a hash map on all calls into WebAssembly. In my local testing
this accounts for >70% of the overhead of calling into WebAssembly on
macOS. Naturally it'd be great to remove this!
This commit fixes this issue and removes the global lock/map that is
updated on all calls into WebAssembly. The fix is to maintain a global
map of wasm modules and their trap addresses in the `wasmtime` crate.
Doing so is relatively simple since we're already tracking this
information at the `Store` level.
Once we've got a global map then the macOS implementation can use this
from a foreign thread and everything works out.
Locally this brings the overhead, on macOS specifically, of calling into
wasm from 80ns to ~20ns.
* Fix compiles
* Review comments
This bumps target-lexicon and adds support for the AppleAarch64 calling
convention. Specifically for WebAssembly support, we only have to worry
about the new stack slots convention. Stack slots don't need to be at
least 8-bytes, they can be as small as the data type's size. For
instance, if we need stack slots for (i32, i32), they can be located at
offsets (+0, +4). Note that they still need to be properly aligned on
the data type they're containing, though, so if we need stack slots for
(i32, i64), we can't start the i64 slot at the +4 offset (it must start
at the +8 offset).
Added one test that was failing on the Mac M1, as well as other tests
stressing different yet similar situations.
This commit splits out a `FiberStack` from `Fiber`, allowing the instance
allocator trait to return `FiberStack` rather than raw stack pointers. This
keeps the stack creation mostly in `wasmtime_fiber`, but now the on-demand
instance allocator can make use of it.
The instance allocators no longer have to return a "not supported" error to
indicate that the store should allocate its own fiber stack.
This includes a bunch of cleanup in the instance allocator to scope stacks to
the new "async" feature in the runtime.
Closes#2708.
* Switch macOS to using mach ports for trap handling
This commit moves macOS to using mach ports instead of signals for
handling traps. The motivation for this is listed in #2456, namely that
once mach ports are used in a process that means traditional UNIX signal
handlers won't get used. This means that if Wasmtime is integrated with
Breakpad, for example, then Wasmtime's trap handler never fires and
traps don't work.
The `traphandlers` module is refactored as part of this commit to split
the platform-specific bits into their own files (it was growing quite a
lot for one inline `cfg_if!`). The `unix.rs` and `windows.rs` files
remain the same as they were before with a few minor tweaks for some
refactored interfaces. The `macos.rs` file is brand new and lifts almost
its entire implementation from SpiderMonkey, adapted for Wasmtime
though.
The main gotcha with mach ports is that a separate thread is what
services the exception. Some unsafe magic allows this separate thread to
read non-`Send` and temporary state from other threads, but is hoped to
be safe in this context. The unfortunate downside is that calling wasm
on macOS now involves taking a global lock and modifying a global hash
map twice-per-call. I'm not entirely sure how to get out of this cost
for now, but hopefully for any embeddings on macOS it's not the end of
the world.
Closes#2456
* Add a sketch of arm64 apple support
* store: maintain CallThreadState mapping when switching fibers
* cranelift/aarch64: generate unwind directives to disable pointer auth
Aarch64 post ARMv8.3 has a feature called pointer authentication,
designed to fight ROP/JOP attacks: some pointers may be signed using new
instructions, adding payloads to the high (previously unused) bits of
the pointers. More on this here: https://lwn.net/Articles/718888/
Unwinders on aarch64 need to know if some pointers contained on the call
frame contain an authentication code or not, to be able to properly
authenticate them or use them directly. Since native code may have
enabled it by default (as is the case on the Mac M1), and the default is
that this configuration value is inherited, we need to explicitly
disable it, for the only kind of supported pointers (return addresses).
To do so, we set the value of a non-existing dwarf pseudo register (34)
to 0, as documented in
https://github.com/ARM-software/abi-aa/blob/master/aadwarf64/aadwarf64.rst#note-8.
This is done at the function granularity, in the spirit of Cranelift
compilation model. Alternatively, a single directive could be generated
in the CIE, generating less information per module.
* Make exception handling work on Mac aarch64 too
* fibers: use a breakpoint instruction after the final call in wasmtime_fiber_start
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
* Add `anyhow` dependency to `wasmtime-runtime`.
* Revert `get_data` back to `fn`.
* Remove `DataInitializer` and box the data in `Module` translation instead.
* Improve comments on `MemoryInitialization`.
* Remove `MemoryInitialization::OutOfBounds` in favor of proper bulk memory
semantics.
* Use segmented memory initialization except for when the uffd feature is
enabled on Linux.
* Validate modules with the allocator after translation.
* Updated various functions in the runtime to return `anyhow::Result`.
* Use a slice when copying pages instead of `ptr::copy_nonoverlapping`.
* Remove unnecessary casts in `OnDemandAllocator::deallocate`.
* Better document the `uffd` feature.
* Use WebAssembly page-sized pages in the paged initialization.
* Remove the stack pool from the uffd handler and simply protect just the guard
pages.
This commit implements copying paged initialization data upon a fault of a
linear memory page.
If the initialization data is "paged", then the appropriate pages are copied
into the Wasm page (or zeroed if the page is not present in the
initialization data).
If the initialization data is not "paged", the Wasm page is zeroed so that
module instantiation can initialize the pages.