This PR removes all minutes and agendas in `meetings/`. These were
previously hosted in this repository, but we found that it makes things
somewhat more complex with respect to CI configuration and merge
permissions to have both small, CI-less changes to the text in
`meetings/` as well as changes to everything else in one repository.
The minutes and agendas have been split out into the repository at
https://github.com/bytecodealliance/meetings/, with all history
preserved. Future agenda additions and minutes contributions should go
there as PRs.
Finally, this PR adds a small note to our "Contributing" doc to note the
existence of the meetings and invite folks to ask to join if interested.
* Cranelift: make `ir::Type` a `u16`.
* Cranelift: pack ValueData back into 64 bits.
After extending `Type` to a `u16`, `ValueData` became 12 bytes rather
than 8. This packs it back down to 8 bytes (64 bits) by stealing two
bits from the `Type` for the enum discriminant (leaving 14 bits for the
type itself).
Performance comparison (3-way between original (`ty-u8`), 16-bit `Type`
(`ty-u16`), and this PR (`ty-packed`)):
```
~/work/sightglass% target/release/sightglass-cli benchmark \
-e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \
--iterations-per-process 10 --processes 2 \
benchmarks-next/spidermonkey/benchmark.wasm
compilation
benchmarks-next/spidermonkey/benchmark.wasm
cycles
[20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so
[22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so
[20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so
nanoseconds
[5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so
[5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so
[5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so
```
So, when compiling SpiderMonkey.wasm, making `Type` 16 bits regresses
performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0%
cost (5.683s -> 5.723s). That's still not great, and we can likely do better,
but it's a start.
* Fix test failure: entities to/from u32 via `{from,to}_bits`, not `{from,to}_u32`.
* Update differential fuzzing configuration
This uses some new features of `wasm-smith` and additionally tweaks the
existing fuzz configuration:
* More than one function is now allowed to be generated. There's no
particular reason to limit differential execution to just one and we
may want to explore other interesting module shapes.
* More than one function type is now allowed to possibly allow more
interesting `block` types.
* Memories are now allowed to grow beyond one page, but still say small
by staying underneath 10 pages.
* Tables are now always limited in their growth to ensure consistent
behavior across engines (e.g. with the pooling allocator vs v8).
* The `export_everything` feature is used instead of specifying a
min/max number of exports.
The `wasmi` differential fuzzer was updated to still work if memory is
exported, but otherwise the v8 differential fuzzer already worked if a
function was exported but a memory wasn't. Both fuzzers continue to
execute only the first exported function.
Also notable from this update is that the `SwarmConfig` from
`wasm-smith` will now include an arbitrary `allowed_instructions`
configuration which may help explore the space of interesting modules
more effectively.
* Fix typos
OSS-fuzz long-ago discovered https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=45662
which we currently believe to be a bug in v8. I originally thought it
was going to be fixed with https://bugs.chromium.org/p/v8/issues/detail?id=12722
but that no longer appears to be the case now that the `v8` crate has
caught up and it still isn't fixed. Personally I've sort of lost an
appetite for continuing to debug these issues so I figure it's best to
just disable reference types with v8 for now and exercise the rest of
the engine, e.g. simd.
`fmin`/`fmax` are defined as returning -0.0 as smaller than 0.0.
This is not how the IEEE754 views these values and the interpreter was
returning the wrong value in these operations since it was just using the
standard IEEE754 comparisons.
This also tries to preserve NaN information by avoiding passing NaN's
through any operation that could canonicalize it.
* Bump versions of wasm-tools crates
Note that this leaves new features in the component model, outer type
aliases for core wasm types, unimplemented for now.
* Move to crates.io-based versions of tools
* ci: replace OpenVINO installer action
To test wasi-nn, we currently use an OpenVINO backend. The Wasmtime CI
must install OpenVINO using a custom GitHub action. This CI action has
not been updated in some time and in the meantime OpenVINO (and the
OpenVINO crates) have released several new versions.
https://github.com/abrown/install-openvino-action is an external action
that we plan to keep up to date with the latest releases. This change
replaces the current CI action with that one.
* wasi-nn: upgrade openvino dependency to v0.4.1
This eliminates a `lazy_static` dependency and changes a few parameters
to pass by reference. Importantly, it enables support for the latest
versions of OpenVINO (v2022.*) in wasi-nn.
* ci: update wasi-nn script to source correct env script
* ci: really use the correct path for the env script
Also, clarify which directory OpenVINO is installed in (the symlink may
not be present).
* Update tracing-core to a version which doesn't depend on lazy-static.
* Update crossbeam-utils to a version that doesn't depend on lazy-static.
* Update crossbeam-epoch to a version that doesn't depend on lazy-static.
* Update clap to a version that doesn't depend on lazy-static.
* Convert Wasmtime's own use of lazy_static to once_cell.
* Make `GDB_REGISTRATION`'s comment a doc comment.
* Fix compilation on Windows.
This commit adds support to Wasmtime for components which themselves
export instances. The support here adds new APIs for how instance
exports are accessed in the embedding API. For now this is mostly just a
first-pass where the API is somewhat confusing and has a lot of
lifetimes. I'm hoping that over time we can figure out how to simplify
this but for now it should at least be expressive enough for exploring
the exports of an instance.
* cranelift: Implement `fma` on interpreter
* cranelift: Implement `fabs` on interpreter
* cranelift: Fix `fneg` implementation on interpreter
`fneg` was implemented as `0 - x` which is not correct according to the
standard since that operation makes no guarantees on what the output
is when the input is `NaN`. However for `fneg` the output for `NaN`
inputs is fully defined.
* cranelift: Implement `fcopysign` on interpreter
* support enums with more than 256 variants in derive macro
This addresses #4361. Technically, we now support up to 2^32 variants, which is
the maximum for the canonical ABI. In practice, though, the derived code for
enums with even just 2^16 variants takes a prohibitively long time to compile.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* simplify `LowerExpander::expand_variant` code
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
When setting up a copy on write image, we add several seals, to prevent
the image from being resized or modified. Set all the seals in a single
call, rather than doing one call per seal.
This defines the full set of 32 128-bit vector registers on s390x.
(Note that the VRs overlap the existing FPRs.) In addition, this
adds support to use all 32 vector registers to implement floating-
point operations, by using vector floating-point instructions with
the 'W' bit set to operate only on the first element.
This part of the vector instruction set mostly matches the old FP
instruction set, with two exceptions:
- There is no vector version of the COPY SIGN instruction. Instead,
now use a VECTOR SELECT with an appropriate bit mask to implement
the fcopysign operation.
- There are no vector version of the float <-> int conversion
instructions where source and target differ in bit size. Use
appropriate multiple conversion steps instead. This also requires
use of explicit checking to implement correct overflow handling.
As a side effect, this version now also implements the i8 / i16
variants of all conversions, which had been missing so far.
For all operations except those two above, we continue to use the
old FP instruction if applicable (i.e. if all operands happen to
have been allocated to the original FP register set), and use the
vector instruction otherwise.
* support variant, enum, and union derives
This is the second stage of implementing #4308. It adds support for deriving
variant, enum, and union impls for `ComponentType`, `Lift`, and `Lower`. It
also fixes derived record impls for generic `struct`s, which I had intended to
support in my previous commit, but forgot to test.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* deduplicate component-macro code
Thanks to @jameysharp for the suggestion!
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Move from passing and returning u8 and u16 values to u32 in many of
the functions. This removes a number of type conversions and gives
a small compilation time speedup, around ~0.7% on my aarch64 machine.
Copyright (c) 2022, Arm Limited.
Now that lowering is fully done in ISLE, clean up some code remnants
in lower.rs. In particular, move code to lower/isle.rs where
possible, and inline lower_insn_to_regs into its caller and simplify.
* Wasmtime: disable unwind_info unless needed
fixes#4350
Otherwise wasm modules will be built with unwind info, even if
backtraces are disabled. This can get expensive in deeply recursive
modules.
* Wasmtime: test that disabling backtraces disables unwind_info
* fix: make sure we have unwind_info when the engine needs it
This commit updates the management of the `may_enter` flag in line with
WebAssembly/component-model#57. Namely the `may_enter` flag is now
exclusively managed in the `canon lift` function (which is
`TypedFunc::call`) and is only unset after post-return completes
successfully. This implements semantics where if any trap happens for
any reason (lifting, lowering, execution, imports, etc) then the
instance is considered permanently poisoned and can no longer be
entered.
Tests needed many updates to create new instances where previously the
same instance was reused after it had an erroneous state.
The more I read over this again the more I think that these should be
constants to explicitly indicate that we're supposed to be able to
optimize for them. Currently I'm predicting that adding memory64 support
will probably double the surface area of each trait (e.g. `lower32` and
`lower64`) rather than have a parameter passed around. This is in the
hopes that having specialized 32 and 64-bit paths will enable better
optimizations within each path instead of having to check all bounds
everywhere.
Additionally one day I'd like to have `fn load(bytes: &[u8;
Self::SIZE32])` but that doesn't work today in Rust.
This adds infrastructure to allow implementing call and return
instructions in ISLE, and migrates the s390x back-end.
To implement ABI details, this patch creates public accessors
for `ABISig` and makes them accessible in ISLE. All actual
code generation is then done in ISLE rules, following the
information provided by that signature.
[ Note that the s390x back end never requires multiple slots for
a single argument - the infrastructure to handle this should
already be present, however. ]
To implement loops in ISLE rules, this patch uses regular tail
recursion, employing a `Range` data structure holding a range
of integers to be looped over.
* Implement `canon lower` of a `canon lift` function in the same component
This commit implements the "degenerate" logic for implementing a
function within a component that is lifted and then immediately lowered
again. In this situation the lowered function will immediately generate
a trap and doesn't need to implement anything else.
The implementation in this commit is somewhat heavyweight but I think is
probably justified moreso in future additions to the component model
rather than what exactly is here right now. It's not expected that this
"always trap" functionality will really be used all that often since it
would generally mean a buggy component, but the functionality plumbed
through here is hopefully going to be useful for implementing
component-to-component adapter trampolines.
Specifically this commit implements a strategy where the `canon.lower`'d
function is generated by Cranelift and simply has a single trap
instruction when called, doing nothing else. The main complexity comes
from juggling around all the data associated with these functions,
primarily plumbing through the traps into the `ModuleRegistry` to
ensure that the global `is_wasm_trap_pc` function returns `true` and at
runtime when we lookup information about the trap it's all readily
available (e.g. translating the trapping pc to a `TrapCode`).
* Fix non-component build
* Fix some offset calculations
* Only create one "always trap" per signature
Use an internal map to deduplicate during compilation.
This is the first stage of implementing
https://github.com/bytecodealliance/wasmtime/issues/4308, i.e. derive macros for
`ComponentType`, `Lift`, and `Lower` for composite types in the component model.
This stage only covers records; I expect the other composite types will follow a
similar pattern.
It borrows heavily from the work Jamey Sharp did in
https://github.com/bytecodealliance/wasmtime/pull/4217. Thanks for that, and
thanks to both Jamey and Alex Crichton for their excellent review feedback.
Thanks also to Brian for pairing up on the initial draft.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
Currently I don't know how we can reasonably implement this. Given all
the signatures of how we call functions and how functions are called on
the host there's no real feasible way that I know of to hook these two
up "seamlessly". This means that a component which reexports an imported
function can't be run in Wasmtime.
One of the main reasons for this is that when calling a component
function Wasmtime wants to lower arguments first and then have them
lifted when the host is called. With a reexport though there's not
actually anything to lower into so we'd sort of need something similar
to a table on the side or maybe a linear memory and that seems like it'd
get quite complicated quite quickly for not really all that much
benefit. As-such for now this simply returns a first-class error (rather
than the current panic) in situations like this.
* Implement lowered-then-lifted functions
This commit is a few features bundled into one, culminating in the
implementation of lowered-then-lifted functions for the component model.
It's probably not going to be used all that often but this is possible
within a valid component so Wasmtime needs to do something relatively
reasonable. The main things implemented in this commit are:
* Component instances are now assigned a `RuntimeComponentInstanceIndex`
to differentiate each one. This will be used in the future to detect
fusion (one instance lowering a function from another instance). For
now it's used to allocate separate `VMComponentFlags` for each
internal component instance.
* The `CoreExport<FuncIndex>` of lowered functions was changed to a
`CoreDef` since technically a lowered function can use another lowered
function as the callee. This ended up being not too difficult to plumb
through as everything else was already in place.
* A need arose to compile host-to-wasm trampolines which weren't already
present. Currently wasm in a component is always entered through a
host-to-wasm trampoline but core wasm modules are the source of all
the trampolines. In the case of a lowered-then-lifted function there
may not actually be any core wasm modules, so component objects now
contain necessary trampolines not otherwise provided by the core wasm
objects. This feature required splitting a new function into the
`Compiler` trait for creating a host-to-wasm trampoline. After doing
this core wasm compilation was also updated to leverage this which
further enabled compiling trampolines in parallel as opposed to the
previous synchronous compilation.
* Review comments
* Migrate from `winapi` to `windows-sys`
I believe that Microsoft itself is supporting the development of
`windows-sys` and it's also used by `cap-std` now so this switches
Wasmtime's dependencies on Windows APIs from the `winapi` crate to the
`windows-sys` crate. We still have `winapi` in our dependency graph but
that may get phased out over time.
* Make windows-sys a target-specific dependency
* Improve error message for failed function compiles
Add in the wasm function index, the name if specified, and the function
offset in the original file to assist in debugging failed function compiles.
* Review commments
- Handle call instructions' clobbers with the clobbers API, using RA2's
clobbers bitmask (bytecodealliance/regalloc2#58) rather than clobbers
list;
- Pull in changes from bytecodealliance/regalloc2#59 for much more sane
edge-case behavior w.r.t. liverange splitting.
* Change how unwind information is stored on Windows
Unwind information on Windows is stored in two separate locations. The
first location is the unwind information itself which corresponds to
`UNWIND_INFO`. The second location is a list of `RUNTIME_INFO`
structures which point to function bodes and `UNWIND_INFO` structures.
Currently in Wasmtime the `UNWIND_INFO` structures are stored just after
functions themselves with a somewhat cryptic comment indicating that
Windows prefers this (I'm unsure as to the provenance of this comment).
The `RUNTIME_INFO` data is then stored in a separate section which has
the custom name of `_wasmtime_winx64_unwind`.
After my recent foray into trying to debug windows-2022 bad unwind
information again I realized though that Windows actually has official
sections for these two unwind information items. The `.xdata` section is
used to store the `UNWIND_INFO` structures and the `.pdata` section
stores the `RUNTIME_INFO` list. To try to be somewhat idiomatic and
perhaps one day even hook into standard Windows debugging tools I went
ahead and refactored how our unwind information is stored to match this.
Perhaps the main benefit of this is that it reduces the size of the
read/execute section of the binary. Previously the unwind information
was executable since it was stored in the `.text` section, but
unnecessarily so. Now it's in a read-only section which is in theory a
small amount of hardening.
Otherwise though I don't think this will really help all that much to
hook up in to standard debugging tools like `objdump` because it's all
still stored in an ELF file rather than a COFF file.
* Review comments
This commit extends the `WasmList<T>` type to have an
`as_slice`-lookalike method (now renamed to `as_le_slice`) for all
integer types rather than just the `u8` type. With the guarantees of the
component model it's known that all lists are aligned in linear memory.
Additionally linear memories themselves are also generally guaranteed to
be aligned. This means that hosts where the primitive integer alignment
is at most the size (which I think is basically all host platforms) can
get a raw view into memory for the wasm linear memory for slices of
these types.
Note, though, that the remaining caveat after alignment is endianness.
Big-endian hosts need to be aware that the integers aren't stored in a
native format. Previously tools like wit-bindgen have added an `Le<T>`
wrapper but for now I've opted to instead use a method that has "le" in
the name - `as_le_slice`. I'm hoping that this is a clear enough
indicator for users to little-endian conversions as appropriate when
reading the values within the slice.
Turns out that `adr` doesn't work in inline assembly within LLVM on
arm macOS, or at least not how we were using it. This switches instead
to an `adrp` and `add` pair which seems to convince the linker that the
relocations should all fit. The same pattern is used on Linux as well
only it has different syntax (so much for a portable assembler) for
consistency. Performance isn't really an issue here so there's no need
to go out of our way to get the single-instruction operand working.
The previous `cls` code was producing wrong results when fed with a -1 i8.
The fix here is to sign extend instead of zero extending since we want
to keep the sign bit as one in order for it to be counted correctly
in the cls instruction
This also merges the interpreter only tests now that aarch64
correctly supports this instruction