* winch: Solidify bounds check for dynamic heaps
This commit fixes and edge case for bounds checks for dynamic heaps.
https://github.com/bytecodealliance/wasmtime/pull/8157/files erroneously
tied the bounds check operation (more concretely the overflow check) to the size derived from from the heap
type. Even though offsets and access sizes are validated ahead-of-time
and bound to the heap type, in the case of overflow checking, we must
ensure that the operation size is tied to the target's pointer size to
avoid clamping the access size and offset addition, which would result
in missing an out-of-bounds memory access.
This commit also adds a disassembly test to avoid introducing
regressions in the future.
Additionally, this commit adds more comments around why `pointer_size`
is used for certain bounds checking operations.
* Update disassembly test
Precompiled artifacts for macOS show this for `wasmtime --version`
wasmtime-cli 24.0.0 (6fc3d274c 2024-08-20)
whereas for Linux they show
wasmtime-cli 24.0.0
and this is due to `git` not being available in the build environment on
Linux.
The skeleton unit may contain attributes that don't appear in
the split unit. In particular, this includes DW_AT_ranges and
DW_AT_comp_dir.
Also, set the correct form for the default directory in the
line program. Previously, the different form meant that we emitted
the directory again when a file used it. Copying the DW_AT_comp_dir
attribute is required for use of the default directory to work
successfully.
This commit is similar to #8976 where it's fixing some typos in the
encoding of the `adc` and `sbb` instructions used in Cranelift. These
appear to have copy/paste typos where the non-register-based opcodes
weren't updated from the `add` and `sub` opcodes. This problem was
exposed from a fuzz test case after #9136 landed. The fuzz test case is
minimized and included here as a new runtest and new emit tests are
additionally added.
This commit fixes an issue where a `Value` was both load-sunk and used
as-is, meaning it was both sunk and not. That triggered a panic in the
backend since this isn't valid. The reason for this is due to how some
ISLE rules were written where a `Value` was both implicitly coerced into
an `XmmMem` and an `Xmm`. This issue is similar to #4815 for example.
The fix in this commit is to force the operands into registers which
prevents load sinking which wouldn't work here anyway.
This panic was introduced in #5841 which is quite old at this point.
This bug does not affect WebAssembly translation due to how the `v128`
type maps to `i8x16` in Cranelift by default.
Closes#9143
* pulley: use enums for `{X,F,V}Reg`
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* pulley: add `BinaryOperands`
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* pulley: use `BinaryOperands` for binary operators
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
---------
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
This commit builds on the support added in #8450 to extend our simple
interpreter with support for the `extended-const` proposal to
WebAssembly. This is required when updating the spec-test-submodule
since `extended-const` was merged into the mainline specification and
some proposals are starting to rebase on that.
* Move `compute_use_states` to be able to test it
Make it a free-function so it doesn't depend on the `I` type variable of `Lower`
* Don't force `Multiple` on multi-result instructions
This commit is a result of [discussion on Zulip][Zulip] and is
attempting to fix an issue where some 128-bit instructions aren't fully
benefitting from load sinking on x64. At a high level 128-bit
addition isn't able to sink loads into instructions for halves of the
128-bit operation. At a lower level the reason for this is that
currently all operands of a multiple-result instruction are considered
multiply-used (as each result could be used) which prevents load
sinking.
Operations on 128-bit integers may be coupled with `isplit` afterwards
which is a multi-result instruction. This then means that the `Multiple`
state flows backwards to the 128-bit operation and all its operands,
including whatever is necessary to produce the individual components of
each 128-bit integer.
The fix in this commit is to introduce the concept of a "root"
instruction from the perspective of the calculation of `ValueUseState`.
In other words `ValueUseState` is no longer an accurate picture of the
function as a whole, but only the parts of the function rooted at a
"root" instruction. This is currently defined as multi-result
instructions meaning that `isplit` for example is a root instruction.
This is coupled with documentation/changes to
`get_value_as_source_or_const` to never allow looking through root
instructions (or considering them a `UniqueUse`).
This commit additionally updates some documentation in a few places and
refactors some usage of `get_value_as_source_or_const` to use other
helpers instead to reduce callers of `get_value_as_source_or_const` and
who to audit when modifying this function.
[Zulip]: https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/ValueUseState.3A.3AMultiple.20and.20multi-result.20instructions/near/462833578
* Review comments
* Remove unneeded `cu_low_pc` parameter
`unit.low_pc` is the same value.
* Simplify `clone_line_program`
We were parsing things that had already been parsed into `gimli::Unit`,
such as `DW_AT_comp_dir`, `DW_AT_name`, and `DW_AT_stmt_list`.
* Use `gimli::Dwarf` methods in more places
This moves Wasmtime over from the old, regalloc-based stack maps system to the
new "user" stack maps system.
Removing the old regalloc-based stack maps system is left for follow-up work.
This adds `pooling-table-elements` and `pooling-max-core-instance-size` options
to the CLI, allowing the user to override the defaults.
I found myself needing to override both of these settings when running a ASP.NET
Core application built using the Native AOT feature of the .NET runtime.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Implement a few minor optimizations around 128-bit integers
This commit implements a few minor changes for `i128` in both the egraph
optimizations and lowerings for x64. The optimization pass will now
transform `iconcat` into a `uextend` or `sextend` where appropriate.
The x64 backend then pattern-matches this to produce slightly more
optimal machine code. Additionally the x64 backend now handles
memory/immediate operands a bit better when the argument to a 128-bit
operation is an `iconcat`.
* Update test expectations
* Match iadd lowering rules for isub
* Add the ability to generate async drop methods for resources.
In the component model, `resource.drop` is a canonical built-in without a proper name. So I invented a custom naming scheme for the component bindgen config. I went with:
`"[drop]{resource-name}"` where `{resource-name}` is the name as defined in WIT. e.g. `"[drop]input-stream"`.
This shouldn't conflict with anything existing in the wild as WIT identifiers are not allowed to contain square brackets.
* Make `input-stream::drop` & `output-stream::drop` async
* Prevent the guest from (inadvertently) spawning an unlimited amount of background tasks through the FileOutputStream
* Properly clean up background tasks for even more types of I/O streams.
Unlike FileOutputStream, the background tasks of these stream types are truly async. So aborting _without_ awaiting them was probably already good enough in practice. Nonetheless, waiting for the background to actually shut down just seems like good resource management "hygiene" to me.
* Let HostInput/OutputStream provide specialized blocking_* implementations.
* Rename filesystem's `spawn_blocking` -> `run_blocking`
* Implement specialized `blocking_write_and_flush` for FileOutputStream.
- `write` now always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`.
- `blocking_write_and_flush` is newly overridden and continues to do the `allow_blocking_current_thread` trickery that `write` used to do.
* Implement `HostInputStream` for `FileInputStream`.
- `read` always spawns the syscall on a background task, regardless of `allow_blocking_current_thread`.
- `blocking_read` performs the `run_blocking`/`allow_blocking_current_thread` trickery.
* In Preview1 adapter: ignore BlockingMode and always perform blocking I/O, as that's what preview1 did.
* Remove special case for FileInputStream and change InputStream enum to be a type alias, just like OutputStream
* Remove `[method]output-stream.forward` from bindgen config. It does not exist.
* Refactor `blocking_splice` to take advantage of specialized `blocking_read` & `blocking_write_and_flush` implementations
* Defer to regular `read` from within `blocking_read` to reduce duplication of logic.
The DWARF sections in `DebugInputContext` were always set to the DWARF sections
from the .wasm file. However, when parsing split DWARF, most of the time we
want to be using the sections from the .dwp file instead. We were already
passing `gimli::Dwarf` to most places already so having the sections in
`DebugInputContext` was no benefit. So a lot of this commit is replacing
`context` with `dwarf`.
Next, sometimes we were using the wrong `gimli::Dwarf`. In general, we want
to use the skeleton unit for parsing line info, and the split unit for
everything else. `clone_unit` was wrongly using the DWARF associated with
the skeleton unit in many places. Instead of changing all of those places,
I've kept the variable names `dwarf` and `unit`, but changed which DWARF they
refer to. This also fits better with the non-split operation. So some of
the changes in this commit are updating places that were already correct
to use the new variable meanings.
Finally, this commit adds a call to `copy_relocated_attributes`. This copies
some info from the skeleton unit that is needed when parsing the split unit.
* Cranelift: Add a new backend for emitting Pulley bytecode
This commit adds two new backends for Cranelift that emits 32- and 64-bit Pulley
bytecode. The backends are both actually the same, with a common implementation
living in `cranelift/codegen/src/isa/pulley_shared`. Each backend configures an
ISA flag that determines the pointer size, and lowering inspects this flag's
value when lowering memory accesses.
To avoid multiple ISLE compilation units, and to avoid compiling duplicate
copies of Pulley's generated `MInst`, I couldn't use `MInst` as the `MachInst`
implementation directly. Instead, there is an `InstAndKind` type that is a
newtype over the generated `MInst` but which also carries a phantom type
parameter that implements the `PulleyTargetKind` trait. There are two
implementations of this trait, a 32- and 64-bit version. This is necessary
because there are various static trait methods for the mach backend which we
must implement, and which return the pointer width, but don't have access to any
`self`. Therefore, we are forced to monomorphize some amount of code. This type
parameter is fairly infectious, and all the "big" backend
types (`PulleyBackend<P>`, `PulleyABICallSite<P>`, etc...) are parameterized
over it. Nonetheless, not everything is parameterized over a `PulleyTargetKind`,
and we manage to avoid duplicate `MInst` definitions and lowering code.
Note that many methods are still stubbed out with `todo!`s. It is expected that
we will fill in those implementations as the work on Pulley progresses.
* Trust the `pulley-interpreter` crate, as it is part of our workspace
* fix some clippy warnings
* Fix a dead-code warning from inside generated code
* Use a helper for emitting br_if+comparison instructions
* Add a helper for converting `Reg` to `pulley_interpreter::XReg`
* Add version to pulley workspace dependency
* search the pulley directory for crates in the publish script
Looks like CMake before 3.20.0 doesn't generate newlines at all without
this configuration option. CMake 3.20.0 and prior, however, generates
newlines by default which is why this didn't show up in CI or
development.
Closes#9126
* Pulley: Add memory access instructions with 64-bit offsets
I had trimmed these instructions from the original upstreaming of the Pulley
interpreter because I had mistakenly believed that they were unused. Turns out
they are needed for Cranelift's Pulley backend to allow for lowering certain
address modes to a single instruction. The alternative, lowering the address
modes to a sequence of instructions, would be a bit annoying and these
instructions seem generally useful.
* rebase on top of indexing changes for `MachineState`
* Refactor use of `CodeBuilder` on the CLI
This commit refactors `wasmtime run` and `wasmtime compile` to
unconditionally use `CodeBuilder` internally. This will in theory help
out in the future if more debug-related options are added to
`CodeBuilder` for example. This refactoring required some changes to
`CodeBuilder` to be able to support a query about whether the internal
bytes were a component or a module. The text format is now converted to
binary immediately when supplied rather than during the compilation
phase. This in turn required some API changes to make the selection of
supporting the text format a compile-time choice of method rather than a
runtime value.
* Fix compile
* Fix no-cranelift build of CLI
* ISLE: reduce allocations when lexing integers
Instead of creating a temporary `Vec<u8>`, use a slice of the original
underlying `buf`, and only allocate a temporary `String` if it contains
an `_`.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: don't `vec![]` macro in lexer tests
`Vec` can be compared against arrays, since both deref to slices.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karlwfmeakin@gmail.com>
* ISLE: create `Files`
Centralize all file related arenas in `Files` struct.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: dont track line/col in `Pos`
They are already tracked in `Files`, so no need to track them in `Pos`
as well. This lets us simply the implementation of `Lexer::advance_pos`
a bit.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: don't pass `Files` into every pass
`Files` was being threaded through a lot of passes where it wasn't
needed. It is only needed for reporting errors in `compile.rs` and for
reporting line numbers when printing in `codegen.rs`.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: store `&str` in `Lexer`
Store the text being lexed as `&str`, rather than `&[u8]`, so that
substrings don't need to be rechecked for UTF-8 validity when lexing
identifiers or integers.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: add `peek_byte` helper for lexer
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: tests for lexing integers
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* ISLE: dont parse integers twice
Instead of trying to parse an integer as an `i128`, and then as an
`u128` if that fails, parse it only as a `u128` and then check for
`i128::MIN`.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
---------
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
Signed-off-by: Karl Meakin <karlwfmeakin@gmail.com>
* Add the ability to generate async drop methods for resources.
In the component model, `resource.drop` is a canonical built-in without a proper name. So I invented a custom naming scheme for the component bindgen config. I went with:
`"[drop]{resource-name}"` where `{resource-name}` is the name as defined in WIT. e.g. `"[drop]input-stream"`.
This shouldn't conflict with anything existing in the wild as WIT identifiers are not allowed to contain square brackets.
* Add test for resource_async
* Improve codegen for enums with many cases
This commit improves the compile time of generating bindings for enums
with many cases in them (e.g. 1000+). This is done by optimizing for
enums specifically rather than handling them generically like other
variants which can reduce the amount of code going into rustc to O(1)
instead of O(N) with the number of cases. This in turn can greatly
reduce compile time.
The tradeoff made in this commit is that enums are now required to have
`#[repr(...)]` annotations along with no Rust-level discriminants
specified. This enables the use of a `transmute` to lift a discriminant
into Rust with a simple bounds check. Previously this was one large
`match` statement.
Closes#9081
* Fix some tests
* Add repr tag in fuzzing
* Fix syntax for Rust 1.78
* Update nightly used in CI
Move it up beyond the LLVM 19 upgrade to see if we have any issues with
LLVM 19.
prtest:full
* Update nightly version
* Fix some warnings on nightly
* Alternative fix for warnings
* More lint fixes
* More warning tweaks
* Clean up dist build configuration
* Move updating `$PATH` to the `main.js` script which is the one that
mounts `/rust/bin` so that knowledge isn't spread around.
* Remove some unused env vars in docker containers.
* Forward cargo/rust-specific env vars to the build from outside of
containers to the build itself.
prtest:full
* Change how musl rustflags are configured
* More rustflags changes
* Review feedback
This commit adds a CI check that if versions are bumped that `cargo vet`
still works. It's hoped that #9115 won't happen again in the future with
this by ensuring we get all the various entries right in our vet
configuration.
Right now this is only on some crates such as `wasmtime` itself and
`wasmtime-cli`, but by applying it to all crates it helps with version
selection of those using just Cranelift for example.
We were using a string, but the DWARF standard specifies that:
"The value of the DW_AT_decl_file attribute corresponds to a file number from
the line number information table...".
Additionally, a typo meant we were overwriting the file attribute with
the value that was meant to be used for a DW_AT_decl_line attribute.
* Use cmake to build wasmtime-c-api
* Properly expose features when building via cmake
* Install all headers to same directory
* Add vets
* attempt to fix ci
* Run all tests on CI
prtest:full
* Set CARGO_BUILD_TARGET; add CMakeLists to package
* Update comment on github action
* Attempt to fix android build
* Fix source dir modifications of c-api build
* Re-add BINARY_DIR option
* Fix build
* Move header installation to a cmake script
Try to avoid dealing with cmake configuration/platforms/etc.
* Tweak build of headers
* Install headers in build dir for examples
* Add cmake files to dist, fix header install dir
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Use the `Index` and `IndexMut` traits for accessing registers, rather
than having to define getters and setters for every register class.
Copyright (c) 2024, Arm Limited.
Signed-off-by: Karl Meakin <karl.meakin@arm.com>
* riscv64: Refactor FpuRR instruction emission
Previously we had opcodes for each instruction length variant, with the width field implicitly embedded in the `funct7` field of the opcode.
This works, but the instructions are defined with the width field having a few more possible values. In order to add FP16 support to these instructions we would have to duplicate the opcodes.
Instead make these opcodes width agnostic and specifiy the width during emission.
* riscv64: Refactor FpuRRR instruction emission
Previously we had opcodes for each instruction length variant, with the width field implicitly embedded in the `funct7` field of the opcode.
This works, but the instructions are defined with the width field having a few more possible values. In order to add FP16 support to these instructions we would have to duplicate the opcodes.
Instead make these opcodes width agnostic and specifiy the width during emission.
* riscv64: Refactor FpuRRRR instruction emission
Previously we had opcodes for each instruction length variant, with the width field implicitly embedded in the `funct7` field of the opcode.
This works, but the instructions are defined with the width field having a few more possible values. In order to add FP16 support to these instructions we would have to duplicate the opcodes.
Instead make these opcodes width agnostic and specifiy the width during emission.
* riscv64: Fix emit tests
* riscv64: Run `cargo fmt`
* Add FP16 and I64 support for wasi-nn WinML backend.
Some devices may not support FP32.
prtest:full
* Remove unnecessary features.
* Address comments.
* Check alignment before from_raw_parts.
* Implement PartialEq for Tensor.
* Remove duplicated shape info from set_input.
* Update alignment checker.
* Add comments about creating TensorFloat16Bit from f32 array.
* Use PartialEq attribute.
* Audit new WinML dependencies
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
This commit groups together the registers that have to be collected from
a signal handler to correctly report a trap: namely, the program counter
and frame pointer, as of the time that the trap occurred.
I also moved the call to set_jit_trap inside test_if_trap for every
platform that uses both methods. Only the implementation for Mach ports
still needs to call set_jit_trap because it doesn't use test_if_trap.
In addition I'm fixing an unrelated doc comment that I stumbled across
while working on this.