`wiggle` looks for an exported `Memory` named `"memory"` to use for its
guest slices. This change allows it to use a `SharedMemory` if this is
the kind of memory used for the export.
It is `unsafe` to use shared memory in Wiggle because of broken Rust
guarantees: previously, Wiggle could hand out slices to WebAssembly
linear memory that could be concurrently modified by some other thread.
With the introduction of Wiggle's new `UnsafeGuestSlice` (#5225, #5229,
#5264), Wiggle should now correctly communicate its guarantees through
its API.
We can encode more constants into 12-bit immediates if we do the following
rewrite for comparisons with odd constants:
A >= B + 1
==> A - 1 >= B
==> A > B
This commit refactors the internals of `wiggle` to have fewer raw pointers and more liberally use `&[UnsafeCell<_>]`. The purpose of this refactoring is to more strictly thread through lifetime information throughout the crate to avoid getting it wrong. Additionally storing `UnsafeCell<T>` at rest pushes the unsafety of access to the leaves of modifications where Rust safety guarantees are upheld. Finally this provides what I believe is a safer internal representation of `WasmtimeGuestMemory` since it technically holds onto `&mut [u8]` un-soundly as other `&mut T` pointers are handed out.
Additionally generated `GuestTypeTransparent` impls in the `wiggle` macro were removed because they are not safe for shared memories as-is and otherwise aren't needed for WASI today. The trait has been updated to indicate that all bit patterns must be valid in addition to having the same representation on the host as in the guest to accomodate this.
* wiggle: adapt Wiggle strings for shared use
This is an extension of #5229 for the `&str` and `&mut str` types. As
documented there, we are attempting to maintain Rust guarantees for
slices that Wiggle hands out in the presence of WebAssembly shared
memory, in which case multiple threads could be modifying the underlying
data of the slice.
This change changes the API of `GuestPtr` to return an `Option` which is
`None` when attempting to view the WebAssembly data as a string and the
underlying WebAssembly memory is shared. This reuses the
`UnsafeGuestSlice` structure from #5229 to do so and appropriately marks
the region as borrowed in Wiggle's manual borrow checker. Each original
call site in this project's WASI implementations is fixed up to `expect`
that a non-shared memory is used. (Note that I can find no uses of
`GuestStrMut` in the WASI implementations).
* wiggle: make `GuestStr*` containers wrappers of `GuestSlice*`
This change makes it possible to reuse the underlying logic in
`UnsafeGuestSlice` and the `GuestSlice*` implementations to continue to
expose the `GuestStr` and `GuestStrMut` types. These types now are
simple wrappers of their `GuestSlice*` variant. The UTF-8 validation
that distinguished `GuestStr*` now lives in the `TryFrom`
implementations for each type.
* cranelift-isle: New IR and revised overlap checks
* Improve error reporting
* Avoid "unused argument" warnings a nicer way
* Remove unused fields
* Minimize diff and "fix" error handling
I had tried to use Miette "right" and made things worse somehow. Among
other changes, revert all my changes to unrelated parts of `error.rs`
and `error_miette.rs`.
* Review comments: Rename "unmatchable" to "unreachable"
* Review comments: newtype wrappers, not type aliases
* Review comments: more comments on overlap checks
* Review comments: Clarify `normalize_equivalence_classes`
* Review comments: use union-find instead of linked list
This saves about 50 lines of code in the trie_again module. The
union-find implementation is about twice as long as that, counting
comments and doc-tests, but that's a worth-while tradeoff.
However, this makes `normalize_equivalence_classes` slower, because now
finding all elements of an equivalence class takes time linear in the
total size of all equivalence classes. If that ever turns out to be a
problem in practice we can find some way to optimize `remove_set_of`.
* Review comments: Hide constraints HashMap
We want to enforce that consumers of this representation can't observe
non-deterministic ordering in any of its public types.
* Review comments: Normalize equivalence classes incrementally
I'm not sure whether this is a good idea. It doesn't make the logic
particularly simpler, and I think it will do more work if three or more
binding sites with enum-variant constraints get set equal to each other.
* More comments and other clarifications
* Revert "Review comments: Normalize equivalence classes incrementally"
* Even more comments
* wiggle: adapt Wiggle guest slices for `unsafe` shared use
When multiple threads can concurrently modify a WebAssembly shared
memory, the underlying data for a Wiggle `GuestSlice` and
`GuestSliceMut` could change due to access from other threads. This
breaks Rust guarantees when `&[T]` and `&mut [T]` slices are handed out.
This change modifies `GuestPtr` to make `as_slice` and `as_slice_mut`
return an `Option` which is `None` when the underlying WebAssembly
memory is shared.
But WASI implementations still need access to the underlying WebAssembly
memory, both to read to it and write from it. This change adds new APIs:
- `GuestPtr::to_vec` copies the bytes from WebAssembly memory (from
which we can safely take a `&[T]`)
- `GuestPtr::as_unsafe_slice_mut` returns a wrapper `struct` from which
we can `unsafe`-ly return a mutable slice (users must accept the
unsafety of concurrently modifying a `&mut [T]`)
This approach allows us to maintain Wiggle's borrow-checking
infrastructure, which enforces the guarantee that Wiggle will not modify
overlapping regions, e.g. This is important because the underlying
system calls may expect this. Though other threads may modify the same
underlying region, this is impossible to prevent; at least Wiggle will
not be able to do so.
Finally, the changes to Wiggle's API are propagated to all WASI
implementations in Wasmtime. For now, code locations that attempt to get
a guest slice will panic if the underlying memory is shared. Note that
Wiggle is not enabled for shared memory (that will come later in
something like #5054), but when it is, these panics will be clear
indicators of locations that must be re-implemented in a thread-safe
way.
* review: remove double cast
* review: refactor to include more logic in 'UnsafeGuestSlice'
* review: add reference to #4203
* review: link all thread-safe WASI fixups to #5235
* fix: consume 'UnsafeGuestSlice' during conversion to safe versions
* review: remove 'as_slice' and 'as_slice_mut'
* review: use 'as_unsafe_slice_mut' in 'to_vec'
* review: add `UnsafeBorrowResult`
This commit updates the index allocation performed in the pooling
allocator with a few refactorings:
* With `cfg(fuzzing)` a deterministic rng is now used to improve
reproducibility of fuzz test cases.
* The `Mutex` was pushed inside of `IndexAllocator`, renamed from
`PoolingAllocationState`.
* Randomness is now always done through a `SmallRng` stored in the
`IndexAllocator` instead of using `thread_rng`.
* The `is_empty` method has been removed in favor of an `Option`-based
return on `alloc`.
This refactoring is additionally intended to encapsulate more
implementation details of `IndexAllocator` to more easily allow for
alternate implementations in the future such as lock-free approaches
(possibly).
This commit adds the missing "out of fuel" trap code to the C API.
Without this, calls to `wasmtime_trap_code` will trigger an unreachable panic
on traps from running out of fuel.
This commit prepares the `winch` crate for updating `wasm-tools`,
notably changing a bit about how the visitation of operators works. This
moves the function body and wasm validator out of the `CodeGen`
structure and into parameters threaded into the emission of the actual
function.
Additionally the `VisitOperator` implementation was updated to remove
the explicit calls to the validator, favoring instead a macro-generated
solution to guarantee that all validation happens before any translation
proceeds. This means that the `VisitOperator for CodeGen` impl is now
infallible and the various methods have been inlined into the trait
methods as well as removing the `Result<_>`.
Finally this commit updates translation to call `validator.finish(..)`
which is required to perform the final validation steps of the function
body.
This commit updates the default random context inserted into a
`WasiCtxt` to be seeded from `thread_rng` rather than the system's
entropy. This avoids an unconditional syscall on the creation of all
`WasiCtx` structures shouldn't reduce the quality of the random numbers
produced.
* Fix CI after CVE fixes
Alas we can't run CI ahead of time so this fixes various minor build
issues from the merging of the recent CVE fixes. Note that I plan to
publish the advisories once CI issues are sorted out.
* Fix mmap/free of zero bytes
This is a common pattern in sema, so factor it out.
Since this version uses `intern` instead of `intern_mut`, it might be a
tiny bit faster when errors occur due to not writing names into maps
then. When no error occurs, ISLE should do exactly the same work with or
without this commit.
This commit is an attempt at improving the safety of using the return
value of the `SharedMemory::data` method. Previously this returned
`*mut [u8]` which, while correct, is unwieldy and unsafe to work with.
The new return value of `&[UnsafeCell<u8>]` has a few advantages:
* The lifetime of the returned data is now connected to the
`SharedMemory` itself, removing the possibility for a class of errors
of accidentally using the prior `*mut [u8]` beyond its original lifetime.
* It's not possibly to safely access `.len()` as opposed to requiring an
`unsafe` dereference before.
* The data internally within the slice is now what retains the `unsafe`
bits, namely indicating that accessing any memory inside of the
contents returned is `unsafe` but addressing it is safe.
I was inspired by the `wiggle`-based discussion on #5229 and felt it
appropriate to apply a similar change here.
* Unconditionally use `MemoryImageSlot`
This commit removes the internal branching within the pooling instance
allocator to sometimes use a `MemoryImageSlot` and sometimes now.
Instead this is now unconditionally used in all situations on all
platforms. This fixes an issue where the state of a slot could get
corrupted if modules being instantiated switched from having images to
not having an image or vice versa.
The bulk of this commit is the removal of the `memory-init-cow`
compile-time feature in addition to adding Windows support to the
`cow.rs` file.
* Fix compile on Unix
* Add a stricter assertion for static memory bounds
Double-check that when a memory is allocated the configuration required
is satisfied by the pooling allocator.
Before, we would do a `heap_addr` to translate the given Wasm memory address
into a native memory address and pass it into the libcall that implemented the
atomic operation, which would then treat the address as a Wasm memory address
and pass it to `validate_atomic_addr` to be bounds checked a second time. This
is a bit nonsensical, as we are validating a native memory address as if it were
a Wasm memory address.
Now, we no longer do a `heap_addr` to translate the Wasm memory address to a
native memory address. Instead, we pass the Wasm memory address to the libcall,
and the libcall is responsible for doing the bounds check (by calling
`validate_atomic_addr` with the correct type of memory address now).
The `is_root` flag to `translate_pattern` just determines whether the
`rule_term` argument is used, which begs a larger cleanup. But that
cleanup is less clear if `is_root` is set anywhere aside from the call
in `collect_rules`. So I wanted to get confirmation that this particular
use of that flag is incorrect first.
These two arguments (`is_root` and `rule_term`) are used to prevent
expansion of a term as an internal extractor ("macro") if:
- that term is also an internal constructor
- and it's the root term on the left-hand side of the current rule
- and the pattern we're currently translating has no parents.
I'm not sure what it should mean to use the term you're currently
defining as the root pattern on the left-hand side of an if-let in the
same rule, but I don't think it should have this particular special
treatment.
It's nice to be able to report these names after sema analysis completes
so rule authors can recognize which names they used.
This isn't used anywhere yet, but I'm planning to use it during codegen,
and the rule-verification folks wanted something like this for debugging
output.
* Cranelift: Make `heap_addr` return calculated `base + index + offset`
Rather than return just the `base + index`.
(Note: I've chosen to use the nomenclature "index" for the dynamic operand and
"offset" for the static immediate.)
This move the addition of the `offset` into `heap_addr`, instead of leaving it
for the subsequent memory operation, so that we can Spectre-guard the full
address, and not allow speculative execution to read the first 4GiB of memory.
Before this commit, we were effectively doing
load(spectre_guard(base + index) + offset)
Now we are effectively doing
load(spectre_guard(base + index + offset))
Finally, this also corrects `heap_addr`'s documented semantics to say that it
returns an address that will trap on access if `index + offset + access_size` is
out of bounds for the given heap, rather than saying that the `heap_addr` itself
will trap. This matches the implemented behavior for static memories, and after
https://github.com/bytecodealliance/wasmtime/pull/5190 lands (which is blocked
on this commit) will also match the implemented behavior for dynamic memories.
* Update heap_addr docs
* Factor out `offset + size` to a helper
While checking the call graph of extractors during semantic validation,
save `TermId` instead of `Sym`. The types are both just integer indexes,
but the `TermId` is more useful here. Saving it avoids needing to check
for failed map lookups twice, which simplifies the implementation.
Modify return pseudo-instructions to have pairs of registers: virtual and real. This allows us to constrain the virtual registers to the real ones specified by the abi, instead of directly emitting moves to those real registers.
* Wasmtime+Cranelift: strip out some dead x86-32 code.
I was recently pointed to fastly/Viceroy#200 where it seems some folks
are trying to compile Wasmtime (via Viceroy) for Windows x86-32 and the
failures may not be loud enough. I've tried to reproduce this
cross-compiling to i686-pc-windows-gnu from Linux and hit build failures
(as expected) in several places. Nevertheless, while trying to discern
what others may be attempting, I noticed some dead x86-32-specific code
in our repo, and figured it would be a good idea to clean this up.
Otherwise, it (i) sends some mixed messages -- "hey look, this codebase
does support x86-32" -- and (ii) keeps untested code around, which is
generally not great.
This PR removes x86-32-specific cases in traphandlers and unwind code,
and Cranelift's native feature detection. It adds helpful compile-error
messages in a few cases. If we ever support x86-32 (contributors
welcome! The big missing piece is Cranelift support; see #1980), these
compile errors and git history should be enough to recover any knowledge
we are now encoding in the source.
I left the x86-32 support in `wasmtime-fiber` alone because that seems
like a bit of a special case -- foundation library, separate from the
rest of Wasmtime, with specific care to provide a (presumably working)
full 32-bit version.
* Remove some extraneous compile_error!s, already covered by others.
This change is the first in a series of changes to support shared memory
in Wiggle. Since Wiggle was written under the assumption of
single-threaded guest-side access, this change introduces a `shared`
field to guest memories in order to flag when this assumption will not
be the case. This change always sets `shared` to `false`; once a few
more pieces are in place, `shared` will be set dynamically when a shared
memory is detected, e.g., in a change like #5054.
Using the `shared` field, we can now decide to load Wiggle values
differently under the new assumptions. This change makes the guest
`T::read` and `T::write` calls into `Relaxed` atomic loads and stores in
order to maintain WebAssembly's expected memory consistency guarantees.
We choose Rust's `Relaxed` here to match the `Unordered` memory
consistency described in the [memory model] section of the ECMA spec.
These relaxed accesses are done unconditionally, since we theorize that
the performance benefit of an additional branch vs a relaxed load is
not much.
[memory model]: https://tc39.es/ecma262/multipage/memory-model.html#sec-memory-model
Since 128-bit scalar types do not have `Atomic*` equivalents, we remove
their `T::read` and `T::write` implementations here. They are unused by
any WASI implementations in the project.
* Implement support for dynamic memories in the pooling allocator
This is a continuation of the thrust in #5207 for reducing page faults
and lock contention when using the pooling allocator. To that end this
commit implements support for efficient memory management in the pooling
allocator when using wasm that is instrumented with bounds checks.
The `MemoryImageSlot` type now avoids unconditionally shrinking memory
back to its initial size during the `clear_and_remain_ready` operation,
instead deferring optional resizing of memory to the subsequent call to
`instantiate` when the slot is reused. The instantiation portion then
takes the "memory style" as an argument which dictates whether the
accessible memory must be precisely fit or whether it's allowed to
exceed the maximum. This in effect enables skipping a call to `mprotect`
to shrink the heap when dynamic memory checks are enabled.
In terms of page fault and contention this should improve the situation
by:
* Fewer calls to `mprotect` since once a heap grows it stays grown and
it never shrinks. This means that a write lock is taken within the
kernel much more rarely from before (only asymptotically now, not
N-times-per-instance).
* Accessed memory after a heap growth operation will not fault if it was
previously paged in by a prior instance and set to zero with `memset`.
Unlike #5207 which requires a 6.0 kernel to see this optimization this
commit enables the optimization for any kernel.
The major cost of choosing this strategy is naturally the performance
hit of the wasm itself. This is being looked at in PRs such as #5190 to
improve Wasmtime's story here.
This commit does not implement any new configuration options for
Wasmtime but instead reinterprets existing configuration options. The
pooling allocator no longer unconditionally sets
`static_memory_bound_is_maximum` and then implements support necessary
for this memory type. This other change to this commit is that the
`Tunables::static_memory_bound` configuration option is no longer gating
on the creation of a `MemoryPool` and it will now appropriately size to
`instance_limits.memory_pages` if the `static_memory_bound` is to small.
This is done to accomodate fuzzing more easily where the
`static_memory_bound` will become small during fuzzing and otherwise the
configuration would be rejected and require manual handling. The spirit
of the `MemoryPool` is one of large virtual address space reservations
anyway so it seemed reasonable to interpret the configuration this way.
* Skip zero memory_size cases
These are causing errors to happen when fuzzing and otherwise in theory
shouldn't be too interesting to optimize for anyway since they likely
aren't used in practice.
Remove the dependency on VCode for VReg allocation. This will simplify the changes in #5172, as that PR introduces the need to allocate temporary registers from the ABI context.
This change also allows us to remove some fields from VCode: reftyped_vregs_set and have_ref_values.
Add a MemFlags operand to the bitcast instruction, where only the
`big` and `little` flags are accepted. These define the lane order
to be used when casting between types of different lane counts.
Update all users to pass an appropriate MemFlags argument.
Implement lane swaps where necessary in the s390x back-end.
This is the final part necessary to fix
https://github.com/bytecodealliance/wasmtime/issues/4566.
Previously extracting an exit code was only possibly on a `wasm_trap_t`
which will never successfully have an exit code on it, so the exit code
extractor is moved over to `wasmtime_error_t`. Additionally extracting a
wasm trace from a `wasmtime_error_t` is added since traces happen on
both traps and errors now.
I noticed this in the backtrace of something that timed out on oss-fuzz
and there's no need to include this information in trampolines, so this
removes the extra sections from being generated.