* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix release dates
* Fix export translation for components.
Exports in the component model cause a new index to be added to the index space
of the item being exported.
This commit updates component translation so that translation of component
export sections properly updates internal lists representing those index
spaces.
* Code review feedback.
This will allow us to build developer tools for Wasmtime and Cranelift like WAT
and asm side-by-side viewers (a la Godbolt).
These are not proper public APIs, so they are marked `doc(hidden)` and have
comments saying they are only for use within this repo's workspace.
* Enable the native target by default in winch
Match cranelift-codegen's build script where if no architecture is
explicitly enabled then the host architecture is implicitly enabled.
* Refactor Cranelift's ISA builder to share more with Winch
This commit refactors the `Builder` type to have a type parameter
representing the finished ISA with Cranelift and Winch having their own
typedefs for `Builder` to represent their own builders. The intention is
to use this shared functionality to produce more shared code between the
two codegen backends.
* Moving compiler shared components to a separate crate
* Restore native flag inference in compiler building
This fixes an oversight from the previous commits to use
`cranelift-native` to infer flags for the native host when using default
settings with Wasmtime.
* Move `Compiler::page_size_align` into wasmtime-environ
The `cranelift-codegen` crate doesn't need this and winch wants the same
implementation, so shuffle it around so everyone has access to it.
* Fill out `Compiler::{flags, isa_flags}` for Winch
These are easy enough to plumb through with some shared code for
Wasmtime.
* Plumb the `is_branch_protection_enabled` flag for Winch
Just forwarding an isa-specific setting accessor.
* Moving executable creation to shared compiler crate
* Adding builder back in and removing from shared crate
* Refactoring the shared pieces for the `CompilerBuilder`
I decided to move a couple things around from Alex's initial changes.
Instead of having the shared builder do everything, I went back to
having each compiler have a distinct builder implementation. I
refactored most of the flag setting logic into a single shared location,
so we can still reduce the amount of code duplication.
With them being separate, we don't need to maintain things like
`LinkOpts` which Winch doesn't currently use. We also have an avenue to
error when certain flags are sent to Winch if we don't support them. I'm
hoping this will make things more maintainable as we build out Winch.
I'm still unsure about keeping everything shared in a single crate
(`cranelift_shared`). It's starting to feel like this crate is doing too
much, which makes it difficult to name. There does seem to be a need for
two distinct abstraction: creating the final executable and the handling
of shared/ISA flags when building the compiler. I could make them into
two separate crates, but there doesn't seem to be enough there yet to
justify it.
* Documentation updates, and renaming the finish method
* Adding back in a default temporarily to pass tests, and removing some unused imports
* Fixing winch tests with wrong method name
* Removing unused imports from codegen shared crate
* Apply documentation formatting updates
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Adding back in cranelift_native flag inferring
* Adding new shared crate to publish list
* Adding write feature to pass cargo check
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Initial support for the Relaxed SIMD proposal
This commit adds initial scaffolding and support for the Relaxed SIMD
proposal for WebAssembly. Codegen support is supported on the x64 and
AArch64 backends on this time.
The purpose of this commit is to get all the boilerplate out of the way
in terms of plumbing through a new feature, adding tests, etc. The tests
are copied from the upstream repository at this time while the
WebAssembly/testsuite repository hasn't been updated.
A summary of changes made in this commit are:
* Lowerings for all relaxed simd opcodes have been added, currently all
exhibiting deterministic behavior. This means that few lowerings are
optimal on the x86 backend, but on the AArch64 backend, for example,
all lowerings should be optimal.
* Support is added to codegen to, eventually, conditionally generate
different code based on input codegen flags. This is intended to
enable codegen to more efficient instructions on x86 by default, for
example, while still allowing embedders to force
architecture-independent semantics and behavior. One good example of
this is the `f32x4.relaxed_fmadd` instruction which when deterministic
forces the `fma` instruction, but otherwise if the backend doesn't
have support for `fma` then intermediate operations are performed
instead.
* Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the
x86 backend as they're now exercised by the deterministic lowerings of
relaxed simd instructions.
* Sample codegen tests for added for x86 and aarch64 for some relaxed
simd instructions.
* Wasmtime embedder support for the relaxed-simd proposal and forcing
determinism have been added to `Config` and the CLI.
* Support has been added to the `*.wast` runtime execution for the
`(either ...)` matcher used in the relaxed-simd proposal.
* Tests for relaxed-simd are run both with a default `Engine` as well as
a "force deterministic" `Engine` to test both configurations.
* All tests from the upstream repository were copied into Wasmtime.
These tests should be deleted when WebAssembly/testsuite is updated.
* x64: Add x86-specific lowerings for relaxed simd
This commit builds on the prior commit and adds an array of `x86_*`
instructions to Cranelift which have semantics that match their
corresponding x86 equivalents. Translation for relaxed simd is then
additionally updated to conditionally generate different CLIF for
relaxed simd instructions depending on whether the target is x86 or not.
This means that for AArch64 no changes are made but for x86 most relaxed
instructions now lower to some x86-equivalent with slightly different
semantics than the "deterministic" lowering.
* Add libcall support for fma to Wasmtime
This will be required to implement the `f32x4.relaxed_madd` instruction
(and others) when an x86 host doesn't specify the `has_fma` feature.
* Ignore relaxed-simd tests on s390x and riscv64
* Enable relaxed-simd tests on s390x
* Update cranelift/codegen/meta/src/shared/instructions.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Add a FIXME from review
* Add notes about deterministic semantics
* Don't default `has_native_fma` to `true`
* Review comments and rebase fixes
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
This notably updates `wasmparser` for updates to the relaxed-simd
proposal and an implementation of the function-references proposal.
Additionally there are some minor bug fixes being picked up for WIT and
the component model.
This commit fixes a panic related to type imports where an import of a
type didn't correctly declare the new type index on the Wasmtime side of
things. Additionally this plumbs more support throughout Wasmtime to
support type imports, namely that they do not need to be supplied
through a `Linker`. This additionally implements a feature where empty
instances, even transitively, do not need to be supplied by a Wasmtime
embedder. This means that instances which only have types, for example,
do not need to be supplied into a `Linker` since no runtime information
for them is required anyway.
Closes#5775
At some point what is now `funcref` was called `anyfunc` and the spec changed,
but we didn't update our internal names. This does that.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This commit fixes more cases from #5565 where `export` items introducing
indices wasn't handled by accident. Additionally this fixes support for
aliasing types from instances which largely wasn't working before. Most
of the fixes here are about correctly maintaining Wasmtime's view of the
type index spaces.
* Update WIT tooling used by Wasmtime
This commit updates the WIT tooling, namely the wasm-tools family of
crates, with recent updates. Notably:
* bytecodealliance/wasm-tools#867
* bytecodealliance/wasm-tools#871
This updates index spaces in components and additionally bumps the
minimum required version of the component binary format to be consumed
by Wasmtime (because of the index space changes). Additionally WIT
tooling now fully supports `use`.
Note that WIT tooling doesn't, at this time, fully support packages and
depending on remotely defined WIT packages. Currently WIT still needs to
be vendored in the project. It's hoped that future work with `cargo
component` and possible integration here could make the story about
depending on remotely-defined WIT more ergonomic and streamlined.
* Fix `bindgen!` codegen tests
* Add a test for `use` paths an implement support
* Update to crates.io versions of wasm-tools
* Uncomment codegen tests
* Resolve libcall relocations for older CPUs
Long ago Wasmtime used to have logic for resolving relocations
post-compilation for libcalls which I ended up removing during
refactorings last year. As #5563 points out, however, it's possible to
get Wasmtime to panic by disabling SSE features which forces Cranelift
to use libcalls for some floating-point operations instead. Note that
this also requires disabling SIMD because SIMD support has a baseline of
SSE 4.2.
This commit pulls back the old implementations of various libcalls and
reimplements logic necessary to have them work on CPUs without SSE 4.2
Closes#5563
* Fix log message in `wast` support
* Fix offset listed in relocations
Be sure to factor in the offset of the function itself
* Review comments
This commit fixes a bug with components by changing how DWARF
information from a wasm binary is copied over to the final compiled
artifact. Note that this is not the Wasmtime-generated DWARF but rather
the native wasm DWARF itself used in backtraces.
Previously the wasm dwarf was inserted into sections `.*.wasm` where `*`
was `debug_info`, `debug_str`, etc -- one per `gimli::SectionId` as
found in the original wasm module. This does not work with components,
however, where modules did not correctly separate their debug
information into separate sections or otherwise disambiguate. The fix in
this commit is to instead smash all the debug information together into
one large section and store offsets into that giant section. This is
similar to the `name`-section scraping or the trap metadata section
where one section contains all the data for all the modules in a component.
This simplifies the object file parsing by only looking for one section
name and doesn't add all that much complexity to serializing and looking
up dwarf information as well.
* Simplify the `ModuleRuntimeInfo` trait slightly
Fold two functions into one as they're only called from one location
anyway.
* Remove ModuleRuntimeInfo::signature
This is redundant as the array mapping is already stored within the
`VMContext` so that can be consulted rather than having a separate trait
function for it. This required altering the `Global` creation slightly
to work correctly in this situation.
* Remove a now-dead constant
* Shared `VMOffsets` across instances
This commit removes the computation of `VMOffsets` to being per-module
instead of per-instance. The `VMOffsets` structure is also quite large
so this shaves off 112 bytes per instance which isn't a huge impact but
should help lower the cost of instantiating small modules.
* Remove `InstanceAllocator::adjust_tunables`
This is no longer needed or necessary with the pooling allocator.
* Fix compile warning
* Fix a vtune warning
* Fix pooling tests
* Fix another test warning
* feat: implement memory.atomic.notify,wait32,wait64
Added the parking_spot crate, which provides the needed registry for the
operations.
Signed-off-by: Harald Hoyer <harald@profian.com>
* fix: change trap message for HeapMisaligned
The threads spec test wants "unaligned atomic"
instead of "misaligned memory access".
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add test for atomic wait on non-shared memory
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests/spec_testsuite/proposals/threads
without pooling and reference types.
Also "shared_memory" is added to the "spectest" interface.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add atomics_notify.wast
checking that notify with 0 waiters returns 0 on shared and non-shared
memory.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests for atomic wait on shared memory
- return 2 - timeout for 0
- return 2 - timeout for 1000ns
- return 1 - invalid value
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
This commit replaces `wasmtime_environ::TrapCode` with `wasmtime::Trap`.
This is possible with past refactorings which slimmed down the `Trap`
definition in the `wasmtime` crate to a simple `enum`. This means that
there's one less place that all the various trap opcodes need to be
listed in Wasmtime.
Before, we would do a `heap_addr` to translate the given Wasm memory address
into a native memory address and pass it into the libcall that implemented the
atomic operation, which would then treat the address as a Wasm memory address
and pass it to `validate_atomic_addr` to be bounds checked a second time. This
is a bit nonsensical, as we are validating a native memory address as if it were
a Wasm memory address.
Now, we no longer do a `heap_addr` to translate the Wasm memory address to a
native memory address. Instead, we pass the Wasm memory address to the libcall,
and the libcall is responsible for doing the bounds check (by calling
`validate_atomic_addr` with the correct type of memory address now).
* Pull `Module` out of `ModuleTextBuilder`
This commit is the first in what will likely be a number towards
preparing for serializing a compiled component to bytes, a precompiled
artifact. To that end my rough plan is to merge all of the compiled
artifacts for a component into one large object file instead of having
lots of separate object files and lots of separate mmaps to manage. To
that end I plan on eventually using `ModuleTextBuilder` to build one
large text section for all core wasm modules and trampolines, meaning
that `ModuleTextBuilder` is no longer specific to one module. I've
extracted out functionality such as function name calculation as well as
relocation resolving (now a closure passed in) in preparation for this.
For now this just keeps tests passing, and the trajectory for this
should become more clear over the following commits.
* Remove component-specific object emission
This commit removes the `ComponentCompiler::emit_obj` function in favor
of `Compiler::emit_obj`, now renamed `append_code`. This involved
significantly refactoring code emission to take a flat list of functions
into `append_code` and the caller is responsible for weaving together
various "families" of functions and un-weaving them afterwards.
* Consolidate ELF parsing in `CodeMemory`
This commit moves the ELF file parsing and section iteration from
`CompiledModule` into `CodeMemory` so one location keeps track of
section ranges and such. This is in preparation for sharing much of this
code with components which needs all the same sections to get tracked
but won't be using `CompiledModule`. A small side benefit from this is
that the section parsing done in `CodeMemory` and `CompiledModule` is no
longer duplicated.
* Remove separately tracked traps in components
Previously components would generate an "always trapping" function
and the metadata around which pc was allowed to trap was handled
manually for components. With recent refactorings the Wasmtime-standard
trap section in object files is now being generated for components as
well which means that can be reused instead of custom-tracking this
metadata. This commit removes the manual tracking for the `always_trap`
functions and plumbs the necessary bits around to make components look
more like modules.
* Remove a now-unnecessary `Arc` in `Module`
Not expected to have any measurable impact on performance, but
complexity-wise this should make it a bit easier to understand the
internals since there's no longer any need to store this somewhere else
than its owner's location.
* Merge compilation artifacts of components
This commit is a large refactoring of the component compilation process
to produce a single artifact instead of multiple binary artifacts. The
core wasm compilation process is refactored as well to share as much
code as necessary with the component compilation process.
This method of representing a compiled component necessitated a few
medium-sized changes internally within Wasmtime:
* A new data structure was created, `CodeObject`, which represents
metadata about a single compiled artifact. This is then stored as an
`Arc` within a component and a module. For `Module` this is always
uniquely owned and represents a shuffling around of data from one
owner to another. For a `Component`, however, this is shared amongst
all loaded modules and the top-level component.
* The "module registry" which is used for symbolicating backtraces and
for trap information has been updated to account for a single region
of loaded code holding possibly multiple modules. This involved adding
a second-level `BTreeMap` for now. This will likely slow down
instantiation slightly but if it poses an issue in the future this
should be able to be represented with a more clever data structure.
This commit additionally solves a number of longstanding issues with
components such as compiling only one host-to-wasm trampoline per
signature instead of possibly once-per-module. Additionally the
`SignatureCollection` registration now happens once-per-component
instead of once-per-module-within-a-component.
* Fix compile errors from prior commits
* Support AOT-compiling components
This commit adds support for AOT-compiled components in the same manner
as `Module`, specifically adding:
* `Engine::precompile_component`
* `Component::serialize`
* `Component::deserialize`
* `Component::deserialize_file`
Internally the support for components looks quite similar to `Module`.
All the prior commits to this made adding the support here
(unsurprisingly) easy. Components are represented as a single object
file as are modules, and the functions for each module are all piled
into the same object file next to each other (as are areas such as data
sections). Support was also added here to quickly differentiate compiled
components vs compiled modules via the `e_flags` field in the ELF
header.
* Prevent serializing exported modules on components
The current representation of a module within a component means that the
implementation of `Module::serialize` will not work if the module is
exported from a component. The reason for this is that `serialize`
doesn't actually do anything and simply returns the underlying mmap as a
list of bytes. The mmap, however, has `.wasmtime.info` describing
component metadata as opposed to this module's metadata. While rewriting
this section could be implemented it's not so easy to do so and is
otherwise seen as not super important of a feature right now anyway.
* Fix windows build
* Fix an unused function warning
* Update crates/environ/src/compilation.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This adds a new field `types` to `ModuleTranslation`, so that
consumers can have access to the module type information known after
validation has finished. This change is useful when consumers want to
have access to the type information in wasmparser's terms rather than
in wasmtime_environ's equivalent types (e.g. `WasmFuncType`).
* Plumb type exports in components around more
This commit adds some more plumbing for type exports to ensure that they
show up in the final compiled representation of a component. For now
they continued to be ignored for all purposes in the embedding API
itself but I found this useful to explore in `wit-bindgen` based tooling
which is leveraging the component parsing in Wasmtime.
* Add a field to `ModuleTranslation` to store the original wasm
This commit adds a field to be able to refer back to the original wasm
binary for a `ModuleTranslation`. This field is used in the upcoming
support for host generation in `wit-component` to "decompile" a
component into core wasm modules to get instantiated. This is used to
extract a core wasm module from the original component.
* FIx a build warning
* allow a ComponentTypeRef::Type to point to a component TypeDef
* component matching: don't assert exported Interface type definitions are "defined"
types may be exported by their name for consumption by some component
runtimes, but in wasmtime this doesn't matter (we lift and lower to
types, not define them) so we should ignore these.
* component-model instance tests: show that an import can export a type definition
this is meaningless, but it should be accepted. (previously rejected)
* Update wasm-tools dependencies
This update brings in a number of features such as:
* The component model binary format and AST has been slightly adjusted
in a few locations. Names are dropped from parameters/results now in
the internal representation since they were not used anyway. At this
time the ability to bind a multi-return function has not been exposed.
* The `wasmparser` validator pass will now share allocations with prior
functions, providing what's probably a very minor speedup for Wasmtime
itself.
* The text format for many component-related tests now requires named
parameters.
* Some new relaxed-simd instructions are updated to be ignored.
I hope to have a follow-up to expose the multi-return ability to the
embedding API of components.
* Update audit information for new crates
* Leverage Cargo's workspace inheritance feature
This commit is an attempt to reduce the complexity of the Cargo
manifests in this repository with Cargo's workspace-inheritance feature
becoming stable in Rust 1.64.0. This feature allows specifying fields in
the root workspace `Cargo.toml` which are then reused throughout the
workspace. For example this PR shares definitions such as:
* All of the Wasmtime-family of crates now use `version.workspace =
true` to have a single location which defines the version number.
* All crates use `edition.workspace = true` to have one default edition
for the entire workspace.
* Common dependencies are listed in `[workspace.dependencies]` to avoid
typing the same version number in a lot of different places (e.g. the
`wasmparser = "0.89.0"` is now in just one spot.
Currently the workspace-inheritance feature doesn't allow having two
different versions to inherit, so all of the Cranelift-family of crates
still manually specify their version. The inter-crate dependencies,
however, are shared amongst the root workspace.
This feature can be seen as a method of "preprocessing" of sorts for
Cargo manifests. This will help us develop Wasmtime but shouldn't have
any actual impact on the published artifacts -- everything's dependency
lists are still the same.
* Fix wasi-crypto tests
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
I noticed an oss-fuzz-based timeout that was reported for the
`component_api` fuzzer where the adapter module generated takes 1.5
seconds to compile the singular function in release mode (no fuzzing
enabled). The test case in question was a deeply recursive
list-of-list-of-etc and only one function was generated instead of
multiple. I updated the cost of strings/lists to cost more in the
approximate cost calculation which now forces the one giant function to
get split up and the large function is now split up into multiple
smaller function that take milliseconds to compile.
Add a function_alignment function to the TargetIsa trait, and use it to align functions when generating objects. Additionally, collect the maximum alignment required for pc-relative constants in functions and pass that value out. Use the max of these two values when padding functions for alignment.
This fixes a bug on x86_64 where rip-relative loads to sse registers could cause a segfault, as functions weren't always guaranteed to be aligned to 16-byte addresses.
Fixes#4812
* components: Limit the recursive size of types in Wasmtime
This commit is aimed at fixing #4814 by placing a hard limit on the
maximal recursive depth a type may have in the component model. The
component model theoretically allows for infinite recursion but many
various types of operations within the component model are naturally
written as recursion over the structure of a type which can lead to
stack overflow with deeply recursive types. Some examples of recursive
operations are:
* Lifting and lowering a type - currently the recursion here is modeled
in Rust directly with `#[derive]` implementations as well as the
implementations for the `Val` type.
* Compilation of adapter trampolines which iterates over the type
structure recursively.
* Historically many various calculations like the size of a type, the
flattened representation of a type, etc, were all done recursively.
Many of these are more efficiently done via other means but it was
still natural to implement these recursively initially.
By placing a hard limit on type recursion Wasmtime won't be able to load
some otherwise-valid modules. The hope, though, is that no human-written
program is likely to ever reach this limit. This limit can be revised
and/or the locations with recursion revised if it's ever reached.
The implementation of this feature is done by generalizing the current
flattened-representation calculation which now keeps track of a type's
depth and size. The size calculation isn't used just yet but I plan to
use it in fixing #4816 and it was natural enough to write here as well.
The depth is checked after a type is translated and if it exceeds the
maximum then an error is returned.
Additionally the `Arbitrary for Type` implementation was updated to
prevent generation of a type that's too-recursive.
Closes#4814
* Remove unused size calculation
* Bump up just under the limit
This commit is a (second?) attempt at improving the generation of
adapter modules to avoid excessively large functions for fuzz-generated
inputs.
The first iteration of adapters simply translated an entire type
inline per-function. This proved problematic however since the size of
the adapter function was on the order of the overall size of a type,
which can be exponential for a type that is otherwise defined in linear
size.
The second iteration of adapters performed a split where memory-based
types would always be translated with individual functions. The theory
here was that once a type was memory-based it was large enough to not
warrant inline translation in the original function and a separate
outlined function could be shared and otherwise used to deduplicate
portions of the original giant function. This again proved problematic,
however, since the splitting heuristic was quite naive and didn't take
into account large stack-based types.
This third iteration in this commit replaces the previous system with a
similar but slightly more general one. Each adapter function now has a
concept of fuel which is decremented each time a layer of a type is
translated. When fuel runs out further translations are deferred to
outlined functions. The fuel counter should hopefully provide a sort of
reasonable upper bound on the size of a function and the outlined
functions should ideally provide the ability to be called from multiple
places and therefore deduplicate what would otherwise be a massive
function.
This final iteration is another attempt at guaranteeing that an adapter
module is linear in size with respect to the input type section of the
original module. Additionally this iteration uniformly handles stack and
memory-based translations which means that stack-based translations
can't go wild in their function size and memory-based translations may
benefit slightly from having at least a little bit of inlining
internally.
The immediate impact of this is that the `component_api` fuzzer seems to
be running at a faster rate than before. Otherwise #4825 is sufficient
to invalidate preexisting fuzz-bugs and this PR is hopefully the final
nail in the coffin to prevent further timeouts for small inputs cropping
up.
Closes#4816
* Upgrade wasm-tools crates, namely the component model
This commit pulls in the latest versions of all of the `wasm-tools`
family of crates. There were two major changes that happened in
`wasm-tools` in the meantime:
* bytecodealliance/wasm-tools#697 - this commit introduced a new API for
more efficiently reading binary operators from a wasm binary. The old
`Operator`-based reading was left in place, however, and continues to
be what Wasmtime uses. I hope to update Wasmtime in a future PR to use
this new API, but for now the biggest change is...
* bytecodealliance/wasm-tools#703 - this commit was a major update to
the component model AST. This commit almost entirely deals with the
fallout of this change.
The changes made to the component model were:
1. The `unit` type no longer exists. This was generally a simple change
where the `Unit` case in a few different locations were all removed.
2. The `expected` type was renamed to `result`. This similarly was
relatively lightweight and mostly just a renaming on the surface. I
took this opportunity to rename `val::Result` to `val::ResultVal` and
`types::Result` to `types::ResultType` to avoid clashing with the
standard library types. The `Option`-based types were handled with
this as well.
3. The payload type of `variant` and `result` types are now optional.
This affected many locations that calculate flat type
representations, ABI information, etc. The `#[derive(ComponentType)]`
macro now specifically handles Rust-defined `enum` types which have
no payload to the equivalent in the component model.
4. Functions can now return multiple parameters. This changed the
signature of invoking component functions because the return value is
now bound by `ComponentNamedList` (renamed from `ComponentParams`).
This had a large effect in the tests, fuzz test case generation, etc.
5. Function types with 2-or-more parameters/results must uniquely name
all parameters/results. This mostly affected the text format used
throughout the tests.
I haven't added specifically new tests for multi-return but I changed a
number of tests to use it. Additionally I've updated the fuzzers to all
exercise multi-return as well so I think we should get some good
coverage with that.
* Update version numbers
* Use crates.io
* Fix a compile error on nightly Rust
It looks like Rust nightly has gotten a bit more strict about
attributes-on-expressions and previously accepted code is no longer
accepted. This commit updates the generated code for a macro to a form
which is accepted by rustc.
* Fix a soundness issue with lowering variants
This commit fixes a soundness issue lowering variants in the component
model where host memory could be leaked to the guest module by accident.
In reviewing code recently for `Val::lower` I noticed that the variant
lowering was extending the payload with `ValRaw::u32(0)` to
appropriately fit the size of the variant. In reading this it appeared
incorrect to me due to the fact that it should be `ValRaw::u64(0)` since
up to 64-bits can be read. Additionally this implementation was also
incorrect because the lowered representation of the payload itself was
not possibly zero-extended to 64-bits to accommodate other variants.
It turned out these issues were benign because with the dynamic
surface area to the component model the arguments were all initialized
to 0 anyway. The static version of the API, however, does not initialize
arguments to 0 and I wanted to initially align these two implementations
so I updated the variant implementation of lowering for dynamic values
and removed the zero-ing of arguments.
To test this change I updated the `debug` mode of adapter module
generation to assert that the upper bits of values in wasm are always
zero when the value is casted down (during `stack_get` which only
happens with variants). I then threaded through the `debug` boolean
configuration parameter into the dynamic and static fuzzers.
To my surprise this new assertion tripped even after the fix was
applied. It turns out, though, that there was other leakage of bits
through other means that I was previously unaware of. At the primitive
level lowerings of types like `u32` will have a `Lower` representation
of `ValRaw` and the lowering is simply `dst.write(ValRaw::i32(self))`,
or the equivalent thereof. The problem, that the fuzzers detected, with
this pattern is that the `ValRaw` type is 16-bytes, and
`ValRaw::i32(X)` only initializes the first 4. This meant that all the
lowerings for all primitives were writing up to 12 bytes of garbage from
the host for the wasm module to read.
It turned out that this write of a `ValRaw` was sometimes 16 bytes and
sometimes the appropriate size depending on the number of optimizations
in play. With enough inlining for example `dst.write(ValRaw::i32(self))`
would only write 4 bytes, as expected. In debug mode though without
inlining 16 bytes would be written, including the garbage from the upper
bits.
To solve this issue I ended up taking a somewhat different approach. I
primarily updated the `ValRaw` constructors to simply always extend the
values internally to 64-bits, meaning that the low 8 bytes of a `ValRaw`
is always initialized. This prevents any undefined data from leaking
from the host into a wasm module, and means that values are also
zero-extended even if they're only used in 32-bit contexts outside of a
variant. This felt like the best fix for now, though, in terms of
not really having a performance impact while additionally not requiring
a rewrite of all lowerings.
This solution ended up also neatly removing the "zero out the entire
payload" logic that was previously require. Now after a payload is
lowered only the tail end of the payload, up to the size of the variant,
is zeroed out. This means that each lowered argument is written to at
most once which should hopefully be a small performance boost for
calling into functions as well.
* Optimize flat type representation calculations
Previously calculating the flat type representation would be done
recursively for an entire type tree every time it was visited.
Additionally the flat type representation was entirely built only to be
thrown away if it was too large at the end. This chiefly presented a
source of recursion based on the type structure in the component model
which fuzzing does not like as it reports stack overflows.
This commit overhauls the representation of flat types in Wasmtime by
caching the representation for each type in the compile-time
`ComponentTypesBuilder` structure. This avoids recalculating each time
the flat representation is queried and additionally allows opportunity
to have more short-circuiting to avoid building overly-large vectors.
* Remove duplicate flat count calculation in wasmtime
Roughly share the infrastructure in the `wasmtime-environ` crate, namely
the non-recursive and memoizing nature of the calculation.
* Fix component fuzz build
* Fix example compile
A `GtU` condition needed to actually be `GeU`, as the comment right
above it stated but apparently I forgot to translate the comment to
actual code. This fixes a fuzz bug that arose from oss-fuzz over the
weekend.
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
This commit is an effort to reduce the amount of complexity around
managing the size/alignment calculations of types in the canonical ABI.
Previously the logic for the size/alignment of a type was spread out
across a number of locations. While each individual calculation is not
really the most complicated thing in the world having the duplication in
so many places was constantly worrying me.
I've opted in this commit to centralize all of this within the runtime
at least, and now there's only one "duplicate" of this information in
the fuzzing infrastructure which is to some degree less important to
deduplicate. This commit introduces a new `CanonicalAbiInfo` type to
house all abi size/align information for both memory32 and memory64.
This new type is then used pervasively throughout fused adapter
compilation, dynamic `Val` management, and typed functions. This type
was also able to reduce the complexity of the macro-generated code
meaning that even `wasmtime-component-macro` is performing less math
than it was before.
One other major feature of this commit is that this ABI information is
now saved within a `ComponentTypes` structure. This avoids recursive
querying of size/align information frequently and instead effectively
caching it. This was a worry I had for the fused adapter compiler which
frequently sought out size/align information and would recursively
descend each type tree each time. The `fact-valid-module` fuzzer is now
nearly 10x faster in terms of iterations/s which I suspect is due to
this caching.
In #4640 a feature was added to adapter modules that whenever
translation goes through memory it instead goes through a helper
function as opposed to inlining it directly. The generation of the
helper function happened recursively at compile time, however, and sure
enough oss-fuzz has found an input which blows the host stack at compile
time.
This commit removes the compile-time recursion from the adapter compiler
when translating these helper functions by deferring the translation to
a worklist which is processed after the original function is translated.
This makes the stack-based recursion instead heap-based, removing the
stack overflow.
The spec was expected to change to not bounds-check 0-byte lists/strings
but has since been updated to match `memory.copy` which does indeed
check the pointer for 0-byte copies.
This commit implements a scheme I've been meaning to work on in the
adapter compiler where instead of always generating a fresh local for
all operations locals may now be reused. Locals generated are explicitly
free'd when their lexical scope has ended, allowing reuse in translation
of later types in the adapter.
This also implements a new scheme for initializing locals where
previously a local could simply be generated, but now the local must be
fused with its initializer where a `local.{tee,set}` instruction is
always generated. This should help prevent a bug I ran into with strings
where one usage of a local was forgotten to be initialized which meant
that when it was used during a loop it may have had a stale value from
before.
Modeling this in Rust isn't possible at compile time unfortunately so I
opted for the next best thing, runtime panics. If a local is
accidentally not released back to the pool of free locals then it will
panic. The fuzzer for simply generating and validating adapter modules
should be good at exercising this and it weeded out a few forgotten
free's and should be good now.
* Improve the `component_api` fuzzer on a few dimensions
* Update the generated component to use an adapter module. This involves
two core wasm instances communicating with each other to test that
data flows through everything correctly. The intention here is to fuzz
the fused adapter compiler. String encoding options have been plumbed
here to exercise differences in string encodings.
* Use `Cow<'static, ...>` and `static` declarations for each static test
case to try to cut down on rustc codegen time.
* Add `Copy` to derivation of fuzzed enums to make `derive(Clone)`
smaller.
* Use `Store<Box<dyn Any>>` to try to cut down on codegen by
monomorphizing fewer `Store<T>` implementation.
* Add debug logging to print out what's flowing in and what's flowing
out for debugging failures.
* Improve `Debug` representation of dynamic value types to more closely
match their Rust counterparts.
* Fix a variant issue with adapter trampolines
Previously the offset of the payload was calculated as the discriminant
aligned up to the alignment of a singular case, but instead this needs
to be aligned up to the alignment of all cases to ensure all cases start
at the same location.
* Fix a copy/paste error when copying masked integers
A 32-bit load was actually doing a 16-bit load by accident since it was
copied from the 16-bit load-and-mask case.
* Fix f32/i64 conversions in adapter modules
The adapter previously erroneously converted the f32 to f64 and then to
i64, where instead it should go from f32 to i32 to i64.
* Fix zero-sized flags in adapter modules
This commit corrects the size calculation for zero-sized flags in
adapter modules.
cc #4592
* Fix a variant size calculation bug in adapters
This fixes the same issue found with variants during normal host-side
fuzzing earlier where the size of a variant needs to align up the
summation of the discriminant and the maximum case size.
* Implement memory growth in libc bump realloc
Some fuzz-generated test cases are copying lists large enough to exceed
one page of memory so bake in a `memory.grow` to the bump allocator as
well.
* Avoid adapters of exponential size
This commit is an attempt to avoid adapters being exponentially sized
with respect to the type hierarchy of the input. Previously all
adaptation was done inline within each adapter which meant that if
something was structured as `tuple<T, T, T, T, ...>` the translation of
`T` would be inlined N times. For very deeply nested types this can
quickly create an exponentially sized adapter with types of the form:
(type $t0 (list u8))
(type $t1 (tuple $t0 $t0))
(type $t2 (tuple $t1 $t1))
(type $t3 (tuple $t2 $t2))
;; ...
where the translation of `t4` has 8 different copies of translating
`t0`.
This commit changes the translation of types through memory to almost
always go through a helper function. The hope here is that it doesn't
lose too much performance because types already reside in memory.
This can still lead to exponentially sized adapter modules to a lesser
degree where if the translation all happens on the "stack", e.g. via
`variant`s and their flat representation then many copies of one
translation could still be made. For now this commit at least gets the
problem under control for fuzzing where fuzzing doesn't trivially find
type hierarchies that take over a minute to codegen the adapter module.
One of the main tricky parts of this implementation is that when a
function is generated the index that it will be placed at in the final
module is not known at that time. To solve this the encoded form of the
`Call` instruction is saved in a relocation-style format where the
`Call` isn't encoded but instead saved into a different area for
encoding later. When the entire adapter module is encoded to wasm these
pseudo-`Call` instructions are encoded as real instructions at that
time.
* Fix some memory64 issues with string encodings
Introduced just before #4623 I had a few mistakes related to 64-bit
memories and mixing 32/64-bit memories.
* Actually insert into the `translate_mem_funcs` map
This... was the whole point of having the map!
* Assert memory growth succeeds in bump allocator
* Implement strings in adapter modules
This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.
The `CanonicalABI.md` in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:
* Fused adapters are first identified to determine src/dst string
encodings. This statically fixes what transcoding operation is being
performed.
* The generated adapter will be responsible for managing calls to
`realloc` and performing bounds checks. The adapter itself does not
perform memory copies or validation of string contents, however.
Instead each transcoding operation is modeled as an imported function
into the adapter module. This means that the adapter module
dynamically, during compile time, determines what string transcoders
are needed. Note that an imported transcoder is not only parameterized
over the transcoding operation but additionally which memory is the
source and which is the destination.
* The imported core wasm functions are modeled as a new
`CoreDef::Transcoder` structure. These transcoders end up being small
Cranelift-compiled trampolines. The Cranelift-compiled trampoline will
load the actual base pointer of memory and add it to the relative
pointers passed as function arguments. This trampoline then calls a
transcoder "libcall" which enters Rust-defined functions for actual
transcoding operations.
* Each possible transcoding operation is implemented in Rust with a
unique name and a unique signature depending on the needs of the
transcoder. I've tried to document inline what each transcoder does.
This means that the `Module::translate_string` in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
`realloc`-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.
Additionally in this PR is a full implementation in the host for the
`latin1+utf16` encoding which means that both lifting and lowering host
strings now works with this encoding.
Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the `encoding_rs` crate. I
initially tried to implement everything with `encoding_rs` but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.
Testing this commit has been somewhat difficult and my general plan,
like with the `(list T)` type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.
* Fix a temporarily commented out case
* Fix wasmtime-runtime tests
* Update deny.toml configuration
* Add `BSD-3-Clause` for the `encoding_rs` crate
* Remove some unused licenses
* Add an exemption for `encoding_rs` for now
* Split up the `translate_string` method
Move out all the closures and package up captured state into smaller
lists of arguments.
* Test out-of-bounds for zero-length strings
When an adapter module depends on a particular core wasm instance this
means that it actually depends on not only that instance but all prior
core wasm instances as well. This is because core wasm instances must be
instantiated in the specified order within a component and that cannot
change depending on the dataflow between adapters. This commit fixes a
possible panic from linearizing the component dfg where an adapter
module tried to depend on an instance that hadn't been instantiated yet
because the ordering dependency between core wasm instances hadn't been
modeled.
* components: ignore export aliases to types in translation.
Currently, translation is ignoring type exports from components during
translation by skipping over them before adding them to the exports map.
If a component instantiates an inner component and aliases a type export of
that instance, it will cause wasmtime to panic with a failure to find the
export in the exports map.
The fix is to add a representation for exported types to the map that is simply
ignored when encountered. This also makes it easier to track places where we
would have to support type exports in translation in the future.
* Keep type information for type exports.
This commit keeps the type information for type exports so that types can be
properly aliased from an instance export and thereby adjusting the type index
space accordingly.
* Add a simple test case for type exports for the component model.