* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix release date
This commit fixes more cases from #5565 where `export` items introducing
indices wasn't handled by accident. Additionally this fixes support for
aliasing types from instances which largely wasn't working before. Most
of the fixes here are about correctly maintaining Wasmtime's view of the
type index spaces.
* Update WIT tooling used by Wasmtime
This commit updates the WIT tooling, namely the wasm-tools family of
crates, with recent updates. Notably:
* bytecodealliance/wasm-tools#867
* bytecodealliance/wasm-tools#871
This updates index spaces in components and additionally bumps the
minimum required version of the component binary format to be consumed
by Wasmtime (because of the index space changes). Additionally WIT
tooling now fully supports `use`.
Note that WIT tooling doesn't, at this time, fully support packages and
depending on remotely defined WIT packages. Currently WIT still needs to
be vendored in the project. It's hoped that future work with `cargo
component` and possible integration here could make the story about
depending on remotely-defined WIT more ergonomic and streamlined.
* Fix `bindgen!` codegen tests
* Add a test for `use` paths an implement support
* Update to crates.io versions of wasm-tools
* Uncomment codegen tests
* Resolve libcall relocations for older CPUs
Long ago Wasmtime used to have logic for resolving relocations
post-compilation for libcalls which I ended up removing during
refactorings last year. As #5563 points out, however, it's possible to
get Wasmtime to panic by disabling SSE features which forces Cranelift
to use libcalls for some floating-point operations instead. Note that
this also requires disabling SIMD because SIMD support has a baseline of
SSE 4.2.
This commit pulls back the old implementations of various libcalls and
reimplements logic necessary to have them work on CPUs without SSE 4.2
Closes#5563
* Fix log message in `wast` support
* Fix offset listed in relocations
Be sure to factor in the offset of the function itself
* Review comments
This commit fixes a bug with components by changing how DWARF
information from a wasm binary is copied over to the final compiled
artifact. Note that this is not the Wasmtime-generated DWARF but rather
the native wasm DWARF itself used in backtraces.
Previously the wasm dwarf was inserted into sections `.*.wasm` where `*`
was `debug_info`, `debug_str`, etc -- one per `gimli::SectionId` as
found in the original wasm module. This does not work with components,
however, where modules did not correctly separate their debug
information into separate sections or otherwise disambiguate. The fix in
this commit is to instead smash all the debug information together into
one large section and store offsets into that giant section. This is
similar to the `name`-section scraping or the trap metadata section
where one section contains all the data for all the modules in a component.
This simplifies the object file parsing by only looking for one section
name and doesn't add all that much complexity to serializing and looking
up dwarf information as well.
* Simplify the `ModuleRuntimeInfo` trait slightly
Fold two functions into one as they're only called from one location
anyway.
* Remove ModuleRuntimeInfo::signature
This is redundant as the array mapping is already stored within the
`VMContext` so that can be consulted rather than having a separate trait
function for it. This required altering the `Global` creation slightly
to work correctly in this situation.
* Remove a now-dead constant
* Shared `VMOffsets` across instances
This commit removes the computation of `VMOffsets` to being per-module
instead of per-instance. The `VMOffsets` structure is also quite large
so this shaves off 112 bytes per instance which isn't a huge impact but
should help lower the cost of instantiating small modules.
* Remove `InstanceAllocator::adjust_tunables`
This is no longer needed or necessary with the pooling allocator.
* Fix compile warning
* Fix a vtune warning
* Fix pooling tests
* Fix another test warning
* feat: implement memory.atomic.notify,wait32,wait64
Added the parking_spot crate, which provides the needed registry for the
operations.
Signed-off-by: Harald Hoyer <harald@profian.com>
* fix: change trap message for HeapMisaligned
The threads spec test wants "unaligned atomic"
instead of "misaligned memory access".
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add test for atomic wait on non-shared memory
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests/spec_testsuite/proposals/threads
without pooling and reference types.
Also "shared_memory" is added to the "spectest" interface.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add atomics_notify.wast
checking that notify with 0 waiters returns 0 on shared and non-shared
memory.
Signed-off-by: Harald Hoyer <harald@profian.com>
* tests: add tests for atomic wait on shared memory
- return 2 - timeout for 0
- return 2 - timeout for 1000ns
- return 1 - invalid value
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
* fixup! feat: implement memory.atomic.notify,wait32,wait64
Signed-off-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Harald Hoyer <harald@profian.com>
This commit replaces `wasmtime_environ::TrapCode` with `wasmtime::Trap`.
This is possible with past refactorings which slimmed down the `Trap`
definition in the `wasmtime` crate to a simple `enum`. This means that
there's one less place that all the various trap opcodes need to be
listed in Wasmtime.
Before, we would do a `heap_addr` to translate the given Wasm memory address
into a native memory address and pass it into the libcall that implemented the
atomic operation, which would then treat the address as a Wasm memory address
and pass it to `validate_atomic_addr` to be bounds checked a second time. This
is a bit nonsensical, as we are validating a native memory address as if it were
a Wasm memory address.
Now, we no longer do a `heap_addr` to translate the Wasm memory address to a
native memory address. Instead, we pass the Wasm memory address to the libcall,
and the libcall is responsible for doing the bounds check (by calling
`validate_atomic_addr` with the correct type of memory address now).
* Pull `Module` out of `ModuleTextBuilder`
This commit is the first in what will likely be a number towards
preparing for serializing a compiled component to bytes, a precompiled
artifact. To that end my rough plan is to merge all of the compiled
artifacts for a component into one large object file instead of having
lots of separate object files and lots of separate mmaps to manage. To
that end I plan on eventually using `ModuleTextBuilder` to build one
large text section for all core wasm modules and trampolines, meaning
that `ModuleTextBuilder` is no longer specific to one module. I've
extracted out functionality such as function name calculation as well as
relocation resolving (now a closure passed in) in preparation for this.
For now this just keeps tests passing, and the trajectory for this
should become more clear over the following commits.
* Remove component-specific object emission
This commit removes the `ComponentCompiler::emit_obj` function in favor
of `Compiler::emit_obj`, now renamed `append_code`. This involved
significantly refactoring code emission to take a flat list of functions
into `append_code` and the caller is responsible for weaving together
various "families" of functions and un-weaving them afterwards.
* Consolidate ELF parsing in `CodeMemory`
This commit moves the ELF file parsing and section iteration from
`CompiledModule` into `CodeMemory` so one location keeps track of
section ranges and such. This is in preparation for sharing much of this
code with components which needs all the same sections to get tracked
but won't be using `CompiledModule`. A small side benefit from this is
that the section parsing done in `CodeMemory` and `CompiledModule` is no
longer duplicated.
* Remove separately tracked traps in components
Previously components would generate an "always trapping" function
and the metadata around which pc was allowed to trap was handled
manually for components. With recent refactorings the Wasmtime-standard
trap section in object files is now being generated for components as
well which means that can be reused instead of custom-tracking this
metadata. This commit removes the manual tracking for the `always_trap`
functions and plumbs the necessary bits around to make components look
more like modules.
* Remove a now-unnecessary `Arc` in `Module`
Not expected to have any measurable impact on performance, but
complexity-wise this should make it a bit easier to understand the
internals since there's no longer any need to store this somewhere else
than its owner's location.
* Merge compilation artifacts of components
This commit is a large refactoring of the component compilation process
to produce a single artifact instead of multiple binary artifacts. The
core wasm compilation process is refactored as well to share as much
code as necessary with the component compilation process.
This method of representing a compiled component necessitated a few
medium-sized changes internally within Wasmtime:
* A new data structure was created, `CodeObject`, which represents
metadata about a single compiled artifact. This is then stored as an
`Arc` within a component and a module. For `Module` this is always
uniquely owned and represents a shuffling around of data from one
owner to another. For a `Component`, however, this is shared amongst
all loaded modules and the top-level component.
* The "module registry" which is used for symbolicating backtraces and
for trap information has been updated to account for a single region
of loaded code holding possibly multiple modules. This involved adding
a second-level `BTreeMap` for now. This will likely slow down
instantiation slightly but if it poses an issue in the future this
should be able to be represented with a more clever data structure.
This commit additionally solves a number of longstanding issues with
components such as compiling only one host-to-wasm trampoline per
signature instead of possibly once-per-module. Additionally the
`SignatureCollection` registration now happens once-per-component
instead of once-per-module-within-a-component.
* Fix compile errors from prior commits
* Support AOT-compiling components
This commit adds support for AOT-compiled components in the same manner
as `Module`, specifically adding:
* `Engine::precompile_component`
* `Component::serialize`
* `Component::deserialize`
* `Component::deserialize_file`
Internally the support for components looks quite similar to `Module`.
All the prior commits to this made adding the support here
(unsurprisingly) easy. Components are represented as a single object
file as are modules, and the functions for each module are all piled
into the same object file next to each other (as are areas such as data
sections). Support was also added here to quickly differentiate compiled
components vs compiled modules via the `e_flags` field in the ELF
header.
* Prevent serializing exported modules on components
The current representation of a module within a component means that the
implementation of `Module::serialize` will not work if the module is
exported from a component. The reason for this is that `serialize`
doesn't actually do anything and simply returns the underlying mmap as a
list of bytes. The mmap, however, has `.wasmtime.info` describing
component metadata as opposed to this module's metadata. While rewriting
this section could be implemented it's not so easy to do so and is
otherwise seen as not super important of a feature right now anyway.
* Fix windows build
* Fix an unused function warning
* Update crates/environ/src/compilation.rs
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
Co-authored-by: Nick Fitzgerald <fitzgen@gmail.com>
This adds a new field `types` to `ModuleTranslation`, so that
consumers can have access to the module type information known after
validation has finished. This change is useful when consumers want to
have access to the type information in wasmparser's terms rather than
in wasmtime_environ's equivalent types (e.g. `WasmFuncType`).
* Plumb type exports in components around more
This commit adds some more plumbing for type exports to ensure that they
show up in the final compiled representation of a component. For now
they continued to be ignored for all purposes in the embedding API
itself but I found this useful to explore in `wit-bindgen` based tooling
which is leveraging the component parsing in Wasmtime.
* Add a field to `ModuleTranslation` to store the original wasm
This commit adds a field to be able to refer back to the original wasm
binary for a `ModuleTranslation`. This field is used in the upcoming
support for host generation in `wit-component` to "decompile" a
component into core wasm modules to get instantiated. This is used to
extract a core wasm module from the original component.
* FIx a build warning
* allow a ComponentTypeRef::Type to point to a component TypeDef
* component matching: don't assert exported Interface type definitions are "defined"
types may be exported by their name for consumption by some component
runtimes, but in wasmtime this doesn't matter (we lift and lower to
types, not define them) so we should ignore these.
* component-model instance tests: show that an import can export a type definition
this is meaningless, but it should be accepted. (previously rejected)
* Update wasm-tools dependencies
This update brings in a number of features such as:
* The component model binary format and AST has been slightly adjusted
in a few locations. Names are dropped from parameters/results now in
the internal representation since they were not used anyway. At this
time the ability to bind a multi-return function has not been exposed.
* The `wasmparser` validator pass will now share allocations with prior
functions, providing what's probably a very minor speedup for Wasmtime
itself.
* The text format for many component-related tests now requires named
parameters.
* Some new relaxed-simd instructions are updated to be ignored.
I hope to have a follow-up to expose the multi-return ability to the
embedding API of components.
* Update audit information for new crates
* Leverage Cargo's workspace inheritance feature
This commit is an attempt to reduce the complexity of the Cargo
manifests in this repository with Cargo's workspace-inheritance feature
becoming stable in Rust 1.64.0. This feature allows specifying fields in
the root workspace `Cargo.toml` which are then reused throughout the
workspace. For example this PR shares definitions such as:
* All of the Wasmtime-family of crates now use `version.workspace =
true` to have a single location which defines the version number.
* All crates use `edition.workspace = true` to have one default edition
for the entire workspace.
* Common dependencies are listed in `[workspace.dependencies]` to avoid
typing the same version number in a lot of different places (e.g. the
`wasmparser = "0.89.0"` is now in just one spot.
Currently the workspace-inheritance feature doesn't allow having two
different versions to inherit, so all of the Cranelift-family of crates
still manually specify their version. The inter-crate dependencies,
however, are shared amongst the root workspace.
This feature can be seen as a method of "preprocessing" of sorts for
Cargo manifests. This will help us develop Wasmtime but shouldn't have
any actual impact on the published artifacts -- everything's dependency
lists are still the same.
* Fix wasi-crypto tests
* Initial forward-edge CFI implementation
Give the user the option to start all basic blocks that are targets
of indirect branches with the BTI instruction introduced by the
Branch Target Identification extension to the Arm instruction set
architecture.
Copyright (c) 2022, Arm Limited.
* Refactor `from_artifacts` to avoid second `make_executable` (#1)
This involves "parsing" twice but this is parsing just the header of an
ELF file so it's not a very intensive operation and should be ok to do
twice.
* Address the code review feedback
Copyright (c) 2022, Arm Limited.
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
This commit replaces #4869 and represents the actual version bump that
should have happened had I remembered to bump the in-tree version of
Wasmtime to 1.0.0 prior to the branch-cut date. Alas!
I noticed an oss-fuzz-based timeout that was reported for the
`component_api` fuzzer where the adapter module generated takes 1.5
seconds to compile the singular function in release mode (no fuzzing
enabled). The test case in question was a deeply recursive
list-of-list-of-etc and only one function was generated instead of
multiple. I updated the cost of strings/lists to cost more in the
approximate cost calculation which now forces the one giant function to
get split up and the large function is now split up into multiple
smaller function that take milliseconds to compile.
Add a function_alignment function to the TargetIsa trait, and use it to align functions when generating objects. Additionally, collect the maximum alignment required for pc-relative constants in functions and pass that value out. Use the max of these two values when padding functions for alignment.
This fixes a bug on x86_64 where rip-relative loads to sse registers could cause a segfault, as functions weren't always guaranteed to be aligned to 16-byte addresses.
Fixes#4812
* components: Limit the recursive size of types in Wasmtime
This commit is aimed at fixing #4814 by placing a hard limit on the
maximal recursive depth a type may have in the component model. The
component model theoretically allows for infinite recursion but many
various types of operations within the component model are naturally
written as recursion over the structure of a type which can lead to
stack overflow with deeply recursive types. Some examples of recursive
operations are:
* Lifting and lowering a type - currently the recursion here is modeled
in Rust directly with `#[derive]` implementations as well as the
implementations for the `Val` type.
* Compilation of adapter trampolines which iterates over the type
structure recursively.
* Historically many various calculations like the size of a type, the
flattened representation of a type, etc, were all done recursively.
Many of these are more efficiently done via other means but it was
still natural to implement these recursively initially.
By placing a hard limit on type recursion Wasmtime won't be able to load
some otherwise-valid modules. The hope, though, is that no human-written
program is likely to ever reach this limit. This limit can be revised
and/or the locations with recursion revised if it's ever reached.
The implementation of this feature is done by generalizing the current
flattened-representation calculation which now keeps track of a type's
depth and size. The size calculation isn't used just yet but I plan to
use it in fixing #4816 and it was natural enough to write here as well.
The depth is checked after a type is translated and if it exceeds the
maximum then an error is returned.
Additionally the `Arbitrary for Type` implementation was updated to
prevent generation of a type that's too-recursive.
Closes#4814
* Remove unused size calculation
* Bump up just under the limit
This commit is a (second?) attempt at improving the generation of
adapter modules to avoid excessively large functions for fuzz-generated
inputs.
The first iteration of adapters simply translated an entire type
inline per-function. This proved problematic however since the size of
the adapter function was on the order of the overall size of a type,
which can be exponential for a type that is otherwise defined in linear
size.
The second iteration of adapters performed a split where memory-based
types would always be translated with individual functions. The theory
here was that once a type was memory-based it was large enough to not
warrant inline translation in the original function and a separate
outlined function could be shared and otherwise used to deduplicate
portions of the original giant function. This again proved problematic,
however, since the splitting heuristic was quite naive and didn't take
into account large stack-based types.
This third iteration in this commit replaces the previous system with a
similar but slightly more general one. Each adapter function now has a
concept of fuel which is decremented each time a layer of a type is
translated. When fuel runs out further translations are deferred to
outlined functions. The fuel counter should hopefully provide a sort of
reasonable upper bound on the size of a function and the outlined
functions should ideally provide the ability to be called from multiple
places and therefore deduplicate what would otherwise be a massive
function.
This final iteration is another attempt at guaranteeing that an adapter
module is linear in size with respect to the input type section of the
original module. Additionally this iteration uniformly handles stack and
memory-based translations which means that stack-based translations
can't go wild in their function size and memory-based translations may
benefit slightly from having at least a little bit of inlining
internally.
The immediate impact of this is that the `component_api` fuzzer seems to
be running at a faster rate than before. Otherwise #4825 is sufficient
to invalidate preexisting fuzz-bugs and this PR is hopefully the final
nail in the coffin to prevent further timeouts for small inputs cropping
up.
Closes#4816
* Upgrade wasm-tools crates, namely the component model
This commit pulls in the latest versions of all of the `wasm-tools`
family of crates. There were two major changes that happened in
`wasm-tools` in the meantime:
* bytecodealliance/wasm-tools#697 - this commit introduced a new API for
more efficiently reading binary operators from a wasm binary. The old
`Operator`-based reading was left in place, however, and continues to
be what Wasmtime uses. I hope to update Wasmtime in a future PR to use
this new API, but for now the biggest change is...
* bytecodealliance/wasm-tools#703 - this commit was a major update to
the component model AST. This commit almost entirely deals with the
fallout of this change.
The changes made to the component model were:
1. The `unit` type no longer exists. This was generally a simple change
where the `Unit` case in a few different locations were all removed.
2. The `expected` type was renamed to `result`. This similarly was
relatively lightweight and mostly just a renaming on the surface. I
took this opportunity to rename `val::Result` to `val::ResultVal` and
`types::Result` to `types::ResultType` to avoid clashing with the
standard library types. The `Option`-based types were handled with
this as well.
3. The payload type of `variant` and `result` types are now optional.
This affected many locations that calculate flat type
representations, ABI information, etc. The `#[derive(ComponentType)]`
macro now specifically handles Rust-defined `enum` types which have
no payload to the equivalent in the component model.
4. Functions can now return multiple parameters. This changed the
signature of invoking component functions because the return value is
now bound by `ComponentNamedList` (renamed from `ComponentParams`).
This had a large effect in the tests, fuzz test case generation, etc.
5. Function types with 2-or-more parameters/results must uniquely name
all parameters/results. This mostly affected the text format used
throughout the tests.
I haven't added specifically new tests for multi-return but I changed a
number of tests to use it. Additionally I've updated the fuzzers to all
exercise multi-return as well so I think we should get some good
coverage with that.
* Update version numbers
* Use crates.io
* Fix a compile error on nightly Rust
It looks like Rust nightly has gotten a bit more strict about
attributes-on-expressions and previously accepted code is no longer
accepted. This commit updates the generated code for a macro to a form
which is accepted by rustc.
* Fix a soundness issue with lowering variants
This commit fixes a soundness issue lowering variants in the component
model where host memory could be leaked to the guest module by accident.
In reviewing code recently for `Val::lower` I noticed that the variant
lowering was extending the payload with `ValRaw::u32(0)` to
appropriately fit the size of the variant. In reading this it appeared
incorrect to me due to the fact that it should be `ValRaw::u64(0)` since
up to 64-bits can be read. Additionally this implementation was also
incorrect because the lowered representation of the payload itself was
not possibly zero-extended to 64-bits to accommodate other variants.
It turned out these issues were benign because with the dynamic
surface area to the component model the arguments were all initialized
to 0 anyway. The static version of the API, however, does not initialize
arguments to 0 and I wanted to initially align these two implementations
so I updated the variant implementation of lowering for dynamic values
and removed the zero-ing of arguments.
To test this change I updated the `debug` mode of adapter module
generation to assert that the upper bits of values in wasm are always
zero when the value is casted down (during `stack_get` which only
happens with variants). I then threaded through the `debug` boolean
configuration parameter into the dynamic and static fuzzers.
To my surprise this new assertion tripped even after the fix was
applied. It turns out, though, that there was other leakage of bits
through other means that I was previously unaware of. At the primitive
level lowerings of types like `u32` will have a `Lower` representation
of `ValRaw` and the lowering is simply `dst.write(ValRaw::i32(self))`,
or the equivalent thereof. The problem, that the fuzzers detected, with
this pattern is that the `ValRaw` type is 16-bytes, and
`ValRaw::i32(X)` only initializes the first 4. This meant that all the
lowerings for all primitives were writing up to 12 bytes of garbage from
the host for the wasm module to read.
It turned out that this write of a `ValRaw` was sometimes 16 bytes and
sometimes the appropriate size depending on the number of optimizations
in play. With enough inlining for example `dst.write(ValRaw::i32(self))`
would only write 4 bytes, as expected. In debug mode though without
inlining 16 bytes would be written, including the garbage from the upper
bits.
To solve this issue I ended up taking a somewhat different approach. I
primarily updated the `ValRaw` constructors to simply always extend the
values internally to 64-bits, meaning that the low 8 bytes of a `ValRaw`
is always initialized. This prevents any undefined data from leaking
from the host into a wasm module, and means that values are also
zero-extended even if they're only used in 32-bit contexts outside of a
variant. This felt like the best fix for now, though, in terms of
not really having a performance impact while additionally not requiring
a rewrite of all lowerings.
This solution ended up also neatly removing the "zero out the entire
payload" logic that was previously require. Now after a payload is
lowered only the tail end of the payload, up to the size of the variant,
is zeroed out. This means that each lowered argument is written to at
most once which should hopefully be a small performance boost for
calling into functions as well.
* Optimize flat type representation calculations
Previously calculating the flat type representation would be done
recursively for an entire type tree every time it was visited.
Additionally the flat type representation was entirely built only to be
thrown away if it was too large at the end. This chiefly presented a
source of recursion based on the type structure in the component model
which fuzzing does not like as it reports stack overflows.
This commit overhauls the representation of flat types in Wasmtime by
caching the representation for each type in the compile-time
`ComponentTypesBuilder` structure. This avoids recalculating each time
the flat representation is queried and additionally allows opportunity
to have more short-circuiting to avoid building overly-large vectors.
* Remove duplicate flat count calculation in wasmtime
Roughly share the infrastructure in the `wasmtime-environ` crate, namely
the non-recursive and memoizing nature of the calculation.
* Fix component fuzz build
* Fix example compile
A `GtU` condition needed to actually be `GeU`, as the comment right
above it stated but apparently I forgot to translate the comment to
actual code. This fixes a fuzz bug that arose from oss-fuzz over the
weekend.
This is the implementation of https://github.com/bytecodealliance/wasmtime/issues/4155, using the "inverted API" approach suggested by @cfallin (thanks!) in Cranelift, and trait object to provide a backend for an all-included experience in Wasmtime.
After the suggestion of Chris, `Function` has been split into mostly two parts:
- on the one hand, `FunctionStencil` contains all the fields required during compilation, and that act as a compilation cache key: if two function stencils are the same, then the result of their compilation (`CompiledCodeBase<Stencil>`) will be the same. This makes caching trivial, as the only thing to cache is the `FunctionStencil`.
- on the other hand, `FunctionParameters` contain the... function parameters that are required to finalize the result of compilation into a `CompiledCode` (aka `CompiledCodeBase<Final>`) with proper final relocations etc., by applying fixups and so on.
Most changes are here to accomodate those requirements, in particular that `FunctionStencil` should be `Hash`able to be used as a key in the cache:
- most source locations are now relative to a base source location in the function, and as such they're encoded as `RelSourceLoc` in the `FunctionStencil`. This required changes so that there's no need to explicitly mark a `SourceLoc` as the base source location, it's automatically detected instead the first time a non-default `SourceLoc` is set.
- user-defined external names in the `FunctionStencil` (aka before this patch `ExternalName::User { namespace, index }`) are now references into an external table of `UserExternalNameRef -> UserExternalName`, present in the `FunctionParameters`, and must be explicitly declared using `Function::declare_imported_user_function`.
- some refactorings have been made for function names:
- `ExternalName` was used as the type for a `Function`'s name; while it thus allowed `ExternalName::Libcall` in this place, this would have been quite confusing to use it there. Instead, a new enum `UserFuncName` is introduced for this name, that's either a user-defined function name (the above `UserExternalName`) or a test case name.
- The future of `ExternalName` is likely to become a full reference into the `FunctionParameters`'s mapping, instead of being "either a handle for user-defined external names, or the thing itself for other variants". I'm running out of time to do this, and this is not trivial as it implies touching ISLE which I'm less familiar with.
The cache computes a sha256 hash of the `FunctionStencil`, and uses this as the cache key. No equality check (using `PartialEq`) is performed in addition to the hash being the same, as we hope that this is sufficient data to avoid collisions.
A basic fuzz target has been introduced that tries to do the bare minimum:
- check that a function successfully compiled and cached will be also successfully reloaded from the cache, and returns the exact same function.
- check that a trivial modification in the external mapping of `UserExternalNameRef -> UserExternalName` hits the cache, and that other modifications don't hit the cache.
- This last check is less efficient and less likely to happen, so probably should be rethought a bit.
Thanks to both @alexcrichton and @cfallin for your very useful feedback on Zulip.
Some numbers show that for a large wasm module we're using internally, this is a 20% compile-time speedup, because so many `FunctionStencil`s are the same, even within a single module. For a group of modules that have a lot of code in common, we get hit rates up to 70% when they're used together. When a single function changes in a wasm module, every other function is reloaded; that's still slower than I expect (between 10% and 50% of the overall compile time), so there's likely room for improvement.
Fixes#4155.
This commit is an effort to reduce the amount of complexity around
managing the size/alignment calculations of types in the canonical ABI.
Previously the logic for the size/alignment of a type was spread out
across a number of locations. While each individual calculation is not
really the most complicated thing in the world having the duplication in
so many places was constantly worrying me.
I've opted in this commit to centralize all of this within the runtime
at least, and now there's only one "duplicate" of this information in
the fuzzing infrastructure which is to some degree less important to
deduplicate. This commit introduces a new `CanonicalAbiInfo` type to
house all abi size/align information for both memory32 and memory64.
This new type is then used pervasively throughout fused adapter
compilation, dynamic `Val` management, and typed functions. This type
was also able to reduce the complexity of the macro-generated code
meaning that even `wasmtime-component-macro` is performing less math
than it was before.
One other major feature of this commit is that this ABI information is
now saved within a `ComponentTypes` structure. This avoids recursive
querying of size/align information frequently and instead effectively
caching it. This was a worry I had for the fused adapter compiler which
frequently sought out size/align information and would recursively
descend each type tree each time. The `fact-valid-module` fuzzer is now
nearly 10x faster in terms of iterations/s which I suspect is due to
this caching.
In #4640 a feature was added to adapter modules that whenever
translation goes through memory it instead goes through a helper
function as opposed to inlining it directly. The generation of the
helper function happened recursively at compile time, however, and sure
enough oss-fuzz has found an input which blows the host stack at compile
time.
This commit removes the compile-time recursion from the adapter compiler
when translating these helper functions by deferring the translation to
a worklist which is processed after the original function is translated.
This makes the stack-based recursion instead heap-based, removing the
stack overflow.
The spec was expected to change to not bounds-check 0-byte lists/strings
but has since been updated to match `memory.copy` which does indeed
check the pointer for 0-byte copies.
This commit implements a scheme I've been meaning to work on in the
adapter compiler where instead of always generating a fresh local for
all operations locals may now be reused. Locals generated are explicitly
free'd when their lexical scope has ended, allowing reuse in translation
of later types in the adapter.
This also implements a new scheme for initializing locals where
previously a local could simply be generated, but now the local must be
fused with its initializer where a `local.{tee,set}` instruction is
always generated. This should help prevent a bug I ran into with strings
where one usage of a local was forgotten to be initialized which meant
that when it was used during a loop it may have had a stale value from
before.
Modeling this in Rust isn't possible at compile time unfortunately so I
opted for the next best thing, runtime panics. If a local is
accidentally not released back to the pool of free locals then it will
panic. The fuzzer for simply generating and validating adapter modules
should be good at exercising this and it weeded out a few forgotten
free's and should be good now.
* Improve the `component_api` fuzzer on a few dimensions
* Update the generated component to use an adapter module. This involves
two core wasm instances communicating with each other to test that
data flows through everything correctly. The intention here is to fuzz
the fused adapter compiler. String encoding options have been plumbed
here to exercise differences in string encodings.
* Use `Cow<'static, ...>` and `static` declarations for each static test
case to try to cut down on rustc codegen time.
* Add `Copy` to derivation of fuzzed enums to make `derive(Clone)`
smaller.
* Use `Store<Box<dyn Any>>` to try to cut down on codegen by
monomorphizing fewer `Store<T>` implementation.
* Add debug logging to print out what's flowing in and what's flowing
out for debugging failures.
* Improve `Debug` representation of dynamic value types to more closely
match their Rust counterparts.
* Fix a variant issue with adapter trampolines
Previously the offset of the payload was calculated as the discriminant
aligned up to the alignment of a singular case, but instead this needs
to be aligned up to the alignment of all cases to ensure all cases start
at the same location.
* Fix a copy/paste error when copying masked integers
A 32-bit load was actually doing a 16-bit load by accident since it was
copied from the 16-bit load-and-mask case.
* Fix f32/i64 conversions in adapter modules
The adapter previously erroneously converted the f32 to f64 and then to
i64, where instead it should go from f32 to i32 to i64.
* Fix zero-sized flags in adapter modules
This commit corrects the size calculation for zero-sized flags in
adapter modules.
cc #4592
* Fix a variant size calculation bug in adapters
This fixes the same issue found with variants during normal host-side
fuzzing earlier where the size of a variant needs to align up the
summation of the discriminant and the maximum case size.
* Implement memory growth in libc bump realloc
Some fuzz-generated test cases are copying lists large enough to exceed
one page of memory so bake in a `memory.grow` to the bump allocator as
well.
* Avoid adapters of exponential size
This commit is an attempt to avoid adapters being exponentially sized
with respect to the type hierarchy of the input. Previously all
adaptation was done inline within each adapter which meant that if
something was structured as `tuple<T, T, T, T, ...>` the translation of
`T` would be inlined N times. For very deeply nested types this can
quickly create an exponentially sized adapter with types of the form:
(type $t0 (list u8))
(type $t1 (tuple $t0 $t0))
(type $t2 (tuple $t1 $t1))
(type $t3 (tuple $t2 $t2))
;; ...
where the translation of `t4` has 8 different copies of translating
`t0`.
This commit changes the translation of types through memory to almost
always go through a helper function. The hope here is that it doesn't
lose too much performance because types already reside in memory.
This can still lead to exponentially sized adapter modules to a lesser
degree where if the translation all happens on the "stack", e.g. via
`variant`s and their flat representation then many copies of one
translation could still be made. For now this commit at least gets the
problem under control for fuzzing where fuzzing doesn't trivially find
type hierarchies that take over a minute to codegen the adapter module.
One of the main tricky parts of this implementation is that when a
function is generated the index that it will be placed at in the final
module is not known at that time. To solve this the encoded form of the
`Call` instruction is saved in a relocation-style format where the
`Call` isn't encoded but instead saved into a different area for
encoding later. When the entire adapter module is encoded to wasm these
pseudo-`Call` instructions are encoded as real instructions at that
time.
* Fix some memory64 issues with string encodings
Introduced just before #4623 I had a few mistakes related to 64-bit
memories and mixing 32/64-bit memories.
* Actually insert into the `translate_mem_funcs` map
This... was the whole point of having the map!
* Assert memory growth succeeds in bump allocator
* Implement strings in adapter modules
This commit is a hefty addition to Wasmtime's support for the component
model. This implements the final remaining type (in the current type
hierarchy) unimplemented in adapter module trampolines: strings. Strings
are the most complicated type to implement in adapter trampolines
because they are highly structured chunks of data in memory (according
to specific encodings). Additionally each lift/lower operation can
choose its own encoding for strings meaning that Wasmtime, the host, may
have to convert between any pairwise ordering of string encodings.
The `CanonicalABI.md` in the component-model repo in general specifies
all the fiddly bits of string encoding so there's not a ton of wiggle
room for Wasmtime to get creative. This PR largely "just" implements
that. The high-level architecture of this implementation is:
* Fused adapters are first identified to determine src/dst string
encodings. This statically fixes what transcoding operation is being
performed.
* The generated adapter will be responsible for managing calls to
`realloc` and performing bounds checks. The adapter itself does not
perform memory copies or validation of string contents, however.
Instead each transcoding operation is modeled as an imported function
into the adapter module. This means that the adapter module
dynamically, during compile time, determines what string transcoders
are needed. Note that an imported transcoder is not only parameterized
over the transcoding operation but additionally which memory is the
source and which is the destination.
* The imported core wasm functions are modeled as a new
`CoreDef::Transcoder` structure. These transcoders end up being small
Cranelift-compiled trampolines. The Cranelift-compiled trampoline will
load the actual base pointer of memory and add it to the relative
pointers passed as function arguments. This trampoline then calls a
transcoder "libcall" which enters Rust-defined functions for actual
transcoding operations.
* Each possible transcoding operation is implemented in Rust with a
unique name and a unique signature depending on the needs of the
transcoder. I've tried to document inline what each transcoder does.
This means that the `Module::translate_string` in adapter modules is by
far the largest translation method. The main reason for this is due to
the management around calling the imported transcoder functions in the
face of validating string pointer/lengths and performing the dance of
`realloc`-vs-transcode at the right time. I've tried to ensure that each
individual case in transcoding is documented well enough to understand
what's going on as well.
Additionally in this PR is a full implementation in the host for the
`latin1+utf16` encoding which means that both lifting and lowering host
strings now works with this encoding.
Currently the implementation of each transcoder function is likely far
from optimal. Where possible I've leaned on the standard library itself
and for latin1-related things I'm leaning on the `encoding_rs` crate. I
initially tried to implement everything with `encoding_rs` but was
unable to uniformly do so easily. For now I settled on trying to get a
known-correct (even in the face of endianness) implementation for all of
these transcoders. If an when performance becomes an issue it should be
possible to implement more optimized versions of each of these
transcoding operations.
Testing this commit has been somewhat difficult and my general plan,
like with the `(list T)` type, is to rely heavily on fuzzing to cover
the various cases here. In this PR though I've added a simple test that
pushes some statically known strings through all the pairs of encodings
between source and destination. I've attempted to pick "interesting"
strings that one way or another stress the various paths in each
transcoding operation to ideally get full branch coverage there.
Additionally a suite of "negative" tests have also been added to ensure
that validity of encoding is actually checked.
* Fix a temporarily commented out case
* Fix wasmtime-runtime tests
* Update deny.toml configuration
* Add `BSD-3-Clause` for the `encoding_rs` crate
* Remove some unused licenses
* Add an exemption for `encoding_rs` for now
* Split up the `translate_string` method
Move out all the closures and package up captured state into smaller
lists of arguments.
* Test out-of-bounds for zero-length strings
When an adapter module depends on a particular core wasm instance this
means that it actually depends on not only that instance but all prior
core wasm instances as well. This is because core wasm instances must be
instantiated in the specified order within a component and that cannot
change depending on the dataflow between adapters. This commit fixes a
possible panic from linearizing the component dfg where an adapter
module tried to depend on an instance that hadn't been instantiated yet
because the ordering dependency between core wasm instances hadn't been
modeled.
* components: ignore export aliases to types in translation.
Currently, translation is ignoring type exports from components during
translation by skipping over them before adding them to the exports map.
If a component instantiates an inner component and aliases a type export of
that instance, it will cause wasmtime to panic with a failure to find the
export in the exports map.
The fix is to add a representation for exported types to the map that is simply
ignored when encountered. This also makes it easier to track places where we
would have to support type exports in translation in the future.
* Keep type information for type exports.
This commit keeps the type information for type exports so that types can be
properly aliased from an instance export and thereby adjusting the type index
space accordingly.
* Add a simple test case for type exports for the component model.
* Add a dataflow-based representation of components
This commit updates the inlining phase of compiling a component to
creating a dataflow-based representation of a component instead of
creating a final `Component` with a linear list of initializers. This
dataflow graph is then linearized in a final step to create the actual
final `Component`.
The motivation for this commit stems primarily from my work implementing
strings in fused adapters. In doing this my plan is to defer most
low-level transcoding to the host itself rather than implementing that
in the core wasm adapter modules. This means that small
cranelift-generated trampolines will be used for adapter modules to call
which then call "transcoding libcalls". The cranelift-generated
trampolines will get raw pointers into linear memory and pass those to
the libcall which core wasm doesn't have access to when passing
arguments to an import.
Implementing this with the previous representation of a `Component` was
becoming too tricky to bear. The initialization of a transcoder needed
to happen at just the right time: before the adapter module which needed
it was instantiated but after the linear memories referenced had been
extracted into the `VMComponentContext`. The difficulty here is further
compounded by the current adapter module injection pass already being
quite complicated. Adapter modules are already renumbering the index
space of runtime instances and shuffling items around in the
`GlobalInitializer` list. Perhaps the worst part of this was that
memories could already be referenced by host function imports or exports
to the host, and if adapters referenced the same memory it shouldn't be
referenced twice in the component. This meant that `ExtractMemory`
initializers ideally needed to be shuffled around in the initializer
list to happen as early as possible instead of wherever they happened to
show up during translation.
Overall I did my best to implement the transcoders but everything always
came up short. I have decided to throw my hands up in the air and try a
completely different approach to this, namely the dataflow-based
representation in this commit. This makes it much easier to edit the
component after initial translation for injection of adapters, injection
of transcoders, adding dependencies on possibly-already-existing items,
etc. The adapter module partitioning pass in this commit was greatly
simplified to something which I believe is functionally equivalent but
is probably an order of magnitude easier to understand.
The biggest downside of this representation I believe is having a
duplicate representation of a component. The `component::info` was
largely duplicated into the `component::dfg` module in this commit.
Personally though I think this is a more appropriate tradeoff than
before because it's very easy to reason about "convert representation A
to B" code whereas it was very difficult to reason about shuffling
around `GlobalInitializer` items in optimal fashions. This may also have
a cost at compile-time in terms of shuffling data around, but my hope is
that we have lots of other low-hanging fruit to optimize if it ever
comes to that which allows keeping this easier-to-understand
representation.
Finally, to reiterate, the final representation of components is not
changed by this PR. To the runtime internals everything is still the
same.
* Fix compile of factc
This addresses #4307.
For the static API we generate 100 arbitrary test cases at build time, each of
which includes 0-5 parameter types, a result type, and a WAT fragment containing
an imported function and an exported function. The exported function calls the
imported function, which is implemented by the host. At runtime, the fuzz test
selects a test case at random and feeds it zero or more sets of arbitrary
parameters and results, checking that values which flow host-to-guest and
guest-to-host make the transition unchanged.
The fuzz test for the dynamic API follows a similar pattern, the only difference
being that test cases are generated at runtime.
Signed-off-by: Joel Dice <joel.dice@fermyon.com>
* Wasmtime: Add a pointer to `VMRuntimeLimits` in component contexts
* Save exit Wasm FP and PC in component-to-host trampolines
Fixes#4535
* Add comment about why we deref the trampoline's FP
* Update some tests to use new `vmruntime_limits_*` methods
This commit aims to improve the readability of supporting the memory64
proposal in the `fact` adapter trampoline compiler. Previously there
were a few sprinkled blocks that used `if` to generate different
instructions inline, but as I've worked on support for strings this has
become pretty unwieldy as strings do far more memory manipulation than
other type conversions. A pattern that's easier to read is to have
small instruction helpers that take the pointer width as an argument and
internally dispatch to the correct instruction. This keeps the main
translation code branch-free and a bit easier to follow. Additionally
for more complicated branching logic it allows for deduplicating the
main translation path by having lots of little branches instead of one
large branch with everything duplicated on both halves.
This is a collection of some minor renamings, refactorings, sharing of
code, etc. This was all discovered during my addition of string support
to adapter functions and I figured it'd be best to frontload this and
land it ahead of the full patch since it's getting complex.
This commit fixes trampoline compilation for lists where the loop condition
would only branch if the amount remaining was 0 instead of not 0.
It resulted in only the first element of the list being copied.
* Implement fused adapters for `(list T)` types
This commit implements one of the two remaining types for adapter
fusion, lists. This implementation is particularly tricky for a number
of reasons:
* Lists have a number of validity checks which need to be carefully
implemented. For example the byte length of the list passed to
allocation in the destination module could overflow the 32-bit index
space. Additionally lists in 32-bit memories need a check that their
final address is in-bounds in the address space.
* In the effort to go ahead and support memory64 at the lowest layers
this is where much of the magic happens. Lists are naturally always
stored in memory and shifting between 64/32-bit address spaces
is done here. This notably required plumbing an `Options` around
during flattening/size/alignment calculations due to the size/types of
lists changing depending on the memory configuration.
I've also added a small `factc` program in this commit which should
hopefully assist in exploring and debugging adapter modules. This takes
as input a component (text or binary format) and then generates an
adapter module for all component function signatures found internally.
This commit notably does not include tests for lists. I tried to figure
out a good way to add these but I felt like there were too many cases to
test and the tests would otherwise be extremely verbose. Instead I think
the best testing strategy for this commit will be through #4537 which
should be relatively extensible to testing adapters between modules in
addition to host-based lifting/lowering.
* Improve handling of lists of 0-size types
* Skip overflow checks on byte sizes for 0-size types
* Skip the copy loop entirely when src/dst are both 0
* Skip the increments of src/dst pointers if either is 0-size
* Update semantics for zero-sized lists/strings
When a list/string has a 0-byte-size the base pointer is no longer
verified to be in-bounds to match the supposedly desired adapter
semantics where no trap happens because no turn of the loop happens.
This commit goes through and updates support in the various argument
passing routines to support 0-sized flags. A bit of a degenerate case
but clarified in WebAssembly/component-model#76 as intentional.