* Fix miscompile from functions mutating `VMContext`
This commit fixes a miscompilation in Wasmtime on LLVM 16 where methods
on `Instance` which mutated the state of the internal `VMContext` were
optimized to not actually mutate the state. The root cause of this issue
is a change in LLVM which takes advantage of `noalias readonly` pointers
which is how `&self` methods are translated. This means that `Instance`
methods which take `&self` but actually mutate the `VMContext` end up
being undefined behavior from LLVM's point of view, meaning that the
writes are candidate for removal.
The fix applied here is intended to be a temporary one while a more
formal fix, ideally backed by `cargo miri` verification, is implemented
on `main`. The fix here is to change the return value of
`vmctx_plus_offset` to return `*const T` instead of `*mut T`. This
caused lots of portions of the runtime code to stop compiling because
mutations were indeed happening. To cover these a new
`vmctx_plus_offset_mut` method was added which notably takes `&mut self`
instead of `&self`. This forced all callers which may mutate to reflect
the `&mut self` requirement, propagating that outwards.
This fixes the miscompilation with LLVM 16 in the immediate future and
should be at least a meager line of defense against issues like this in
the future. This is not a long-term fix, though, since `cargo miri`
still does not like what's being done in `Instance` and with
`VMContext`. That fix is likely to be more invasive, though, so it's
being deferred to later.
* Update release notes
* Fix release dates
* Backport "Allow WASI to open directories without O_DIRECTORY" (#6163)
The `O_DIRECTORY` flag is a request that open should fail if the named
path is not a directory. Opening a path which turns out to be a
directory is not supposed to fail if this flag is not specified.
However, wasi-common required callers to use it when opening
directories.
With this PR, we always open the path the same way whether or not the
`O_DIRECTORY` flag is specified. However, after opening it, we `stat` it
to check whether it turned out to be a directory, and determine which
operations the file descriptor should support accordingly. In addition,
we explicitly check whether the precondition defined by `O_DIRECTORY` is
satisfied.
On Windows, when opening a path which might be a directory using
`CreateFile`, cap-primitives also removes the `FILE_SHARE_DELETE` mode.
That means that if we implement WASI's `path_open` such that it always
uses `CreateFile` on Windows, for both files and directories, then
holding an open file handle prevents deletion of that file.
* Update RELEASES.md
* Fix export translation for components.
Exports in the component model cause a new index to be added to the index space
of the item being exported.
This commit updates component translation so that translation of component
export sections properly updates internal lists representing those index
spaces.
* Code review feedback.
This commit changes the signature of the `Store::epoch_deadline_callback` to
take in `StoreContextMut` instead of a mutable reference to the store's data.
This is useful in cases in which the callback definition needs access to the
Store to be able to use other methods that take in `AsContext`/`AsContextMut`,
like for example `WasmtimeBacktrace::capture`
* Add support for generating perf maps for simple perf profiling
* add missing enum entry in C code
* bugfix: use hexa when printing the code region's length too (thanks bjorn3!)
* sanitize file name + use bufwriter
* introduce --profile CLI flag for wasmtime
* Update doc and doc comments for new --profile option
* remove redundant FromStr import
* Apply review feedback: make_line receives a Write impl, report errors
* fix tests?
* better docs
* cranelift-wasm: Add a bounds-checking optimization for dynamic memories and guard pages
This is a new special case for when we know that there are enough guard pages to
cover the memory access's offset and access size.
The precise should-we-trap condition is
index + offset + access_size > bound
However, if we instead check only the partial condition
index > bound
then the most out of bounds that the access can be, while that partial check
still succeeds, is `offset + access_size`.
However, when we have a guard region that is at least as large as `offset +
access_size`, we can rely on the virtual memory subsystem handling these
out-of-bounds errors at runtime. Therefore, the partial `index > bound` check is
sufficient for this heap configuration.
Additionally, this has the advantage that a series of Wasm loads that use the
same dynamic index operand but different static offset immediates -- which is a
common code pattern when accessing multiple fields in the same struct that is in
linear memory -- will all emit the same `index > bound` check, which we can GVN.
* cranelift: Add WAT tests for accessing dynamic memories with the same index but different offsets
The bounds check comparison is GVN'd but we still branch on values we should
know will always be true if we get this far in the code. This is actual `br_if`s
in the non-Spectre code and `select_spectre_guard`s that we should know will
always go a certain way if we have Spectre mitigations enabled.
Improving the non-Spectre case is pretty straightforward: walk the dominator
tree and remember which values we've already branched on at this point, and
therefore we can simplify any further conditional branches on those same values
into direct jumps.
Improving the Spectre case requires something that is morally the same, but has
a few snags:
* We don't have actual `br_if`s to determine whether the bounds checking
condition succeeded or not. We need to instead reason about dominating
`select_spectre_guard; {load, store}` instruction pairs.
* We have to be SUPER careful about reasoning "through" `select_spectre_guard`s.
Our general rule is never to do that, since it could break the speculative
execution sandboxing that the instruction is designed for.
* Validate faulting addresses are valid to fault on
This commit adds a defense-in-depth measure to Wasmtime which is
intended to mitigate the impact of CVEs such as GHSA-ff4p-7xrq-q5r8.
Currently Wasmtime will catch `SIGSEGV` signals for WebAssembly code so
long as the instruction which faulted is an allow-listed instruction
(aka has a trap code listed for it). With the recent security issue,
however, the problem was that a wasm guest could exploit a compiler bug
to access memory outside of its sandbox. If the access was successful
there's no real way to detect that, but if the access was unsuccessful
then Wasmtime would happily swallow the `SIGSEGV` and report a nominal
trap. To embedders, this might look like nothing is going awry.
The new strategy implemented here in this commit is to attempt to be
more robust towards these sorts of failures. When a `SIGSEGV` is raised
the faulting pc is recorded but additionally the address of the
inaccessible location is also record. After the WebAssembly stack is
unwound and control returns to Wasmtime which has access to a `Store`
Wasmtime will now use this inaccessible faulting address to translate it
to a wasm address. This process should be guaranteed to succeed as
WebAssembly should only be able to access a well-defined region of
memory for all linear memories in a `Store`.
If no linear memory in a `Store` could contain the faulting address,
then Wasmtime now prints a scary message and aborts the process. The
purpose of this is to catch these sorts of bugs, make them very loud
errors, and hopefully mitigate impact. This would continue to not
mitigate the impact of a guest successfully loading data outside of its
sandbox, but if a guest was doing a sort of probing strategy trying to
find valid addresses then any invalid access would turn into a process
crash which would immediately be noticed by embedders.
While I was here I went ahead and additionally took a stab at #3120.
Traps due to `SIGSEGV` will now report the size of linear memory and the
address that was being accessed in addition to the bland "access out of
bounds" error. While this is still somewhat bland in the context of a
high level source language it's hopefully at least a little bit more
actionable for some. I'll note though that this isn't a guaranteed
contextual message since only the default configuration for Wasmtime
generates `SIGSEGV` on out-of-bounds memory accesses. Dynamically
bounds-checked configurations, for example, don't do this.
Testing-wise I unfortunately am not aware of a great way to test this.
The closet equivalent would be something like an `unsafe` method
`Config::allow_wasm_sandbox_escape`. In lieu of adding tests, though, I
can confirm that during development the crashing messages works just
fine as it took awhile on macOS to figure out where the faulting address
was recorded in the exception information which meant I had lots of
instances of recording an address of a trap not accessible from wasm.
* Fix tests
* Review comments
* Fix compile after refactor
* Fix compile on macOS
* Fix trap test for s390x
s390x rounds faulting addresses to 4k boundaries.
* x64: Take SIGFPE signals for divide traps
Prior to this commit Wasmtime would configure `avoid_div_traps=true`
unconditionally for Cranelift. This, for the division-based
instructions, would change emitted code to explicitly trap on trap
conditions instead of letting the `div` x86 instruction trap.
There's no specific reason for Wasmtime, however, to specifically avoid
traps in the `div` instruction. This means that the extra generated
branches on x86 aren't necessary since the `div` and `idiv` instructions
already trap for similar conditions as wasm requires.
This commit instead disables the `avoid_div_traps` setting for
Wasmtime's usage of Cranelift. Subsequently the codegen rules were
updated slightly:
* When `avoid_div_traps=true`, traps are no longer emitted for `div`
instructions.
* The `udiv`/`urem` instructions now list their trap as divide-by-zero
instead of integer overflow.
* The lowering for `sdiv` was updated to still explicitly check for zero
but the integer overflow case is deferred to the instruction itself.
* The lowering of `srem` no longer checks for zero and the listed trap
for the `div` instruction is a divide-by-zero.
This means that the codegen for `udiv` and `urem` no longer have any
branches. The codegen for `sdiv` removes one branch but keeps the
zero-check to differentiate the two kinds of traps. The codegen for
`srem` removes one branch but keeps the -1 check since the semantics of
`srem` mismatch with the semantics of `idiv` with a -1 divisor
(specifically for INT_MIN).
This is unlikely to have really all that much of a speedup but was
something I noticed during #6008 which seemed like it'd be good to clean
up. Plus Wasmtime's signal handling was already set up to catch
`SIGFPE`, it was just never firing.
* Remove the `avoid_div_traps` cranelift setting
With no known users currently removing this should be possible and helps
simplify the x64 backend.
* x64: GC more support for avoid_div_traps
Remove the `validate_sdiv_divisor*` pseudo-instructions and clean up
some of the ISLE rules now that `div` is allowed to itself trap
unconditionally.
* x64: Store div trap code in instruction itself
* Keep divisors in registers, not in memory
Don't accidentally fold multiple traps together
* Handle EXC_ARITHMETIC on macos
* Update emit tests
* Update winch and tests
Takes the approach described in #6004, but also creates a wrapper for the monotonic time that encapsulates the `creation_time` field as well, since they logically belong and are always used together.
This makes it easier to configure `WasiCtx` with custom clocks as well as disable them for security or determinism reasons.
Closes#6004.
Similar to the `--trap-unknown-imports` option, which defines unknown function
imports with functions that trap when called, this new
`--default-values-unknown-imports` option defines unknown function imports with
a function that returns the default values for the result types (either zero or
null depending on the value type).
This commit fixes a few minor issues that Nick and I ran into walking
through some code with the `wasmtime explore` command:
* When a new function is reached the address map iterator is advanced
past the prior function to avoid accidentally attributing instructions
across functions.
* A `<` comparison was changed to `<=` to fix some off-by-one
attributions from instructions to wasm instructions.
* The `skipdata` option is enabled in Capstone to avoid truncating
AArch64 disassemblies too early.
This implements Godbolt Compiler Explorer-like functionality for Wasmtime and
Cranelift. Given a Wasm module, it compiles the module to native code and then
writes a standalone HTML file that gives a split pane view between the WAT and
ASM disassemblies.
Maps to the corresponding `wasmtime::Config` option. The motivation here
is largely completeness and was something I was looking into with the
failures in #5970
This will allow us to build developer tools for Wasmtime and Cranelift like WAT
and asm side-by-side viewers (a la Godbolt).
These are not proper public APIs, so they are marked `doc(hidden)` and have
comments saying they are only for use within this repo's workspace.
This follows the same strategy pioneered by the `wit-bindgen` guest Rust
bindgen which keeps track of the latest name of an interface for how to
refer to an interface.
Closes#5961
I was debugging [an issue] recently where it appears that the underlying
cause was a discrepancy in the size/align of a WIT type between Wasmtime
and `wit-parser`. This commit adds compile-time assertions that the size
of a WIT type is the same with `wit-parser` as it is in Wasmtime since
the two have different systems to calculate the size of a type. The hope
is that this will head off any future issues if they crop up.
[an issue]: https://github.com/bytecodealliance/wit-bindgen/issues/526
* Enable the native target by default in winch
Match cranelift-codegen's build script where if no architecture is
explicitly enabled then the host architecture is implicitly enabled.
* Refactor Cranelift's ISA builder to share more with Winch
This commit refactors the `Builder` type to have a type parameter
representing the finished ISA with Cranelift and Winch having their own
typedefs for `Builder` to represent their own builders. The intention is
to use this shared functionality to produce more shared code between the
two codegen backends.
* Moving compiler shared components to a separate crate
* Restore native flag inference in compiler building
This fixes an oversight from the previous commits to use
`cranelift-native` to infer flags for the native host when using default
settings with Wasmtime.
* Move `Compiler::page_size_align` into wasmtime-environ
The `cranelift-codegen` crate doesn't need this and winch wants the same
implementation, so shuffle it around so everyone has access to it.
* Fill out `Compiler::{flags, isa_flags}` for Winch
These are easy enough to plumb through with some shared code for
Wasmtime.
* Plumb the `is_branch_protection_enabled` flag for Winch
Just forwarding an isa-specific setting accessor.
* Moving executable creation to shared compiler crate
* Adding builder back in and removing from shared crate
* Refactoring the shared pieces for the `CompilerBuilder`
I decided to move a couple things around from Alex's initial changes.
Instead of having the shared builder do everything, I went back to
having each compiler have a distinct builder implementation. I
refactored most of the flag setting logic into a single shared location,
so we can still reduce the amount of code duplication.
With them being separate, we don't need to maintain things like
`LinkOpts` which Winch doesn't currently use. We also have an avenue to
error when certain flags are sent to Winch if we don't support them. I'm
hoping this will make things more maintainable as we build out Winch.
I'm still unsure about keeping everything shared in a single crate
(`cranelift_shared`). It's starting to feel like this crate is doing too
much, which makes it difficult to name. There does seem to be a need for
two distinct abstraction: creating the final executable and the handling
of shared/ISA flags when building the compiler. I could make them into
two separate crates, but there doesn't seem to be enough there yet to
justify it.
* Documentation updates, and renaming the finish method
* Adding back in a default temporarily to pass tests, and removing some unused imports
* Fixing winch tests with wrong method name
* Removing unused imports from codegen shared crate
* Apply documentation formatting updates
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Adding back in cranelift_native flag inferring
* Adding new shared crate to publish list
* Adding write feature to pass cargo check
---------
Co-authored-by: Alex Crichton <alex@alexcrichton.com>
Co-authored-by: Saúl Cabrera <saulecabrera@gmail.com>
* Initial support for the Relaxed SIMD proposal
This commit adds initial scaffolding and support for the Relaxed SIMD
proposal for WebAssembly. Codegen support is supported on the x64 and
AArch64 backends on this time.
The purpose of this commit is to get all the boilerplate out of the way
in terms of plumbing through a new feature, adding tests, etc. The tests
are copied from the upstream repository at this time while the
WebAssembly/testsuite repository hasn't been updated.
A summary of changes made in this commit are:
* Lowerings for all relaxed simd opcodes have been added, currently all
exhibiting deterministic behavior. This means that few lowerings are
optimal on the x86 backend, but on the AArch64 backend, for example,
all lowerings should be optimal.
* Support is added to codegen to, eventually, conditionally generate
different code based on input codegen flags. This is intended to
enable codegen to more efficient instructions on x86 by default, for
example, while still allowing embedders to force
architecture-independent semantics and behavior. One good example of
this is the `f32x4.relaxed_fmadd` instruction which when deterministic
forces the `fma` instruction, but otherwise if the backend doesn't
have support for `fma` then intermediate operations are performed
instead.
* Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the
x86 backend as they're now exercised by the deterministic lowerings of
relaxed simd instructions.
* Sample codegen tests for added for x86 and aarch64 for some relaxed
simd instructions.
* Wasmtime embedder support for the relaxed-simd proposal and forcing
determinism have been added to `Config` and the CLI.
* Support has been added to the `*.wast` runtime execution for the
`(either ...)` matcher used in the relaxed-simd proposal.
* Tests for relaxed-simd are run both with a default `Engine` as well as
a "force deterministic" `Engine` to test both configurations.
* All tests from the upstream repository were copied into Wasmtime.
These tests should be deleted when WebAssembly/testsuite is updated.
* x64: Add x86-specific lowerings for relaxed simd
This commit builds on the prior commit and adds an array of `x86_*`
instructions to Cranelift which have semantics that match their
corresponding x86 equivalents. Translation for relaxed simd is then
additionally updated to conditionally generate different CLIF for
relaxed simd instructions depending on whether the target is x86 or not.
This means that for AArch64 no changes are made but for x86 most relaxed
instructions now lower to some x86-equivalent with slightly different
semantics than the "deterministic" lowering.
* Add libcall support for fma to Wasmtime
This will be required to implement the `f32x4.relaxed_madd` instruction
(and others) when an x86 host doesn't specify the `has_fma` feature.
* Ignore relaxed-simd tests on s390x and riscv64
* Enable relaxed-simd tests on s390x
* Update cranelift/codegen/meta/src/shared/instructions.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Add a FIXME from review
* Add notes about deterministic semantics
* Don't default `has_native_fma` to `true`
* Review comments and rebase fixes
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
This notably updates `wasmparser` for updates to the relaxed-simd
proposal and an implementation of the function-references proposal.
Additionally there are some minor bug fixes being picked up for WIT and
the component model.
* Change the name of wit-bindgen's host implementation traits.
Instead of naming the host implementation trait something like
`wasi_filesystem::WasiFilesystem`, name it `wasi_filesystem::Host`, and
avoid using the identifier `Host` in other places.
This fixes a collision when generating bindings for the current
wasi-clock API, which contains an interface `wall-clock` which contains
a type `wall-clock`, which created a naming collision on the name
`WallClock`.
* Update tests to use the new trait name.
* Fix one more.
* Add the new test interface to the simple-wasi world.
Early on in WASI, we weren't sure whether we should allow preopens to be
closed, so conservatively, we disallowed them. Among other things, this
protected assumptions in wasi-libc that it can hold onto preopen file
descriptors and rely on them always being open.
However now, I think it makes sense to relax this restriction. wasi-libc
itself doesn't expose the preopen file descriptors, so users shouldn't
ever be closing them naively, unless they have wild closes. And
toolchains other than wasi-libc may want to close preopens as a way to
drop priveleges once the main file handles are opened.
* Add a Result type alias
* Refer to the type in top-level docs
* Use this inside the documentation for the bindgen! macro
* Fix tests
* Address small PR feedback
* Simply re-export anyhow types
* Remove globals from parking spot tests
Use `std:🧵:scope` to keep everything local to just the tests.
* Fix a panic due to a race in `unpark` and `park`
This commit fixes a panic in the `ParkingSpot` implementation where an
`unpark` signal may not get acknowledged when a waiter times out,
causing the waiter to remove itself from the internal map but panic
thinking that it missed an unpark signal.
The fix in this commit is to consume unpark signals when a timeout
happens. This can lead to another possible race I've detailed in the
comments which I believe is allowed by the specification of park/unpark
in wasm.
* Update crates/runtime/src/parking_spot.rs
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
---------
Co-authored-by: Andrew Brown <andrew.brown@intel.com>
* Update world-selection in `bindgen!` macro
Inspired by bytecodealliance/wit-bindgen#494 specifying a world or
document to bindgen is now optional as it's inferred if there's only one
`default world` in a package's documents.
* Add cargo-vet entry
This commit fixes a panic related to type imports where an import of a
type didn't correctly declare the new type index on the Wasmtime side of
things. Additionally this plumbs more support throughout Wasmtime to
support type imports, namely that they do not need to be supplied
through a `Linker`. This additionally implements a feature where empty
instances, even transitively, do not need to be supplied by a Wasmtime
embedder. This means that instances which only have types, for example,
do not need to be supplied into a `Linker` since no runtime information
for them is required anyway.
Closes#5775
This works around a `rustc` bug where compiling with LTO
will sometimes strip out some of the trampoline entrypoint
symbols resulting in a linking failure.
* Update wasm-tools crates
Pulls in a new component binary format which should hopefully be the
last update for awhile.
* Update cargo vet configuration
* Add support for WASI sockets to C API
Add support for WASI sockets in the C API by adding a new API to handle
preopening sockets for clients. This uses HashMap instead of Vec for
preopened sockets to identify if caller has called in more than once
with the same FD number. If so, then we return false so caller is given
hint that they are attempting to overwrite an already existing socket
FD.
* Apply suggestions from code review
Co-authored-by: Peter Huene <peter@huene.dev>
* s/stdlistener/listener/
---------
Co-authored-by: Peter Huene <peter@huene.dev>
* wasi-threads: fix import name
As @TerrorJack pointed out in #5484, that PR implements an older
name--`thread_spawn`. This change uses the now-official name from the
specification--`thread-spawn`.
* fix: update name in test
At some point what is now `funcref` was called `anyfunc` and the spec changed,
but we didn't update our internal names. This does that.
Co-authored-by: Jamey Sharp <jsharp@fastly.com>
This commit includes a set of changes that add initial support for `wasi-threads` to Wasmtime:
* feat: remove mutability from the WasiCtx Table
This patch adds interior mutability to the WasiCtx Table and the Table elements.
Major pain points:
* `File` only needs `RwLock<cap_std::fs::File>` to implement
`File::set_fdflags()` on Windows, because of [1]
* Because `File` needs a `RwLock` and `RwLock*Guard` cannot
be hold across an `.await`, The `async` from
`async fn num_ready_bytes(&self)` had to be removed
* Because `File` needs a `RwLock` and `RwLock*Guard` cannot
be dereferenced in `pollable`, the signature of
`fn pollable(&self) -> Option<rustix::fd::BorrowedFd>`
changed to `fn pollable(&self) -> Option<Arc<dyn AsFd + '_>>`
[1] da238e324e/src/fs/fd_flags.rs (L210-L217)
* wasi-threads: add an initial implementation
This change is a first step toward implementing `wasi-threads` in
Wasmtime. We may find that it has some missing pieces, but the core
functionality is there: when `wasi::thread_spawn` is called by a running
WebAssembly module, a function named `wasi_thread_start` is found in the
module's exports and called in a new instance. The shared memory of the
original instance is reused in the new instance.
This new WASI proposal is in its early stages and details are still
being hashed out in the [spec] and [wasi-libc] repositories. Due to its
experimental state, the `wasi-threads` functionality is hidden behind
both a compile-time and runtime flag: one must build with `--features
wasi-threads` but also run the Wasmtime CLI with `--wasm-features
threads` and `--wasi-modules experimental-wasi-threads`. One can
experiment with `wasi-threads` by running:
```console
$ cargo run --features wasi-threads -- \
--wasm-features threads --wasi-modules experimental-wasi-threads \
<a threads-enabled module>
```
Threads-enabled Wasm modules are not yet easy to build. Hopefully this
is resolved soon, but in the meantime see the use of
`THREAD_MODEL=posix` in the [wasi-libc] repository for some clues on
what is necessary. Wiggle complicates things by requiring the Wasm
memory to be exported with a certain name and `wasi-threads` also
expects that memory to be imported; this build-time obstacle can be
overcome with the `--import-memory --export-memory` flags only available
in the latest Clang tree. Due to all of this, the included tests are
written directly in WAT--run these with:
```console
$ cargo test --features wasi-threads -p wasmtime-cli -- cli_tests
```
[spec]: https://github.com/WebAssembly/wasi-threads
[wasi-libc]: https://github.com/WebAssembly/wasi-libc
This change does not protect the WASI implementations themselves from
concurrent access. This is already complete in previous commits or left
for future commits in certain cases (e.g., wasi-nn).
* wasi-threads: factor out process exit logic
As is being discussed [elsewhere], either calling `proc_exit` or
trapping in any thread should halt execution of all threads. The
Wasmtime CLI already has logic for adapting a WebAssembly error code to
a code expected in each OS. This change factors out this logic to a new
function, `maybe_exit_on_error`, for use within the `wasi-threads`
implementation.
This will work reasonably well for CLI users of Wasmtime +
`wasi-threads`, but embedders will want something better in the future:
when a `wasi-threads` threads fails, they may not want their application
to exit. Handling this is tricky, because it will require cancelling the
threads spawned by the `wasi-threads` implementation, something that is
not trivial to do in Rust. With this change, we defer that work until
later in order to provide a working implementation of `wasi-threads` for
experimentation.
[elsewhere]: https://github.com/WebAssembly/wasi-threads/pull/17
* review: work around `fd_fdstat_set_flags`
In order to make progress with wasi-threads, this change temporarily
works around limitations induced by `wasi-common`'s
`fd_fdstat_set_flags` to allow `&mut self` use in the implementation.
Eventual resolution is tracked in
https://github.com/bytecodealliance/wasmtime/issues/5643. This change
makes several related helper functions (e.g., `set_fdflags`) take `&mut
self` as well.
* test: use `wait`/`notify` to improve `threads.wat` test
Previously, the test simply executed in a loop for some hardcoded number
of iterations. This changes uses `wait` and `notify` and atomic
operations to keep track of when the spawned threads are done and join
on the main thread appropriately.
* various fixes and tweaks due to the PR review
---------
Signed-off-by: Harald Hoyer <harald@profian.com>
Co-authored-by: Harald Hoyer <harald@profian.com>
Co-authored-by: Alex Crichton <alex@alexcrichton.com>