cranelift

Commit Graph

Author	SHA1	Message	Date
Trevor Elliott	4c88acbb89	Test all backends when a runtest is modified (#5872 ) * Test all backends when a runtest is modified * Check that this triggers all backend tests * Revert "Check that this triggers all backend tests" This reverts commit `1d12536d04`.	2 years ago
Jamey Sharp	5cfb461945	Only emit ISLE/egraph terms for single-value insts (#5848 ) For instructions with no results (such as branches and stores) or instructions with multiple results (such as add with carry), we have assertions checking that an optimization rule doesn't try to match on or construct such instructions. When we generate terms for matching or constructing instructions, the terms for these instructions are guaranteed to panic if they're ever used. So let's just not generate them. In the future we may wish to generate terms with different types for these instructions, to make them usable in ISLE rules for optimization that fall outside our current egraph constraints.	2 years ago
Ryan Levick	6d6bd0ea1c	Result alias for convienient use of anyhow::Error without depending on anyhow (#5853 ) * Add a Result type alias * Refer to the type in top-level docs * Use this inside the documentation for the bindgen! macro * Fix tests * Address small PR feedback * Simply re-export anyhow types	2 years ago
Jamey Sharp	7d790fcdfe	x64: Only branch once in br_table (#5850 ) This uses the `cmov`, which was previously necessary for Spectre mitigation, to clamp the table index instead of zeroing it. By then placing the default target as the last entry in the table, we can use just one branch instruction in all cases. Since there isn't a bounds-check branch any more, this sequence no longer needs Spectre mitigation. And since we don't need to be careful about preserving flags, half the instructions can be removed from this pseudoinstruction and emitted as regular instructions instead. This is a net savings of three bytes in the encoding of x64's br_table pseudoinstruction. The generated code can sometimes be longer overall because the blocks are emitted in a slightly different order. My benchmark results show a very small effect on runtime performance with this change. The spidermonkey benchmark in Sightglass runs "1.01x faster" than main by instructions retired, but with no significant difference in CPU cycles. I think that means it rarely hit the default case in any br_table instructions it executed. The pulldown-cmark benchmark in Sightglass runs "1.01x faster" than main by CPU cycles, but main runs "1.00x faster" by instructions retired. I think that means this benchmark hit the default case a significant amount of the time, so it executes a few more instructions per br_table, but maybe the branches were predicted better.	2 years ago
Trevor Elliott	c5d9d5b10f	Remove module-level code generation tests (#5870 ) * Remove module-level code generation tests * Add cold block tests for each backend * Better cold block tests	2 years ago
Alex Crichton	f91640ffab	Fix a panic due to a race in unpark and park (#5871 ) * Remove globals from parking spot tests Use `std:🧵:scope` to keep everything local to just the tests. * Fix a panic due to a race in `unpark` and `park` This commit fixes a panic in the `ParkingSpot` implementation where an `unpark` signal may not get acknowledged when a waiter times out, causing the waiter to remove itself from the internal map but panic thinking that it missed an unpark signal. The fix in this commit is to consume unpark signals when a timeout happens. This can lead to another possible race I've detailed in the comments which I believe is allowed by the specification of park/unpark in wasm. * Update crates/runtime/src/parking_spot.rs Co-authored-by: Andrew Brown <andrew.brown@intel.com> --------- Co-authored-by: Andrew Brown <andrew.brown@intel.com>	2 years ago
Alex Crichton	3fc3bc9ec8	x64: Fill out more AVX instructions (#5849 ) * x64: Fill out more AVX instructions This commit fills out more AVX instructions for SSE counterparts currently used. Many of these instructions do not benefit from the 3-operand form that AVX uses but instead benefit from being able to use `XmmMem` instead of `XmmMemAligned` which may be able to avoid some extra temporary registers in some cases. * Review comments	2 years ago
Trevor Elliott	8abfe928d6	Reuse the DominatorTree postorder travesal in BlockLoweringOrder (#5843 ) * Rework the blockorder module to reuse the dom tree's cfg postorder * Update domtree tests * Treat br_table with an empty jump table as multiple block exits * Bless tests * Change branch_idx to succ_idx and fix the comment	2 years ago
Ulrich Weigand	4314210162	s390x: Fix implementation of {s,u}{min,max} (#5864 ) When expanding a min/max operation to a pair of icmp + select, do not attempt to expand the input value operands twice, as this might fail with memory operands. Fixes https://github.com/bytecodealliance/wasmtime/issues/5859.	2 years ago
Afonso Bordado	fc080c739e	fuzzgen: Add `AtomicRMW` (#5861 )	2 years ago
Ulrich Weigand	9719147f91	s390x: Fix integer overflow during negation (#5866 ) Use wrapping_neg in i{64,32,16}_from_negated_value to avoid Rust aborts due to integer overflow. The resulting INT_MIN is already handled correctly in subsequent operations. Fixes https://github.com/bytecodealliance/wasmtime/issues/5863.	2 years ago
Alex Crichton	761e44bd36	Fix running WASI tests in isolation (#5865 ) Closes #5860	2 years ago
Noa	4f7746da60	Have StoreContext::data return &'a T (#5855 )	2 years ago
Andrew Brown	f6b16a7178	wasi-threads: fix use of `wait` in test (#5858 ) As @yamt points out [here], the `wait`/`notify` pairing used in this manual WAT test was not effective. The `wait` always immediately returned, meaning that the main thread essentially spins until a counter is atomically incremented. This is fine for test correctness, but was not the original intent, which was lost in a refactoring. This change uses the `$i` local to keep track of the counter value we expect to see for the `wait`, so that the `wait`/`notify` pair actually waits as expected. [here]: https://github.com/bytecodealliance/wasmtime/pull/5484#discussion_r1101200012	2 years ago
Jan-Justin van Tonder	0521155896	cranelift: Add atomic_rmw to interpreter (#5817 ) (#5856 ) As per the linked issue, atomic_rmw was implemented without specific regard for thread safety. Additionally, the relevant filetest (atomic-rmw-little.clif) was enabled and altered to fix an inccorrect call to test function `%atomic_rmw_and_i64` after setting up test function `%atomic_rmw_and_i32`.	2 years ago
Afonso Bordado	f6c6bc2155	riscv64: Improve signed and zero extend codegen (#5844 ) * riscv64: Remove unused code * riscv64: Group extend rules * riscv64: Remove more unused rules * riscv64: Cleanup existing extension rules * riscv64: Move the existing Extend rules to ISLE * riscv64: Use `sext.w` when extending * riscv64: Remove duplicate extend tests * riscv64: Use `zbb` instructions when extending values * riscv64: Use `zbkb` extensions when zero extending * riscv64: Enable additional tests for extend i128 * riscv64: Fix formatting for `Inst::Extend` * riscv64: Reverse register for pack * riscv64: Misc Cleanups * riscv64: Cleanup extend rules	2 years ago
Afonso Bordado	6e6a1034d7	riscv64: Add bitmanip extension flags (#5847 )	2 years ago
Alex Crichton	bd3dcd313d	x64: Add more `fma` instruction lowerings (#5846 ) The relaxed-simd proposal for WebAssembly adds a fused-multiply-add operation for `v128` types so I was poking around at Cranelift's existing support for its `fma` instruction. I was also poking around at the x86_64 ISA's offerings for the FMA operation and ended up with this PR that improves the lowering of the `fma` instruction on the x64 backend in a number of ways: * A libcall-based fallback is now provided for `f32x4` and `f64x2` types in preparation for eventual support of the relaxed-simd proposal. These encodings are horribly slow, but it's expected that if FMA semantics must be guaranteed then it's the best that can be done without the `fma` feature. Otherwise it'll be up to producers (e.g. Wasmtime embedders) whether wasm-level FMA operations should be FMA or multiply-then-add. * In addition to the existing `vfmadd213` instructions opcodes were added for `vfmadd132`. The `132` variant is selected based on which argument can have a sinkable load. * Any argument in the `fma` CLIF instruction can now have a `sinkable_load` and it'll generate a single FMA instruction. * All `vfnmadd*` opcodes were added as well. These are pattern-matched where one of the arguments to the CLIF instruction is an `fneg`. I opted to not add a new CLIF instruction here since it seemed like pattern matching was easy enough but I'm also not intimately familiar with the semantics here so if that's the preferred approach I can do that too.	2 years ago
Alex Crichton	d82ebcc102	x64: Enable load-coalescing for SSE/AVX instructions (#5841 ) * x64: Enable load-coalescing for SSE/AVX instructions This commit unlocks the ability to fold loads into operands of SSE and AVX instructions. This is beneficial for both function size when it happens in addition to being able to reduce register pressure. Previously this was not done because most SSE instructions require memory to be aligned. AVX instructions, however, do not have alignment requirements. The solution implemented here is one recommended by Chris which is to add a new `XmmMemAligned` newtype wrapper around `XmmMem`. All SSE instructions are now annotated as requiring an `XmmMemAligned` operand except for a new new instruction styles used specifically for instructions that don't require alignment (e.g. `movdqu`, `sd`, and `ss` instructions). All existing instruction helpers continue to take `XmmMem`, however. This way if an AVX lowering is chosen it can be used as-is. If an SSE lowering is chosen, however, then an automatic conversion from `XmmMem` to `XmmMemAligned` kicks in. This automatic conversion only fails for unaligned addresses in which case a load instruction is emitted and the operand becomes a temporary register instead. A number of prior `Xmm` arguments have now been converted to `XmmMem` as well. One change from this commit is that loading an unaligned operand for an SSE instruction previously would use the "correct type" of load, e.g. `movups` for f32x4 or `movup` for f64x2, but now the loading happens in a context without type information so the `movdqu` instruction is generated. According to [this stack overflow question][question] it looks like modern processors won't penalize this "wrong" choice of type when the operand is then used for f32 or f64 oriented instructions. Finally this commit improves some reuse of logic in the `put_in__mem` helper to share code with `sinkable_load` and avoid duplication. With this in place some various ISLE rules have been updated as well. In the tests it can be seen that AVX-instructions are now automatically load-coalesced and use memory operands in a few cases. [question]: https://stackoverflow.com/questions/40854819/is-there-any-situation-where-using-movdqu-and-movupd-is-better-than-movups * Fix tests * Fix move-and-extend to be unaligned These don't have alignment requirements like other xmm instructions as well. Additionally add some ISA tests to ensure that their output is tested. * Review comments	2 years ago
Alex Crichton	c65de1f1b1	x64: Remove conditional `SseOpcode::uses_src1` (#5842 ) This is a follow-up to comments in #5795 to remove some cruft in the x64 instruction model to ensure that the shape of an `Inst` reflects what's going to happen in regalloc and encoding. This accessor was used to handle `round`, `pextr`, and `pshufb` instructions. The `round` ones had already moved to the appropriate `XmmUnary` variant and `pshufb` was additionally moved over to that variant as well. The `pextr*` instructions got a new `Inst` variant and additionally had their constructors slightly modified to no longer require the type as input. The encoding for these instructions now automatically handles the various type-related operands through a new `SseOpcode::Pextrq` operand to represent 64-bit movements.	2 years ago
Alex Crichton	e6a5ec3fde	x64: Tidy up some handling of sinkable loads (#5840 ) This commit refactors a bit about how sinkable loads are handled in the x64 backend. The intention is to bring most handling around sinkable loads up to date with the current state of the backend since things have changed since these were originally introduced, namely automatic conversions between types in ISLE. For example the `Value` type can be automatically converted to `RegMem` to perform load sinking, but some rules are still explicitly doing matching themselves. Here I've removed explicit handling of immediates and sinkable loads when they're the right-hand-side of an operation. These cases are already handle by the "base case" when converting a `Value` to a `RegMemImm`. Instead only rules explicitly for left-hand-side immediates and sinkable loads remain. This helps cut down on the number of explicit rules needed. Additionally in the same manner that `Value` can be automatically converted to `RegMem` I've added automatic conversions from `SinkableLoad` to `RegMem` and the various other newtypes. This helps cut down a bit on rule verbosity where `sink_load_*` is largely no longer necessary.	2 years ago
Afonso Bordado	0f51338def	riscv64: Clear the top 32bits in the `br_table` index (#5831 ) We were unintentionally relying on these to be zeroed when jumping.	2 years ago
Saúl Cabrera	4d954f5c0e	winch: Add support for `<i32\|i64>.rem_*` WebAssembly instructions (#5823 ) This commit adds support for i32 and i64 remainder instructions for x64.	2 years ago
Alex Crichton	c26a65a854	x64: Add most remaining AVX lowerings (#5819 ) * x64: Add most remaining AVX lowerings This commit goes through `inst.isle` and adds a corresponding AVX lowering for most SSE lowerings. I opted to skip instructions where the SSE lowering didn't read/modify a register, such as `roundps`. I think that AVX will benefit these instructions when there's load-merging since AVX doesn't require alignment, but I've deferred that work to a future PR. Otherwise though in this PR I think all (or almost all) of the 3-operand forms of AVX instructions are supported with their SSE counterparts. This should ideally improve codegen slightly by removing register pressure and the need for `movdqa` between registers. I've attempted to ensure that there's at least one codegen test for all the new instructions. As a side note, the recent capstone integration into `precise-output` tests helped me catch a number of encoding bugs much earlier than otherwise, so I've found that incredibly useful in tests! * Move `vpinsr` instructions to their own variant Use true `XmmMem` and `GprMem` types in the instruction as well to get more type-level safety for what goes where. Remove `Inst::produces_const` accessor Instead of conditionally defining regalloc and various other operations instead add dedicated `MInst` variants for operations which are intended to produce a constant to have more clear interactions with regalloc and printing and such. * Fix tests * Register traps in `MachBuffer` for load-folding ops This adds a missing `add_trap` to encoding of VEX instructions with memory operands to ensure that if they cause a segfault that there's appropriate metadata for Wasmtime to understand that the instruction could in fact trap. This fixes a fuzz test case found locally where v8 trapped and Wasmtime didn't catch the signal and crashed the fuzzer.	2 years ago
wasmtime-publish	ad128b6811	Update release date of Wasmtime 6.0.0 (#5836 ) Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	2 years ago
Afonso Bordado	1e6c94bec1	cranelift-object: Make sections read only by default (#5619 ) This changes the default section type to be `ReadOnlyDataWithRel` instead of `Data`. On COFF types the CRT initializers do not run unless their section is read only. The new SectionKind makes these sections read only for COFF and MachO, but leaves it as Writable as required by ELF.	2 years ago
Jamey Sharp	539c42e590	Audit `object` crate update to 0.30.3 (#5827 ) This audit is needed for #5619. I'm going ahead and updating Cargo.toml and Cargo.lock at the same time because no source code changes are required for this update.	2 years ago
Saúl Cabrera	7ec925122d	winch: Add support for the `<i32\|i64>.div_` instructions (#5807 ) Refactor the structure and responsibilities of `CodeGenContext` This commit refactors how the `CodeGenContext` is used throughout the code generation process, making it easier to pass it around when more flexibility is desired in the MacroAssembler to perform the lowering of certain instructions. As of this change, the responsibility of the `CodeGenContext` is to provide an interface for operations that require an orchestration between the register allocator, the value stack and function's frame. The MacroAssembler is removed from the CodeGenContext as is passed as a dependency where needed, effectly using it as an independent code generation interface only. By giving more responsibilities to the `CodeGenContext` we can clearly separate the concerns of the register allocator, which previously did more than it should (e.g. popping values and spilling). This change ultimately allows passing in the `CodeGenContext` to the `MacroAssembler` when a given instruction cannot be generically described through a common interface. Allowing each implementation to decide the best way to lower a particular instruction. * winch: Add support for the WebAssembly `<i32\|i64>.div_*` instructions Given that some architectures have very specific requirements on how to handle division, this change uses `CodeGenContext` as a dependency to the `div` MacroAssembler instruction to ensure that each implementation can decide on how to lower the division. This approach also allows -- in architectures where division can be expressed as an ordinary binary operation -- to rely on the `CodeGenContext::i32_binop` or `CodeGenContext::i64_binop` helpers.	2 years ago
Afonso Bordado	853ff787f3	fuzzgen: Refactor name and signature generation (#5764 ) * fuzzgen: Move cranelift type generation into CraneliftArbitrary * fuzzgen: Deduplicate DataValue generation * fuzzgen: Remove unused code * fuzzgen: Pass allowed function calls into `FunctionGenerator`	2 years ago
Afonso Bordado	a7bd65d116	fuzzgen: Allow inline stackprobes for riscv64 (#5822 )	2 years ago
Trevor Elliott	a139ed6d56	Fix the postorder traversal in the DominatorTree (#5821 ) Fix the postorder traversal computed by the `DominatorTree`. It was recording nodes in the wrong order depending on the order child nodes were visited. Consider the following program: ``` function %foo2(i8) -> i8 { block0(v0: i8): brif v0, block1, block2 block1: return v0 block2: jump block1 } ``` The postorder produced by the previous implementation was: ``` block2 block1 block0 ``` Which is incorrect, as `block1` is branched to by `block2`. Changing the branch order in the function would also change the postorder result, yielding the expected order with `block1` emitted first. The problem was that when pushing successor nodes onto the stack, the old implementation would also mark them SEEN. This would then prevent them from being pushed on the stack again in the future, which is incorrect as they might be visited by other nodes that have not yet been pushed. This causes nodes to potentially show up later in the postorder traversal than they should. This PR reworks the implementation of `DominatorTree::compute` to produce an order where `block1` is always returned first, regardless of the branch order in the original program. Co-authored-by: Jamey Sharp <jsharp@fastly.com>	2 years ago
Peter Huene	4fc768df36	Fix incorrect shadowing of `world` option in component bindgen macro options. (#5813 )	2 years ago
Berkus Decker	c8fa1b845f	Fix typo (#5814 )	2 years ago
Fuu	db9efcb099	Correct some spelling errors in a comment (#5812 )	2 years ago
Alex Crichton	453330b2db	x64: Add rudimentary support for some AVX instructions (#5795 ) * x64: Add rudimentary support for some AVX instructions I was poking around Spidermonkey's wasm backend and saw that the various assembler functions used are all `v`-prefixed which look like they're intended for use with AVX instructions. I looked at Cranelift and it currently doesn't have support for many AVX-based instructions, so I figured I'd take a crack at it! The support added here is a bit of a mishmash when viewed alone, but my general goal was to take a single instruction from the SIMD proposal for WebAssembly and migrate all of its component instructions to AVX. I, by random chance, picked a pretty complicated instruction of `f32x4.min`. This wasm instruction is implemented on x64 with 4 unique SSE instructions and ended up being a pretty good candidate. Further digging about AVX-vs-SSE shows that there should be two major benefits to using AVX over SSE: Primarily AVX instructions largely use a three-operand form where two input registers are operated with and an output register is also specified. This is in contrast to SSE's predominant one-register-is-input-but-also-output pattern. This should help free up the register allocator a bit and additionally remove the need for movement between registers. * As #4767 notes the memory-based operations of VEX-encoded instructions (aka AVX instructions) do not have strict alignment requirements which means we would be able to sink loads and stores into individual instructions instead of having separate instructions. So I set out on my journey to implement the instructions used by `f32x4.min`. The first few were fairly easy. The machinst backends are already of the shape "take these inputs and compute the output" where the x86 requirement of a register being both input and output is postprocessed in. This means that the `inst.isle` creation helpers for SSE instructions were already of the correct form to use AVX. I chose to add new `rule` branches for the instruction creation helpers, for example `x64_andnps`. The new `rule` conditionally only runs if AVX is enabled and emits an AVX instruction instead of an SSE instruction for achieving the same goal. This means that no lowerings of clif instructions were modified, instead just new instructions are being generated. The VEX encoding was previously not heavily used in Cranelift. The only current user are the FMA-style instructions that Cranelift has at this time. These FMA instructions have one extra operand than `vandnps`, for example, so I split the existing `XmmRmRVex` into a few more variants to fit the shape of the instructions that needed generating for `f32x4.min`. This was accompanied then with more AVX opcode definitions, more emission support, etc. Upon implementing all of this it turned out that the test suite was failing on my machine due to the memory-operand encodings of VEX instructions not being supported. I didn't explicitly add those in myself but some preexisting RIP-relative addressing was leaking into the new instructions with existing tests. I opted to go ahead and fill out the memory addressing modes of VEX encoding to get the tests passing again. All-in-all this PR adds new instructions to the x64 backend for a number of AVX instructions, updates 5 existing instruction producers to use AVX instructions conditionally, implements VEX memory operands, and adds some simple tests for the new output of `f32x4.min`. The existing runtest for `f32x.min` caught a few intermediate bugs along the way and I additionally added a plain `target x86_64` to that runtest to ensure that it executes with and without AVX to test the various lowerings. I'll also note that this, and future support, should be well-fuzzed through Wasmtime's fuzzing which may explicitly disable AVX support despite the machine having access to AVX, so non-AVX lowerings should be well-tested into the future. It's also worth mentioning that I am not an AVX or VEX or x64 expert. Implementing the memory operand part for VEX was the hardest part of this PR and while I think it should be good someone else should definitely double-check me. Additionally I haven't added many instructions to the x64 backend yet so I may have missed obvious places to tests or such, so am happy to follow-up with anything to be more thorough if necessary. Finally I should note that this is just the tip of the iceberg when it comes to AVX. My hope is to get some of the idioms sorted out to make it easier for future PRs to add one-off instruction lowerings or such. * Review feedback	2 years ago
Alex Crichton	f8ca67cdc6	Handle failures in the `determine` step (#5810 ) I saw some PRs fail this step earlier today due to rate limits but it ended up not failing the entire PR's CI due to it not being listed in the final set of dependencies, so add it there.	2 years ago
Alex Crichton	44fa189575	Run labeler triage once an hour (#5811 ) Instead of every 5 minutes since this seems to eat through the rate limits pretty quickly if it ends up running.	2 years ago
Trevor Elliott	d711872d63	Refactor collect_branches_and_targets to not need a smallvec (#5803 ) * Refactor collect_branches_and_targets to not need a smallvec Basic blocks are terminated by at most one branch instruction now, so we can use that assumption in `collect_branches_and_targets` to return the last instruction we saw instead. * Review comments	2 years ago
Chris Fallin	c7e2571866	egraphs: disable GVN of effectful idempotent ops (temporarily). (#5808 ) This is a short-term fix to the same bug that #5800 is addressing (#5796), but with less risk: it simply turns off GVN'ing of effectful but idempotent ops. Because we have an upcoming release, and this is a miscompile (albeit to do with trapping behavior), we would like to make the simplest possible fix that avoids the bug, and backport it. I will then rebase #5800 on top of a revert of this followed by the more complete fix.	2 years ago
Alex Crichton	3ce439ce57	Update PR tests slightly and when what runs where (#5805 ) * Don't run LLDB tests on PRs These take an extra minute or so, so only run them on the full test suite of a merge instead of on all PRs as well. * Add a test for the x64 isa files This guarantees that if cranelift's x64 backend is modified that the tests will be run on a PR, even if other backends were also modified.	2 years ago
Alex Crichton	65633db244	Only run deny/vet on CI if `Cargo.lock` changes (#5806 ) These mostly only validate changes to `Cargo.lock` so skip these checks by default on PRs which generally never need to trigger them. If `Cargo.lock` changes, however, then run them for PRs.	2 years ago
Alex Crichton	cae3b26623	x64: Improve codegen for vectors with constant shift amounts (#5797 ) I stumbled across this working on #5795 and figured this was a nice opportunity to improve the codegen here.	2 years ago
Alex Crichton	1efee4abdf	Update CI to use GitHub's Merge Queue (#5766 ) GitHub recently made its merge queue feature available for use in public repositories owned by organizations meaning that the Wasmtime repository is a candidate for using this. GitHub's Merge Queue feature is a system that's similar to Rust's bors integration where PRs are tested before merging and only passing PRs are merged. This implements the "not rocket science" rule where the `main` branch of Wasmtime, for example, is always tested and passes CI. This is in contrast to our current implementation of CI where PRs are merged when they pass their own CI, but the code that was tested is not guaranteed to be the state of `main` when the PR is merged, meaning that we're at risk now of a failing `main` branch despite all merged PRs being green. While this has happened with Wasmtime this is not a common occurrence, however. The main motivation, instead, to use GitHub's Merge Queue feature is that it will enable Wasmtime to greatly reduce the amount of CI running on PRs themselves. Currently the full test suite runs on every push to every PR, meaning that our workers on GitHub Actions are frequently clogged throughout weekdays and PRs can take quite some time to come back with a successful run. Through the use of a Merge Queue, however, we're able to configure only a small handful of checks to run on PRs while deferring the main body of checks to happening on the merge-via-the-queue itself. This is hoped to free up capacity on CI and overall improve CI times for Wasmtime and Cranelift developers. The implementation of all of this required quite a lot of plumbing and retooling of our CI. I've been testing this in an [external repository][testrepo] and I think everything is working now. A list of changes made in this PR are: * The `build.yml` workflow is merged back into the `main.yml` workflow as the original reason to split it out is not longer applicable (it'll run on all merges). This was also done to fit in the dependency graph of jobs of one workflow. * Publication of the `gh-pages` branch, the `dev` tag artifacts, and release artifacts have been moved to a separate `publish-artifacts.yml` workflow. This workflow runs on all pushes to `main` and all tags. This workflow no longer actually preforms any builds, however, and relies on a merge queue or similar being used for branches/tags where artifacts are downloaded from the workflow run to be uploaded. For pushes to `main` this works because a merge queue is run meaning that by the time the push happens all artifacts are ready. For release branches this is handled by.. * The `push-tag.yml` workflow is subsumed by the `main.yml` workflow. CI for a tag being pushed will upload artifacts to a release in GitHub, meaning that all builds must finish first for the commit. The `main.yml` workflow at the end now scans commits for the preexisting magical marker and pushes a tag if necessary. * CI is currently a flat list of "run all these jobs" and this is now rearchitected to a "fan out" approach where some jobs run to determine the next jobs to run which then get "joined" into a finish step. The purpose for this is somewhat nuanced and this has implications for CI runtime as well. The Merge Queue feature requires branches to be protected with "these checks must pass" and then the same checks are gates both to enter the merge queue as well as pass the merge queue. The saving grace, however, is that a "skipped" check counts as passing, meaning checks can be skipped on PRs but run to completion on the merge queue. A problem with this though is the build matrix used for tests where PRs want to only run one element of the build matrix ideally but there's no means on GitHub Actions right now for the skipped entries to show up as skipped easily (or not that I know of). This means that the "join" step serves the purpose of being the single gate for both PR and merge queue CI and there's just more inputs to it for merge queue CI. The major consequence of this decision is that GitHub's actions scheduling doesn't work out well here. Jobs are scheduled in a FIFO order meaning that the job for "ok complete the CI run" is queued up after everything else has completed, possibly after lots of other CI requests in the middle for other PRs. The hope here is that by using a merge queue we can keep CI relatively under control and this won't affect merge times too much. * All jobs in the `main.yml` workflow will not automatically cancel the entire run if they fail. Previously this fail-fast behavior was only part of the matrix runs (and just for that matrix), but this is required to make the merge queue expedient. The gate of the merge queue is the final "join" step which is only executed once all dependencies have finished. This means, for example, that if rustfmt fails quickly then the tests which take longer might run for quite awhile before the join step reports failure, meaning that the PR sits in the queue for longer than needed being tested when we know it's already going to fail. By having all jobs cancel the run this means that failures immediately bail out and mark the whole job as cancelled. * A new "determine" CI job was added to determine what CI actually needs to run. This is a "choke point" which is scheduled at the start of CI that quickly figures out what else needs to be run. This notably indicates whether large swaths of ci (the `run-full` flag) like the build matrix are executed. Additionally this dynamically calculates a matrix of tests to run based on a new `./ci/build-test-matrix.js` script. Various inputs are considered for this such as: 1. All pushes, meaning merge queue branches or release-branch merges, will run full CI. 2. PRs to release branches will run full CI. 3. PRs to `main`, the most common, determine what to run based on what's modified and what's in the commit message. Some examples for (3) above are if modifications are made to `cranelift/codegen/src/isa/` then that corresponding builder is executed on CI. If the `crates/c-api` directory is modified then the CMake-based tests are run on PRs but are otherwise skipped. Annotations in commit messages such as `prtest:` can be used to explicitly request testing. Before this PR merges to `main` would perform two full runs of CI: one on the PR itself and one on the merge to `main`. Note that the one as a merge to `main` was quite frequently cancelled due to a merge happening later. Additionally before this PR there was always the risk of a bad merge where what was merged ended up creating a `main` that failed CI to to a non-code-related merge conflict. After this PR merges to `main` will perform one full run of CI, the one as part of the merge queue. PRs themselves will perform one test job most of the time otherwise. The `main` branch is additionally always guaranteed to pass tests via the merge queue feature. For release branches, before this PR merges would perform two full builds - one for the PR and one for the merge. A third build was then required for the release tag itself. This is now cut down to two full builds, one for the PR and one for the merge. The reason for this is that the merge queue feature currently can't be used for our wildcard-based `release-` branch protections. It is now possible, however, to turn on required CI checks for the `release-` branch PRs so we can at least have a "hit the button and forget" strategy for merging PRs now. Note that this change to CI is not without its risks. The Merge Queue feature is still in beta and is quite new for GitHub. One bug that Trevor and I uncovered is that if a PR is being tested in the merge queue and a contributor pushes to their PR then the PR isn't removed from the merge queue but is instead merged when CI is successful, losing the changes that the contributor pushed (what's merged is what was tested). We suspect that GitHub will fix this, however. Additionally though there's the risk that this may increase merge time for PRs to Wasmtime in practice. The Merge Queue feature has the ability to "batch" PRs together for a merge but this is only done if concurrent builds are allowed. This means that if 5 PRs are batched together then 5 separate merges would be created for the stack of 5 PRs. If the CI for all 5 merged together passes then everything is merged, otherwise a PR is kicked out. We can't easily do this, however, since a major purpose for the merge queue for us would be to cut down on usage of CI builders meaning the max concurrency would be set to 1 meaning that only one PR at a time will be merged. This means PRs may sit in the queue for awhile since previously many `main`-based builds are cancelled due to subsequent merges of other PRs, but now they must all run to 100% completion. [testrepo]: https://github.com/bytecodealliance/wasmtime-merge-queue-testing	2 years ago
Trevor Elliott	80c147d9c0	Rework br_table to use BlockCall (#5731 ) Rework br_table to use BlockCall, allowing us to avoid adding new nodes during ssa construction to hold block arguments. Additionally, many places where we previously matched on InstructionData to extract branch destinations can be replaced with a use of branch_destination or branch_destination_mut.	2 years ago
Andrew Brown	c3c16eb207	wasi-threads: build the crate in the CLI application by default (#5782 ) This change adds the `wasmtime-wasi-threads` crate as a default crate for the CLI application. This is no change for embedders of Wasmtime: they would still have to include `wasmtime-wasi-threads` manually. Enabling the crate by default in the CLI application has several benefits, e.g., that it is simpler to experiment with and that it will be part of more test runs (and thus bugs can be discovered more quickly). Users will still have to add `--wasi-modules=experimental-wasi-threads` to enable wasi-threads on the command line.	2 years ago
Pat Hickey	d30ce3192b	wasmtime::component re-exports all Val variant types. Closes #5788 (#5790 )	2 years ago
Chris Fallin	c15c4ed23d	Cranelift: upgrade to regalloc2 0.6.1. (#5799 ) * Cranelift: upgrade to regalloc2 0.6.1. Fixes #5791 by pulling in bytecodealliance/regalloc2#113. * Add cargo-vet entry for regalloc2 0.6.1.	2 years ago
Trevor Elliott	cc073593a4	Fix block label printing in precise-output tests (#5798 ) As a follow-up to #5780, disassemble the regions identified by bb_starts, falling back on disassembling the whole buffer. This ensures that instructions like br_table that introduce a lot of constants don't throw off capstone for the remainder of the function. --------- Co-authored-by: Jamey Sharp <jamey@minilop.net>	2 years ago
Trevor Elliott	f04decc4a1	Use capstone to validate precise-output tests (#5780 ) Use the capstone library to disassemble precise-output tests, in addition to pretty-printing their vcode.	2 years ago
Afonso Bordado	eabd43a178	aarch64: Support GOT Relative relocations in PIC mode (#5550 ) * cranelift: Add `adrp` encoding to AArch64 backend * cranelift: Support GOT Symbol References in AArch64 * cranelift: Add MachO GOT relocations * cranelift: Do not mark the GOT PageOffset12 MachO relocation as relative	2 years ago

1 2 3 4 5 ...

10919 Commits (4c88acbb897c3843c5f0f11eb0f1b0729656e18b) All Branches Search

10919 Commits (4c88acbb897c3843c5f0f11eb0f1b0729656e18b)

All Branches