cranelift

Commit Graph

Author	SHA1	Message	Date
Ulrich Weigand	e1f7b50a12	Add ISA flag detection for s390x (#4101 ) Adds support for s390x to check_compatible_with_isa_flag, which fixes running the test suite on z15 and later.	3 years ago
Alex Crichton	7fdc616368	Remove the `Paged` memory initialization variant (#4046 ) * Remove the `Paged` memory initialization variant This commit simplifies the `MemoryInitialization` enum by removing the `Paged` variant. The `Paged` variant was originally added for uffd, but that support has now been removed in #4040. This is no longer necessary but is still used as an intermediate step of becoming a `Static` variant of initialized memory (which copy-on-write uses). As a result this commit largely modifies the static initialization of memory steps and folds the two methods together. * Apply suggestions from code review Co-authored-by: Peter Huene <peter@huene.dev> Co-authored-by: Peter Huene <peter@huene.dev>	3 years ago
Andrew Brown	5c3642fcb1	bench-api: configure execution with a flags string (#4096 ) As discussed previously, we need a way to be able to configure Wasmtime when running it in the Sightglass benchmark infrastructure. The easiest way to do this seemed to be to pass a string from Sightglass to the `bench-api` library and parse this in the same way that Wasmtime parses its CLI flags. The structure that contains these flags is `CommonOptions`, so it has been moved to its own crate to be depended on by both `wasmtime-cli` and `wasmtime-bench-api`. Also, this change adds an externally-visible function for parsing a string into `CommonOptions`, which is used for configuring an engine.	3 years ago
Andrew Brown	527b7a9b05	x64: add test for #3744 (#4095 ) In #3744, we identified that extra `mov` instructions were inserted in between the `cmov` instructions that CLIF's `select` lowers to. The switch to regalloc2 resolved this and this test checks that no intervening `mov`s are inserted. Closes #3744.	3 years ago
Chris Fallin	019ebf47b1	x64: fix pretty-printing argument order for XmmRmR instructions. (#4094 ) The pretty-printing had swapped dst and src2; this was introduced when we moved to RA2 (sorry about that! IMHO we should do something to automate the mapping between regalloc arg collection and pretty printing/emission). `src2` comes at the end because it has a variable number of register mentions; this is in line with how many of the other inst formats work. Actual emitted code was never incorrect, just the pretty-printing. Updated test golden outputs look correct to me now, including the one that we saw was incorrect in #3945.	3 years ago
Chris Fallin	2122337112	ISLE compiler: fix priority-trie interval bug. (#4093 ) This PR fixes a bug in the ISLE compiler related to rule priorities. An important note first: the bug did not affect the correctness of the Cranelift backends, either in theory (because the rules should be correct applied in any order, even contrary to the stated priorities) or in practice (because the generated code actually does not change at all with the DSL compiler fix, only with a separate minimized bug example). The issue was a simple swap of `min` for `max` (see first commit). This is the minimal fix, I think, to get a correct priority-trie with the minimized bug example in this commit. However, while debugging this, I started to convince myself that the complexity of merging multiple priority ranges using the sort of hybrid interval tree / string-matching trie data structure was unneeded. The original design was built with the assumption we might have a bunch of different priority levels, and would need the efficiency of merging where possible. But in practice we haven't used priorities this way: the vast majority of lowering rules exist at the default (priority 0), and just a few overrides are explicitly at prio 1, 2 or (rarely) 3. So, it turns out to be a lot simpler to label trie edges with (prio, symbol) rather than (prio-range, symbol), and delete the whole mess of interval-splitting logic on insertion. It's easier (IMHO) to convince oneself that the resulting insertion algorithm is correct. I was worried that this might impact the size of the generated Rust code or its runtime, but In fact, to my initial surprise (but it makes sense given the above "rarely used" factor), the generated code with this compiler fix is exactly the same. I rebuilt with `--features rebuild-isle,all-arch` but... there were no diffs to commit! This is to me the simplest evidence that we didn't really need that complexity.	3 years ago
Chris Fallin	f85047b084	Rework x64 addressing-mode lowering to be slightly more flexible. (#4080 ) This PR refactors the x64 backend address-mode lowering to use an incremental-build approach, where it considers each node in a tree of `iadd`s that feed into a load/store address and, at each step, builds the best possible `Amode`. It will combine an arbitrary number of constant offsets (an extension beyond the current rules), and can capture a left-shifted (scaled) index in any position of the tree (another extension). This doesn't have any measurable performance improvement on our Wasm benchmarks in Sightglass, unfortunately, because the IR lowered from wasm32 will do address computation in 32 bits and then `uextend` it to add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64 bits because this loses the wraparound semantics. (We could label adds as "expected not to overflow", and allow those to be lifted to 64 bit operations; wasm32 heap address computation should fit this. This is `add nuw` (no unsigned wrap) in LLVM IR terms. That's likely my next step.) Nevertheless, (i) this generalizes the cases we can handle, which should be a good thing, all other things being equal (and in this case, no compile time impact was measured); and (ii) might benefit non-Wasm frontends.	3 years ago
Chris Fallin	61dc38c065	Implement Spectre mitigations for table accesses and br_tables. (#4092 ) Currently, we have partial Spectre mitigation: we protect heap accesses with dynamic bounds checks. Specifically, we guard against errant accesses on the misspeculated path beyond the bounds-check conditional branch by adding a conditional move that is also dependent on the bounds-check condition. This data dependency on the condition is not speculated and thus will always pick the "safe" value (in the heap case, a NULL address) on the misspeculated path, until the pipeline flushes and recovers onto the correct path. This PR uses the same technique both for table accesses -- used to implement Wasm tables -- and for jumptables, used to implement Wasm `br_table` instructions. In the case of Wasm tables, the cmove picks the table base address on the misspeculated path. This is equivalent to reading the first table entry. This prevents loads of arbitrary data addresses on the misspeculated path. In the case of `br_table`, the cmove picks index 0 on the misspeculated path. This is safer than allowing a branch to an address loaded from an index under misspeculation (i.e., it preserves control-flow integrity even under misspeculation). The table mitigation is controlled by a Cranelift setting, on by default. The br_table mitigation is always on, because it is part of the single lowering pseudoinstruction. In both cases, the impact should be minimal: a single extra cmove in a (relatively) rarely-used operation. The table mitigation is architecture-independent (happens during legalization); the br_table mitigation has been implemented for both x64 and aarch64. (I don't know enough about s390x to implement this confidently there, but would happily review a PR to do the same on that platform.)	3 years ago
Chris Fallin	03793b71a7	ISLE: remove all uses of argument polarity, and remove it from the language. (#4091 ) This PR removes "argument polarity": the feature of ISLE extractors that lets them take inputs aside from the value to be matched. Cases that need this expressivity have been subsumed by #4072 with if-let clauses; we can now finally remove this misfeature of the language, which has caused significant confusion and has always felt like a bit of a hack. This PR (i) removes the feature from the ISLE compiler; (ii) removes it from the reference documentation; and (iii) refactors away all uses of the feature in our three existing backends written in ISLE.	3 years ago
Chris Fallin	c7e2c21bb2	ISLE language reference: move subsection to proper section. (#4087 ) In #4072 I mistakenly put the subsection about if-let clauses in the language doc just below the next section header, so it's in the wrong section ("mapping to Rust"). This moves it back upward to where it should be. Sorry about that!	3 years ago
Andrew Brown	3dbdcfa220	runtime: refactor `Memory` to always use `Box<dyn RuntimeLinearMemory>` (#4086 ) While working with the runtime `Memory` object, it became clear that some refactoring was needed. In order to implement shared memory from the threads proposal, we must be able to atomically change the memory size. Previously, the split into variants, `Memory::Static` and `Memory::Dynamic`, made any attempt to lock forced us to duplicate logic in various places. This change moves `enum Memory { Static..., Dynamic... }` to simply `struct Memory(Box<dyn RuntimeLinearMemory>)`. A new type, `ExternalMemory`, takes the place of `Memory::Static` and also implements the `RuntimeLinearMemory` trait, allowing `Memory` to contain the same two options as before: `MmapMemory` for `Memory::Dynamic` and `ExternalMemory` for `Memory::Static`. To interface with the `PoolingAllocator`, this change also required the ability to downcast to the internal representation.	3 years ago
Chris Fallin	5b7d56f6f7	ISLE: add support for extended left-hand sides with `if-let` clauses. (#4072 ) This PR adds support for `if-let` clauses, as proposed in bytecodealliance/rfcs#21. These clauses augment the left-hand side (pattern-matching phase) of rules in the ISLE instruction-selection DSL with sub-patterns matching on sub-expressions. The ability to list additional match operations to perform, out-of-line from the original toplevel pattern, greatly simplifies some idioms. See the RFC for more details and examples of use.	3 years ago
George Kulakowski	128c42fa09	Discuss pollers at the 2022/05/12 wasmtime meeting (#4083 )	3 years ago
Chris Fallin	eceb433b28	Remove `=x` uses from ISLE, and remove support from the DSL compiler. (#4078 ) This is a follow-up on #4074: now that we have the simplified syntax, we can remove the old, redundant syntax.	3 years ago
Chris Fallin	477d394288	ISLE: handle out-of-order extern converter decls. (#4079 ) This fixes a bug where the ISLE compiler would refuse to accept out-of-order declarations in the order of: (i) use of an implicit conversion backed by an extern constructor; (ii) extern declaration for that constructor. The issue was one of phase separation: we were capturing and noting "extern constructor" status on terms in the same pass in which we were typechecking and resolving implicit conversions. Given this knowledge, the fix is straightforward: externs are picked up in a prior pass.	3 years ago
Chris Fallin	936f4efd6a	Meeting notes from Wasmtime meeting on 2022-04-28. (#4084 )	3 years ago
Nick Fitzgerald	7cbfb39047	Remove old peepmatic source file (#4085 ) Peepmatic has long since been removed, we have ISLE now.	3 years ago
Alex Crichton	5fe06f7345	Update to clap 3.* (#4082 ) * Update to clap 3.0 This commit migrates all CLI commands internally used in this project from structopt/clap2 to clap 3. The intent here is to ensure that we're using maintained versions of the dependencies as structopt and clap 2 are less maintained nowadays. Most transitions were pretty straightforward and mostly dealing with structopt/clap3 differences. * Fix a number of `cargo deny` errors This commit fixes a few errors around duplicate dependencies which arose from the prior update to clap3. This also uses a new feature in `deny.toml`, `skip-tree`, which allows having a bit more targeted ignores for skips of duplicate version checks. This showed a few more locations in Wasmtime itself where we could update some dependencies.	3 years ago
Alex Crichton	871a9d93f2	Update some dependencies in `Cargo.lock` (#4081 ) * Run a `cargo update` over our dependencies This'll notably fix a `cargo audit` error where we have a pinned version of the `regex` crate which has a CVE assigned to it. * Update to `object` and `hashbrown` crates Prune some duplicate versions showing up from the previous `cargo update`	3 years ago
Anton Kirilov	a1e4b4b521	Enable AArch64 processor feature detection unconditionally (#4034 ) std::arch::is_aarch64_feature_detected!() is now part of stable Rust, so we can always use it. Copyright (c) 2022, Arm Limited.	3 years ago
Chris Fallin	b69fede72f	ISLE: add support for implicit `=x` variable matchers. (#4074 ) Currently, a variable can be named in two different ways in an ISLE pattern. One can write a pattern like `(T x y)` that binds the two args of `T` with the subpatterns `x` and `y`, each of which match anything and capture the value as a bound variable. Or, one can write a pattern like `(T x =x)`, where the first arg pattern `x` captures the value in `x` and the second arg pattern `=x` matches only the same value that was already captured. It turns out (thanks to @fitzgen for this insight here [1]) that this distinction can actually be inferred easily: if `x` isn't bound, then mentioning it binds it; otherwise, it matches only the already-bound variable. There's no concern about ordering (one mention binding vs. the other) because (i) the value is equal either way, and (ii) the types at both sites must be the same. This language tweak seems like it should simplify things nicely! We can remove the `=x` syntax later if we want, but this PR doesn't do so. [1] https://github.com/bytecodealliance/wasmtime/pull/4071#discussion_r859111513	3 years ago
Sam Parker	12b4374cd5	[AArch64] Port atomic rmw to ISLE (#4021 ) Also fix and extend the current implementation: - AtomicRMWOp::Clr != AtomicRmwOp::And, as the input needs to be inverted first. - Inputs to the cmp for the RMWLoop case are sign-extended when needed. - Lower Xchg to Swp. - Lower Sub to Add with a negated input. - Added more runtests. Copyright (c) 2022, Arm Limited.	3 years ago
Chris Fallin	8381179503	Cranelift meeting: cancel May 2. (#4073 ) This is a public holiday for a number of our regular attendees [1] so we will go ahead and cancel -- talk to you all on May 16! [1] https://bytecodealliance.zulipchat.com/#narrow/stream/217117-cranelift/topic/meeting.20right.20now.3F/near/280195782	3 years ago
Chris Fallin	dd45f44511	x64 backend: add lowerings with load-op-store fusion. (#4071 ) x64 backend: add lowerings with load-op-store fusion. These lowerings use the `OP [mem], reg` forms (or in AT&T syntax, `OP %reg, (mem)`) -- i.e., x86 instructions that load from memory, perform an ALU operation, and store the result, all in one instruction. Using these instruction forms, we can merge three CLIF ops together: a load, an arithmetic operation, and a store.	3 years ago
Chris Fallin	164bfeaf7e	x64 backend: migrate stores, and remainder of loads (I128 case), to ISLE. (#4069 )	3 years ago
Chris Fallin	f384938a10	x64 backend: fix a load-op merging bug with integer min/max. (#4068 ) The recent work in #4061 introduced a notion of "unique uses" for CLIF values that both simplified the load-op merging rules and allowed loads to merge in some more places. Unfortunately there's one factor that PR didn't account for: a unique use at the CLIF level could become a multiple-use at the VCode level, when a lowering uses a value multiple times! Making this less error-prone in general is hard, because we don't know the lowering in VCode until it's emitted, so we can't ahead-of-time know that a value will be used multiple times and prevent its merging. But we can know in the lowerings themselves when we're doing this. At least we get a panic from regalloc when we get this wrong; no bad code (uninitialized register being read) should ever come from a backend bug like this. This is still a bit less than ideal, but for now the fix is: in `cmp_and_choose` in the x64 backend (which compares values, then picks one or the other with a cmove), explicitly put values in registers. Fixes #4067 (thanks @Mrmaxmeier for the report!).	3 years ago
Chris Fallin	e4b7c8a737	Cranelift: fix #3953 : rework single/multiple-use logic in lowering. (#4061 ) * Cranelift: fix #3953: rework single/multiple-use logic in lowering. This PR addresses the longstanding issue with loads trying to merge into compares on x86-64, and more generally, with the lowering framework falsely recognizing "single uses" of one op by another (which would normally allow merging of side-effecting ops like loads) when there is indirect duplication. To fix this, we replace the direct `value_uses` count with a transitive notion of uniqueness (not unlike Rust's `&`/`&mut` and how a `&mut` downgrades to `&` when accessed through another `&`!). A value is used multiple times transitively if it has multiple direct uses, or is used by another op that is used multiple times transitively. The canonical example of badness is: ``` v1 := load v2 := ifcmp v1, ... v3 := selectif v2, ... v4 := selectif v2, ... ``` both `v3` and `v4` effectively merge the `ifcmp` (`v2`), so even though the use of `v1` is "unique", it is codegenned twice. This is why we ~~can't have nice things~~ can't merge loads into compares (#3953). There is quite a subtle and interesting design space around this problem and how we might solve it. See the long doc-comment on `ValueUseState` in this PR for more justification for the particular design here. In particular, this design deliberately simplifies a bit relative to an "optimal" solution: some uses can become unique depending on merging, but we don't design our data structures for such updates because that would require significant extra costly tracking (some sort of transitive refcounting). For example, in the above, if `selectif` somehow did not merge `ifcmp`, then we would only codegen the `ifcmp` once into its result register (and use that register twice); then the load is uniquely used, and could be merged. But that requires transitioning from "multiple use" back to "unique use" with careful tracking as we do pattern-matching, which I've chosen to make out-of-scope here for now. In practice, I don't think it will matter too much (and we can always improve later). With this PR, we can now re-enable load-op merging for compares. A subsequent commit does this. * Update x64 backend to allow load-op merging for `cmp`. * Update filetests. * Add test for cmp-mem merging on x64. * Comment fixes. * Rework ValueUseState analysis for better performance. * Update s390x filetest: iadd_ifcout cannot merge loads anymore because it has multiple outputs (ValueUseState limitation) * Address review comments.	3 years ago
Johnnie Birch	6a36a1d15d	X64: Port Sqrt to ISLE (#4065 )	3 years ago
Alex Crichton	99e9e1395d	Update more workflows to only this repository (#4062 ) * Update more workflows to only this repository This adds `if: github.repository == 'bytecodealliance/wasmtime'` to a few more workflows related to the release process which should only run in this repository and no other (e.g. forks). * Also only run verify-publish in the upstream repo No need for local deelopment to be burdened with ensuring everything is actually publish-able, that's just a concern for the main repository. * Gate workflows which need secrets on this repository	3 years ago
wasmtime-publish	5c2db166f1	Update release date of Wasmtime 0.36.0 (#4057 ) [skip ci] Co-authored-by: Wasmtime Publish <wasmtime-publish@users.noreply.github.com>	3 years ago
Alex Crichton	bea0433b54	Fix the release process's latest step (#4055 ) * Fix the release process's latest step The automated release of 0.36.0 was attempted last night but it failed due to a [failure on CI][bad]. This failure comes about because it was trying to change the release date of 0.35.0 which ended up not modifying any fails so `git` failed to commit as no files were changed. The original bug though was that 0.35.0 was being changed instead of 0.36.0. The reason for this is that the script used `--sort=-committerdate` to determine the latest branch. I forgot, though, that with backports it's possible for 0.35.0 to have a more recent commit date than 0.36.0 (as is currently the case). This commit updates the script to perform a numerical sort outside of git to get the latest release branch. Additionally this adds in some `set -ex` commands for the shell which should help print out commands as they're run and assist in future debugging. [bad]: https://github.com/bytecodealliance/wasmtime/runs/6087188708 * Replace sed with rust	3 years ago
Alex Crichton	1eed0bcb89	Add some release notes for 0.37.0 (#4056 ) I was poking at the release process so figured I'd do some release notes touch-up as well.	3 years ago
Dan Gohman	321124ad21	Update to rustix 0.33.7. (#4052 ) This pulls in the fix for bytecodealliance/rustix#285, which fixes a failure in the WASI `time` APIs on powerpc64.	3 years ago
Nick Fitzgerald	428958bf49	Add notes from cranelift meeting 2022-04-18 (#4053 )	3 years ago
Alex Crichton	90791a0e32	Reduce contention on the global module rwlock (#4041 ) * Reduce contention on the global module rwlock This commit intendes to close #4025 by reducing contention on the global rwlock Wasmtime has for module information during instantiation and dropping a store. Currently registration of a module into this global map happens during instantiation, but this can be a hot path as embeddings may want to, in parallel, instantiate modules. Instead this switches to a strategy of inserting into the global module map when a `Module` is created and then removing it from the map when the `Module` is dropped. Registration in a `Store` now preserves the entire `Module` within the store as opposed to trying to only save it piecemeal. In reality the only piece that wasn't saved within a store was the `TypeTables` which was pretty inconsequential for core wasm modules anyway. This means that instantiation should now clone a singluar `Arc` into a `Store` per `Module` (previously it cloned two) with zero managemnt on the global rwlock as that happened at `Module` creation time. Additionally dropping a `Store` again involves zero rwlock management and only a single `Arc` drop per-instantiated module (previously it was two). In the process of doing this I also went ahead and removed the `Module::new_with_name` API. This has been difficult to support historically with various variations on the internals of `ModuleInner` because it involves mutating a `Module` after it's been created. My hope is that this API is pretty rarely used and/or isn't super important, so it's ok to remove. Finally this change removes some internal `Arc` layerings that are no longer necessary, attempting to use either `T` or `&T` where possible without dealing with the overhead of an `Arc`. Closes #4025 * Move back to a `BTreeMap` in `ModuleRegistry`	3 years ago
Alex Crichton	3394c2bb91	Reduce clones of `Arc<HostFunc>` during instantiation (#4051 ) This commit implements an optimization to help improve concurrently creating instances of a module on many threads simultaneously. One bottleneck to this measured has been the reference count modification on `Arc<HostFunc>`. Each host function stored within a `Linker<T>` is wrapped in an `Arc<HostFunc>` structure, and when any of those host functions are inserted into a store the reference count is incremented. When the store is dropped the reference count is then decremented. This ends up meaning that when a module imports N functions it ends up doing 2N atomic modifications over the lifetime of the instance. For embeddings where the `Linker<T>` is rarely modified but instances are frequently created this can be a surprising bottleneck to creating many instances. A change implemented here is to optimize the instantiation process when using an `InstancePre<T>`. An `InstancePre` serves as an opportunity to take the list of items used to instantiate a module and wrap them all up in an `Arc<[T]>`. Everything is going to get cloned into a `Store<T>` anyway so to optimize this the `Arc<[T]>` is cloned at the top-level and then nothing else is cloned internally. This continues to, however, preserve a strong reference count for all contained items to prevent them from being deallocated. A new variant of `FuncKind` was added for host functions which is effectively stored via `*mut HostFunc`. This variant is unsafe to create and manage and has been documented internally. Performance-wise the overall impact of this change is somewhat minor. It's already a bit esoteric if this atomic increment and decrement are a bottleneck due to the number of concurrent instances being created. In my measurements I've seen that this can reduce instantiation time by up to 10% for a module that imports two dozen functions. For larger modules with more imports this is expected to have a larger win.	3 years ago
Piotr Sikora	19fe0878cb	c-api: add missing bcrypt.lib dependency in docs. (#4049 ) Signed-off-by: Piotr Sikora <piotrsikora@google.com>	3 years ago
Piotr Sikora	b9de8eb3e2	docs: SIMD proposal is enabled by default. (#4050 ) Missed in #3601. Signed-off-by: Piotr Sikora <piotrsikora@google.com>	3 years ago
Nick Fitzgerald	988d6ef9ac	fuzzing: Combine the `compile` and `compile-mutate` fuzz targets (#4048 ) We should still get the same amount of fuzzing using libfuzzer's mutators and using `wasm-mutate` as a mutator now, but they can share the same corpus, allowing mutations that one performed but the other didn't to reach new areas.	3 years ago
Chris Fallin	65b694f6c2	Turn on the regalloc2 checker in the `compile` fuzz target. (#4047 ) This tells Cranelift to run regalloc2's symbolic verifier on the results of register allocation after compiling each function. We already fuzz regalloc2 independently, but that provides coverage using regalloc2's purpose-built (synthetic) `Function` implementation. This fuzz target with this change, in contrast, exercises regalloc2 with whatever particular details of generated code Cranelift generates. Testing the whole pipeline together and ensuring that the register allocation is still valid is at least as important as fuzzing regalloc2 independently, IMHO. Fuzzed locally for a brief time (~10M inputs) to smoke-test; let's see what oss-fuzz can find (hopefully it's boring)!	3 years ago
Chris Fallin	0af8737ec3	Add support for running the regalloc2 checker. (#4043 ) With these fixes, all this PR has to do is instantiate and run the checker on the `regalloc2::Output`. This is off by default, and is enabled by setting the `regalloc_checker` Cranelift option. This restores the old functionality provided by e.g. the `backtracking_checked` regalloc algorithm setting rather than `backtracking` when we were still on regalloc.rs.	3 years ago
Alex Crichton	534e4263ce	Use tokio::test instead of `dummy_waker` in tests (#3975 ) Currently wasmtime's async tests use a mixture of `#[tokio::test]` and `dummy_waker`. To be consistent this tries to move all tests possible to `#[tokio::test]` and just a two need to keep using `dummy_waker` (no renamed to `noop_waker`) due to what they're testing.	3 years ago
Chris Fallin	5aa9bdc7eb	Cranelift: fix fuzzbug in critical-edge splitting. (#4044 ) regalloc2 is a bit pickier about critical edges than regalloc.rs was, because of how it inserts moves. In particular, if a branch has any arguments (e.g., a conditional branch or br_table), its successors must all have only one predecessor, so we can do edge moves at the top of successor blocks rather than at the end of this block. Otherwise, moves that semantically must come after the block's last uses (the branch's args) would be placed before it. This is almost always the case, because crit-edge splitting ensures that if we have more than one succ, all our succs will have only one pred. This is because branch kinds that take arguments (fixed args, not the blockparam args) tend to have more than one successor: conditionals and br_tables. However, a fuzzbug recently illuminated one corner case I had missed: a br_table can have one successor only, if it has a default target and an empty table. In this case, crit-edge splitting will happily skip a split and assume that we can insert edge moves at the end of the block with the br_table. But this will fail. regalloc2 explicitly checks this and bails with a panic, rather than continue, so no miscompilation is possible; but without this fix, we will get these panics on br_tables with empty tables.	3 years ago
Alex Crichton	3f3afb455e	Remove support for userfaultfd (#4040 ) This commit removes support for the `userfaultfd` or "uffd" syscall on Linux. This support was originally added for users migrating from Lucet to Wasmtime, but the recent developments of kernel-supported copy-on-write support for memory initialization wound up being more appropriate for these use cases than usefaultfd. The main reason for moving to copy-on-write initialization are: * The `userfaultfd` feature was never necessarily intended for this style of use case with wasm and was susceptible to subtle and rare bugs that were extremely difficult to track down. We were never 100% certain that there were kernel bugs related to userfaultfd but the suspicion never went away. * Handling faults with userfaultfd was always slow and single-threaded. Only one thread could handle faults and traveling to user-space to handle faults is inherently slower than handling them all in the kernel. The single-threaded aspect in particular presented a significant scaling bottleneck for embeddings that want to run many wasm instances in parallel. * One of the major benefits of userfaultfd was lazy initialization of wasm linear memory which is also achieved with the copy-on-write initialization support we have right now. * One of the suspected benefits of userfaultfd was less frobbing of the kernel vma structures when wasm modules are instantiated. Currently the copy-on-write support has a mitigation where we attempt to reuse the memory images where possible to avoid changing vma structures. When comparing this to userfaultfd's performance it was found that kernel modifications of vmas aren't a worrisome bottleneck so copy-on-write is suitable for this as well. Overall there are no remaining benefits that userfaultfd gives that copy-on-write doesn't, and copy-on-write solves a major downsides of userfaultfd, the scaling issue with a single faulting thread. Additionally copy-on-write support seems much more robust in terms of kernel implementation since it's only using standard memory-management syscalls which are heavily exercised. Finally copy-on-write support provides a new bonus where read-only memory in WebAssembly can be mapped directly to the same kernel cache page, even amongst many wasm instances of the same module, which was never possible with userfaultfd. In light of all this it's expected that all users of userfaultfd should migrate to the copy-on-write initialization of Wasmtime (which is enabled by default).	3 years ago
Chris Fallin	5774e068b7	Cranelift: fix regalloc2 integration bug wrt blockparam branch args. (#4042 ) Previously, the block successor accumulation and the blockparam branch arg setup were decoupled. The lowering backend implicitly specified the order of successor edges via its `MachTerminator` enum on the last instruction in the block, while the `Lower` toplevel machine-independent driver set up blockparam branch args in the edge order seen in CLIF. In some cases, these orders did not match -- for example, when the conditional branch depended on an FP condition that was implemented by swapping taken/not-taken edges and inverting the condition code. This PR refactors the successor handling to be centralized in `Lower` rather than flow through the terminator `MachInst`, and adds a successor block and its blockparam args at the same time, ensuring the orders match.	3 years ago
Chris Fallin	7cf5f05830	Cranelift: remove slow invariant validation in cfg(fuzzing) from MachBuffer. (#4038 ) Following the merge of regalloc2 support, this became slower because we are stricter about the critical-edge invariant, generating a separate edge block for every out-edge even if two or more out-edges go to the same successor (this is significant in cases of `br_table` with many entries having the same target block, for example). Many of those edge blocks are empty and end up collapsed by the MachBuffer, which leads to a large set of aliased labels. The invariant validation will dutifully iterate over all the data structures at every step, validating all of our conditions. But this gets way slower in the new context, to the point that we'll probably have some fuzz timeouts. This was pointed out in [1] but I missed removing this in #3989. Given that `MachBuffer` has been around for nearly two years now, has been fuzzed continuously with the invariant validation for that time, and also has a correctness proof in the comments, it's probably reasonable to remove this high (recently increased) cost from the fuzzing-specific compilation configuration. [1] https://github.com/bytecodealliance/wasmtime/pull/3989#discussion_r847712263	3 years ago
Sam Parker	cf533a8041	[AArch64] Merge Fcmp32 and Fcmp64 (#4032 ) Copyright (c) 2022, Arm Limited.	3 years ago
Chris Fallin	a40b5c3985	Add note about regalloc2 switch in release notes. (#4037 )	3 years ago
Sam Parker	682ef7b470	[AArch64] Refactor Mov instructions (#4033 ) Merge Mov32 and Mov64 into a single instruction parameterized by a new OperandSize field. Also combine the Mov[K,N,Z] into a single instruction with a new opcode to select between the operations. Copyright (c) 2022, Arm Limited.	3 years ago
Sam Parker	dd442a4d2f	[AArch64] Merge 32- and 64-bit FPUOp1 (#4031 ) Copyright (c) 2022, Arm Limited.	3 years ago

1 2 3 4 5 ...

9774 Commits (e1f7b50a1278db9208aac113bf4fec15b51d6f65) All Branches Search

9774 Commits (e1f7b50a1278db9208aac113bf4fec15b51d6f65)

All Branches