Browse Source
* Cranelift: Use a fixpoint loop to compute the best value for each eclass Fixes #7857 * Remove fixpoint loop early-continue optimization * Add document describing optimization rule invariants * Make select optimizations use subsume * Remove invalid debug assert * Remove now-unused methods * Add commutative adds to cost testspull/7880/head
Nick Fitzgerald
9 months ago
committed by
GitHub
6 changed files with 294 additions and 67 deletions
@ -1,5 +1,81 @@ |
|||
Rules here are allowed to rewrite pure expressions arbitrarily, |
|||
using the same inputs as the original, or fewer. In other words, we |
|||
cannot pull a new eclass id out of thin air and refer to it, other |
|||
than a piece of the input or a new node that we construct; but we |
|||
can freely rewrite e.g. `x+y-y` to `x`. |
|||
# Rules for Writing Optimization Rules |
|||
|
|||
For both correctness and compile speed, we must be careful with our rules. A lot |
|||
of it boils down to the fact that, unlike traditional e-graphs, our rules are |
|||
*directional*. |
|||
|
|||
1. Rules should not rewrite to worse code: the right-hand side should be at |
|||
least as good as the left-hand side or better. |
|||
|
|||
For example, the rule |
|||
|
|||
x => (add x 0) |
|||
|
|||
is disallowed, but swapping its left- and right-hand sides produces a rule |
|||
that is allowed. |
|||
|
|||
Any kind of canonicalizing rule that intends to help subsequent rules match |
|||
and unlock further optimizations (e.g. floating constants to the right side |
|||
for our constant-propagation rules to match) must produce canonicalized |
|||
output that is no worse than its noncanonical input. |
|||
|
|||
We assume this invariant as a heuristic to break ties between two |
|||
otherwise-equal-cost expressions in various places, making up for some |
|||
limitations of our explicit cost function. |
|||
|
|||
2. Any rule that removes value-uses in its right-hand side that previously |
|||
existed in its left-hand side MUST use `subsume`. |
|||
|
|||
For example, the rule |
|||
|
|||
(select 1 x y) => x |
|||
|
|||
MUST use `subsume`. |
|||
|
|||
This is required for correctness because, once a value-use is removed, some |
|||
e-nodes in the e-class are more equal than others. There might be uses of `x` |
|||
in a scope where `y` is not available, and so emitting `(select 1 x y)` in |
|||
place of `x` in such cases would introduce uses of `y` where it is not |
|||
defined. |
|||
|
|||
3. Avoid overly general rewrites like commutativity and associativity. Instead, |
|||
prefer targeted instances of the rewrite (for example, canonicalizing adds |
|||
where one operand is a constant such that the constant is always the add's |
|||
second operand, rather than general commutativity for adds) or even writing |
|||
the "same" optimization rule multiple times. |
|||
|
|||
For example, the commutativity in the first rule in the following snippet is |
|||
bad because it will match even when the first operand is not an add: |
|||
|
|||
;; Commute to allow `(foo (add ...) x)`, when we see it, to match. |
|||
(foo x y) => (foo y x) |
|||
|
|||
;; Optimize. |
|||
(foo x (add ...)) => (bar x) |
|||
|
|||
Better is to commute only when we know that canonicalizing in this way will |
|||
all definitely allow the subsequent optimization rule to match: |
|||
|
|||
;; Canonicalize all adds to `foo`'s second operand. |
|||
(foo (add ...) x) => (foo x (add ...)) |
|||
|
|||
;; Optimize. |
|||
(foo x (add ...)) => (bar x) |
|||
|
|||
But even better in this case is to write the "same" optimization multiple |
|||
times: |
|||
|
|||
(foo (add ...) x) => (bar x) |
|||
(foo x (add ...)) => (bar x) |
|||
|
|||
The cost of rule-matching is amortized by the ISLE compiler, where as the |
|||
intermediate result of each rewrite allocates new e-nodes and requires |
|||
storage in the dataflow graph. Therefore, additional rules are cheaper than |
|||
additional e-nodes. |
|||
|
|||
Commutativity and associativity in particular can cause huge amounts of |
|||
e-graph bloat. |
|||
|
|||
One day we intend to extend ISLE with built-in support for commutativity, so |
|||
we don't need to author the redundant commutations ourselves: |
|||
https://github.com/bytecodealliance/wasmtime/issues/6128 |
|||
|
@ -0,0 +1,37 @@ |
|||
test optimize |
|||
set enable_verifier=true |
|||
set opt_level=speed |
|||
target x86_64 |
|||
|
|||
;; This test case should optimize just fine, and should definitely not produce |
|||
;; CLIF that has verifier errors like |
|||
;; |
|||
;; error: inst10 (v12 = select.f32 v11, v4, v10 ; v11 = 1): uses value arg |
|||
;; from non-dominating block4 |
|||
|
|||
function %foo() { |
|||
block0: |
|||
v0 = iconst.i64 0 |
|||
v2 = f32const 0.0 |
|||
v9 = f32const 0.0 |
|||
v20 = fneg v2 |
|||
v18 = fcmp eq v20, v20 |
|||
v4 = select v18, v2, v20 |
|||
v8 = iconst.i32 0 |
|||
v11 = iconst.i32 1 |
|||
brif v0, block2, block3 |
|||
|
|||
block2: |
|||
brif.i32 v8, block4(v2), block4(v9) |
|||
|
|||
block4(v10: f32): |
|||
v12 = select.f32 v11, v4, v10 |
|||
v13 = bitcast.i32 v12 |
|||
store v13, v0 |
|||
trap user0 |
|||
|
|||
block3: |
|||
v15 = bitcast.i32 v4 |
|||
store v15, v0 |
|||
return |
|||
} |
Loading…
Reference in new issue