Browse Source
* egraph support: rewrite to work in terms of CLIF data structures. This work rewrites the "egraph"-based optimization framework in Cranelift to operate on aegraphs (acyclic egraphs) represented in the CLIF itself rather than as a separate data structure to which and from which we translate the CLIF. The basic idea is to add a new kind of value, a "union", that is like an alias but refers to two other values rather than one. This allows us to represent an eclass of enodes (values) as a tree. The union node allows for a value to have *multiple representations*: either constituent value could be used, and (in well-formed CLIF produced by correct optimization rules) they must be equivalent. Like the old egraph infrastructure, we take advantage of acyclicity and eager rule application to do optimization in a single pass. Like before, we integrate GVN (during the optimization pass) and LICM (during elaboration). Unlike the old egraph infrastructure, everything stays in the DataFlowGraph. "Pure" enodes are represented as instructions that have values attached, but that are not placed into the function layout. When entering "egraph" form, we remove them from the layout while optimizing. When leaving "egraph" form, during elaboration, we can place an instruction back into the layout the first time we elaborate the enode; if we elaborate it more than once, we clone the instruction. The implementation performs two passes overall: - One, a forward pass in RPO (to see defs before uses), that (i) removes "pure" instructions from the layout and (ii) optimizes as it goes. As before, we eagerly optimize, so we form the entire union of optimized forms of a value before we see any uses of that value. This lets us rewrite uses to use the most "up-to-date" form of the value and canonicalize and optimize that form. The eager rewriting and acyclic representation make each other work (we could not eagerly rewrite if there were cycles; and acyclicity does not miss optimization opportunities only because the first time we introduce a value, we immediately produce its "best" form). This design choice is also what allows us to avoid the "parent pointers" and fixpoint loop of traditional egraphs. This forward optimization pass keeps a scoped hashmap to "intern" nodes (thus performing GVN), and also interleaves on a per-instruction level with alias analysis. The interleaving with alias analysis allows alias analysis to see the most optimized form of each address (so it can see equivalences), and allows the next value to see any equivalences (reuses of loads or stored values) that alias analysis uncovers. - Two, a forward pass in domtree preorder, that "elaborates" pure enodes back into the layout, possibly in multiple places if needed. This tracks the loop nest and hoists nodes as needed, performing LICM as it goes. Note that by doing this in forward order, we avoid the "fixpoint" that traditional LICM needs: we hoist a def before its uses, so when we place a node, we place it in the right place the first time rather than moving later. This PR replaces the old (a)egraph implementation. It removes both the cranelift-egraph crate and the logic in cranelift-codegen that uses it. On `spidermonkey.wasm` running a simple recursive Fibonacci microbenchmark, this work shows 5.5% compile-time reduction and 7.7% runtime improvement (speedup). Most of this implementation was done in (very productive) pair programming sessions with Jamey Sharp, thus: Co-authored-by: Jamey Sharp <jsharp@fastly.com> * Review feedback. * Review feedback. * Review feedback. * Bugfix: cprop rule: `(x + k1) - k2` becomes `x - (k2 - k1)`, not `x - (k1 - k2)`. Co-authored-by: Jamey Sharp <jsharp@fastly.com>pull/5388/head
Chris Fallin
2 years ago
committed by
GitHub
42 changed files with 1839 additions and 3833 deletions
@ -0,0 +1,168 @@ |
|||
//! A hashmap with "external hashing": nodes are hashed or compared for
|
|||
//! equality only with some external context provided on lookup/insert.
|
|||
//! This allows very memory-efficient data structures where
|
|||
//! node-internal data references some other storage (e.g., offsets into
|
|||
//! an array or pool of shared data).
|
|||
|
|||
use hashbrown::raw::RawTable; |
|||
use std::hash::{Hash, Hasher}; |
|||
|
|||
/// Trait that allows for equality comparison given some external
|
|||
/// context.
|
|||
///
|
|||
/// Note that this trait is implemented by the *context*, rather than
|
|||
/// the item type, for somewhat complex lifetime reasons (lack of GATs
|
|||
/// to allow `for<'ctx> Ctx<'ctx>`-like associated types in traits on
|
|||
/// the value type).
|
|||
pub trait CtxEq<V1: ?Sized, V2: ?Sized> { |
|||
/// Determine whether `a` and `b` are equal, given the context in
|
|||
/// `self` and the union-find data structure `uf`.
|
|||
fn ctx_eq(&self, a: &V1, b: &V2) -> bool; |
|||
} |
|||
|
|||
/// Trait that allows for hashing given some external context.
|
|||
pub trait CtxHash<Value: ?Sized>: CtxEq<Value, Value> { |
|||
/// Compute the hash of `value`, given the context in `self` and
|
|||
/// the union-find data structure `uf`.
|
|||
fn ctx_hash<H: Hasher>(&self, state: &mut H, value: &Value); |
|||
} |
|||
|
|||
/// A null-comparator context type for underlying value types that
|
|||
/// already have `Eq` and `Hash`.
|
|||
#[derive(Default)] |
|||
pub struct NullCtx; |
|||
|
|||
impl<V: Eq + Hash> CtxEq<V, V> for NullCtx { |
|||
fn ctx_eq(&self, a: &V, b: &V) -> bool { |
|||
a.eq(b) |
|||
} |
|||
} |
|||
impl<V: Eq + Hash> CtxHash<V> for NullCtx { |
|||
fn ctx_hash<H: Hasher>(&self, state: &mut H, value: &V) { |
|||
value.hash(state); |
|||
} |
|||
} |
|||
|
|||
/// A bucket in the hash table.
|
|||
///
|
|||
/// Some performance-related design notes: we cache the hashcode for
|
|||
/// speed, as this often buys a few percent speed in
|
|||
/// interning-table-heavy workloads. We only keep the low 32 bits of
|
|||
/// the hashcode, for memory efficiency: in common use, `K` and `V`
|
|||
/// are often 32 bits also, and a 12-byte bucket is measurably better
|
|||
/// than a 16-byte bucket.
|
|||
struct BucketData<K, V> { |
|||
hash: u32, |
|||
k: K, |
|||
v: V, |
|||
} |
|||
|
|||
/// A HashMap that takes external context for all operations.
|
|||
pub struct CtxHashMap<K, V> { |
|||
raw: RawTable<BucketData<K, V>>, |
|||
} |
|||
|
|||
impl<K, V> CtxHashMap<K, V> { |
|||
/// Create an empty hashmap with pre-allocated space for the given
|
|||
/// capacity.
|
|||
pub fn with_capacity(capacity: usize) -> Self { |
|||
Self { |
|||
raw: RawTable::with_capacity(capacity), |
|||
} |
|||
} |
|||
} |
|||
|
|||
fn compute_hash<Ctx, K>(ctx: &Ctx, k: &K) -> u32 |
|||
where |
|||
Ctx: CtxHash<K>, |
|||
{ |
|||
let mut hasher = crate::fx::FxHasher::default(); |
|||
ctx.ctx_hash(&mut hasher, k); |
|||
hasher.finish() as u32 |
|||
} |
|||
|
|||
impl<K, V> CtxHashMap<K, V> { |
|||
/// Insert a new key-value pair, returning the old value associated
|
|||
/// with this key (if any).
|
|||
pub fn insert<Ctx>(&mut self, k: K, v: V, ctx: &Ctx) -> Option<V> |
|||
where |
|||
Ctx: CtxEq<K, K> + CtxHash<K>, |
|||
{ |
|||
let hash = compute_hash(ctx, &k); |
|||
match self.raw.find(hash as u64, |bucket| { |
|||
hash == bucket.hash && ctx.ctx_eq(&bucket.k, &k) |
|||
}) { |
|||
Some(bucket) => { |
|||
let data = unsafe { bucket.as_mut() }; |
|||
Some(std::mem::replace(&mut data.v, v)) |
|||
} |
|||
None => { |
|||
let data = BucketData { hash, k, v }; |
|||
self.raw |
|||
.insert_entry(hash as u64, data, |bucket| bucket.hash as u64); |
|||
None |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Look up a key, returning a borrow of the value if present.
|
|||
pub fn get<'a, Q, Ctx>(&'a self, k: &Q, ctx: &Ctx) -> Option<&'a V> |
|||
where |
|||
Ctx: CtxEq<K, Q> + CtxHash<Q> + CtxHash<K>, |
|||
{ |
|||
let hash = compute_hash(ctx, k); |
|||
self.raw |
|||
.find(hash as u64, |bucket| { |
|||
hash == bucket.hash && ctx.ctx_eq(&bucket.k, k) |
|||
}) |
|||
.map(|bucket| { |
|||
let data = unsafe { bucket.as_ref() }; |
|||
&data.v |
|||
}) |
|||
} |
|||
} |
|||
|
|||
#[cfg(test)] |
|||
mod test { |
|||
use super::*; |
|||
use std::hash::Hash; |
|||
|
|||
#[derive(Clone, Copy, Debug)] |
|||
struct Key { |
|||
index: u32, |
|||
} |
|||
struct Ctx { |
|||
vals: &'static [&'static str], |
|||
} |
|||
impl CtxEq<Key, Key> for Ctx { |
|||
fn ctx_eq(&self, a: &Key, b: &Key) -> bool { |
|||
self.vals[a.index as usize].eq(self.vals[b.index as usize]) |
|||
} |
|||
} |
|||
impl CtxHash<Key> for Ctx { |
|||
fn ctx_hash<H: Hasher>(&self, state: &mut H, value: &Key) { |
|||
self.vals[value.index as usize].hash(state); |
|||
} |
|||
} |
|||
|
|||
#[test] |
|||
fn test_basic() { |
|||
let ctx = Ctx { |
|||
vals: &["a", "b", "a"], |
|||
}; |
|||
|
|||
let k0 = Key { index: 0 }; |
|||
let k1 = Key { index: 1 }; |
|||
let k2 = Key { index: 2 }; |
|||
|
|||
assert!(ctx.ctx_eq(&k0, &k2)); |
|||
assert!(!ctx.ctx_eq(&k0, &k1)); |
|||
assert!(!ctx.ctx_eq(&k2, &k1)); |
|||
|
|||
let mut map: CtxHashMap<Key, u64> = CtxHashMap::with_capacity(4); |
|||
assert_eq!(map.insert(k0, 42, &ctx), None); |
|||
assert_eq!(map.insert(k2, 84, &ctx), Some(42)); |
|||
assert_eq!(map.get(&k1, &ctx), None); |
|||
assert_eq!(*map.get(&k0, &ctx).unwrap(), 84); |
|||
} |
|||
} |
@ -0,0 +1,97 @@ |
|||
//! Cost functions for egraph representation.
|
|||
|
|||
use crate::ir::Opcode; |
|||
|
|||
/// A cost of computing some value in the program.
|
|||
///
|
|||
/// Costs are measured in an arbitrary union that we represent in a
|
|||
/// `u32`. The ordering is meant to be meaningful, but the value of a
|
|||
/// single unit is arbitrary (and "not to scale"). We use a collection
|
|||
/// of heuristics to try to make this approximation at least usable.
|
|||
///
|
|||
/// We start by defining costs for each opcode (see `pure_op_cost`
|
|||
/// below). The cost of computing some value, initially, is the cost
|
|||
/// of its opcode, plus the cost of computing its inputs.
|
|||
///
|
|||
/// We then adjust the cost according to loop nests: for each
|
|||
/// loop-nest level, we multiply by 1024. Because we only have 32
|
|||
/// bits, we limit this scaling to a loop-level of two (i.e., multiply
|
|||
/// by 2^20 ~= 1M).
|
|||
///
|
|||
/// Arithmetic on costs is always saturating: we don't want to wrap
|
|||
/// around and return to a tiny cost when adding the costs of two very
|
|||
/// expensive operations. It is better to approximate and lose some
|
|||
/// precision than to lose the ordering by wrapping.
|
|||
///
|
|||
/// Finally, we reserve the highest value, `u32::MAX`, as a sentinel
|
|||
/// that means "infinite". This is separate from the finite costs and
|
|||
/// not reachable by doing arithmetic on them (even when overflowing)
|
|||
/// -- we saturate just *below* infinity. (This is done by the
|
|||
/// `finite()` method.) An infinite cost is used to represent a value
|
|||
/// that cannot be computed, or otherwise serve as a sentinel when
|
|||
/// performing search for the lowest-cost representation of a value.
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] |
|||
pub(crate) struct Cost(u32); |
|||
impl Cost { |
|||
pub(crate) fn at_level(&self, loop_level: usize) -> Cost { |
|||
let loop_level = std::cmp::min(2, loop_level); |
|||
let multiplier = 1u32 << ((10 * loop_level) as u32); |
|||
Cost(self.0.saturating_mul(multiplier)).finite() |
|||
} |
|||
|
|||
pub(crate) fn infinity() -> Cost { |
|||
// 2^32 - 1 is, uh, pretty close to infinite... (we use `Cost`
|
|||
// only for heuristics and always saturate so this suffices!)
|
|||
Cost(u32::MAX) |
|||
} |
|||
|
|||
pub(crate) fn zero() -> Cost { |
|||
Cost(0) |
|||
} |
|||
|
|||
/// Clamp this cost at a "finite" value. Can be used in
|
|||
/// conjunction with saturating ops to avoid saturating into
|
|||
/// `infinity()`.
|
|||
fn finite(self) -> Cost { |
|||
Cost(std::cmp::min(u32::MAX - 1, self.0)) |
|||
} |
|||
} |
|||
|
|||
impl std::default::Default for Cost { |
|||
fn default() -> Cost { |
|||
Cost::zero() |
|||
} |
|||
} |
|||
|
|||
impl std::ops::Add<Cost> for Cost { |
|||
type Output = Cost; |
|||
fn add(self, other: Cost) -> Cost { |
|||
Cost(self.0.saturating_add(other.0)).finite() |
|||
} |
|||
} |
|||
|
|||
/// Return the cost of a *pure* opcode. Caller is responsible for
|
|||
/// checking that the opcode came from an instruction that satisfies
|
|||
/// `inst_predicates::is_pure_for_egraph()`.
|
|||
pub(crate) fn pure_op_cost(op: Opcode) -> Cost { |
|||
match op { |
|||
// Constants.
|
|||
Opcode::Iconst | Opcode::F32const | Opcode::F64const => Cost(0), |
|||
// Extends/reduces.
|
|||
Opcode::Uextend | Opcode::Sextend | Opcode::Ireduce | Opcode::Iconcat | Opcode::Isplit => { |
|||
Cost(1) |
|||
} |
|||
// "Simple" arithmetic.
|
|||
Opcode::Iadd |
|||
| Opcode::Isub |
|||
| Opcode::Band |
|||
| Opcode::BandNot |
|||
| Opcode::Bor |
|||
| Opcode::BorNot |
|||
| Opcode::Bxor |
|||
| Opcode::BxorNot |
|||
| Opcode::Bnot => Cost(2), |
|||
// Everything else (pure.)
|
|||
_ => Cost(3), |
|||
} |
|||
} |
File diff suppressed because it is too large
@ -1,366 +0,0 @@ |
|||
//! Node definition for EGraph representation.
|
|||
|
|||
use super::PackedMemoryState; |
|||
use crate::ir::{Block, DataFlowGraph, InstructionImms, Opcode, RelSourceLoc, Type}; |
|||
use crate::loop_analysis::LoopLevel; |
|||
use cranelift_egraph::{CtxEq, CtxHash, Id, Language, UnionFind}; |
|||
use cranelift_entity::{EntityList, ListPool}; |
|||
use std::hash::{Hash, Hasher}; |
|||
|
|||
#[derive(Debug)] |
|||
pub enum Node { |
|||
/// A blockparam. Effectively an input/root; does not refer to
|
|||
/// predecessors' branch arguments, because this would create
|
|||
/// cycles.
|
|||
Param { |
|||
/// CLIF block this param comes from.
|
|||
block: Block, |
|||
/// Index of blockparam within block.
|
|||
index: u32, |
|||
/// Type of the value.
|
|||
ty: Type, |
|||
/// The loop level of this Param.
|
|||
loop_level: LoopLevel, |
|||
}, |
|||
/// A CLIF instruction that is pure (has no side-effects). Not
|
|||
/// tied to any location; we will compute a set of locations at
|
|||
/// which to compute this node during lowering back out of the
|
|||
/// egraph.
|
|||
Pure { |
|||
/// The instruction data, without SSA values.
|
|||
op: InstructionImms, |
|||
/// eclass arguments to the operator.
|
|||
args: EntityList<Id>, |
|||
/// Type of result, if one.
|
|||
ty: Type, |
|||
/// Number of results.
|
|||
arity: u16, |
|||
}, |
|||
/// A CLIF instruction that has side-effects or is otherwise not
|
|||
/// representable by `Pure`.
|
|||
Inst { |
|||
/// The instruction data, without SSA values.
|
|||
op: InstructionImms, |
|||
/// eclass arguments to the operator.
|
|||
args: EntityList<Id>, |
|||
/// Type of result, if one.
|
|||
ty: Type, |
|||
/// Number of results.
|
|||
arity: u16, |
|||
/// The source location to preserve.
|
|||
srcloc: RelSourceLoc, |
|||
/// The loop level of this Inst.
|
|||
loop_level: LoopLevel, |
|||
}, |
|||
/// A projection of one result of an `Inst` or `Pure`.
|
|||
Result { |
|||
/// `Inst` or `Pure` node.
|
|||
value: Id, |
|||
/// Index of the result we want.
|
|||
result: usize, |
|||
/// Type of the value.
|
|||
ty: Type, |
|||
}, |
|||
|
|||
/// A load instruction. Nominally a side-effecting `Inst` (and
|
|||
/// included in the list of side-effecting roots so it will always
|
|||
/// be elaborated), but represented as a distinct kind of node so
|
|||
/// that we can leverage deduplication to do
|
|||
/// redundant-load-elimination for free (and make store-to-load
|
|||
/// forwarding much easier).
|
|||
Load { |
|||
// -- identity depends on:
|
|||
/// The original load operation. Must have one argument, the
|
|||
/// address.
|
|||
op: InstructionImms, |
|||
/// The type of the load result.
|
|||
ty: Type, |
|||
/// Address argument. Actual address has an offset, which is
|
|||
/// included in `op` (and thus already considered as part of
|
|||
/// the key).
|
|||
addr: Id, |
|||
/// The abstract memory state that this load accesses.
|
|||
mem_state: PackedMemoryState, |
|||
|
|||
// -- not included in dedup key:
|
|||
/// Source location, for traps. Not included in Eq/Hash.
|
|||
srcloc: RelSourceLoc, |
|||
}, |
|||
} |
|||
|
|||
impl Node { |
|||
pub(crate) fn is_non_pure(&self) -> bool { |
|||
match self { |
|||
Node::Inst { .. } | Node::Load { .. } => true, |
|||
_ => false, |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Shared pools for type and id lists in nodes.
|
|||
pub struct NodeCtx { |
|||
/// Arena for arg eclass-ID lists.
|
|||
pub args: ListPool<Id>, |
|||
} |
|||
|
|||
impl NodeCtx { |
|||
pub(crate) fn with_capacity_for_dfg(dfg: &DataFlowGraph) -> Self { |
|||
let n_args = dfg.value_lists.capacity(); |
|||
Self { |
|||
args: ListPool::with_capacity(n_args), |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl NodeCtx { |
|||
fn ids_eq(&self, a: &EntityList<Id>, b: &EntityList<Id>, uf: &mut UnionFind) -> bool { |
|||
let a = a.as_slice(&self.args); |
|||
let b = b.as_slice(&self.args); |
|||
a.len() == b.len() && a.iter().zip(b.iter()).all(|(&a, &b)| uf.equiv_id_mut(a, b)) |
|||
} |
|||
|
|||
fn hash_ids<H: Hasher>(&self, a: &EntityList<Id>, hash: &mut H, uf: &mut UnionFind) { |
|||
let a = a.as_slice(&self.args); |
|||
for &id in a { |
|||
uf.hash_id_mut(hash, id); |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl CtxEq<Node, Node> for NodeCtx { |
|||
fn ctx_eq(&self, a: &Node, b: &Node, uf: &mut UnionFind) -> bool { |
|||
match (a, b) { |
|||
( |
|||
&Node::Param { |
|||
block, |
|||
index, |
|||
ty, |
|||
loop_level: _, |
|||
}, |
|||
&Node::Param { |
|||
block: other_block, |
|||
index: other_index, |
|||
ty: other_ty, |
|||
loop_level: _, |
|||
}, |
|||
) => block == other_block && index == other_index && ty == other_ty, |
|||
( |
|||
&Node::Result { value, result, ty }, |
|||
&Node::Result { |
|||
value: other_value, |
|||
result: other_result, |
|||
ty: other_ty, |
|||
}, |
|||
) => uf.equiv_id_mut(value, other_value) && result == other_result && ty == other_ty, |
|||
( |
|||
&Node::Pure { |
|||
ref op, |
|||
ref args, |
|||
ty, |
|||
arity: _, |
|||
}, |
|||
&Node::Pure { |
|||
op: ref other_op, |
|||
args: ref other_args, |
|||
ty: other_ty, |
|||
arity: _, |
|||
}, |
|||
) => *op == *other_op && self.ids_eq(args, other_args, uf) && ty == other_ty, |
|||
( |
|||
&Node::Inst { ref args, .. }, |
|||
&Node::Inst { |
|||
args: ref other_args, |
|||
.. |
|||
}, |
|||
) => self.ids_eq(args, other_args, uf), |
|||
( |
|||
&Node::Load { |
|||
ref op, |
|||
ty, |
|||
addr, |
|||
mem_state, |
|||
.. |
|||
}, |
|||
&Node::Load { |
|||
op: ref other_op, |
|||
ty: other_ty, |
|||
addr: other_addr, |
|||
mem_state: other_mem_state, |
|||
// Explicitly exclude: `inst` and `srcloc`. We
|
|||
// want loads to merge if identical in
|
|||
// opcode/offset, address expression, and last
|
|||
// store (this does implicit
|
|||
// redundant-load-elimination.)
|
|||
//
|
|||
// Note however that we *do* include `ty` (the
|
|||
// type) and match on that: we otherwise would
|
|||
// have no way of disambiguating loads of
|
|||
// different widths to the same address.
|
|||
.. |
|||
}, |
|||
) => { |
|||
op == other_op |
|||
&& ty == other_ty |
|||
&& uf.equiv_id_mut(addr, other_addr) |
|||
&& mem_state == other_mem_state |
|||
} |
|||
_ => false, |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl CtxHash<Node> for NodeCtx { |
|||
fn ctx_hash(&self, value: &Node, uf: &mut UnionFind) -> u64 { |
|||
let mut state = crate::fx::FxHasher::default(); |
|||
std::mem::discriminant(value).hash(&mut state); |
|||
match value { |
|||
&Node::Param { |
|||
block, |
|||
index, |
|||
ty: _, |
|||
loop_level: _, |
|||
} => { |
|||
block.hash(&mut state); |
|||
index.hash(&mut state); |
|||
} |
|||
&Node::Result { |
|||
value, |
|||
result, |
|||
ty: _, |
|||
} => { |
|||
uf.hash_id_mut(&mut state, value); |
|||
result.hash(&mut state); |
|||
} |
|||
&Node::Pure { |
|||
ref op, |
|||
ref args, |
|||
ty, |
|||
arity: _, |
|||
} => { |
|||
op.hash(&mut state); |
|||
self.hash_ids(args, &mut state, uf); |
|||
ty.hash(&mut state); |
|||
} |
|||
&Node::Inst { ref args, .. } => { |
|||
self.hash_ids(args, &mut state, uf); |
|||
} |
|||
&Node::Load { |
|||
ref op, |
|||
ty, |
|||
addr, |
|||
mem_state, |
|||
.. |
|||
} => { |
|||
op.hash(&mut state); |
|||
ty.hash(&mut state); |
|||
uf.hash_id_mut(&mut state, addr); |
|||
mem_state.hash(&mut state); |
|||
} |
|||
} |
|||
|
|||
state.finish() |
|||
} |
|||
} |
|||
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord)] |
|||
pub(crate) struct Cost(u32); |
|||
impl Cost { |
|||
pub(crate) fn at_level(&self, loop_level: LoopLevel) -> Cost { |
|||
let loop_level = std::cmp::min(2, loop_level.level()); |
|||
let multiplier = 1u32 << ((10 * loop_level) as u32); |
|||
Cost(self.0.saturating_mul(multiplier)).finite() |
|||
} |
|||
|
|||
pub(crate) fn infinity() -> Cost { |
|||
// 2^32 - 1 is, uh, pretty close to infinite... (we use `Cost`
|
|||
// only for heuristics and always saturate so this suffices!)
|
|||
Cost(u32::MAX) |
|||
} |
|||
|
|||
pub(crate) fn zero() -> Cost { |
|||
Cost(0) |
|||
} |
|||
|
|||
/// Clamp this cost at a "finite" value. Can be used in
|
|||
/// conjunction with saturating ops to avoid saturating into
|
|||
/// `infinity()`.
|
|||
fn finite(self) -> Cost { |
|||
Cost(std::cmp::min(u32::MAX - 1, self.0)) |
|||
} |
|||
} |
|||
|
|||
impl std::default::Default for Cost { |
|||
fn default() -> Cost { |
|||
Cost::zero() |
|||
} |
|||
} |
|||
|
|||
impl std::ops::Add<Cost> for Cost { |
|||
type Output = Cost; |
|||
fn add(self, other: Cost) -> Cost { |
|||
Cost(self.0.saturating_add(other.0)).finite() |
|||
} |
|||
} |
|||
|
|||
pub(crate) fn op_cost(op: &InstructionImms) -> Cost { |
|||
match op.opcode() { |
|||
// Constants.
|
|||
Opcode::Iconst | Opcode::F32const | Opcode::F64const => Cost(0), |
|||
// Extends/reduces.
|
|||
Opcode::Uextend | Opcode::Sextend | Opcode::Ireduce | Opcode::Iconcat | Opcode::Isplit => { |
|||
Cost(1) |
|||
} |
|||
// "Simple" arithmetic.
|
|||
Opcode::Iadd |
|||
| Opcode::Isub |
|||
| Opcode::Band |
|||
| Opcode::BandNot |
|||
| Opcode::Bor |
|||
| Opcode::BorNot |
|||
| Opcode::Bxor |
|||
| Opcode::BxorNot |
|||
| Opcode::Bnot => Cost(2), |
|||
// Everything else.
|
|||
_ => Cost(3), |
|||
} |
|||
} |
|||
|
|||
impl Language for NodeCtx { |
|||
type Node = Node; |
|||
|
|||
fn children<'a>(&'a self, node: &'a Node) -> &'a [Id] { |
|||
match node { |
|||
Node::Param { .. } => &[], |
|||
Node::Pure { args, .. } | Node::Inst { args, .. } => args.as_slice(&self.args), |
|||
Node::Load { addr, .. } => std::slice::from_ref(addr), |
|||
Node::Result { value, .. } => std::slice::from_ref(value), |
|||
} |
|||
} |
|||
|
|||
fn children_mut<'a>(&'a mut self, node: &'a mut Node) -> &'a mut [Id] { |
|||
match node { |
|||
Node::Param { .. } => &mut [], |
|||
Node::Pure { args, .. } | Node::Inst { args, .. } => args.as_mut_slice(&mut self.args), |
|||
Node::Load { addr, .. } => std::slice::from_mut(addr), |
|||
Node::Result { value, .. } => std::slice::from_mut(value), |
|||
} |
|||
} |
|||
|
|||
fn needs_dedup(&self, node: &Node) -> bool { |
|||
match node { |
|||
Node::Pure { .. } | Node::Load { .. } => true, |
|||
_ => false, |
|||
} |
|||
} |
|||
} |
|||
|
|||
#[cfg(test)] |
|||
mod test { |
|||
#[test] |
|||
#[cfg(target_pointer_width = "64")] |
|||
fn node_size() { |
|||
use super::*; |
|||
assert_eq!(std::mem::size_of::<InstructionImms>(), 16); |
|||
assert_eq!(std::mem::size_of::<Node>(), 32); |
|||
} |
|||
} |
@ -1,293 +0,0 @@ |
|||
//! Last-store tracking via alias analysis.
|
|||
//!
|
|||
//! We partition memory state into several *disjoint pieces* of
|
|||
//! "abstract state". There are a finite number of such pieces:
|
|||
//! currently, we call them "heap", "table", "vmctx", and "other". Any
|
|||
//! given address in memory belongs to exactly one disjoint piece.
|
|||
//!
|
|||
//! One never tracks which piece a concrete address belongs to at
|
|||
//! runtime; this is a purely static concept. Instead, all
|
|||
//! memory-accessing instructions (loads and stores) are labeled with
|
|||
//! one of these four categories in the `MemFlags`. It is forbidden
|
|||
//! for a load or store to access memory under one category and a
|
|||
//! later load or store to access the same memory under a different
|
|||
//! category. This is ensured to be true by construction during
|
|||
//! frontend translation into CLIF and during legalization.
|
|||
//!
|
|||
//! Given that this non-aliasing property is ensured by the producer
|
|||
//! of CLIF, we can compute a *may-alias* property: one load or store
|
|||
//! may-alias another load or store if both access the same category
|
|||
//! of abstract state.
|
|||
//!
|
|||
//! The "last store" pass helps to compute this aliasing: we perform a
|
|||
//! fixpoint analysis to track the last instruction that *might have*
|
|||
//! written to a given part of abstract state. We also track the block
|
|||
//! containing this store.
|
|||
//!
|
|||
//! We can't say for sure that the "last store" *did* actually write
|
|||
//! that state, but we know for sure that no instruction *later* than
|
|||
//! it (up to the current instruction) did. However, we can get a
|
|||
//! must-alias property from this: if at a given load or store, we
|
|||
//! look backward to the "last store", *AND* we find that it has
|
|||
//! exactly the same address expression and value type, then we know
|
|||
//! that the current instruction's access *must* be to the same memory
|
|||
//! location.
|
|||
//!
|
|||
//! To get this must-alias property, we leverage the node
|
|||
//! hashconsing. We design the Eq/Hash (node identity relation
|
|||
//! definition) of the `Node` struct so that all loads with (i) the
|
|||
//! same "last store", and (ii) the same address expression, and (iii)
|
|||
//! the same opcode-and-offset, will deduplicate (the first will be
|
|||
//! computed, and the later ones will use the same value). Furthermore
|
|||
//! we have an optimization that rewrites a load into the stored value
|
|||
//! of the last store *if* the last store has the same address
|
|||
//! expression and constant offset.
|
|||
//!
|
|||
//! This gives us two optimizations, "redundant load elimination" and
|
|||
//! "store-to-load forwarding".
|
|||
//!
|
|||
//! In theory we could also do *dead-store elimination*, where if a
|
|||
//! store overwrites a value earlier written by another store, *and*
|
|||
//! if no other load/store to the abstract state category occurred,
|
|||
//! *and* no other trapping instruction occurred (at which point we
|
|||
//! need an up-to-date memory state because post-trap-termination
|
|||
//! memory state can be observed), *and* we can prove the original
|
|||
//! store could not have trapped, then we can eliminate the original
|
|||
//! store. Because this is so complex, and the conditions for doing it
|
|||
//! correctly when post-trap state must be correct likely reduce the
|
|||
//! potential benefit, we don't yet do this.
|
|||
|
|||
use crate::flowgraph::ControlFlowGraph; |
|||
use crate::fx::{FxHashMap, FxHashSet}; |
|||
use crate::inst_predicates::has_memory_fence_semantics; |
|||
use crate::ir::{Block, Function, Inst, InstructionData, MemFlags, Opcode}; |
|||
use crate::trace; |
|||
use cranelift_entity::{EntityRef, SecondaryMap}; |
|||
use smallvec::{smallvec, SmallVec}; |
|||
|
|||
/// For a given program point, the vector of last-store instruction
|
|||
/// indices for each disjoint category of abstract state.
|
|||
#[derive(Clone, Copy, Debug, Default, PartialEq, Eq)] |
|||
struct LastStores { |
|||
heap: MemoryState, |
|||
table: MemoryState, |
|||
vmctx: MemoryState, |
|||
other: MemoryState, |
|||
} |
|||
|
|||
/// State of memory seen by a load.
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash, Default)] |
|||
pub enum MemoryState { |
|||
/// State at function entry: nothing is known (but it is one
|
|||
/// consistent value, so two loads from "entry" state at the same
|
|||
/// address will still provide the same result).
|
|||
#[default] |
|||
Entry, |
|||
/// State just after a store by the given instruction. The
|
|||
/// instruction is a store from which we can forward.
|
|||
Store(Inst), |
|||
/// State just before the given instruction. Used for abstract
|
|||
/// value merges at merge-points when we cannot name a single
|
|||
/// producing site.
|
|||
BeforeInst(Inst), |
|||
/// State just after the given instruction. Used when the
|
|||
/// instruction may update the associated state, but is not a
|
|||
/// store whose value we can cleanly forward. (E.g., perhaps a
|
|||
/// barrier of some sort.)
|
|||
AfterInst(Inst), |
|||
} |
|||
|
|||
/// Memory state index, packed into a u32.
|
|||
#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Hash)] |
|||
pub struct PackedMemoryState(u32); |
|||
|
|||
impl From<MemoryState> for PackedMemoryState { |
|||
fn from(state: MemoryState) -> Self { |
|||
match state { |
|||
MemoryState::Entry => Self(0), |
|||
MemoryState::Store(i) => Self(1 | (i.index() as u32) << 2), |
|||
MemoryState::BeforeInst(i) => Self(2 | (i.index() as u32) << 2), |
|||
MemoryState::AfterInst(i) => Self(3 | (i.index() as u32) << 2), |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl PackedMemoryState { |
|||
/// Does this memory state refer to a specific store instruction?
|
|||
pub fn as_store(&self) -> Option<Inst> { |
|||
if self.0 & 3 == 1 { |
|||
Some(Inst::from_bits(self.0 >> 2)) |
|||
} else { |
|||
None |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl LastStores { |
|||
fn update(&mut self, func: &Function, inst: Inst) { |
|||
let opcode = func.dfg[inst].opcode(); |
|||
if has_memory_fence_semantics(opcode) { |
|||
self.heap = MemoryState::AfterInst(inst); |
|||
self.table = MemoryState::AfterInst(inst); |
|||
self.vmctx = MemoryState::AfterInst(inst); |
|||
self.other = MemoryState::AfterInst(inst); |
|||
} else if opcode.can_store() { |
|||
if let Some(memflags) = func.dfg[inst].memflags() { |
|||
*self.for_flags(memflags) = MemoryState::Store(inst); |
|||
} else { |
|||
self.heap = MemoryState::AfterInst(inst); |
|||
self.table = MemoryState::AfterInst(inst); |
|||
self.vmctx = MemoryState::AfterInst(inst); |
|||
self.other = MemoryState::AfterInst(inst); |
|||
} |
|||
} |
|||
} |
|||
|
|||
fn for_flags(&mut self, memflags: MemFlags) -> &mut MemoryState { |
|||
if memflags.heap() { |
|||
&mut self.heap |
|||
} else if memflags.table() { |
|||
&mut self.table |
|||
} else if memflags.vmctx() { |
|||
&mut self.vmctx |
|||
} else { |
|||
&mut self.other |
|||
} |
|||
} |
|||
|
|||
fn meet_from(&mut self, other: &LastStores, loc: Inst) { |
|||
let meet = |a: MemoryState, b: MemoryState| -> MemoryState { |
|||
match (a, b) { |
|||
(a, b) if a == b => a, |
|||
_ => MemoryState::BeforeInst(loc), |
|||
} |
|||
}; |
|||
|
|||
self.heap = meet(self.heap, other.heap); |
|||
self.table = meet(self.table, other.table); |
|||
self.vmctx = meet(self.vmctx, other.vmctx); |
|||
self.other = meet(self.other, other.other); |
|||
} |
|||
} |
|||
|
|||
/// An alias-analysis pass.
|
|||
pub struct AliasAnalysis { |
|||
/// Last-store instruction (or none) for a given load. Use a hash map
|
|||
/// instead of a `SecondaryMap` because this is sparse.
|
|||
load_mem_state: FxHashMap<Inst, PackedMemoryState>, |
|||
} |
|||
|
|||
impl AliasAnalysis { |
|||
/// Perform an alias analysis pass.
|
|||
pub fn new(func: &Function, cfg: &ControlFlowGraph) -> AliasAnalysis { |
|||
log::trace!("alias analysis: input is:\n{:?}", func); |
|||
let block_input = Self::compute_block_input_states(func, cfg); |
|||
let load_mem_state = Self::compute_load_last_stores(func, block_input); |
|||
AliasAnalysis { load_mem_state } |
|||
} |
|||
|
|||
fn compute_block_input_states( |
|||
func: &Function, |
|||
cfg: &ControlFlowGraph, |
|||
) -> SecondaryMap<Block, Option<LastStores>> { |
|||
let mut block_input = SecondaryMap::with_capacity(func.dfg.num_blocks()); |
|||
let mut worklist: SmallVec<[Block; 16]> = smallvec![]; |
|||
let mut worklist_set = FxHashSet::default(); |
|||
let entry = func.layout.entry_block().unwrap(); |
|||
worklist.push(entry); |
|||
worklist_set.insert(entry); |
|||
block_input[entry] = Some(LastStores::default()); |
|||
|
|||
while let Some(block) = worklist.pop() { |
|||
worklist_set.remove(&block); |
|||
let state = block_input[block].clone().unwrap(); |
|||
|
|||
trace!("alias analysis: input to {} is {:?}", block, state); |
|||
|
|||
let state = func |
|||
.layout |
|||
.block_insts(block) |
|||
.fold(state, |mut state, inst| { |
|||
state.update(func, inst); |
|||
trace!("after {}: state is {:?}", inst, state); |
|||
state |
|||
}); |
|||
|
|||
for succ in cfg.succ_iter(block) { |
|||
let succ_first_inst = func.layout.first_inst(succ).unwrap(); |
|||
let succ_state = &mut block_input[succ]; |
|||
let old = succ_state.clone(); |
|||
if let Some(succ_state) = succ_state.as_mut() { |
|||
succ_state.meet_from(&state, succ_first_inst); |
|||
} else { |
|||
*succ_state = Some(state); |
|||
}; |
|||
let updated = *succ_state != old; |
|||
|
|||
if updated && worklist_set.insert(succ) { |
|||
worklist.push(succ); |
|||
} |
|||
} |
|||
} |
|||
|
|||
block_input |
|||
} |
|||
|
|||
fn compute_load_last_stores( |
|||
func: &Function, |
|||
block_input: SecondaryMap<Block, Option<LastStores>>, |
|||
) -> FxHashMap<Inst, PackedMemoryState> { |
|||
let mut load_mem_state = FxHashMap::default(); |
|||
load_mem_state.reserve(func.dfg.num_insts() / 8); |
|||
|
|||
for block in func.layout.blocks() { |
|||
let mut state = block_input[block].clone().unwrap(); |
|||
|
|||
for inst in func.layout.block_insts(block) { |
|||
trace!( |
|||
"alias analysis: scanning at {} with state {:?} ({:?})", |
|||
inst, |
|||
state, |
|||
func.dfg[inst], |
|||
); |
|||
|
|||
// N.B.: we match `Load` specifically, and not any
|
|||
// other kinds of loads (or any opcode such that
|
|||
// `opcode.can_load()` returns true), because some
|
|||
// "can load" instructions actually have very
|
|||
// different semantics (are not just a load of a
|
|||
// particularly-typed value). For example, atomic
|
|||
// (load/store, RMW, CAS) instructions "can load" but
|
|||
// definitely should not participate in store-to-load
|
|||
// forwarding or redundant-load elimination. Our goal
|
|||
// here is to provide a `MemoryState` just for plain
|
|||
// old loads whose semantics we can completely reason
|
|||
// about.
|
|||
if let InstructionData::Load { |
|||
opcode: Opcode::Load, |
|||
flags, |
|||
.. |
|||
} = func.dfg[inst] |
|||
{ |
|||
let mem_state = *state.for_flags(flags); |
|||
trace!( |
|||
"alias analysis: at {}: load with mem_state {:?}", |
|||
inst, |
|||
mem_state, |
|||
); |
|||
|
|||
load_mem_state.insert(inst, mem_state.into()); |
|||
} |
|||
|
|||
state.update(func, inst); |
|||
} |
|||
} |
|||
|
|||
load_mem_state |
|||
} |
|||
|
|||
/// Get the state seen by a load, if any.
|
|||
pub fn get_state_for_load(&self, inst: Inst) -> Option<PackedMemoryState> { |
|||
self.load_mem_state.get(&inst).copied() |
|||
} |
|||
} |
@ -0,0 +1,74 @@ |
|||
//! Simple union-find data structure.
|
|||
|
|||
use crate::trace; |
|||
use cranelift_entity::{packed_option::ReservedValue, EntityRef, SecondaryMap}; |
|||
use std::hash::Hash; |
|||
|
|||
/// A union-find data structure. The data structure can allocate
|
|||
/// `Id`s, indicating eclasses, and can merge eclasses together.
|
|||
#[derive(Clone, Debug, PartialEq)] |
|||
pub struct UnionFind<Idx: EntityRef> { |
|||
parent: SecondaryMap<Idx, Val<Idx>>, |
|||
} |
|||
|
|||
#[derive(Clone, Debug, PartialEq)] |
|||
struct Val<Idx>(Idx); |
|||
impl<Idx: EntityRef + ReservedValue> Default for Val<Idx> { |
|||
fn default() -> Self { |
|||
Self(Idx::reserved_value()) |
|||
} |
|||
} |
|||
|
|||
impl<Idx: EntityRef + Hash + std::fmt::Display + Ord + ReservedValue> UnionFind<Idx> { |
|||
/// Create a new `UnionFind` with the given capacity.
|
|||
pub fn with_capacity(cap: usize) -> Self { |
|||
UnionFind { |
|||
parent: SecondaryMap::with_capacity(cap), |
|||
} |
|||
} |
|||
|
|||
/// Add an `Idx` to the `UnionFind`, with its own equivalence class
|
|||
/// initially. All `Idx`s must be added before being queried or
|
|||
/// unioned.
|
|||
pub fn add(&mut self, id: Idx) { |
|||
debug_assert!(id != Idx::reserved_value()); |
|||
self.parent[id] = Val(id); |
|||
} |
|||
|
|||
/// Find the canonical `Idx` of a given `Idx`.
|
|||
pub fn find(&self, mut node: Idx) -> Idx { |
|||
while node != self.parent[node].0 { |
|||
node = self.parent[node].0; |
|||
} |
|||
node |
|||
} |
|||
|
|||
/// Find the canonical `Idx` of a given `Idx`, updating the data
|
|||
/// structure in the process so that future queries for this `Idx`
|
|||
/// (and others in its chain up to the root of the equivalence
|
|||
/// class) will be faster.
|
|||
pub fn find_and_update(&mut self, mut node: Idx) -> Idx { |
|||
// "Path splitting" mutating find (Tarjan and Van Leeuwen).
|
|||
debug_assert!(node != Idx::reserved_value()); |
|||
while node != self.parent[node].0 { |
|||
let next = self.parent[self.parent[node].0].0; |
|||
debug_assert!(next != Idx::reserved_value()); |
|||
self.parent[node] = Val(next); |
|||
node = next; |
|||
} |
|||
debug_assert!(node != Idx::reserved_value()); |
|||
node |
|||
} |
|||
|
|||
/// Merge the equivalence classes of the two `Idx`s.
|
|||
pub fn union(&mut self, a: Idx, b: Idx) { |
|||
let a = self.find_and_update(a); |
|||
let b = self.find_and_update(b); |
|||
let (a, b) = (std::cmp::min(a, b), std::cmp::max(a, b)); |
|||
if a != b { |
|||
// Always canonicalize toward lower IDs.
|
|||
self.parent[b] = Val(a); |
|||
trace!("union: {}, {}", a, b); |
|||
} |
|||
} |
|||
} |
@ -1,24 +0,0 @@ |
|||
[package] |
|||
authors = ["The Cranelift Project Developers"] |
|||
name = "cranelift-egraph" |
|||
version = "0.92.0" |
|||
description = "acyclic-egraph (aegraph) implementation for Cranelift" |
|||
license = "Apache-2.0 WITH LLVM-exception" |
|||
documentation = "https://docs.rs/cranelift-egraph" |
|||
repository = "https://github.com/bytecodealliance/wasmtime" |
|||
edition = "2021" |
|||
|
|||
[dependencies] |
|||
cranelift-entity = { workspace = true } |
|||
log = { workspace = true } |
|||
smallvec = { workspace = true } |
|||
indexmap = { version = "1.9.1" } |
|||
hashbrown = { version = "0.12.2", features = ["raw"] } |
|||
fxhash = "0.2.1" |
|||
|
|||
[features] |
|||
default = [] |
|||
|
|||
# Enable detailed trace-level debug logging. Excluded by default to |
|||
# omit the dynamic overhead of checking the logging level. |
|||
trace-log = [] |
@ -1,524 +0,0 @@ |
|||
//! Vectors allocated in arenas, with small per-vector overhead.
|
|||
|
|||
use std::marker::PhantomData; |
|||
use std::mem::MaybeUninit; |
|||
use std::ops::Range; |
|||
|
|||
/// A vector of `T` stored within a `BumpArena`.
|
|||
///
|
|||
/// This is something like a normal `Vec`, except that all accesses
|
|||
/// and updates require a separate borrow of the `BumpArena`. This, in
|
|||
/// turn, makes the Vec itself very compact: only three `u32`s (12
|
|||
/// bytes). The `BumpSlice` variant is only two `u32`s (8 bytes) and
|
|||
/// is sufficient to reconstruct a slice, but not grow the vector.
|
|||
///
|
|||
/// The `BumpVec` does *not* implement `Clone` or `Copy`; it
|
|||
/// represents unique ownership of a range of indices in the arena. If
|
|||
/// dropped, those indices will be unavailable until the arena is
|
|||
/// freed. This is "fine" (it is normally how arena allocation
|
|||
/// works). To explicitly free and make available for some
|
|||
/// allocations, a very rudimentary reuse mechanism exists via
|
|||
/// `BumpVec::free(arena)`. (The allocation path opportunistically
|
|||
/// checks the first range on the freelist, and can carve off a piece
|
|||
/// of it if larger than needed, but it does not attempt to traverse
|
|||
/// the entire freelist; this is a compromise between bump-allocation
|
|||
/// speed and memory efficiency, which also influences speed through
|
|||
/// cached-memory reuse.)
|
|||
///
|
|||
/// The type `T` should not have a `Drop` implementation. This
|
|||
/// typically means that it does not own any boxed memory,
|
|||
/// sub-collections, or other resources. This is important for the
|
|||
/// efficiency of the data structure (otherwise, to call `Drop` impls,
|
|||
/// the arena needs to track which indices are live or dead; the
|
|||
/// BumpVec itself cannot do the drop because it does not retain a
|
|||
/// reference to the arena). Note that placing a `T` with a `Drop`
|
|||
/// impl in the arena is still *safe*, because leaking (that is, never
|
|||
/// calling `Drop::drop()`) is safe. It is merely less efficient, and
|
|||
/// so should be avoided if possible.
|
|||
#[derive(Debug)] |
|||
pub struct BumpVec<T> { |
|||
base: u32, |
|||
len: u32, |
|||
cap: u32, |
|||
_phantom: PhantomData<T>, |
|||
} |
|||
|
|||
/// A slice in an arena: like a `BumpVec`, but has a fixed size that
|
|||
/// cannot grow. The size of this struct is one 32-bit word smaller
|
|||
/// than `BumpVec`. It is copyable/cloneable because it will never be
|
|||
/// freed.
|
|||
#[derive(Debug, Clone, Copy)] |
|||
pub struct BumpSlice<T> { |
|||
base: u32, |
|||
len: u32, |
|||
_phantom: PhantomData<T>, |
|||
} |
|||
|
|||
#[derive(Default)] |
|||
pub struct BumpArena<T> { |
|||
vec: Vec<MaybeUninit<T>>, |
|||
freelist: Vec<Range<u32>>, |
|||
} |
|||
|
|||
impl<T> BumpArena<T> { |
|||
/// Create a new arena into which one can allocate `BumpVec`s.
|
|||
pub fn new() -> Self { |
|||
Self { |
|||
vec: vec![], |
|||
freelist: vec![], |
|||
} |
|||
} |
|||
|
|||
/// Create a new arena, pre-allocating space for `cap` total `T`
|
|||
/// elements.
|
|||
pub fn arena_with_capacity(cap: usize) -> Self { |
|||
Self { |
|||
vec: Vec::with_capacity(cap), |
|||
freelist: Vec::with_capacity(cap / 16), |
|||
} |
|||
} |
|||
|
|||
/// Create a new `BumpVec` with the given pre-allocated capacity
|
|||
/// and zero length.
|
|||
pub fn vec_with_capacity(&mut self, cap: usize) -> BumpVec<T> { |
|||
let cap = u32::try_from(cap).unwrap(); |
|||
if let Some(range) = self.maybe_freelist_alloc(cap) { |
|||
BumpVec { |
|||
base: range.start, |
|||
len: 0, |
|||
cap, |
|||
_phantom: PhantomData, |
|||
} |
|||
} else { |
|||
let base = self.vec.len() as u32; |
|||
for _ in 0..cap { |
|||
self.vec.push(MaybeUninit::uninit()); |
|||
} |
|||
BumpVec { |
|||
base, |
|||
len: 0, |
|||
cap, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Create a new `BumpVec` with a single element. The capacity is
|
|||
/// also only one element; growing the vector further will require
|
|||
/// a reallocation.
|
|||
pub fn single(&mut self, t: T) -> BumpVec<T> { |
|||
let mut vec = self.vec_with_capacity(1); |
|||
unsafe { |
|||
self.write_into_index(vec.base, t); |
|||
} |
|||
vec.len = 1; |
|||
vec |
|||
} |
|||
|
|||
/// Create a new `BumpVec` with the sequence from an iterator.
|
|||
pub fn from_iter<I: Iterator<Item = T>>(&mut self, i: I) -> BumpVec<T> { |
|||
let base = self.vec.len() as u32; |
|||
self.vec.extend(i.map(|item| MaybeUninit::new(item))); |
|||
let len = self.vec.len() as u32 - base; |
|||
BumpVec { |
|||
base, |
|||
len, |
|||
cap: len, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
|
|||
/// Append two `BumpVec`s, returning a new one. Consumes both
|
|||
/// vectors. This will use the capacity at the end of `a` if
|
|||
/// possible to move `b`'s elements into place; otherwise it will
|
|||
/// need to allocate new space.
|
|||
pub fn append(&mut self, a: BumpVec<T>, b: BumpVec<T>) -> BumpVec<T> { |
|||
if (a.cap - a.len) >= b.len { |
|||
self.append_into_cap(a, b) |
|||
} else { |
|||
self.append_into_new(a, b) |
|||
} |
|||
} |
|||
|
|||
/// Helper: read the `T` out of a given arena index. After
|
|||
/// reading, that index becomes uninitialized.
|
|||
unsafe fn read_out_of_index(&self, index: u32) -> T { |
|||
// Note that we don't actually *track* uninitialized status
|
|||
// (and this is fine because we will never `Drop` and we never
|
|||
// allow a `BumpVec` to refer to an uninitialized index, so
|
|||
// the bits are effectively dead). We simply read the bits out
|
|||
// and return them.
|
|||
self.vec[index as usize].as_ptr().read() |
|||
} |
|||
|
|||
/// Helper: write a `T` into the given arena index. Index must
|
|||
/// have been uninitialized previously.
|
|||
unsafe fn write_into_index(&mut self, index: u32, t: T) { |
|||
self.vec[index as usize].as_mut_ptr().write(t); |
|||
} |
|||
|
|||
/// Helper: move a `T` from one index to another. Old index
|
|||
/// becomes uninitialized and new index must have previously been
|
|||
/// uninitialized.
|
|||
unsafe fn move_item(&mut self, from: u32, to: u32) { |
|||
let item = self.read_out_of_index(from); |
|||
self.write_into_index(to, item); |
|||
} |
|||
|
|||
/// Helper: push a `T` onto the end of the arena, growing its
|
|||
/// storage. The `T` to push is read out of another index, and
|
|||
/// that index subsequently becomes uninitialized.
|
|||
unsafe fn push_item(&mut self, from: u32) -> u32 { |
|||
let index = self.vec.len() as u32; |
|||
let item = self.read_out_of_index(from); |
|||
self.vec.push(MaybeUninit::new(item)); |
|||
index |
|||
} |
|||
|
|||
/// Helper: append `b` into the capacity at the end of `a`.
|
|||
fn append_into_cap(&mut self, mut a: BumpVec<T>, b: BumpVec<T>) -> BumpVec<T> { |
|||
debug_assert!(a.cap - a.len >= b.len); |
|||
for i in 0..b.len { |
|||
// Safety: initially, the indices in `b` are initialized;
|
|||
// the indices in `a`'s cap, beyond its length, are
|
|||
// uninitialized. We move the initialized contents from
|
|||
// `b` to the tail beyond `a`, and we consume `b` (so it
|
|||
// no longer exists), and we update `a`'s length to cover
|
|||
// the initialized contents in their new location.
|
|||
unsafe { |
|||
self.move_item(b.base + i, a.base + a.len + i); |
|||
} |
|||
} |
|||
a.len += b.len; |
|||
b.free(self); |
|||
a |
|||
} |
|||
|
|||
/// Helper: return a range of indices that are available
|
|||
/// (uninitialized) according to the freelist for `len` elements,
|
|||
/// if possible.
|
|||
fn maybe_freelist_alloc(&mut self, len: u32) -> Option<Range<u32>> { |
|||
if let Some(entry) = self.freelist.last_mut() { |
|||
if entry.len() >= len as usize { |
|||
let base = entry.start; |
|||
entry.start += len; |
|||
if entry.start == entry.end { |
|||
self.freelist.pop(); |
|||
} |
|||
return Some(base..(base + len)); |
|||
} |
|||
} |
|||
None |
|||
} |
|||
|
|||
/// Helper: append `a` and `b` into a completely new allocation.
|
|||
fn append_into_new(&mut self, a: BumpVec<T>, b: BumpVec<T>) -> BumpVec<T> { |
|||
// New capacity: round up to a power of two.
|
|||
let len = a.len + b.len; |
|||
let cap = round_up_power_of_two(len); |
|||
|
|||
if let Some(range) = self.maybe_freelist_alloc(cap) { |
|||
for i in 0..a.len { |
|||
// Safety: the indices in `a` must be initialized. We read
|
|||
// out the item and copy it to a new index; the old index
|
|||
// is no longer covered by a BumpVec, because we consume
|
|||
// `a`.
|
|||
unsafe { |
|||
self.move_item(a.base + i, range.start + i); |
|||
} |
|||
} |
|||
for i in 0..b.len { |
|||
// Safety: the indices in `b` must be initialized. We read
|
|||
// out the item and copy it to a new index; the old index
|
|||
// is no longer covered by a BumpVec, because we consume
|
|||
// `b`.
|
|||
unsafe { |
|||
self.move_item(b.base + i, range.start + a.len + i); |
|||
} |
|||
} |
|||
|
|||
a.free(self); |
|||
b.free(self); |
|||
|
|||
BumpVec { |
|||
base: range.start, |
|||
len, |
|||
cap, |
|||
_phantom: PhantomData, |
|||
} |
|||
} else { |
|||
self.vec.reserve(cap as usize); |
|||
let base = self.vec.len() as u32; |
|||
for i in 0..a.len { |
|||
// Safety: the indices in `a` must be initialized. We read
|
|||
// out the item and copy it to a new index; the old index
|
|||
// is no longer covered by a BumpVec, because we consume
|
|||
// `a`.
|
|||
unsafe { |
|||
self.push_item(a.base + i); |
|||
} |
|||
} |
|||
for i in 0..b.len { |
|||
// Safety: the indices in `b` must be initialized. We read
|
|||
// out the item and copy it to a new index; the old index
|
|||
// is no longer covered by a BumpVec, because we consume
|
|||
// `b`.
|
|||
unsafe { |
|||
self.push_item(b.base + i); |
|||
} |
|||
} |
|||
let len = self.vec.len() as u32 - base; |
|||
|
|||
for _ in len..cap { |
|||
self.vec.push(MaybeUninit::uninit()); |
|||
} |
|||
|
|||
a.free(self); |
|||
b.free(self); |
|||
|
|||
BumpVec { |
|||
base, |
|||
len, |
|||
cap, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Returns the size of the backing `Vec`.
|
|||
pub fn size(&self) -> usize { |
|||
self.vec.len() |
|||
} |
|||
} |
|||
|
|||
fn round_up_power_of_two(x: u32) -> u32 { |
|||
debug_assert!(x > 0); |
|||
debug_assert!(x < 0x8000_0000); |
|||
let log2 = 32 - (x - 1).leading_zeros(); |
|||
1 << log2 |
|||
} |
|||
|
|||
impl<T> BumpVec<T> { |
|||
/// Returns a slice view of this `BumpVec`, given a borrow of the
|
|||
/// arena.
|
|||
pub fn as_slice<'a>(&'a self, arena: &'a BumpArena<T>) -> &'a [T] { |
|||
let maybe_uninit_slice = |
|||
&arena.vec[(self.base as usize)..((self.base + self.len) as usize)]; |
|||
// Safety: the index range we represent must be initialized.
|
|||
unsafe { std::mem::transmute(maybe_uninit_slice) } |
|||
} |
|||
|
|||
/// Returns a mutable slice view of this `BumpVec`, given a
|
|||
/// mutable borrow of the arena.
|
|||
pub fn as_mut_slice<'a>(&'a mut self, arena: &'a mut BumpArena<T>) -> &'a mut [T] { |
|||
let maybe_uninit_slice = |
|||
&mut arena.vec[(self.base as usize)..((self.base + self.len) as usize)]; |
|||
// Safety: the index range we represent must be initialized.
|
|||
unsafe { std::mem::transmute(maybe_uninit_slice) } |
|||
} |
|||
|
|||
/// Returns the length of this vector. Does not require access to
|
|||
/// the arena.
|
|||
pub fn len(&self) -> usize { |
|||
self.len as usize |
|||
} |
|||
|
|||
/// Returns the capacity of this vector. Does not require access
|
|||
/// to the arena.
|
|||
pub fn cap(&self) -> usize { |
|||
self.cap as usize |
|||
} |
|||
|
|||
/// Reserve `extra_len` capacity at the end of the vector,
|
|||
/// reallocating if necessary.
|
|||
pub fn reserve(&mut self, extra_len: usize, arena: &mut BumpArena<T>) { |
|||
let extra_len = u32::try_from(extra_len).unwrap(); |
|||
if self.cap - self.len < extra_len { |
|||
if self.base + self.cap == arena.vec.len() as u32 { |
|||
for _ in 0..extra_len { |
|||
arena.vec.push(MaybeUninit::uninit()); |
|||
} |
|||
self.cap += extra_len; |
|||
} else { |
|||
let new_cap = self.cap + extra_len; |
|||
let new = arena.vec_with_capacity(new_cap as usize); |
|||
unsafe { |
|||
for i in 0..self.len { |
|||
arena.move_item(self.base + i, new.base + i); |
|||
} |
|||
} |
|||
self.base = new.base; |
|||
self.cap = new.cap; |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Push an item, growing the capacity if needed.
|
|||
pub fn push(&mut self, t: T, arena: &mut BumpArena<T>) { |
|||
if self.cap > self.len { |
|||
unsafe { |
|||
arena.write_into_index(self.base + self.len, t); |
|||
} |
|||
self.len += 1; |
|||
} else if (self.base + self.cap) as usize == arena.vec.len() { |
|||
arena.vec.push(MaybeUninit::new(t)); |
|||
self.cap += 1; |
|||
self.len += 1; |
|||
} else { |
|||
let new_cap = round_up_power_of_two(self.cap + 1); |
|||
let extra = new_cap - self.cap; |
|||
self.reserve(extra as usize, arena); |
|||
unsafe { |
|||
arena.write_into_index(self.base + self.len, t); |
|||
} |
|||
self.len += 1; |
|||
} |
|||
} |
|||
|
|||
/// Clone, if `T` is cloneable.
|
|||
pub fn clone(&self, arena: &mut BumpArena<T>) -> BumpVec<T> |
|||
where |
|||
T: Clone, |
|||
{ |
|||
let mut new = arena.vec_with_capacity(self.len as usize); |
|||
for i in 0..self.len { |
|||
let item = self.as_slice(arena)[i as usize].clone(); |
|||
new.push(item, arena); |
|||
} |
|||
new |
|||
} |
|||
|
|||
/// Truncate the length to a smaller-or-equal length.
|
|||
pub fn truncate(&mut self, len: usize) { |
|||
let len = len as u32; |
|||
assert!(len <= self.len); |
|||
self.len = len; |
|||
} |
|||
|
|||
/// Consume the BumpVec and return its indices to a free pool in
|
|||
/// the arena.
|
|||
pub fn free(self, arena: &mut BumpArena<T>) { |
|||
arena.freelist.push(self.base..(self.base + self.cap)); |
|||
} |
|||
|
|||
/// Freeze the capacity of this BumpVec, turning it into a slice,
|
|||
/// for a smaller struct (8 bytes rather than 12). Once this
|
|||
/// exists, it is copyable, because the slice will never be freed.
|
|||
pub fn freeze(self, arena: &mut BumpArena<T>) -> BumpSlice<T> { |
|||
if self.cap > self.len { |
|||
arena |
|||
.freelist |
|||
.push((self.base + self.len)..(self.base + self.cap)); |
|||
} |
|||
BumpSlice { |
|||
base: self.base, |
|||
len: self.len, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl<T> BumpSlice<T> { |
|||
/// Returns a slice view of the `BumpSlice`, given a borrow of the
|
|||
/// arena.
|
|||
pub fn as_slice<'a>(&'a self, arena: &'a BumpArena<T>) -> &'a [T] { |
|||
let maybe_uninit_slice = |
|||
&arena.vec[(self.base as usize)..((self.base + self.len) as usize)]; |
|||
// Safety: the index range we represent must be initialized.
|
|||
unsafe { std::mem::transmute(maybe_uninit_slice) } |
|||
} |
|||
|
|||
/// Returns a mutable slice view of the `BumpSlice`, given a
|
|||
/// mutable borrow of the arena.
|
|||
pub fn as_mut_slice<'a>(&'a mut self, arena: &'a mut BumpArena<T>) -> &'a mut [T] { |
|||
let maybe_uninit_slice = |
|||
&mut arena.vec[(self.base as usize)..((self.base + self.len) as usize)]; |
|||
// Safety: the index range we represent must be initialized.
|
|||
unsafe { std::mem::transmute(maybe_uninit_slice) } |
|||
} |
|||
|
|||
/// Returns the length of the `BumpSlice`.
|
|||
pub fn len(&self) -> usize { |
|||
self.len as usize |
|||
} |
|||
} |
|||
|
|||
impl<T> std::default::Default for BumpVec<T> { |
|||
fn default() -> Self { |
|||
BumpVec { |
|||
base: 0, |
|||
len: 0, |
|||
cap: 0, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl<T> std::default::Default for BumpSlice<T> { |
|||
fn default() -> Self { |
|||
BumpSlice { |
|||
base: 0, |
|||
len: 0, |
|||
_phantom: PhantomData, |
|||
} |
|||
} |
|||
} |
|||
|
|||
#[cfg(test)] |
|||
mod test { |
|||
use super::*; |
|||
|
|||
#[test] |
|||
fn test_round_up() { |
|||
assert_eq!(1, round_up_power_of_two(1)); |
|||
assert_eq!(2, round_up_power_of_two(2)); |
|||
assert_eq!(4, round_up_power_of_two(3)); |
|||
assert_eq!(4, round_up_power_of_two(4)); |
|||
assert_eq!(32, round_up_power_of_two(24)); |
|||
assert_eq!(0x8000_0000, round_up_power_of_two(0x7fff_ffff)); |
|||
} |
|||
|
|||
#[test] |
|||
fn test_basic() { |
|||
let mut arena: BumpArena<u32> = BumpArena::new(); |
|||
|
|||
let a = arena.single(1); |
|||
let b = arena.single(2); |
|||
let c = arena.single(3); |
|||
let ab = arena.append(a, b); |
|||
assert_eq!(ab.as_slice(&arena), &[1, 2]); |
|||
assert_eq!(ab.cap(), 2); |
|||
let abc = arena.append(ab, c); |
|||
assert_eq!(abc.len(), 3); |
|||
assert_eq!(abc.cap(), 4); |
|||
assert_eq!(abc.as_slice(&arena), &[1, 2, 3]); |
|||
assert_eq!(arena.size(), 9); |
|||
let mut d = arena.single(4); |
|||
// Should have reused the freelist.
|
|||
assert_eq!(arena.size(), 9); |
|||
assert_eq!(d.len(), 1); |
|||
assert_eq!(d.cap(), 1); |
|||
assert_eq!(d.as_slice(&arena), &[4]); |
|||
d.as_mut_slice(&mut arena)[0] = 5; |
|||
assert_eq!(d.as_slice(&arena), &[5]); |
|||
abc.free(&mut arena); |
|||
let d2 = d.clone(&mut arena); |
|||
let dd = arena.append(d, d2); |
|||
// Should have reused the freelist.
|
|||
assert_eq!(arena.size(), 9); |
|||
assert_eq!(dd.as_slice(&arena), &[5, 5]); |
|||
let mut e = arena.from_iter([10, 11, 12].into_iter()); |
|||
e.push(13, &mut arena); |
|||
assert_eq!(arena.size(), 13); |
|||
e.reserve(4, &mut arena); |
|||
assert_eq!(arena.size(), 17); |
|||
let _f = arena.from_iter([1, 2, 3, 4, 5, 6, 7, 8].into_iter()); |
|||
assert_eq!(arena.size(), 25); |
|||
e.reserve(8, &mut arena); |
|||
assert_eq!(e.cap(), 16); |
|||
assert_eq!(e.as_slice(&arena), &[10, 11, 12, 13]); |
|||
// `e` must have been copied now that `f` is at the end of the
|
|||
// arena.
|
|||
assert_eq!(arena.size(), 41); |
|||
} |
|||
} |
@ -1,281 +0,0 @@ |
|||
//! A hashmap with "external hashing": nodes are hashed or compared for
|
|||
//! equality only with some external context provided on lookup/insert.
|
|||
//! This allows very memory-efficient data structures where
|
|||
//! node-internal data references some other storage (e.g., offsets into
|
|||
//! an array or pool of shared data).
|
|||
|
|||
use super::unionfind::UnionFind; |
|||
use hashbrown::raw::{Bucket, RawTable}; |
|||
use std::hash::{Hash, Hasher}; |
|||
use std::marker::PhantomData; |
|||
|
|||
/// Trait that allows for equality comparison given some external
|
|||
/// context.
|
|||
///
|
|||
/// Note that this trait is implemented by the *context*, rather than
|
|||
/// the item type, for somewhat complex lifetime reasons (lack of GATs
|
|||
/// to allow `for<'ctx> Ctx<'ctx>`-like associated types in traits on
|
|||
/// the value type).
|
|||
///
|
|||
/// Furthermore, the `ctx_eq` method includes a `UnionFind` parameter,
|
|||
/// because in practice we require this and a borrow to it cannot be
|
|||
/// included in the context type without GATs (similarly to above).
|
|||
pub trait CtxEq<V1: ?Sized, V2: ?Sized> { |
|||
/// Determine whether `a` and `b` are equal, given the context in
|
|||
/// `self` and the union-find data structure `uf`.
|
|||
fn ctx_eq(&self, a: &V1, b: &V2, uf: &mut UnionFind) -> bool; |
|||
} |
|||
|
|||
/// Trait that allows for hashing given some external context.
|
|||
pub trait CtxHash<Value: ?Sized>: CtxEq<Value, Value> { |
|||
/// Compute the hash of `value`, given the context in `self` and
|
|||
/// the union-find data structure `uf`.
|
|||
fn ctx_hash(&self, value: &Value, uf: &mut UnionFind) -> u64; |
|||
} |
|||
|
|||
/// A null-comparator context type for underlying value types that
|
|||
/// already have `Eq` and `Hash`.
|
|||
#[derive(Default)] |
|||
pub struct NullCtx; |
|||
|
|||
impl<V: Eq + Hash> CtxEq<V, V> for NullCtx { |
|||
fn ctx_eq(&self, a: &V, b: &V, _: &mut UnionFind) -> bool { |
|||
a.eq(b) |
|||
} |
|||
} |
|||
impl<V: Eq + Hash> CtxHash<V> for NullCtx { |
|||
fn ctx_hash(&self, value: &V, _: &mut UnionFind) -> u64 { |
|||
let mut state = fxhash::FxHasher::default(); |
|||
value.hash(&mut state); |
|||
state.finish() |
|||
} |
|||
} |
|||
|
|||
/// A bucket in the hash table.
|
|||
///
|
|||
/// Some performance-related design notes: we cache the hashcode for
|
|||
/// speed, as this often buys a few percent speed in
|
|||
/// interning-table-heavy workloads. We only keep the low 32 bits of
|
|||
/// the hashcode, for memory efficiency: in common use, `K` and `V`
|
|||
/// are often 32 bits also, and a 12-byte bucket is measurably better
|
|||
/// than a 16-byte bucket.
|
|||
struct BucketData<K, V> { |
|||
hash: u32, |
|||
k: K, |
|||
v: V, |
|||
} |
|||
|
|||
/// A HashMap that takes external context for all operations.
|
|||
pub struct CtxHashMap<K, V> { |
|||
raw: RawTable<BucketData<K, V>>, |
|||
} |
|||
|
|||
impl<K, V> CtxHashMap<K, V> { |
|||
/// Create an empty hashmap.
|
|||
pub fn new() -> Self { |
|||
Self { |
|||
raw: RawTable::new(), |
|||
} |
|||
} |
|||
|
|||
/// Create an empty hashmap with pre-allocated space for the given
|
|||
/// capacity.
|
|||
pub fn with_capacity(capacity: usize) -> Self { |
|||
Self { |
|||
raw: RawTable::with_capacity(capacity), |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl<K, V> CtxHashMap<K, V> { |
|||
/// Insert a new key-value pair, returning the old value associated
|
|||
/// with this key (if any).
|
|||
pub fn insert<Ctx: CtxEq<K, K> + CtxHash<K>>( |
|||
&mut self, |
|||
k: K, |
|||
v: V, |
|||
ctx: &Ctx, |
|||
uf: &mut UnionFind, |
|||
) -> Option<V> { |
|||
let hash = ctx.ctx_hash(&k, uf) as u32; |
|||
match self.raw.find(hash as u64, |bucket| { |
|||
hash == bucket.hash && ctx.ctx_eq(&bucket.k, &k, uf) |
|||
}) { |
|||
Some(bucket) => { |
|||
let data = unsafe { bucket.as_mut() }; |
|||
Some(std::mem::replace(&mut data.v, v)) |
|||
} |
|||
None => { |
|||
let data = BucketData { hash, k, v }; |
|||
self.raw |
|||
.insert_entry(hash as u64, data, |bucket| bucket.hash as u64); |
|||
None |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// Look up a key, returning a borrow of the value if present.
|
|||
pub fn get<'a, Q, Ctx: CtxEq<K, Q> + CtxHash<Q> + CtxHash<K>>( |
|||
&'a self, |
|||
k: &Q, |
|||
ctx: &Ctx, |
|||
uf: &mut UnionFind, |
|||
) -> Option<&'a V> { |
|||
let hash = ctx.ctx_hash(k, uf) as u32; |
|||
self.raw |
|||
.find(hash as u64, |bucket| { |
|||
hash == bucket.hash && ctx.ctx_eq(&bucket.k, k, uf) |
|||
}) |
|||
.map(|bucket| { |
|||
let data = unsafe { bucket.as_ref() }; |
|||
&data.v |
|||
}) |
|||
} |
|||
|
|||
/// Return an Entry cursor on a given bucket for a key, allowing
|
|||
/// for fetching the current value or inserting a new one.
|
|||
#[inline(always)] |
|||
pub fn entry<'a, Ctx: CtxEq<K, K> + CtxHash<K>>( |
|||
&'a mut self, |
|||
k: K, |
|||
ctx: &'a Ctx, |
|||
uf: &mut UnionFind, |
|||
) -> Entry<'a, K, V> { |
|||
let hash = ctx.ctx_hash(&k, uf) as u32; |
|||
match self.raw.find(hash as u64, |bucket| { |
|||
hash == bucket.hash && ctx.ctx_eq(&bucket.k, &k, uf) |
|||
}) { |
|||
Some(bucket) => Entry::Occupied(OccupiedEntry { |
|||
bucket, |
|||
_phantom: PhantomData, |
|||
}), |
|||
None => Entry::Vacant(VacantEntry { |
|||
raw: &mut self.raw, |
|||
hash, |
|||
key: k, |
|||
}), |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// An entry in the hashmap.
|
|||
pub enum Entry<'a, K: 'a, V> { |
|||
Occupied(OccupiedEntry<'a, K, V>), |
|||
Vacant(VacantEntry<'a, K, V>), |
|||
} |
|||
|
|||
/// An occupied entry.
|
|||
pub struct OccupiedEntry<'a, K, V> { |
|||
bucket: Bucket<BucketData<K, V>>, |
|||
_phantom: PhantomData<&'a ()>, |
|||
} |
|||
|
|||
impl<'a, K: 'a, V> OccupiedEntry<'a, K, V> { |
|||
/// Get the value.
|
|||
pub fn get(&self) -> &'a V { |
|||
let bucket = unsafe { self.bucket.as_ref() }; |
|||
&bucket.v |
|||
} |
|||
} |
|||
|
|||
/// A vacant entry.
|
|||
pub struct VacantEntry<'a, K, V> { |
|||
raw: &'a mut RawTable<BucketData<K, V>>, |
|||
hash: u32, |
|||
key: K, |
|||
} |
|||
|
|||
impl<'a, K, V> VacantEntry<'a, K, V> { |
|||
/// Insert a value.
|
|||
pub fn insert(self, v: V) -> &'a V { |
|||
let bucket = self.raw.insert( |
|||
self.hash as u64, |
|||
BucketData { |
|||
hash: self.hash, |
|||
k: self.key, |
|||
v, |
|||
}, |
|||
|bucket| bucket.hash as u64, |
|||
); |
|||
let data = unsafe { bucket.as_ref() }; |
|||
&data.v |
|||
} |
|||
} |
|||
|
|||
#[cfg(test)] |
|||
mod test { |
|||
use super::*; |
|||
use std::hash::Hash; |
|||
|
|||
#[derive(Clone, Copy, Debug)] |
|||
struct Key { |
|||
index: u32, |
|||
} |
|||
struct Ctx { |
|||
vals: &'static [&'static str], |
|||
} |
|||
impl CtxEq<Key, Key> for Ctx { |
|||
fn ctx_eq(&self, a: &Key, b: &Key, _: &mut UnionFind) -> bool { |
|||
self.vals[a.index as usize].eq(self.vals[b.index as usize]) |
|||
} |
|||
} |
|||
impl CtxHash<Key> for Ctx { |
|||
fn ctx_hash(&self, value: &Key, _: &mut UnionFind) -> u64 { |
|||
let mut state = fxhash::FxHasher::default(); |
|||
self.vals[value.index as usize].hash(&mut state); |
|||
state.finish() |
|||
} |
|||
} |
|||
|
|||
#[test] |
|||
fn test_basic() { |
|||
let ctx = Ctx { |
|||
vals: &["a", "b", "a"], |
|||
}; |
|||
let mut uf = UnionFind::new(); |
|||
|
|||
let k0 = Key { index: 0 }; |
|||
let k1 = Key { index: 1 }; |
|||
let k2 = Key { index: 2 }; |
|||
|
|||
assert!(ctx.ctx_eq(&k0, &k2, &mut uf)); |
|||
assert!(!ctx.ctx_eq(&k0, &k1, &mut uf)); |
|||
assert!(!ctx.ctx_eq(&k2, &k1, &mut uf)); |
|||
|
|||
let mut map: CtxHashMap<Key, u64> = CtxHashMap::new(); |
|||
assert_eq!(map.insert(k0, 42, &ctx, &mut uf), None); |
|||
assert_eq!(map.insert(k2, 84, &ctx, &mut uf), Some(42)); |
|||
assert_eq!(map.get(&k1, &ctx, &mut uf), None); |
|||
assert_eq!(*map.get(&k0, &ctx, &mut uf).unwrap(), 84); |
|||
} |
|||
|
|||
#[test] |
|||
fn test_entry() { |
|||
let mut ctx = Ctx { |
|||
vals: &["a", "b", "a"], |
|||
}; |
|||
let mut uf = UnionFind::new(); |
|||
|
|||
let k0 = Key { index: 0 }; |
|||
let k1 = Key { index: 1 }; |
|||
let k2 = Key { index: 2 }; |
|||
|
|||
let mut map: CtxHashMap<Key, u64> = CtxHashMap::new(); |
|||
match map.entry(k0, &mut ctx, &mut uf) { |
|||
Entry::Vacant(v) => { |
|||
v.insert(1); |
|||
} |
|||
_ => panic!(), |
|||
} |
|||
match map.entry(k1, &mut ctx, &mut uf) { |
|||
Entry::Vacant(_) => {} |
|||
Entry::Occupied(_) => panic!(), |
|||
} |
|||
match map.entry(k2, &mut ctx, &mut uf) { |
|||
Entry::Occupied(o) => { |
|||
assert_eq!(*o.get(), 1); |
|||
} |
|||
_ => panic!(), |
|||
} |
|||
} |
|||
} |
@ -1,666 +0,0 @@ |
|||
//! # ægraph (aegraph, or acyclic e-graph) implementation.
|
|||
//!
|
|||
//! An aegraph is a form of e-graph. We will first describe the
|
|||
//! e-graph, then the aegraph as a slightly less powerful but highly
|
|||
//! optimized variant of it.
|
|||
//!
|
|||
//! The main goal of this library is to be explicitly memory-efficient
|
|||
//! and light on allocations. We need to be as fast and as small as
|
|||
//! possible in order to minimize impact on compile time in a
|
|||
//! production compiler.
|
|||
//!
|
|||
//! ## The e-graph
|
|||
//!
|
|||
//! An e-graph, or equivalence graph, is a kind of node-based
|
|||
//! intermediate representation (IR) data structure that consists of
|
|||
//! *eclasses* and *enodes*. An eclass contains one or more enodes;
|
|||
//! semantically an eclass is like a value, and an enode is one way to
|
|||
//! compute that value. If several enodes are in one eclass, the data
|
|||
//! structure is asserting that any of these enodes, if evaluated,
|
|||
//! would produce the value.
|
|||
//!
|
|||
//! An e-graph also contains a deduplicating hash-map of nodes, so if
|
|||
//! the user creates the same e-node more than once, they get the same
|
|||
//! e-class ID.
|
|||
//!
|
|||
//! In the usual use-case, an e-graph is used to build a sea-of-nodes
|
|||
//! IR for a function body or other expression-based code, and then
|
|||
//! *rewrite rules* are applied to the e-graph. Each rewrite
|
|||
//! potentially introduces a new e-node that is equivalent to an
|
|||
//! existing e-node, and then unions the two e-nodes' classes
|
|||
//! together.
|
|||
//!
|
|||
//! In the trivial case this results in an e-class containing a series
|
|||
//! of e-nodes that are newly added -- all known forms of an
|
|||
//! expression -- but Note how if a rewrite rule rewrites into an
|
|||
//! existing e-node (discovered via deduplication), rewriting can
|
|||
//! result in unioning of two e-classes that have existed for some
|
|||
//! time.
|
|||
//!
|
|||
//! An e-graph's enodes refer to *classes* for their arguments, rather
|
|||
//! than other nodes directly. This is key to the ability of an
|
|||
//! e-graph to canonicalize: when two e-classes that are already used
|
|||
//! as arguments by other e-nodes are unioned, all e-nodes that refer
|
|||
//! to those e-classes are themselves re-canonicalized. This can
|
|||
//! result in "cascading" unioning of eclasses, in a process that
|
|||
//! discovers the transitive implications of all individual
|
|||
//! equalities. This process is known as "equality saturation".
|
|||
//!
|
|||
//! ## The acyclic e-graph (aegraph)
|
|||
//!
|
|||
//! An e-graph is powerful, but it can also be expensive to build and
|
|||
//! saturate: there are often many different forms an expression can
|
|||
//! take (because many different rewrites are possible), and cascading
|
|||
//! canonicalization requires heavyweight data structure bookkeeping
|
|||
//! that is expensive to maintain.
|
|||
//!
|
|||
//! This crate introduces the aegraph: an acyclic e-graph. This data
|
|||
//! structure stores an e-class as an *immutable persistent data
|
|||
//! structure*. An id can refer to some *level* of an eclass: a
|
|||
//! snapshot of the nodes in the eclass at one point in time. The
|
|||
//! nodes referred to by this id never change, though the eclass may
|
|||
//! grow later.
|
|||
//!
|
|||
//! A *union* is also an operation that creates a new eclass id: the
|
|||
//! original eclass IDs refer to the original eclass contents, while
|
|||
//! the id resulting from the `union()` operation refers to an eclass
|
|||
//! that has all nodes.
|
|||
//!
|
|||
//! In order to allow for adequate canonicalization, an enode normally
|
|||
//! stores the *latest* eclass id for each argument, but computes
|
|||
//! hashes and equality using a *canonical* eclass id. We define such
|
|||
//! a canonical id with a union-find data structure, just as for a
|
|||
//! traditional e-graph. It is normally the lowest id referring to
|
|||
//! part of the eclass.
|
|||
//!
|
|||
//! The persistent/immutable nature of this data structure yields one
|
|||
//! extremely important property: it is acyclic! This simplifies
|
|||
//! operation greatly:
|
|||
//!
|
|||
//! - When "elaborating" out of the e-graph back to linearized code,
|
|||
//! so that we can generate machine code, we do not need to break
|
|||
//! cycles. A given enode cannot indirectly refer back to itself.
|
|||
//!
|
|||
//! - When applying rewrite rules, the nodes visible from a given id
|
|||
//! for an eclass never change. This means that we only need to
|
|||
//! apply rewrite rules at that node id *once*.
|
|||
//!
|
|||
//! ## Data Structure and Example
|
|||
//!
|
|||
//! Each eclass id refers to a table entry ("eclass node", which is
|
|||
//! different than an "enode") that can be one of:
|
|||
//!
|
|||
//! - A single enode;
|
|||
//! - An enode and an earlier eclass id it is appended to (a "child"
|
|||
//! eclass node);
|
|||
//! - A "union node" with two earlier eclass ids.
|
|||
//!
|
|||
//! Building the aegraph consists solely of adding new entries to the
|
|||
//! end of this table of eclass nodes. An enode referenced from any
|
|||
//! given eclass node can only refer to earlier eclass ids.
|
|||
//!
|
|||
//! For example, consider the following eclass table:
|
|||
//!
|
|||
//! ```plain
|
|||
//!
|
|||
//! eclass/enode table
|
|||
//!
|
|||
//! eclass1 iconst(1)
|
|||
//! eclass2 blockparam(block0, 0)
|
|||
//! eclass3 iadd(eclass1, eclass2)
|
|||
//! ```
|
|||
//!
|
|||
//! This represents the expression `iadd(blockparam(block0, 0),
|
|||
//! iconst(1))` (as the sole enode for eclass3).
|
|||
//!
|
|||
//! Now, say that as we further build the function body, we add
|
|||
//! another enode `iadd(eclass3, iconst(1))`. The `iconst(1)` will be
|
|||
//! deduplicated to `eclass1`, and the toplevel `iadd` will become its
|
|||
//! own new eclass (`eclass4`).
|
|||
//!
|
|||
//! ```plain
|
|||
//! eclass4 iadd(eclass3, eclass1)
|
|||
//! ```
|
|||
//!
|
|||
//! Now we apply our body of rewrite rules, and these results can
|
|||
//! combine `x + 1 + 1` into `x + 2`; so we get:
|
|||
//!
|
|||
//! ```plain
|
|||
//! eclass5 iconst(2)
|
|||
//! eclass6 union(iadd(eclass2, eclass5), eclass4)
|
|||
//! ```
|
|||
//!
|
|||
//! Note that we added the nodes for the new expression, and then we
|
|||
//! union'd it with the earlier `eclass4`. Logically this represents a
|
|||
//! single eclass that contains two nodes -- the `x + 1 + 1` and `x +
|
|||
//! 2` representations -- and the *latest* id for the eclass,
|
|||
//! `eclass6`, can reach all nodes in the eclass (here the node stored
|
|||
//! in `eclass6` and the earlier one in `elcass4`).
|
|||
//!
|
|||
//! ## aegraph vs. egraph
|
|||
//!
|
|||
//! Where does an aegraph fall short of an e-graph -- or in other
|
|||
//! words, why maintain the data structures to allow for full
|
|||
//! (re)canonicalization at all, with e.g. parent pointers to
|
|||
//! recursively update parents?
|
|||
//!
|
|||
//! This question deserves further study, but right now, it appears
|
|||
//! that the difference is limited to a case like the following:
|
|||
//!
|
|||
//! - expression E1 is interned into the aegraph.
|
|||
//! - expression E2 is interned into the aegraph. It uses E1 as an
|
|||
//! argument to one or more operators, and so refers to the
|
|||
//! (currently) latest id for E1.
|
|||
//! - expression E3 is interned into the aegraph. A rewrite rule fires
|
|||
//! that unions E3 with E1.
|
|||
//!
|
|||
//! In an e-graph, the last action would trigger a re-canonicalization
|
|||
//! of all "parents" (users) of E1; so E2 would be re-canonicalized
|
|||
//! using an id that represents the union of E1 and E3. At
|
|||
//! code-generation time, E2 could choose to use a value computed by
|
|||
//! either E1's or E3's operator. In an aegraph, this is not the case:
|
|||
//! E2's e-class and e-nodes are immutable once created, so E2 refers
|
|||
//! only to E1's representation of the value (a "slice" of the whole
|
|||
//! e-class).
|
|||
//!
|
|||
//! While at first this sounds quite limiting, there actually appears
|
|||
//! to be a nice mutually-beneficial interaction with the immediate
|
|||
//! application of rewrite rules: by applying all rewrites we know
|
|||
//! about right when E1 is interned, E2 can refer to the best version
|
|||
//! when it is created. The above scenario only leads to a missed
|
|||
//! optimization if:
|
|||
//!
|
|||
//! - a rewrite rule exists from E3 to E1, but not E1 to E3; and
|
|||
//! - E3 is *cheaper* than E1.
|
|||
//!
|
|||
//! Or in other words, this only matters if there is a rewrite rule
|
|||
//! that rewrites into a more expensive direction. This is unlikely
|
|||
//! for the sorts of rewrite rules we plan to write; it may matter
|
|||
//! more if many possible equalities are expressed, such as
|
|||
//! associativity, commutativity, etc.
|
|||
//!
|
|||
//! Note that the above represents the best of our understanding, but
|
|||
//! there may be cases we have missed; a more complete examination of
|
|||
//! this question would involve building a full equality saturation
|
|||
//! loop on top of the (a)egraph in this crate, and testing with many
|
|||
//! benchmarks to see if it makes any difference.
|
|||
//!
|
|||
//! ## Rewrite Rules (FLAX: Fast Localized Aegraph eXpansion)
|
|||
//!
|
|||
//! The most common use of an e-graph or aegraph is to serve as the IR
|
|||
//! for a compiler. In this use-case, we usually wish to transform the
|
|||
//! program using a body of rewrite rules that represent valid
|
|||
//! transformations (equivalent and hopefully simpler ways of
|
|||
//! computing results). An aegraph supports applying rules in a fairly
|
|||
//! straightforward way: whenever a new eclass entry is added to the
|
|||
//! table, we invoke a toplevel "apply all rewrite rules" entry
|
|||
//! point. This entry point creates new nodes as needed, and when
|
|||
//! done, unions the rewritten nodes with the original. We thus
|
|||
//! *immediately* expand a new value into all of its representations.
|
|||
//!
|
|||
//! This immediate expansion stands in contrast to a traditional
|
|||
//! "equality saturation" e-egraph system, in which it is usually best
|
|||
//! to apply rules in batches and then fix up the
|
|||
//! canonicalization. This approach was introduced in the `egg`
|
|||
//! e-graph engine [^1]. We call our system FLAX (because flax is an
|
|||
//! alternative to egg): Fast Localized Aegraph eXpansion.
|
|||
//!
|
|||
//! The reason that this is possible in an aegraph but not
|
|||
//! (efficiently, at least) in a traditional e-graph is that the data
|
|||
//! structure nodes are immutable once created: an eclass id will
|
|||
//! always refer to a fixed set of enodes. There is no
|
|||
//! recanonicalizing of eclass arguments as they union; but also this
|
|||
//! is not usually necessary, because args will have already been
|
|||
//! processed and eagerly rewritten as well. In other words, eager
|
|||
//! rewriting and the immutable data structure mutually allow each
|
|||
//! other to be practical; both work together.
|
|||
//!
|
|||
//! [^1]: M Willsey, C Nandi, Y R Wang, O Flatt, Z Tatlock, P
|
|||
//! Panchekha. "egg: Fast and Flexible Equality Saturation." In
|
|||
//! POPL 2021. <https://dl.acm.org/doi/10.1145/3434304>
|
|||
|
|||
use cranelift_entity::PrimaryMap; |
|||
use cranelift_entity::{entity_impl, packed_option::ReservedValue, SecondaryMap}; |
|||
use smallvec::{smallvec, SmallVec}; |
|||
use std::fmt::Debug; |
|||
use std::hash::Hash; |
|||
use std::marker::PhantomData; |
|||
|
|||
mod bumpvec; |
|||
mod ctxhash; |
|||
mod unionfind; |
|||
|
|||
pub use bumpvec::{BumpArena, BumpSlice, BumpVec}; |
|||
pub use ctxhash::{CtxEq, CtxHash, CtxHashMap, Entry}; |
|||
pub use unionfind::UnionFind; |
|||
|
|||
/// An eclass ID.
|
|||
#[derive(Copy, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] |
|||
pub struct Id(u32); |
|||
entity_impl!(Id, "eclass"); |
|||
|
|||
impl Id { |
|||
pub fn invalid() -> Id { |
|||
Self::reserved_value() |
|||
} |
|||
} |
|||
impl std::default::Default for Id { |
|||
fn default() -> Self { |
|||
Self::invalid() |
|||
} |
|||
} |
|||
|
|||
/// A trait implemented by all "languages" (types that can be enodes).
|
|||
pub trait Language: CtxEq<Self::Node, Self::Node> + CtxHash<Self::Node> { |
|||
type Node: Debug; |
|||
fn children<'a>(&'a self, node: &'a Self::Node) -> &'a [Id]; |
|||
fn children_mut<'a>(&'a mut self, ctx: &'a mut Self::Node) -> &'a mut [Id]; |
|||
fn needs_dedup(&self, node: &Self::Node) -> bool; |
|||
} |
|||
|
|||
/// A trait that allows the aegraph to compute a property of each
|
|||
/// node as it is created.
|
|||
pub trait Analysis { |
|||
type L: Language; |
|||
type Value: Clone + Default; |
|||
fn for_node( |
|||
&self, |
|||
ctx: &Self::L, |
|||
n: &<Self::L as Language>::Node, |
|||
values: &SecondaryMap<Id, Self::Value>, |
|||
) -> Self::Value; |
|||
fn meet(&self, ctx: &Self::L, v1: &Self::Value, v2: &Self::Value) -> Self::Value; |
|||
} |
|||
|
|||
/// Conditionally-compiled trace-log macro. (Borrowed from
|
|||
/// `cranelift-codegen`; it's not worth factoring out a common
|
|||
/// subcrate for this.)
|
|||
#[macro_export] |
|||
macro_rules! trace { |
|||
($($tt:tt)*) => { |
|||
if cfg!(feature = "trace-log") { |
|||
::log::trace!($($tt)*); |
|||
} |
|||
}; |
|||
} |
|||
|
|||
/// An egraph.
|
|||
pub struct EGraph<L: Language, A: Analysis<L = L>> { |
|||
/// Node-allocation arena.
|
|||
pub nodes: Vec<L::Node>, |
|||
/// Hash-consing map from Nodes to eclass IDs.
|
|||
node_map: CtxHashMap<NodeKey, Id>, |
|||
/// Eclass definitions. Each eclass consists of an enode, and
|
|||
/// child pointer to the rest of the eclass.
|
|||
pub classes: PrimaryMap<Id, EClass>, |
|||
/// Union-find for canonical ID generation. This lets us name an
|
|||
/// eclass with a canonical ID that is the same for all
|
|||
/// generations of the class.
|
|||
pub unionfind: UnionFind, |
|||
/// Analysis and per-node state.
|
|||
pub analysis: Option<(A, SecondaryMap<Id, A::Value>)>, |
|||
} |
|||
|
|||
/// A reference to a node.
|
|||
#[derive(Clone, Copy, Debug)] |
|||
pub struct NodeKey { |
|||
index: u32, |
|||
} |
|||
|
|||
impl NodeKey { |
|||
fn from_node_idx(node_idx: usize) -> NodeKey { |
|||
NodeKey { |
|||
index: u32::try_from(node_idx).unwrap(), |
|||
} |
|||
} |
|||
|
|||
/// Get the node for this NodeKey, given the `nodes` from the
|
|||
/// appropriate `EGraph`.
|
|||
pub fn node<'a, N>(&self, nodes: &'a [N]) -> &'a N { |
|||
&nodes[self.index as usize] |
|||
} |
|||
|
|||
fn bits(self) -> u32 { |
|||
self.index |
|||
} |
|||
|
|||
fn from_bits(bits: u32) -> Self { |
|||
NodeKey { index: bits } |
|||
} |
|||
} |
|||
|
|||
struct NodeKeyCtx<'a, 'b, L: Language> { |
|||
nodes: &'a [L::Node], |
|||
node_ctx: &'b L, |
|||
} |
|||
|
|||
impl<'a, 'b, L: Language> CtxEq<NodeKey, NodeKey> for NodeKeyCtx<'a, 'b, L> { |
|||
fn ctx_eq(&self, a: &NodeKey, b: &NodeKey, uf: &mut UnionFind) -> bool { |
|||
let a = a.node(self.nodes); |
|||
let b = b.node(self.nodes); |
|||
self.node_ctx.ctx_eq(a, b, uf) |
|||
} |
|||
} |
|||
|
|||
impl<'a, 'b, L: Language> CtxHash<NodeKey> for NodeKeyCtx<'a, 'b, L> { |
|||
fn ctx_hash(&self, value: &NodeKey, uf: &mut UnionFind) -> u64 { |
|||
self.node_ctx.ctx_hash(value.node(self.nodes), uf) |
|||
} |
|||
} |
|||
|
|||
/// An EClass entry. Contains either a single new enode and a child
|
|||
/// eclass (i.e., adds one new enode), or unions two child eclasses
|
|||
/// together.
|
|||
#[derive(Debug, Clone, Copy)] |
|||
pub struct EClass { |
|||
// formats:
|
|||
//
|
|||
// 00 | unused (31 bits) | NodeKey (31 bits)
|
|||
// 01 | eclass_child (31 bits) | NodeKey (31 bits)
|
|||
// 10 | eclass_child_1 (31 bits) | eclass_child_id_2 (31 bits)
|
|||
bits: u64, |
|||
} |
|||
|
|||
impl EClass { |
|||
fn node(node: NodeKey) -> EClass { |
|||
let node_idx = node.bits() as u64; |
|||
debug_assert!(node_idx < (1 << 31)); |
|||
EClass { |
|||
bits: (0b00 << 62) | node_idx, |
|||
} |
|||
} |
|||
|
|||
fn node_and_child(node: NodeKey, eclass_child: Id) -> EClass { |
|||
let node_idx = node.bits() as u64; |
|||
debug_assert!(node_idx < (1 << 31)); |
|||
debug_assert!(eclass_child != Id::invalid()); |
|||
let child = eclass_child.0 as u64; |
|||
debug_assert!(child < (1 << 31)); |
|||
EClass { |
|||
bits: (0b01 << 62) | (child << 31) | node_idx, |
|||
} |
|||
} |
|||
|
|||
fn union(child1: Id, child2: Id) -> EClass { |
|||
debug_assert!(child1 != Id::invalid()); |
|||
let child1 = child1.0 as u64; |
|||
debug_assert!(child1 < (1 << 31)); |
|||
|
|||
debug_assert!(child2 != Id::invalid()); |
|||
let child2 = child2.0 as u64; |
|||
debug_assert!(child2 < (1 << 31)); |
|||
|
|||
EClass { |
|||
bits: (0b10 << 62) | (child1 << 31) | child2, |
|||
} |
|||
} |
|||
|
|||
/// Get the node, if any, from a node-only or node-and-child
|
|||
/// eclass.
|
|||
pub fn get_node(&self) -> Option<NodeKey> { |
|||
self.as_node() |
|||
.or_else(|| self.as_node_and_child().map(|(node, _)| node)) |
|||
} |
|||
|
|||
/// Get the first child, if any.
|
|||
pub fn child1(&self) -> Option<Id> { |
|||
self.as_node_and_child() |
|||
.map(|(_, p1)| p1) |
|||
.or(self.as_union().map(|(p1, _)| p1)) |
|||
} |
|||
|
|||
/// Get the second child, if any.
|
|||
pub fn child2(&self) -> Option<Id> { |
|||
self.as_union().map(|(_, p2)| p2) |
|||
} |
|||
|
|||
/// If this EClass is just a lone enode, return it.
|
|||
pub fn as_node(&self) -> Option<NodeKey> { |
|||
if (self.bits >> 62) == 0b00 { |
|||
let node_idx = (self.bits & ((1 << 31) - 1)) as u32; |
|||
Some(NodeKey::from_bits(node_idx)) |
|||
} else { |
|||
None |
|||
} |
|||
} |
|||
|
|||
/// If this EClass is one new enode and a child, return the node
|
|||
/// and child ID.
|
|||
pub fn as_node_and_child(&self) -> Option<(NodeKey, Id)> { |
|||
if (self.bits >> 62) == 0b01 { |
|||
let node_idx = (self.bits & ((1 << 31) - 1)) as u32; |
|||
let child = ((self.bits >> 31) & ((1 << 31) - 1)) as u32; |
|||
Some((NodeKey::from_bits(node_idx), Id::from_bits(child))) |
|||
} else { |
|||
None |
|||
} |
|||
} |
|||
|
|||
/// If this EClass is the union variety, return the two child
|
|||
/// EClasses. Both are guaranteed not to be `Id::invalid()`.
|
|||
pub fn as_union(&self) -> Option<(Id, Id)> { |
|||
if (self.bits >> 62) == 0b10 { |
|||
let child1 = ((self.bits >> 31) & ((1 << 31) - 1)) as u32; |
|||
let child2 = (self.bits & ((1 << 31) - 1)) as u32; |
|||
Some((Id::from_bits(child1), Id::from_bits(child2))) |
|||
} else { |
|||
None |
|||
} |
|||
} |
|||
} |
|||
|
|||
/// A new or existing `T` when adding to a deduplicated set or data
|
|||
/// structure, like an egraph.
|
|||
#[derive(Clone, Copy, Debug)] |
|||
pub enum NewOrExisting<T> { |
|||
New(T), |
|||
Existing(T), |
|||
} |
|||
|
|||
impl<T> NewOrExisting<T> { |
|||
/// Get the underlying value.
|
|||
pub fn get(self) -> T { |
|||
match self { |
|||
NewOrExisting::New(t) => t, |
|||
NewOrExisting::Existing(t) => t, |
|||
} |
|||
} |
|||
} |
|||
|
|||
impl<L: Language, A: Analysis<L = L>> EGraph<L, A> |
|||
where |
|||
L::Node: 'static, |
|||
{ |
|||
/// Create a new aegraph.
|
|||
pub fn new(analysis: Option<A>) -> Self { |
|||
let analysis = analysis.map(|a| (a, SecondaryMap::new())); |
|||
Self { |
|||
nodes: vec![], |
|||
node_map: CtxHashMap::new(), |
|||
classes: PrimaryMap::new(), |
|||
unionfind: UnionFind::new(), |
|||
analysis, |
|||
} |
|||
} |
|||
|
|||
/// Create a new aegraph with the given capacity.
|
|||
pub fn with_capacity(nodes: usize, analysis: Option<A>) -> Self { |
|||
let analysis = analysis.map(|a| (a, SecondaryMap::with_capacity(nodes))); |
|||
Self { |
|||
nodes: Vec::with_capacity(nodes), |
|||
node_map: CtxHashMap::with_capacity(nodes), |
|||
classes: PrimaryMap::with_capacity(nodes), |
|||
unionfind: UnionFind::with_capacity(nodes), |
|||
analysis, |
|||
} |
|||
} |
|||
|
|||
/// Add a new node.
|
|||
pub fn add(&mut self, node: L::Node, node_ctx: &L) -> NewOrExisting<Id> { |
|||
// Push the node. We can then build a NodeKey that refers to
|
|||
// it and look for an existing interned copy. If one exists,
|
|||
// we can pop the pushed node and return the existing Id.
|
|||
let node_idx = self.nodes.len(); |
|||
trace!("adding node: {:?}", node); |
|||
let needs_dedup = node_ctx.needs_dedup(&node); |
|||
self.nodes.push(node); |
|||
|
|||
let key = NodeKey::from_node_idx(node_idx); |
|||
if needs_dedup { |
|||
let ctx = NodeKeyCtx { |
|||
nodes: &self.nodes[..], |
|||
node_ctx, |
|||
}; |
|||
|
|||
match self.node_map.entry(key, &ctx, &mut self.unionfind) { |
|||
Entry::Occupied(o) => { |
|||
let eclass_id = *o.get(); |
|||
self.nodes.pop(); |
|||
trace!(" -> existing id {}", eclass_id); |
|||
NewOrExisting::Existing(eclass_id) |
|||
} |
|||
Entry::Vacant(v) => { |
|||
// We're creating a new eclass now.
|
|||
let eclass_id = self.classes.push(EClass::node(key)); |
|||
trace!(" -> new node and eclass: {}", eclass_id); |
|||
self.unionfind.add(eclass_id); |
|||
|
|||
// Add to interning map with a NodeKey referring to the eclass.
|
|||
v.insert(eclass_id); |
|||
|
|||
// Update analysis.
|
|||
let node_ctx = ctx.node_ctx; |
|||
self.update_analysis_new(node_ctx, eclass_id, key); |
|||
|
|||
NewOrExisting::New(eclass_id) |
|||
} |
|||
} |
|||
} else { |
|||
let eclass_id = self.classes.push(EClass::node(key)); |
|||
self.unionfind.add(eclass_id); |
|||
NewOrExisting::New(eclass_id) |
|||
} |
|||
} |
|||
|
|||
/// Merge one eclass into another, maintaining the acyclic
|
|||
/// property (args must have lower eclass Ids than the eclass
|
|||
/// containing the node with those args). Returns the Id of the
|
|||
/// merged eclass.
|
|||
pub fn union(&mut self, ctx: &L, a: Id, b: Id) -> Id { |
|||
assert_ne!(a, Id::invalid()); |
|||
assert_ne!(b, Id::invalid()); |
|||
let (a, b) = (std::cmp::max(a, b), std::cmp::min(a, b)); |
|||
trace!("union: id {} and id {}", a, b); |
|||
if a == b { |
|||
trace!(" -> no-op"); |
|||
return a; |
|||
} |
|||
|
|||
self.unionfind.union(a, b); |
|||
|
|||
// If the younger eclass has no child, we can link it
|
|||
// directly and return that eclass. Otherwise, we create a new
|
|||
// union eclass.
|
|||
if let Some(node) = self.classes[a].as_node() { |
|||
trace!( |
|||
" -> id {} is one-node eclass; making into node-and-child with id {}", |
|||
a, |
|||
b |
|||
); |
|||
self.classes[a] = EClass::node_and_child(node, b); |
|||
self.update_analysis_union(ctx, a, a, b); |
|||
return a; |
|||
} |
|||
|
|||
let u = self.classes.push(EClass::union(a, b)); |
|||
self.unionfind.add(u); |
|||
self.unionfind.union(u, b); |
|||
trace!(" -> union id {} and id {} into id {}", a, b, u); |
|||
self.update_analysis_union(ctx, u, a, b); |
|||
u |
|||
} |
|||
|
|||
/// Get the canonical ID for an eclass. This may be an older
|
|||
/// generation, so will not be able to see all enodes in the
|
|||
/// eclass; but it will allow us to unambiguously refer to an
|
|||
/// eclass, even across merging.
|
|||
pub fn canonical_id_mut(&mut self, eclass: Id) -> Id { |
|||
self.unionfind.find_and_update(eclass) |
|||
} |
|||
|
|||
/// Get the canonical ID for an eclass. This may be an older
|
|||
/// generation, so will not be able to see all enodes in the
|
|||
/// eclass; but it will allow us to unambiguously refer to an
|
|||
/// eclass, even across merging.
|
|||
pub fn canonical_id(&self, eclass: Id) -> Id { |
|||
self.unionfind.find(eclass) |
|||
} |
|||
|
|||
/// Get the enodes for a given eclass.
|
|||
pub fn enodes(&self, eclass: Id) -> NodeIter<L, A> { |
|||
NodeIter { |
|||
stack: smallvec![eclass], |
|||
_phantom1: PhantomData, |
|||
_phantom2: PhantomData, |
|||
} |
|||
} |
|||
|
|||
/// Update analysis for a given eclass node (new-enode case).
|
|||
fn update_analysis_new(&mut self, ctx: &L, eclass: Id, node: NodeKey) { |
|||
if let Some((analysis, state)) = self.analysis.as_mut() { |
|||
let node = node.node(&self.nodes); |
|||
state[eclass] = analysis.for_node(ctx, node, state); |
|||
} |
|||
} |
|||
|
|||
/// Update analysis for a given eclass node (union case).
|
|||
fn update_analysis_union(&mut self, ctx: &L, eclass: Id, a: Id, b: Id) { |
|||
if let Some((analysis, state)) = self.analysis.as_mut() { |
|||
let a = &state[a]; |
|||
let b = &state[b]; |
|||
state[eclass] = analysis.meet(ctx, a, b); |
|||
} |
|||
} |
|||
|
|||
/// Get the analysis value for a given eclass. Panics if no analysis is present.
|
|||
pub fn analysis_value(&self, eclass: Id) -> &A::Value { |
|||
&self.analysis.as_ref().unwrap().1[eclass] |
|||
} |
|||
} |
|||
|
|||
/// An iterator over all nodes in an eclass.
|
|||
///
|
|||
/// Because eclasses are immutable once created, this does *not* need
|
|||
/// to hold an open borrow on the egraph; it is free to add new nodes,
|
|||
/// while our existing Ids will remain valid.
|
|||
pub struct NodeIter<L: Language, A: Analysis<L = L>> { |
|||
stack: SmallVec<[Id; 8]>, |
|||
_phantom1: PhantomData<L>, |
|||
_phantom2: PhantomData<A>, |
|||
} |
|||
|
|||
impl<L: Language, A: Analysis<L = L>> NodeIter<L, A> { |
|||
#[inline(always)] |
|||
pub fn next<'a>(&mut self, egraph: &'a EGraph<L, A>) -> Option<&'a L::Node> { |
|||
while let Some(next) = self.stack.pop() { |
|||
let eclass = egraph.classes[next]; |
|||
if let Some(node) = eclass.as_node() { |
|||
return Some(&egraph.nodes[node.index as usize]); |
|||
} else if let Some((node, child)) = eclass.as_node_and_child() { |
|||
if child != Id::invalid() { |
|||
self.stack.push(child); |
|||
} |
|||
return Some(&egraph.nodes[node.index as usize]); |
|||
} else if let Some((child1, child2)) = eclass.as_union() { |
|||
debug_assert!(child1 != Id::invalid()); |
|||
debug_assert!(child2 != Id::invalid()); |
|||
self.stack.push(child2); |
|||
self.stack.push(child1); |
|||
continue; |
|||
} else { |
|||
unreachable!("Invalid eclass format"); |
|||
} |
|||
} |
|||
None |
|||
} |
|||
} |
@ -1,85 +0,0 @@ |
|||
//! Simple union-find data structure.
|
|||
|
|||
use crate::{trace, Id}; |
|||
use cranelift_entity::SecondaryMap; |
|||
use std::hash::{Hash, Hasher}; |
|||
|
|||
/// A union-find data structure. The data structure can allocate
|
|||
/// `Id`s, indicating eclasses, and can merge eclasses together.
|
|||
#[derive(Clone, Debug)] |
|||
pub struct UnionFind { |
|||
parent: SecondaryMap<Id, Id>, |
|||
} |
|||
|
|||
impl UnionFind { |
|||
/// Create a new `UnionFind`.
|
|||
pub fn new() -> Self { |
|||
UnionFind { |
|||
parent: SecondaryMap::new(), |
|||
} |
|||
} |
|||
|
|||
/// Create a new `UnionFind` with the given capacity.
|
|||
pub fn with_capacity(cap: usize) -> Self { |
|||
UnionFind { |
|||
parent: SecondaryMap::with_capacity(cap), |
|||
} |
|||
} |
|||
|
|||
/// Add an `Id` to the `UnionFind`, with its own equivalence class
|
|||
/// initially. All `Id`s must be added before being queried or
|
|||
/// unioned.
|
|||
pub fn add(&mut self, id: Id) { |
|||
self.parent[id] = id; |
|||
} |
|||
|
|||
/// Find the canonical `Id` of a given `Id`.
|
|||
pub fn find(&self, mut node: Id) -> Id { |
|||
while node != self.parent[node] { |
|||
node = self.parent[node]; |
|||
} |
|||
node |
|||
} |
|||
|
|||
/// Find the canonical `Id` of a given `Id`, updating the data
|
|||
/// structure in the process so that future queries for this `Id`
|
|||
/// (and others in its chain up to the root of the equivalence
|
|||
/// class) will be faster.
|
|||
pub fn find_and_update(&mut self, mut node: Id) -> Id { |
|||
// "Path splitting" mutating find (Tarjan and Van Leeuwen).
|
|||
let orig = node; |
|||
while node != self.parent[node] { |
|||
let next = self.parent[self.parent[node]]; |
|||
self.parent[node] = next; |
|||
node = next; |
|||
} |
|||
trace!("find_and_update: {} -> {}", orig, node); |
|||
node |
|||
} |
|||
|
|||
/// Merge the equivalence classes of the two `Id`s.
|
|||
pub fn union(&mut self, a: Id, b: Id) { |
|||
let a = self.find_and_update(a); |
|||
let b = self.find_and_update(b); |
|||
let (a, b) = (std::cmp::min(a, b), std::cmp::max(a, b)); |
|||
if a != b { |
|||
// Always canonicalize toward lower IDs.
|
|||
self.parent[b] = a; |
|||
trace!("union: {}, {}", a, b); |
|||
} |
|||
} |
|||
|
|||
/// Determine if two `Id`s are equivalent, after
|
|||
/// canonicalizing. Update union-find data structure during our
|
|||
/// canonicalization to make future lookups faster.
|
|||
pub fn equiv_id_mut(&mut self, a: Id, b: Id) -> bool { |
|||
self.find_and_update(a) == self.find_and_update(b) |
|||
} |
|||
|
|||
/// Hash an `Id` after canonicalizing it. Update union-find data
|
|||
/// structure to make future lookups/hashing faster.
|
|||
pub fn hash_id_mut<H: Hasher>(&mut self, hash: &mut H, id: Id) { |
|||
let id = self.find_and_update(id); |
|||
id.hash(hash); |
|||
} |
|||
} |
Loading…
Reference in new issue