Browse Source

Add meeting notes for Cranelift meeting 2021-08-23

pull/3224/head
Benjamin Bouvier 3 years ago
committed by GitHub
parent
commit
6dba669e54
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
  1. 124
      meetings/cranelift/2021/cranelift-08-23.md

124
meetings/cranelift/2021/cranelift-08-23.md

@ -18,4 +18,128 @@
### Attendees
in no particular order:
- CF: Chris Fallin
- AB: Andrew Brown
- B3: bjorn3
- UW: Ulrich Weigand
- SP: Sam Parker
- AK: Anton Kirilov
- Afonso Bordado
- Johnnie Birch
- BB: Benjamin Bouvier
### Notes
- Semantics of booleans: https://github.com/bytecodealliance/wasmtime/issues/3205
- Inconsistencies between different backends
- Tribal knowledge about this, mostly
- New uses of boolean types, e.g. cg_clif
- Q: should there be repr for a boolean type?
- Q: what does it mean to have a bool that’s wider than 1 bit?
- Historically, did have those wider-than-1 bit. Have to be all 0 or 1. Use
case: bitcast from boolean to other types to get vector masks.
- SIMD vector compare instructions are better handled for this use case
- Q: what are the semantics of storing/loading bool from memory + casting to/from
ints?
- Historically, validator error to load/store bool from memory
- Two main options:
- A. false = 0, true = 1, wider-than-1-bit is 1 (zero-extended)
- B. wider-than-1-bit is all ones
- UW: b1’s documentation says it can’t be loaded/stored from/to memory
- CF: not true as of last week (fuzz bug), need to update doc
- AB: SIMD bool types must have a known bit repr
- Q: do we want boolean types at the clif level to behave as the others (can be
stored/loaded), or do we want to forbid memory accesses to those?
- SP/UW: Do we know any arch that has sub-byte load/store? Sounds like
no.
- AB: fine to not mandate a repr on b1, but useful to have a repr for SIMD
vectors, since bool vectors are likely to be stored
- UW: doc is outdated for bool vectors (still mentions forbidden
loads/stores)
- Q: why do we want a bool type?
- CF: we could just remove all the bool types overall
- AB: what about return values of SIMD compare?
- CF: only remove all the scalar bool types
- UW: weird to have bool types only for vector
- CF: could have b1 for scalar, and b128 for vectors, only
- UW: what’s the benefit of e.g. b8 over i8 at the IR level?
- CF: bitmasking stuff will depend on the actual IR type
- AB: could remove a few `raw_bitcast` if we didn’t have so many bool
types
- CF: still want b1, do not allow load/store of bools, do not allow bitcast
(they don’t have a repr)
- B3: how would vselect work without bools?
- AB: bool vectors give guarantees about the actual repr, so that’s nice
- CF: can’t rely on lowering that the result of loading a b128 from memory is
actually all ones or zeroes, so would have to canonicalize anyways
- AK: could have shorter aarch64 sequences if we knew about the repr of
bool vectors
- AK: instead of canonicalization, could use pattern-matching up the
operand tree that the value got produced by an inst that generated all0 or
all1
- CF: Proposal: we have wider bool types, and they are guaranteed to be
canonicalized (insert checks for load/stores/bitcast). Impl could be
compare-to-0?
- UW: or shifts, depend on the situation. Would be a factor slower in any
case.
- AB: what about the use case where lowering wasm to clif, we load an
v128 and use it as a mask in another wasm simd op?
- CF: would need to cast to a bool type
- Semantics of `raw_bitcast`?
- Useful to convert from a CLIF type to another, without any change
at the machine level
- CF: think about it for some more time, and get back to it?
- No one disagrees, so everyone agrees
- Please make suggestions in the issue
- ISLE: https://github.com/bytecodealliance/rfcs/pull/15
- AK: want to be able to spend less/more time to do pattern-matching according to
opt level. Would need runtime flags for this. Could this be implemented via the
extractors?
- CF: possible to have a switch at meta-compile time to exclude certain rules.
Should it be a compile-time flag, or a runtime flag (more complicated)?
- AK: really want a runtime flag to get really fast compile times
- AK: also need a way to pattern match on CPU extensions
- CF: would be a runtime flag as well
- Afonso Bordado: commented about having this kind of predicates on instructions;
proposal to use the `when` syntax
- CF: implicit conditioning: no special marking, but if a rule uses an e.g.
avx512-only inst, automatically detect it and add a predicate on the whole rule
that it requires the CPU ext.
- UW: how does it compare with LLVM?
- CF: Studied related work in pre-RFC (#13). Pattern-matching DSL similar to what
LLVM does. ISLE is less broad in scope than TableGen and would only be used
for codegen. ISLE is simpler.
- UW: in the LLVM community there’s been a push away from SelectionDag
- CF: bigger compile times. It’s a tradeoff with dev productivity + we did have very
subtle bugs in the past. LLVM moving to FastISel? because it’s faster. We’re
building the foundational level of rules, “simple” pattern matching, nice to have a
DSL at this point. How to make it fast in long run is an open research question.
- UW: In LLVM, FastISel handles more common use cases and then redirects to
SelectionDag if complicated cases show up. GlobalISel is supposed to be more
global (can match across basic blocks).
- CF: Could have a system with foundational rules + simple optimization rules that
don’t try to match very deep.
- AB: would like to try out some code when it’s ready so as to give more targeted
feedback.
- BB: risk of scattering code between Rust extern functions + high-level DSL.
Some old problems are becoming new again. Reinventing many concepts
present in legalization, concepts overload for newcomers. Risk of seeing bugs in
the “system”, much harder to debug vs just looking at handwritten code. Tradeoff
between developer experience and complexity, as said before.
- CF: re: FFI, mostly isolated. re: complexity, “test and fuzz the crap out of it” :). Re:
cognitive load, tribal knowledge is starting to appear in the current system (how
to properly do pattern match without causing subtle errors?). Should be better in
a lot of ways.
- AK: emphasis on getting better documentation (blog posts / internal docs).
- CF: if the system is complicated and requires lots of docs, it’s not ideal. Want to
make the system easy to understand and have good docs.
- AB: generated code should be in-tree, for better discoverability.
- CF: agreed, would help compile times + we could maybe include comments in
generated code.
- Status updates:
- UW: CI for s390, qemu patches now in main, some qemu version should work out of the
box. Yet it (either qemu or wasmtime) doesn’t build anymore on s390. Looking into it
before being able to run s390 in CI.
- AK: more aarch64 tests run on qemu. Also have native runners.

Loading…
Cancel
Save