|
|
@ -18,4 +18,128 @@ |
|
|
|
|
|
|
|
### Attendees |
|
|
|
|
|
|
|
in no particular order: |
|
|
|
|
|
|
|
- CF: Chris Fallin |
|
|
|
- AB: Andrew Brown |
|
|
|
- B3: bjorn3 |
|
|
|
- UW: Ulrich Weigand |
|
|
|
- SP: Sam Parker |
|
|
|
- AK: Anton Kirilov |
|
|
|
- Afonso Bordado |
|
|
|
- Johnnie Birch |
|
|
|
- BB: Benjamin Bouvier |
|
|
|
|
|
|
|
### Notes |
|
|
|
|
|
|
|
- Semantics of booleans: https://github.com/bytecodealliance/wasmtime/issues/3205 |
|
|
|
- Inconsistencies between different backends |
|
|
|
- Tribal knowledge about this, mostly |
|
|
|
- New uses of boolean types, e.g. cg_clif |
|
|
|
- Q: should there be repr for a boolean type? |
|
|
|
- Q: what does it mean to have a bool that’s wider than 1 bit? |
|
|
|
- Historically, did have those wider-than-1 bit. Have to be all 0 or 1. Use |
|
|
|
case: bitcast from boolean to other types to get vector masks. |
|
|
|
- SIMD vector compare instructions are better handled for this use case |
|
|
|
- Q: what are the semantics of storing/loading bool from memory + casting to/from |
|
|
|
ints? |
|
|
|
- Historically, validator error to load/store bool from memory |
|
|
|
- Two main options: |
|
|
|
- A. false = 0, true = 1, wider-than-1-bit is 1 (zero-extended) |
|
|
|
- B. wider-than-1-bit is all ones |
|
|
|
- UW: b1’s documentation says it can’t be loaded/stored from/to memory |
|
|
|
- CF: not true as of last week (fuzz bug), need to update doc |
|
|
|
- AB: SIMD bool types must have a known bit repr |
|
|
|
- Q: do we want boolean types at the clif level to behave as the others (can be |
|
|
|
stored/loaded), or do we want to forbid memory accesses to those? |
|
|
|
- SP/UW: Do we know any arch that has sub-byte load/store? Sounds like |
|
|
|
no. |
|
|
|
- AB: fine to not mandate a repr on b1, but useful to have a repr for SIMD |
|
|
|
vectors, since bool vectors are likely to be stored |
|
|
|
- UW: doc is outdated for bool vectors (still mentions forbidden |
|
|
|
loads/stores) |
|
|
|
- Q: why do we want a bool type? |
|
|
|
- CF: we could just remove all the bool types overall |
|
|
|
- AB: what about return values of SIMD compare? |
|
|
|
- CF: only remove all the scalar bool types |
|
|
|
- UW: weird to have bool types only for vector |
|
|
|
- CF: could have b1 for scalar, and b128 for vectors, only |
|
|
|
- UW: what’s the benefit of e.g. b8 over i8 at the IR level? |
|
|
|
- CF: bitmasking stuff will depend on the actual IR type |
|
|
|
- AB: could remove a few `raw_bitcast` if we didn’t have so many bool |
|
|
|
types |
|
|
|
- CF: still want b1, do not allow load/store of bools, do not allow bitcast |
|
|
|
(they don’t have a repr) |
|
|
|
- B3: how would vselect work without bools? |
|
|
|
- AB: bool vectors give guarantees about the actual repr, so that’s nice |
|
|
|
- CF: can’t rely on lowering that the result of loading a b128 from memory is |
|
|
|
actually all ones or zeroes, so would have to canonicalize anyways |
|
|
|
- AK: could have shorter aarch64 sequences if we knew about the repr of |
|
|
|
bool vectors |
|
|
|
- AK: instead of canonicalization, could use pattern-matching up the |
|
|
|
operand tree that the value got produced by an inst that generated all0 or |
|
|
|
all1 |
|
|
|
- CF: Proposal: we have wider bool types, and they are guaranteed to be |
|
|
|
canonicalized (insert checks for load/stores/bitcast). Impl could be |
|
|
|
compare-to-0? |
|
|
|
- UW: or shifts, depend on the situation. Would be a factor slower in any |
|
|
|
case. |
|
|
|
- AB: what about the use case where lowering wasm to clif, we load an |
|
|
|
v128 and use it as a mask in another wasm simd op? |
|
|
|
- CF: would need to cast to a bool type |
|
|
|
- Semantics of `raw_bitcast`? |
|
|
|
- Useful to convert from a CLIF type to another, without any change |
|
|
|
at the machine level |
|
|
|
- CF: think about it for some more time, and get back to it? |
|
|
|
- No one disagrees, so everyone agrees |
|
|
|
- Please make suggestions in the issue |
|
|
|
- ISLE: https://github.com/bytecodealliance/rfcs/pull/15 |
|
|
|
- AK: want to be able to spend less/more time to do pattern-matching according to |
|
|
|
opt level. Would need runtime flags for this. Could this be implemented via the |
|
|
|
extractors? |
|
|
|
- CF: possible to have a switch at meta-compile time to exclude certain rules. |
|
|
|
Should it be a compile-time flag, or a runtime flag (more complicated)? |
|
|
|
- AK: really want a runtime flag to get really fast compile times |
|
|
|
- AK: also need a way to pattern match on CPU extensions |
|
|
|
- CF: would be a runtime flag as well |
|
|
|
- Afonso Bordado: commented about having this kind of predicates on instructions; |
|
|
|
proposal to use the `when` syntax |
|
|
|
- CF: implicit conditioning: no special marking, but if a rule uses an e.g. |
|
|
|
avx512-only inst, automatically detect it and add a predicate on the whole rule |
|
|
|
that it requires the CPU ext. |
|
|
|
- UW: how does it compare with LLVM? |
|
|
|
- CF: Studied related work in pre-RFC (#13). Pattern-matching DSL similar to what |
|
|
|
LLVM does. ISLE is less broad in scope than TableGen and would only be used |
|
|
|
for codegen. ISLE is simpler. |
|
|
|
- UW: in the LLVM community there’s been a push away from SelectionDag |
|
|
|
- CF: bigger compile times. It’s a tradeoff with dev productivity + we did have very |
|
|
|
subtle bugs in the past. LLVM moving to FastISel? because it’s faster. We’re |
|
|
|
building the foundational level of rules, “simple” pattern matching, nice to have a |
|
|
|
DSL at this point. How to make it fast in long run is an open research question. |
|
|
|
- UW: In LLVM, FastISel handles more common use cases and then redirects to |
|
|
|
SelectionDag if complicated cases show up. GlobalISel is supposed to be more |
|
|
|
global (can match across basic blocks). |
|
|
|
- CF: Could have a system with foundational rules + simple optimization rules that |
|
|
|
don’t try to match very deep. |
|
|
|
- AB: would like to try out some code when it’s ready so as to give more targeted |
|
|
|
feedback. |
|
|
|
- BB: risk of scattering code between Rust extern functions + high-level DSL. |
|
|
|
Some old problems are becoming new again. Reinventing many concepts |
|
|
|
present in legalization, concepts overload for newcomers. Risk of seeing bugs in |
|
|
|
the “system”, much harder to debug vs just looking at handwritten code. Tradeoff |
|
|
|
between developer experience and complexity, as said before. |
|
|
|
- CF: re: FFI, mostly isolated. re: complexity, “test and fuzz the crap out of it” :). Re: |
|
|
|
cognitive load, tribal knowledge is starting to appear in the current system (how |
|
|
|
to properly do pattern match without causing subtle errors?). Should be better in |
|
|
|
a lot of ways. |
|
|
|
- AK: emphasis on getting better documentation (blog posts / internal docs). |
|
|
|
- CF: if the system is complicated and requires lots of docs, it’s not ideal. Want to |
|
|
|
make the system easy to understand and have good docs. |
|
|
|
- AB: generated code should be in-tree, for better discoverability. |
|
|
|
- CF: agreed, would help compile times + we could maybe include comments in |
|
|
|
generated code. |
|
|
|
- Status updates: |
|
|
|
- UW: CI for s390, qemu patches now in main, some qemu version should work out of the |
|
|
|
box. Yet it (either qemu or wasmtime) doesn’t build anymore on s390. Looking into it |
|
|
|
before being able to run s390 in CI. |
|
|
|
- AK: more aarch64 tests run on qemu. Also have native runners. |
|
|
|