This crate contains oracles, generators, and fuzz targets for use with fuzzing
engines (e.g. libFuzzer). This doesn't contain the actual
`libfuzzer_sys::fuzz_target!` definitions (those are in the `peepmatic-fuzz`
crate) but does those definitions are one liners calling out to functions
defined in this crate.
This crate provides testing utilities for `peepmatic`, and a test-only
instruction set we can use to check that various optimizations do or don't
apply.
Peepmatic is a DSL for peephole optimizations and compiler for generating
peephole optimizers from them. The user writes a set of optimizations in the
DSL, and then `peepmatic` compiles the set of optimizations into an efficient
peephole optimizer:
```
DSL ----peepmatic----> Peephole Optimizer
```
The generated peephole optimizer has all of its optimizations' left-hand sides
collapsed into a compact automata that makes matching candidate instruction
sequences fast.
The DSL's optimizations may be written by hand or discovered mechanically with a
superoptimizer like [Souper][]. Eventually, `peepmatic` should have a verifier
that ensures that the DSL's optimizations are sound, similar to what [Alive][]
does for LLVM optimizations.
[Souper]: https://github.com/google/souper
[Alive]: https://github.com/AliveToolkit/alive2
The `peepmatic-runtime` crate contains everything required to use a
`peepmatic`-generated peephole optimizer.
In short: build times and code size.
If you are just using a peephole optimizer, you shouldn't need the functions
to construct it from scratch from the DSL (and the implied code size and
compilation time), let alone even build it at all. You should just
deserialize an already-built peephole optimizer, and then use it.
That's all that is contained here in this crate.
This crate provides the derive macros used by `peepmatic`, notable AST-related
derives that enumerate child AST nodes, and operator-related derives that
provide helpers for type checking.
The `peepmatic-automata` crate builds and queries finite-state transducer
automata.
A transducer is a type of automata that has not only an input that it
accepts or rejects, but also an output. While regular automata check whether
an input string is in the set that the automata accepts, a transducer maps
the input strings to values. A regular automata is sort of a compressed,
immutable set, and a transducer is sort of a compressed, immutable key-value
dictionary. A [trie] compresses a set of strings or map from a string to a
value by sharing prefixes of the input string. Automata and transducers can
compress even better: they can share both prefixes and suffixes. [*Index
1,600,000,000 Keys with Automata and Rust* by Andrew Gallant (aka
burntsushi)][burntsushi-blog-post] is a top-notch introduction.
If you're looking for a general-purpose transducers crate in Rust you're
probably looking for [the `fst` crate][fst-crate]. While this implementation
is fully generic and has no dependencies, its feature set is specific to
`peepmatic`'s needs:
* We need to associate extra data with each state: the match operation to
evaluate next.
* We can't provide the full input string up front, so this crate must
support incremental lookups. This is because the peephole optimizer is
computing the input string incrementally and dynamically: it looks at the
current state's match operation, evaluates it, and then uses the result as
the next character of the input string.
* We also support incremental insertion and output when building the
transducer. This is necessary because we don't want to emit output values
that bind a match on an optimization's left-hand side's pattern (for
example) until after we've succeeded in matching it, which might not
happen until we've reached the n^th state.
* We need to support generic output values. The `fst` crate only supports
`u64` outputs, while we need to build up an optimization's right-hand side
instructions.
This implementation is based on [*Direct Construction of Minimal Acyclic
Subsequential Transducers* by Mihov and Maurel][paper]. That means that keys
must be inserted in lexicographic order during construction.
[trie]: https://en.wikipedia.org/wiki/Trie
[burntsushi-blog-post]: https://blog.burntsushi.net/transducers/#ordered-maps
[fst-crate]: https://crates.io/crates/fst
[paper]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3698&rep=rep1&type=pdf
This reduces the size of the Inst enum from 112 bytes to 48 bytes.
Using DHAT on a regex-rs.wasm benchmark, `valgrind --tool=dhat clif-util compile --target aarch64`
The total number of allocated bytes, drops by around 170 MB.
At t-gmax drops by 3 MB.
Using `perf stat clif-util compile --target aarch64`, the instructions count dropped by 0.6%. Cache misses dropped by 6%. Cycles dropped by 2.3%.
This commit adds a suite of `wasmtime_funcref_table_*` APIs which mirror
the standard APIs but have a few differences:
* More errors are returned. For example error messages are communicated
through `wasmtime_error_t` and out-of-bounds vs load of null can be
differentiated in the `get` API.
* APIs take `wasm_func_t` instead of `wasm_ref_t`. Given the recent
decision to remove subtyping from the anyref proposal it's not clear
how the C API for tables will be affected, so for now these APIs are
all specialized to only funcref tables.
* Growth now allows access to the previous size of the table, if
desired, which mirrors the `table.grow` instruction.
This was originally motivated by bytecodealliance/wasmtime-go#5 where
the current APIs we have for working with tables don't quite work. We
don't have a great way to take an anyref constructed from a `Func` and
get the `Func` back out, so for now this sidesteps those concerns while
we sort out the anyref story.
It's intended that once the anyref story has settled and the official C
API has updated we'll likely delete these wasmtime-specific APIs or
implement them as trivial wrappers around the official ones.
* Remove Cranelift's OutOfBounds trap, which is no longer used.
* Change proc_exit to unwind instead of exit the host process.
This implements the semantics in https://github.com/WebAssembly/WASI/pull/235.
Fixes#783.
Fixes#993.
* Fix exit-status tests on Windows.
* Revert the wiggle changes and re-introduce the wasi-common implementations.
* Move `wasi_proc_exit` into the wasmtime-wasi crate.
* Revert the spec_testsuite change.
* Remove the old proc_exit implementations.
* Make `TrapReason` an implementation detail.
* Allow exit status 2 on Windows too.
* Fix a documentation link.
* Really fix a documentation link.
The `wasmtime` crate currently lives in `crates/api` for historical
reasons, because we once called it `wasmtime-api` crate. This creates a
stumbling block for new contributors.
As discussed on Zulip, rename the directory to `crates/wasmtime`.
Several links were broken by line-breaks between the link caption and
the link itself. This commit fixes them by moving each on its own line.
Co-authored-by: k.bagrov <k.bagrov@g.nsu.ru>
If the scratch register is caller-saved, then it might appear in fixed
ranges because of call clobbers. Instead, use a register that's not
caller-saved and has no predefined use in the ABI.
SmallVec<[Value; 1]>, not as a Vec<Value>. This isn't a useful change for any
non-developer use of Cranelift, but it does significantly reduce the amount of
allocation "noise" seen when tuning the new backend pipeline as driven by
clif-util reading .clif files. In one case the number of malloc calls
declined by about 20% with this change.
In stark contrast with every reasonable architecture, X86-32 does not
pass any parameters in registers. Because of that we have to resort
to reading arguments from stack without being able to use the stack
slot machinery.
(This wouldn't have been avoidable even by pinning a register because
there is a trampoline in wasmtime with the C ABI that Cranelift needs
to be able to call.)
compilation.
This saves ~0.14% instruction count, ~0.18% allocated bytes, and ~1.5%
allocated blocks on a `clif-util wasm` compilation of `bz2.wasm` for
aarch64.
that only one value ever flows. Has been observed to improve generated code
run times by up to 8%. Compilation cost increases by about 0.6%, but up to 7%
total cost has been observed to be saved; iow it can be a significant win in
terms of compilation time, overall.
This is an incomplete version of a Cranelift IR interpreter: only a small subset of instructions are implemented and (known) missing parts are marked with TODO or FIXME.
* Introduce strongly-typed system primitives
This commit does a lot of reshuffling and even some more. It introduces
strongly-typed system primitives which are: `OsFile`, `OsDir`, `Stdio`,
and `OsOther`. Those primitives are separate structs now, each implementing
a subset of `Handle` methods, rather than all being an enumeration of some
supertype such as `OsHandle`. To summarise the structs:
* `OsFile` represents a regular file, and implements fd-ops
of `Handle` trait
* `OsDir` represents a directory, and primarily implements path-ops, plus
`readdir` and some common fd-ops such as `fdstat`, etc.
* `Stdio` represents a stdio handle, and implements a subset of fd-ops
such as `fdstat` _and_ `read_` and `write_vectored` calls
* `OsOther` currently represents anything else and implements a set similar
to that implemented by `Stdio`
This commit is effectively an experiment and an excercise into better
understanding what's going on for each OS resource/type under-the-hood.
It's meant to give us some intuition in order to move on with the idea
of having strongly-typed handles in WASI both in the syscall impl as well
as at the libc level.
Some more minor changes include making `OsHandle` represent an OS-specific
wrapper for a raw OS handle (Unix fd or Windows handle). Also, since `OsDir`
is tricky across OSes, we also have a supertype of `OsHandle` called
`OsDirHandle` which may store a `DIR*` stream pointer (mainly BSD). Last but not
least, the `Filetype` and `Rights` are now computed when the resource is created,
rather than every time we call `Handle::get_file_type` and `Handle::get_rights`.
Finally, in order to facilitate the latter, I've converted `EntryRights` into
`HandleRights` and pushed them into each `Handle` implementor.
* Do not adjust rights on Stdio
* Clean up testing for TTY and escaping writes
* Implement AsFile for dyn Handle
This cleans up a lot of repeating boilerplate code todo with
dynamic dispatch.
* Delegate definition of OsDir to OS-specific modules
Delegates defining `OsDir` struct to OS-specific modules (BSD, Linux,
Emscripten, Windows). This way, `OsDir` can safely re-use `OsHandle`
for raw OS handle storage, and can store some aux data such as an
initialized stream ptr in case of BSD. As a result, we can safely
get rid of `OsDirHandle` which IMHO was causing unnecessary noise and
overcomplicating the design. On the other hand, delegating definition
of `OsDir` to OS-specific modules isn't super clean in and of itself
either. Perhaps there's a better way of handling this?
* Check if filetype of OS handle matches WASI filetype when creating
It seems prudent to check if the passed in `File` instance is of
type matching that of the requested WASI filetype. In other words,
we'd like to avoid situations where `OsFile` is created from a
pipe.
* Make AsFile fallible
Return `EBADF` in `AsFile` in case a `Handle` cannot be made into
a `std::fs::File`.
* Remove unnecessary as_file conversion
* Remove unnecessary check for TTY for Stdio handle type
* Fix incorrect stdio ctors on Unix
* Split Stdio into three separate types: Stdin, Stdout, Stderr
* Rename PendingEntry::File to PendingEntry::OsHandle to avoid confusion
* Rename OsHandle to RawOsHandle
Also, since `RawOsHandle` on *nix doesn't need interior mutability
wrt the inner raw file descriptor, we can safely swap the `RawFd`
for `File` instance.
* Add docs explaining what OsOther is
* Allow for stdio to be non-character-device (e.g., piped)
* Return error on bad preopen rather than panic
This PR changes the aarch64 ABI implementation to use positive offsets
from SP, rather than negative offsets from FP, to refer to spill slots
and stack-local storage. This allows for better addressing-mode options,
and hence slightly better code: e.g., the unsigned scaled 12-bit offset
mode can be used to reach anywhere in a 32KB frame without extra
address-construction instructions, whereas negative offsets are limited
to a signed 9-bit unscaled mode (-256 bytes).
To enable this, the PR introduces a notion of "nominal SP offsets" as a
virtual addressing mode, lowered during the emission pass. The offsets
are relative to "SP after adjusting downward to allocate stack/spill
slots", but before pushing clobbers. This allows the addressing-mode
expressions to be generated before register allocation (or during it,
for spill/reload sequences).
To convert these offsets into *true* offsets from SP, we need to track
how much further SP is moved downward, and compensate for this. We do so
with "virtual SP offset adjustment" pseudo-instructions: these are seen
by the emission pass, and result in no instruction (0 byte output), but
update state that is now threaded through each instruction emission in
turn. In this way, we can push e.g. stack args for a call and adjust
the virtual SP offset, allowing reloads from nominal-SP-relative
spillslots while we do the argument setup with "real SP offsets" at the
same time.