You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

193 lines
6.5 KiB

fn main() -> anyhow::Result<()> {
component::generate_static_api_tests()?;
Ok(())
}
mod component {
use anyhow::{anyhow, Context, Error, Result};
use arbitrary::Unstructured;
use component_fuzz_util::{Declarations, TestCase, Type, MAX_TYPE_DEPTH};
use proc_macro2::TokenStream;
use quote::quote;
use rand::rngs::StdRng;
use rand::{Rng, SeedableRng};
use std::env;
use std::fmt::Write;
use std::fs;
use std::iter;
use std::path::PathBuf;
use std::process::Command;
pub fn generate_static_api_tests() -> Result<()> {
println!("cargo:rerun-if-changed=build.rs");
let out_dir = PathBuf::from(
env::var_os("OUT_DIR").expect("The OUT_DIR environment variable must be set"),
);
let mut out = String::new();
write_static_api_tests(&mut out)?;
let output = out_dir.join("static_component_api.rs");
fs::write(&output, out)?;
drop(Command::new("rustfmt").arg(&output).status());
Ok(())
}
fn write_static_api_tests(out: &mut String) -> Result<()> {
let seed = if let Ok(seed) = env::var("WASMTIME_FUZZ_SEED") {
seed.parse::<u64>()
.with_context(|| anyhow!("expected u64 in WASMTIME_FUZZ_SEED"))?
} else {
StdRng::from_entropy().gen()
};
eprintln!(
"using seed {seed} (set WASMTIME_FUZZ_SEED={seed} in your environment to reproduce)"
);
let mut rng = StdRng::seed_from_u64(seed);
const TYPE_COUNT: usize = 50;
const MAX_ARITY: u32 = 5;
const TEST_CASE_COUNT: usize = 100;
let mut type_fuel = 1000;
let mut types = Vec::new();
let name_counter = &mut 0;
let mut declarations = TokenStream::new();
let mut tests = TokenStream::new();
// First generate a set of type to select from.
for _ in 0..TYPE_COUNT {
let ty = gen(&mut rng, |u| {
// Only discount fuel if the generation was successful,
// otherwise we'll get more random data and try again.
let mut fuel = type_fuel;
let ret = Type::generate(u, MAX_TYPE_DEPTH, &mut fuel);
if ret.is_ok() {
type_fuel = fuel;
}
ret
})?;
let name = component_fuzz_util::rust_type(&ty, name_counter, &mut declarations);
types.push((name, ty));
}
// Next generate a set of static API test cases driven by the above
// types.
for index in 0..TEST_CASE_COUNT {
let (case, rust_params, rust_results) = gen(&mut rng, |u| {
let mut params = Vec::new();
let mut results = Vec::new();
let mut rust_params = TokenStream::new();
let mut rust_results = TokenStream::new();
for _ in 0..u.int_in_range(0..=MAX_ARITY)? {
let (name, ty) = u.choose(&types)?;
params.push(ty);
rust_params.extend(name.clone());
rust_params.extend(quote!(,));
}
for _ in 0..u.int_in_range(0..=MAX_ARITY)? {
let (name, ty) = u.choose(&types)?;
results.push(ty);
rust_results.extend(name.clone());
rust_results.extend(quote!(,));
}
let case = TestCase {
params,
results,
encoding1: u.arbitrary()?,
encoding2: u.arbitrary()?,
};
Ok((case, rust_params, rust_results))
})?;
let Declarations {
types,
type_instantiation_args,
params,
Upgrade wasm-tools crates, namely the component model (#4715) * Upgrade wasm-tools crates, namely the component model This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven&#39;t added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I&#39;ve updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that. * Update version numbers * Use crates.io
2 years ago
results,
import_and_export,
Implement roundtrip fuzzing of component adapters (#4640) * Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow&lt;&#39;static, ...&gt;` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store&lt;Box&lt;dyn Any&gt;&gt;` to try to cut down on codegen by monomorphizing fewer `Store&lt;T&gt;` implementation. * Add debug logging to print out what&#39;s flowing in and what&#39;s flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple&lt;T, T, T, T, ...&gt;` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn&#39;t lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the &#34;stack&#34;, e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn&#39;t trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn&#39;t encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator
2 years ago
encoding1,
encoding2,
} = case.declarations();
let test = quote!(#index => component_types::static_api_test::<(#rust_params), (#rust_results)>(
input,
Implement roundtrip fuzzing of component adapters (#4640) * Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow&lt;&#39;static, ...&gt;` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store&lt;Box&lt;dyn Any&gt;&gt;` to try to cut down on codegen by monomorphizing fewer `Store&lt;T&gt;` implementation. * Add debug logging to print out what&#39;s flowing in and what&#39;s flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple&lt;T, T, T, T, ...&gt;` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn&#39;t lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the &#34;stack&#34;, e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn&#39;t trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn&#39;t encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator
2 years ago
{
static DECLS: Declarations = Declarations {
types: Cow::Borrowed(#types),
type_instantiation_args: Cow::Borrowed(#type_instantiation_args),
Implement roundtrip fuzzing of component adapters (#4640) * Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow&lt;&#39;static, ...&gt;` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store&lt;Box&lt;dyn Any&gt;&gt;` to try to cut down on codegen by monomorphizing fewer `Store&lt;T&gt;` implementation. * Add debug logging to print out what&#39;s flowing in and what&#39;s flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple&lt;T, T, T, T, ...&gt;` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn&#39;t lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the &#34;stack&#34;, e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn&#39;t trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn&#39;t encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator
2 years ago
params: Cow::Borrowed(#params),
Upgrade wasm-tools crates, namely the component model (#4715) * Upgrade wasm-tools crates, namely the component model This commit pulls in the latest versions of all of the `wasm-tools` family of crates. There were two major changes that happened in `wasm-tools` in the meantime: * bytecodealliance/wasm-tools#697 - this commit introduced a new API for more efficiently reading binary operators from a wasm binary. The old `Operator`-based reading was left in place, however, and continues to be what Wasmtime uses. I hope to update Wasmtime in a future PR to use this new API, but for now the biggest change is... * bytecodealliance/wasm-tools#703 - this commit was a major update to the component model AST. This commit almost entirely deals with the fallout of this change. The changes made to the component model were: 1. The `unit` type no longer exists. This was generally a simple change where the `Unit` case in a few different locations were all removed. 2. The `expected` type was renamed to `result`. This similarly was relatively lightweight and mostly just a renaming on the surface. I took this opportunity to rename `val::Result` to `val::ResultVal` and `types::Result` to `types::ResultType` to avoid clashing with the standard library types. The `Option`-based types were handled with this as well. 3. The payload type of `variant` and `result` types are now optional. This affected many locations that calculate flat type representations, ABI information, etc. The `#[derive(ComponentType)]` macro now specifically handles Rust-defined `enum` types which have no payload to the equivalent in the component model. 4. Functions can now return multiple parameters. This changed the signature of invoking component functions because the return value is now bound by `ComponentNamedList` (renamed from `ComponentParams`). This had a large effect in the tests, fuzz test case generation, etc. 5. Function types with 2-or-more parameters/results must uniquely name all parameters/results. This mostly affected the text format used throughout the tests. I haven&#39;t added specifically new tests for multi-return but I changed a number of tests to use it. Additionally I&#39;ve updated the fuzzers to all exercise multi-return as well so I think we should get some good coverage with that. * Update version numbers * Use crates.io
2 years ago
results: Cow::Borrowed(#results),
Implement roundtrip fuzzing of component adapters (#4640) * Improve the `component_api` fuzzer on a few dimensions * Update the generated component to use an adapter module. This involves two core wasm instances communicating with each other to test that data flows through everything correctly. The intention here is to fuzz the fused adapter compiler. String encoding options have been plumbed here to exercise differences in string encodings. * Use `Cow&lt;&#39;static, ...&gt;` and `static` declarations for each static test case to try to cut down on rustc codegen time. * Add `Copy` to derivation of fuzzed enums to make `derive(Clone)` smaller. * Use `Store&lt;Box&lt;dyn Any&gt;&gt;` to try to cut down on codegen by monomorphizing fewer `Store&lt;T&gt;` implementation. * Add debug logging to print out what&#39;s flowing in and what&#39;s flowing out for debugging failures. * Improve `Debug` representation of dynamic value types to more closely match their Rust counterparts. * Fix a variant issue with adapter trampolines Previously the offset of the payload was calculated as the discriminant aligned up to the alignment of a singular case, but instead this needs to be aligned up to the alignment of all cases to ensure all cases start at the same location. * Fix a copy/paste error when copying masked integers A 32-bit load was actually doing a 16-bit load by accident since it was copied from the 16-bit load-and-mask case. * Fix f32/i64 conversions in adapter modules The adapter previously erroneously converted the f32 to f64 and then to i64, where instead it should go from f32 to i32 to i64. * Fix zero-sized flags in adapter modules This commit corrects the size calculation for zero-sized flags in adapter modules. cc #4592 * Fix a variant size calculation bug in adapters This fixes the same issue found with variants during normal host-side fuzzing earlier where the size of a variant needs to align up the summation of the discriminant and the maximum case size. * Implement memory growth in libc bump realloc Some fuzz-generated test cases are copying lists large enough to exceed one page of memory so bake in a `memory.grow` to the bump allocator as well. * Avoid adapters of exponential size This commit is an attempt to avoid adapters being exponentially sized with respect to the type hierarchy of the input. Previously all adaptation was done inline within each adapter which meant that if something was structured as `tuple&lt;T, T, T, T, ...&gt;` the translation of `T` would be inlined N times. For very deeply nested types this can quickly create an exponentially sized adapter with types of the form: (type $t0 (list u8)) (type $t1 (tuple $t0 $t0)) (type $t2 (tuple $t1 $t1)) (type $t3 (tuple $t2 $t2)) ;; ... where the translation of `t4` has 8 different copies of translating `t0`. This commit changes the translation of types through memory to almost always go through a helper function. The hope here is that it doesn&#39;t lose too much performance because types already reside in memory. This can still lead to exponentially sized adapter modules to a lesser degree where if the translation all happens on the &#34;stack&#34;, e.g. via `variant`s and their flat representation then many copies of one translation could still be made. For now this commit at least gets the problem under control for fuzzing where fuzzing doesn&#39;t trivially find type hierarchies that take over a minute to codegen the adapter module. One of the main tricky parts of this implementation is that when a function is generated the index that it will be placed at in the final module is not known at that time. To solve this the encoded form of the `Call` instruction is saved in a relocation-style format where the `Call` isn&#39;t encoded but instead saved into a different area for encoding later. When the entire adapter module is encoded to wasm these pseudo-`Call` instructions are encoded as real instructions at that time. * Fix some memory64 issues with string encodings Introduced just before #4623 I had a few mistakes related to 64-bit memories and mixing 32/64-bit memories. * Actually insert into the `translate_mem_funcs` map This... was the whole point of having the map! * Assert memory growth succeeds in bump allocator
2 years ago
import_and_export: Cow::Borrowed(#import_and_export),
encoding1: #encoding1,
encoding2: #encoding2,
};
&DECLS
}
),);
tests.extend(test);
}
let module = quote! {
#[allow(unused_imports)]
fn static_component_api_target(input: &mut libfuzzer_sys::arbitrary::Unstructured) -> libfuzzer_sys::arbitrary::Result<()> {
use anyhow::Result;
use component_fuzz_util::Declarations;
use component_test_util::{self, Float32, Float64};
use libfuzzer_sys::arbitrary::{self, Arbitrary};
use std::borrow::Cow;
use std::sync::{Arc, Once};
use wasmtime::component::{ComponentType, Lift, Lower};
use wasmtime_fuzzing::generators::component_types;
const SEED: u64 = #seed;
static ONCE: Once = Once::new();
ONCE.call_once(|| {
eprintln!(
"Seed {SEED} was used to generate static component API fuzz tests.\n\
Set WASMTIME_FUZZ_SEED={SEED} in your environment at build time to reproduce."
);
});
#declarations
match input.int_in_range(0..=(#TEST_CASE_COUNT-1))? {
#tests
_ => unreachable!()
}
}
};
write!(out, "{module}")?;
Ok(())
}
fn gen<T>(
rng: &mut StdRng,
mut f: impl FnMut(&mut Unstructured<'_>) -> arbitrary::Result<T>,
) -> Result<T> {
let mut bytes = Vec::new();
loop {
let count = rng.gen_range(1000..2000);
bytes.extend(iter::repeat_with(|| rng.gen::<u8>()).take(count));
match f(&mut Unstructured::new(&bytes)) {
Ok(ret) => break Ok(ret),
Err(arbitrary::Error::NotEnoughData) => (),
Err(error) => break Err(Error::from(error)),
}
}
}
}