* Remove -fno-builtin.
-fno-builtin suppresses optimizations such as turning calls to `sqrt`
or `fabs` into `f64.sqrt` or `f64.abs` instructions inline. Libc code
itself benefits from these optimizations.
I originally added this flag because historically it was needed when
building libc to avoid the compiler pattern-matching the body of memcpy
into a memcpy call, however clang no longer requires this.
* Expand the comment about why we use USE_DL_PREFIX in dlmalloc.
This isn't strictly required, as wasm SIMD loads and stores work on
unaligned memory. However, it may provide better performance. That said,
this isn't currently studied by any benchmarking.