Browse Source

Merge pull request #526 from alexcrichton/cache-docs

Move cache configuration documentation into book
pull/527/head
Dan Gohman 5 years ago
committed by GitHub
parent
commit
94044100f9
No known key found for this signature in database GPG Key ID: 4AEE18F83AFDEB23
  1. 2
      ci/build-tarballs.sh
  2. 24
      crates/environ/src/cache/config.rs
  3. 279
      docs/CACHE_CONFIGURATION.md
  4. 277
      docs/cli-cache.md
  5. 4
      installer/msi/wasmtime.wxs

2
ci/build-tarballs.sh

@ -32,7 +32,7 @@ mktarball() {
# Create the main tarball of binaries
bin_pkgname=wasmtime-$TAG-$platform
mkdir tmp/$bin_pkgname
cp LICENSE README.md docs/CACHE_CONFIGURATION.md tmp/$bin_pkgname
cp LICENSE README.md tmp/$bin_pkgname
mv bins-$src/{wasmtime,wasm2obj}$exe tmp/$bin_pkgname
chmod +x tmp/$bin_pkgname/{wasmtime,wasm2obj}$exe
mktarball $bin_pkgname

24
crates/environ/src/cache/config.rs

@ -171,7 +171,7 @@ pub fn create_new_config<P: AsRef<Path> + Debug>(
let content = "\
# Comment out certain settings to use default values.
# For more settings, please refer to the documentation:
# https://github.com/CraneStation/wasmtime/blob/master/docs/CACHE_CONFIGURATION.md
# https://cranestation.github.io/wasmtime/cli-cache.html
[cache]
enabled = true
@ -204,34 +204,34 @@ lazy_static! {
// At the moment of writing, the modules couldn't depend on anothers,
// so we have at most one module per wasmtime instance
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_WORKER_EVENT_QUEUE_SIZE: u64 = 0x10;
const WORKER_EVENT_QUEUE_SIZE_WARNING_TRESHOLD: u64 = 3;
// should be quick and provide good enough compression
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_BASELINE_COMPRESSION_LEVEL: i32 = zstd::DEFAULT_COMPRESSION_LEVEL;
// should provide significantly better compression than baseline
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_OPTIMIZED_COMPRESSION_LEVEL: i32 = 20;
// shouldn't be to low to avoid recompressing too many files
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_OPTIMIZED_COMPRESSION_USAGE_COUNTER_THRESHOLD: u64 = 0x100;
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_CLEANUP_INTERVAL: Duration = Duration::from_secs(60 * 60);
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_OPTIMIZING_COMPRESSION_TASK_TIMEOUT: Duration = Duration::from_secs(30 * 60);
// the default assumes problems with timezone configuration on network share + some clock drift
// please notice 24 timezones = max 23h difference between some of them
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_ALLOWED_CLOCK_DRIFT_FOR_FILES_FROM_FUTURE: Duration =
Duration::from_secs(60 * 60 * 24);
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_FILE_COUNT_SOFT_LIMIT: u64 = 0x10_000;
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_FILES_TOTAL_SIZE_SOFT_LIMIT: u64 = 1024 * 1024 * 512;
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_FILE_COUNT_LIMIT_PERCENT_IF_DELETING: u8 = 70;
// if changed, update CACHE_CONFIGURATION.md
// if changed, update cli-cache.md
const DEFAULT_FILES_TOTAL_SIZE_LIMIT_PERCENT_IF_DELETING: u8 = 70;
// Deserializers of our custom formats

279
docs/CACHE_CONFIGURATION.md

@ -1,279 +0,0 @@
Wasmtime Cache Configuration
============================
The cache configuration file uses the [toml] format.
You can create a configuration file at the default location with:
```
$ wasmtime --create-cache-config
```
It will print the location regardless of the success.
Please refer to the `--help` message for using a custom location.
All settings, except `enabled`, are **optional**.
If the setting is not specified, the **default** value is used.
***Thus, if you don't know what values to use, don't specify them.***
The default values might be tuned in the future.
Wasmtime assumes all the options are in the `cache` section.
Example config:
```toml
[cache]
enabled = true
directory = "/nfs-share/wasmtime-cache/"
cleanup-interval = "30m"
files-total-size-soft-limit = "1Gi"
```
Please refer to the [cache system] section to learn how it works.
If you think some default value should be tuned, some new settings
should be introduced or some behavior should be changed, you are
welcome to discuss it and contribute to [the Wasmtime repository].
[the Wasmtime repository]: https://github.com/CraneStation/wasmtime
Setting `enabled`
-----------------
- **type**: boolean
- **format**: `true | false`
- **default**: `true`
Specifies whether the cache system is used or not.
This field is *mandatory*.
The default value is used when configuration file is not specified
and none exists at the default location.
[`enabled`]: #setting-enabled
Setting `directory`
-----------------
- **type**: string (path)
- **default**: look up `cache_dir` in [directories] crate
Specifies where the cache directory is. Must be an absolute path.
[`directory`]: #setting-directory
Setting `worker-event-queue-size`
-----------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"16"`
Size of [cache worker] event queue.
If the queue is full, incoming cache usage events will be dropped.
[`worker-event-queue-size`]: #setting-worker-event-queue-size
Setting `baseline-compression-level`
------------------
- **type**: integer
- **default**: `3`, the default zstd compression level
Compression level used when a new cache file is being written by the [cache system].
Wasmtime uses [zstd] compression.
[`baseline-compression-level`]: #setting-baseline-compression-level
Setting `optimized-compression-level`
------------------
- **type**: integer
- **default**: `20`
Compression level used when the [cache worker] decides to recompress a cache file.
Wasmtime uses [zstd] compression.
[`optimized-compression-level`]: #setting-optimized-compression-level
Setting `optimized-compression-usage-counter-threshold`
------------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"256"`
One of the conditions for the [cache worker] to recompress a cache file
is to have usage count of the file exceeding this threshold.
[`optimized-compression-usage-counter-threshold`]: #setting-optimized-compression-usage-counter-threshold
Setting `cleanup-interval`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"1h"`
When the [cache worker] is notified about a cache file being updated by the [cache system]
and this interval has already passed since last cleaning up,
the worker will attempt a new cleanup.
Please also refer to [`allowed-clock-drift-for-files-from-future`].
[`cleanup-interval`]: #setting-cleanup-interval
Setting `optimizing-compression-task-timeout`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"30m"`
When the [cache worker] decides to recompress a cache file, it makes sure that
no other worker has started the task for this file within the last
[`optimizing-compression-task-timeout`] interval.
If some worker has started working on it, other workers are skipping this task.
Please also refer to the [`allowed-clock-drift-for-files-from-future`] section.
[`optimizing-compression-task-timeout`]: #setting-optimizing-compression-task-timeout
Setting `allowed-clock-drift-for-files-from-future`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"1d"`
### Locks
When the [cache worker] attempts acquiring a lock for some task,
it checks if some other worker has already acquired such a lock.
To be fault tolerant and eventually execute every task,
the locks expire after some interval.
However, because of clock drifts and different timezones,
it would happen that some lock was created in the future.
This setting defines a tolerance limit for these locks.
If the time has been changed in the system (i.e. two years backwards),
the [cache system] should still work properly.
Thus, these locks will be treated as expired
(assuming the tolerance is not too big).
### Cache files
Similarly to the locks, the cache files or their metadata might
have modification time in distant future.
The cache system tries to keep these files as long as possible.
If the limits are not reached, the cache files will not be deleted.
Otherwise, they will be treated as the oldest files, so they might survive.
If the user actually uses the cache file, the modification time will be updated.
[`allowed-clock-drift-for-files-from-future`]: #setting-allowed-clock-drift-for-files-from-future
Setting `file-count-soft-limit`
------------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"65536"`
Soft limit for the file count in the cache directory.
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`file-count-soft-limit`]: #setting-file-count-soft-limit
Setting `files-total-size-soft-limit`
------------------
- **type**: string (disk space)
- **format**: `"{integer}(K | Ki | M | Mi | G | Gi | T | Ti | P | Pi)?"`
- **default**: `"512Mi"`
Soft limit for the total size* of files in the cache directory.
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
*this is the file size, not the space physically occupied on the disk.
[`files-total-size-soft-limit`]: #setting-files-total-size-soft-limit
Setting `file-count-limit-percent-if-deleting`
------------------
- **type**: string (percent)
- **format**: `"{integer}%"`
- **default**: `"70%"`
If [`file-count-soft-limit`] is exceeded and the [cache worker] performs the cleanup task,
then the worker will delete some cache files, so after the task,
the file count should not exceed
[`file-count-soft-limit`] * [`file-count-limit-percent-if-deleting`].
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`file-count-limit-percent-if-deleting`]: #setting-file-count-limit-percent-if-deleting
Setting `files-total-size-limit-percent-if-deleting`
------------------
- **type**: string (percent)
- **format**: `"{integer}%"`
- **default**: `"70%"`
If [`files-total-size-soft-limit`] is exceeded and [cache worker] performs the cleanup task,
then the worker will delete some cache files, so after the task,
the files total size should not exceed
[`files-total-size-soft-limit`] * [`files-total-size-limit-percent-if-deleting`].
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`files-total-size-limit-percent-if-deleting`]: #setting-files-total-size-limit-percent-if-deleting
[toml]: https://github.com/toml-lang/toml
[directories]: https://crates.io/crates/directories
[cache system]: #how-does-the-cache-work
[cache worker]: #how-does-the-cache-work
[zstd]: https://facebook.github.io/zstd/
[Least Recently Used (LRU)]: https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)
How does the cache work?
========================
**This is an implementation detail and might change in the future.**
Information provided here is meant to help understanding the big picture
and configuring the cache.
There are two main components - the *cache system* and the *cache worker*.
Cache system
------------
Handles GET and UPDATE cache requests.
- **GET request** - simply loads the cache from disk if it is there.
- **UPDATE request** - compresses received data with [zstd] and [`baseline-compression-level`], then writes the data to the disk.
In case of successful handling of a request, it notifies the *cache worker* about this
event using the queue.
The queue has a limited size of [`worker-event-queue-size`]. If it is full, it will drop
new events until the *cache worker* pops some event from the queue.
Cache worker
------------
The cache worker runs in a single thread with lower priority and pops events from the queue
in a loop handling them one by one.
### On GET request
1. Read the statistics file for the cache file,
increase the usage counter and write it back to the disk.
2. Attempt recompressing the cache file if all of the following conditions are met:
- usage counter exceeds [`optimized-compression-usage-counter-threshold`],
- the file is compressed with compression level lower than [`optimized-compression-level`],
- no other worker has started working on this particular task within the last
[`optimizing-compression-task-timeout`] interval.
When recompressing, [`optimized-compression-level`] is used as a compression level.
### On UPDATE request
1. Write a fresh statistics file for the cache file.
2. Clean up the cache if no worker has attempted to do this within the last [`cleanup-interval`].
During this task:
- all unrecognized files and expired task locks in cache directory will be deleted
- if [`file-count-soft-limit`] or [`files-total-size-soft-limit`] is exceeded,
then recognized files will be deleted according to
[`file-count-limit-percent-if-deleting`] and [`files-total-size-limit-percent-if-deleting`].
Wasmtime uses [Least Recently Used (LRU)] cache replacement policy and requires that
the filesystem maintains proper mtime (modification time) of the files.
Files with future mtimes are treated specially - more details
in [`allowed-clock-drift-for-files-from-future`].
### Metadata files
- every cached WebAssembly module has its own statistics file
- every lock is a file

277
docs/cli-cache.md

@ -1,3 +1,278 @@
# Cache Configuration of `wasmtime`
... more coming soon
The cache configuration file uses the [toml] format.
You can create a configuration file at the default location with:
```
$ wasmtime --create-cache-config
```
It will print the location regardless of the success.
Please refer to the `--help` message for using a custom location.
All settings, except `enabled`, are **optional**.
If the setting is not specified, the **default** value is used.
***Thus, if you don't know what values to use, don't specify them.***
The default values might be tuned in the future.
Wasmtime assumes all the options are in the `cache` section.
Example config:
```toml
[cache]
enabled = true
directory = "/nfs-share/wasmtime-cache/"
cleanup-interval = "30m"
files-total-size-soft-limit = "1Gi"
```
Please refer to the [cache system] section to learn how it works.
If you think some default value should be tuned, some new settings
should be introduced or some behavior should be changed, you are
welcome to discuss it and contribute to [the Wasmtime repository].
[the Wasmtime repository]: https://github.com/CraneStation/wasmtime
Setting `enabled`
-----------------
- **type**: boolean
- **format**: `true | false`
- **default**: `true`
Specifies whether the cache system is used or not.
This field is *mandatory*.
The default value is used when configuration file is not specified
and none exists at the default location.
[`enabled`]: #setting-enabled
Setting `directory`
-----------------
- **type**: string (path)
- **default**: look up `cache_dir` in [directories] crate
Specifies where the cache directory is. Must be an absolute path.
[`directory`]: #setting-directory
Setting `worker-event-queue-size`
-----------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"16"`
Size of [cache worker] event queue.
If the queue is full, incoming cache usage events will be dropped.
[`worker-event-queue-size`]: #setting-worker-event-queue-size
Setting `baseline-compression-level`
------------------
- **type**: integer
- **default**: `3`, the default zstd compression level
Compression level used when a new cache file is being written by the [cache system].
Wasmtime uses [zstd] compression.
[`baseline-compression-level`]: #setting-baseline-compression-level
Setting `optimized-compression-level`
------------------
- **type**: integer
- **default**: `20`
Compression level used when the [cache worker] decides to recompress a cache file.
Wasmtime uses [zstd] compression.
[`optimized-compression-level`]: #setting-optimized-compression-level
Setting `optimized-compression-usage-counter-threshold`
------------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"256"`
One of the conditions for the [cache worker] to recompress a cache file
is to have usage count of the file exceeding this threshold.
[`optimized-compression-usage-counter-threshold`]: #setting-optimized-compression-usage-counter-threshold
Setting `cleanup-interval`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"1h"`
When the [cache worker] is notified about a cache file being updated by the [cache system]
and this interval has already passed since last cleaning up,
the worker will attempt a new cleanup.
Please also refer to [`allowed-clock-drift-for-files-from-future`].
[`cleanup-interval`]: #setting-cleanup-interval
Setting `optimizing-compression-task-timeout`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"30m"`
When the [cache worker] decides to recompress a cache file, it makes sure that
no other worker has started the task for this file within the last
[`optimizing-compression-task-timeout`] interval.
If some worker has started working on it, other workers are skipping this task.
Please also refer to the [`allowed-clock-drift-for-files-from-future`] section.
[`optimizing-compression-task-timeout`]: #setting-optimizing-compression-task-timeout
Setting `allowed-clock-drift-for-files-from-future`
------------------
- **type**: string (duration)
- **format**: `"{integer}(s | m | h | d)"`
- **default**: `"1d"`
### Locks
When the [cache worker] attempts acquiring a lock for some task,
it checks if some other worker has already acquired such a lock.
To be fault tolerant and eventually execute every task,
the locks expire after some interval.
However, because of clock drifts and different timezones,
it would happen that some lock was created in the future.
This setting defines a tolerance limit for these locks.
If the time has been changed in the system (i.e. two years backwards),
the [cache system] should still work properly.
Thus, these locks will be treated as expired
(assuming the tolerance is not too big).
### Cache files
Similarly to the locks, the cache files or their metadata might
have modification time in distant future.
The cache system tries to keep these files as long as possible.
If the limits are not reached, the cache files will not be deleted.
Otherwise, they will be treated as the oldest files, so they might survive.
If the user actually uses the cache file, the modification time will be updated.
[`allowed-clock-drift-for-files-from-future`]: #setting-allowed-clock-drift-for-files-from-future
Setting `file-count-soft-limit`
------------------
- **type**: string (SI prefix)
- **format**: `"{integer}(K | M | G | T | P)?"`
- **default**: `"65536"`
Soft limit for the file count in the cache directory.
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`file-count-soft-limit`]: #setting-file-count-soft-limit
Setting `files-total-size-soft-limit`
------------------
- **type**: string (disk space)
- **format**: `"{integer}(K | Ki | M | Mi | G | Gi | T | Ti | P | Pi)?"`
- **default**: `"512Mi"`
Soft limit for the total size* of files in the cache directory.
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
*this is the file size, not the space physically occupied on the disk.
[`files-total-size-soft-limit`]: #setting-files-total-size-soft-limit
Setting `file-count-limit-percent-if-deleting`
------------------
- **type**: string (percent)
- **format**: `"{integer}%"`
- **default**: `"70%"`
If [`file-count-soft-limit`] is exceeded and the [cache worker] performs the cleanup task,
then the worker will delete some cache files, so after the task,
the file count should not exceed
[`file-count-soft-limit`] * [`file-count-limit-percent-if-deleting`].
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`file-count-limit-percent-if-deleting`]: #setting-file-count-limit-percent-if-deleting
Setting `files-total-size-limit-percent-if-deleting`
------------------
- **type**: string (percent)
- **format**: `"{integer}%"`
- **default**: `"70%"`
If [`files-total-size-soft-limit`] is exceeded and [cache worker] performs the cleanup task,
then the worker will delete some cache files, so after the task,
the files total size should not exceed
[`files-total-size-soft-limit`] * [`files-total-size-limit-percent-if-deleting`].
This doesn't include files with metadata.
To learn more, please refer to the [cache system] section.
[`files-total-size-limit-percent-if-deleting`]: #setting-files-total-size-limit-percent-if-deleting
[toml]: https://github.com/toml-lang/toml
[directories]: https://crates.io/crates/directories
[cache system]: #how-does-the-cache-work
[cache worker]: #how-does-the-cache-work
[zstd]: https://facebook.github.io/zstd/
[Least Recently Used (LRU)]: https://en.wikipedia.org/wiki/Cache_replacement_policies#Least_recently_used_(LRU)
How does the cache work?
========================
**This is an implementation detail and might change in the future.**
Information provided here is meant to help understanding the big picture
and configuring the cache.
There are two main components - the *cache system* and the *cache worker*.
Cache system
------------
Handles GET and UPDATE cache requests.
- **GET request** - simply loads the cache from disk if it is there.
- **UPDATE request** - compresses received data with [zstd] and [`baseline-compression-level`], then writes the data to the disk.
In case of successful handling of a request, it notifies the *cache worker* about this
event using the queue.
The queue has a limited size of [`worker-event-queue-size`]. If it is full, it will drop
new events until the *cache worker* pops some event from the queue.
Cache worker
------------
The cache worker runs in a single thread with lower priority and pops events from the queue
in a loop handling them one by one.
### On GET request
1. Read the statistics file for the cache file,
increase the usage counter and write it back to the disk.
2. Attempt recompressing the cache file if all of the following conditions are met:
- usage counter exceeds [`optimized-compression-usage-counter-threshold`],
- the file is compressed with compression level lower than [`optimized-compression-level`],
- no other worker has started working on this particular task within the last
[`optimizing-compression-task-timeout`] interval.
When recompressing, [`optimized-compression-level`] is used as a compression level.
### On UPDATE request
1. Write a fresh statistics file for the cache file.
2. Clean up the cache if no worker has attempted to do this within the last [`cleanup-interval`].
During this task:
- all unrecognized files and expired task locks in cache directory will be deleted
- if [`file-count-soft-limit`] or [`files-total-size-soft-limit`] is exceeded,
then recognized files will be deleted according to
[`file-count-limit-percent-if-deleting`] and [`files-total-size-limit-percent-if-deleting`].
Wasmtime uses [Least Recently Used (LRU)] cache replacement policy and requires that
the filesystem maintains proper mtime (modification time) of the files.
Files with future mtimes are treated specially - more details
in [`allowed-clock-drift-for-files-from-future`].
### Metadata files
- every cached WebAssembly module has its own statistics file
- every lock is a file

4
installer/msi/wasmtime.wxs

@ -58,9 +58,6 @@
<Component Id="README" Guid="*">
<File Id="README.md" Source="README.md" KeyPath="yes" Checksum="yes"/>
</Component>
<Component Id="CACHE_CONFIGURATION" Guid="*">
<File Id="CACHE_CONFIGURATION.md" Source="docs\CACHE_CONFIGURATION.md" KeyPath="yes" Checksum="yes"/>
</Component>
</DirectoryRef>
<DirectoryRef Id="BINDIR">
@ -77,7 +74,6 @@
<ComponentRef Id="wasm2obj.exe" />
<ComponentRef Id="LICENSE" />
<ComponentRef Id="README" />
<ComponentRef Id="CACHE_CONFIGURATION" />
<ComponentRef Id="InstallDir" />
</Feature>
<Feature Id="AddToPath"

Loading…
Cancel
Save