emmi_data_management.diskcache.cache_benchmark

Attributes

Functions

underlying_fs([cache_size])

This function returns the underlying filesystem, either S3 or OCI, based on the USE_OCI flag.

naive_fs(cache_size)

naive_sqlite_fs(cache_size)

local_fs(cache_size)

initialize_worker(fs_factory)

This is the initializer function for each worker process.

read_file(path)

benchmark(fs_factory, name, cache_size, num_workers, ...)

Module Contents

emmi_data_management.diskcache.cache_benchmark.USE_OCI = False
emmi_data_management.diskcache.cache_benchmark.underlying_fs(cache_size=0)

This function returns the underlying filesystem, either S3 or OCI, based on the USE_OCI flag. 1. If USE_OCI is False, it returns an S3 filesystem pointing to a MinIO server. You have to run a local MinIO server for this to work from https://www.min.io/download?platform=linux&arch=amd64 then run MINIO_BROWSER=off ./minio server –address “:9123” /mnt/localdisk/benchmark/minio

emmi_data_management.diskcache.cache_benchmark.naive_fs(cache_size)
Parameters:

cache_size (int)

emmi_data_management.diskcache.cache_benchmark.naive_sqlite_fs(cache_size)
Parameters:

cache_size (int)

emmi_data_management.diskcache.cache_benchmark.local_fs(cache_size)
Parameters:

cache_size (int)

emmi_data_management.diskcache.cache_benchmark.local
emmi_data_management.diskcache.cache_benchmark.initialize_worker(fs_factory)

This is the initializer function for each worker process. It creates an instance of HeavyObject and assigns it to a global variable.

Parameters:

fs_factory (collections.abc.Callable[[], fsspec.AbstractFileSystem])

emmi_data_management.diskcache.cache_benchmark.read_file(path)
emmi_data_management.diskcache.cache_benchmark.benchmark(fs_factory, name, cache_size, num_workers, prefix, max_files=100, warmup_fraction=0.0, executor=ThreadPoolExecutor)
Parameters:
emmi_data_management.diskcache.cache_benchmark.impls