emmi_data_management.verification

Attributes

Exceptions

ParallelErrors

Common base class for all non-exit exceptions.

Classes

FailAction

str(object='') -> str

FileRecord

VerificationResult

VerificationType

str(object='') -> str

Functions

parallel_map_collect_errors(fn, items[, max_workers])

hash_file(path[, chunk_size_mb])

Reads file bytes as chunks and hashes them.

build_manifest(root[, jobs, include_hash])

Creates a manifest from the given root directory.

save_manifest(manifest, path)

Saves the given manifest to the given path.

load_manifest(path)

Loads the given manifest from the given path.

verify_tree(root, manifest[, jobs, require_hash])

Verifies the given manifest against the given root directory.

Module Contents

emmi_data_management.verification.HashType
emmi_data_management.verification.T
emmi_data_management.verification.R
class emmi_data_management.verification.FailAction

Bases: str, enum.Enum

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

Initialize self. See help(type(self)) for accurate signature.

WARN = 'warn'
DELETE = 'delete'
REDOWNLOAD = 'redownload'
ABORT = 'abort'
describe()
Return type:

str

class emmi_data_management.verification.FileRecord
path: str
size: int
hash: str | None
etag: str | None
source: dict[str, Any] | None
class emmi_data_management.verification.VerificationResult
ok: list[str]
missing: list[str]
extra: list[str]
size_mismatch: list[str]
hash_mismatch: list[str]
class emmi_data_management.verification.VerificationType

Bases: str, enum.Enum

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

Initialize self. See help(type(self)) for accurate signature.

SIZE = 'size'
HASH = 'hash'
exception emmi_data_management.verification.ParallelErrors(errors)

Bases: Exception

Common base class for all non-exit exceptions.

Initialize self. See help(type(self)) for accurate signature.

Parameters:

errors (list[tuple[Any, BaseException]])

errors
emmi_data_management.verification.Manifest
emmi_data_management.verification.parallel_map_collect_errors(fn, items, max_workers=8)
Parameters:
Return type:

list[R]

emmi_data_management.verification.hash_file(path, chunk_size_mb=1)

Reads file bytes as chunks and hashes them.

Parameters:
  • path (pathlib.Path) – Input file path.

  • chunk_size_mb (int) – Chunk size in megabytes.

Returns:

  • The hash of the file.

Return type:

str

emmi_data_management.verification.build_manifest(root, jobs=4, include_hash=True)

Creates a manifest from the given root directory.

Parameters:
  • root (pathlib.Path) – Input root directory.

  • jobs (int) – A number of jobs to run in parallel.

  • include_hash (bool) – If True, include the hash of the file in the manifest. Defaults to True.

Returns:

  • A dictionary containing the manifest.

Return type:

Manifest

emmi_data_management.verification.save_manifest(manifest, path)

Saves the given manifest to the given path.

Parameters:
  • manifest (Manifest) – Input manifest to save.

  • path (pathlib.Path) – Output file path.

Returns:

  • None

Return type:

None

emmi_data_management.verification.load_manifest(path)

Loads the given manifest from the given path. :param path: Input file path.

Returns:

  • A dictionary containing the manifest.

Parameters:

path (pathlib.Path)

Return type:

Manifest

emmi_data_management.verification.verify_tree(root, manifest, jobs=4, require_hash=True)

Verifies the given manifest against the given root directory.

Parameters:
  • root (pathlib.Path) – Input root directory.

  • manifest (Manifest) – Input manifest to verify.

  • jobs (int) – Number of jobs to run in parallel.

  • require_hash (bool) – If True, require hash of the file in the manifest. Defaults to True.

Returns:

  • An instance of VerificationResult.

Return type:

VerificationResult