emmi_data_management.interfaces.s3¶
Classes¶
str(object='') -> str |
|
dict() -> new empty dictionary |
Functions¶
Construct an S3 client from managed credentials (env or config). |
|
|
List S3 objects under bucket/prefix with an optional extension filter. Skips directory placeholders (keys ending with '/') and normalizes quoted ETags. |
|
Estimate size of objects under bucket/prefix with an optional extension filter. |
|
Download file from S3 bucket to local directory, preserving the key's subpath. |
|
Stream an S3 object as chunks of bytes. Intended to be used with higher-level atomic writers / hashing |
|
Lightweight HEAD to retrieve content length and etag (if available). |
|
Download all objects under bucket/prefix with an optional extension filter into a local directory. |
Module Contents¶
- class emmi_data_management.interfaces.s3.AWSSecrets¶
-
str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str
Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.
Initialize self. See help(type(self)) for accurate signature.
- AWS_ACCESS_KEY_ID = 'AWS_ACCESS_KEY_ID'¶
- AWS_SECRET_ACCESS_KEY = 'AWS_SECRET_ACCESS_KEY'¶
- AWS_SESSION_TOKEN = 'AWS_SESSION_TOKEN'¶
- AWS_REGION = 'AWS_REGION'¶
- AWS_DEFAULT_REGION = 'AWS_DEFAULT_REGION'¶
- AWS_ENDPOINT_URL = 'AWS_ENDPOINT_URL'¶
- class emmi_data_management.interfaces.s3.S3Object¶
Bases:
TypedDictdict() -> new empty dictionary dict(mapping) -> new dictionary initialized from a mapping object’s
(key, value) pairs
- dict(iterable) -> new dictionary initialized as if via:
d = {} for k, v in iterable:
d[k] = v
- dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
Initialize self. See help(type(self)) for accurate signature.
- emmi_data_management.interfaces.s3.get_s3_client()¶
Construct an S3 client from managed credentials (env or config). Expected keys (matching env names):
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY
optional AWS_SESSION_TOKEN
AWS_DEFAULT_REGION or AWS_REGION
optional AWS_ENDPOINT_URL (for MinIO / on‑prem / custom endpoints)
Falls back to unsigned access for public buckets when no non‑empty credentials are provided.
- Return type:
botocore.client.BaseClient
- emmi_data_management.interfaces.s3.list_s3_objects(bucket, prefix, extension=None)¶
List S3 objects under bucket/prefix with an optional extension filter. Skips directory placeholders (keys ending with ‘/’) and normalizes quoted ETags.
- emmi_data_management.interfaces.s3.estimate_s3_size(bucket, prefix, extension=None)¶
Estimate size of objects under bucket/prefix with an optional extension filter.
- emmi_data_management.interfaces.s3.fetch_s3_file(bucket, key, local_dir)¶
Download file from S3 bucket to local directory, preserving the key’s subpath.
- Parameters:
bucket (str) – Name of the S3 bucket.
key (str) – File key.
local_dir (pathlib.Path) – Path to local directory.
- Returns:
Local file path.
- Return type:
- emmi_data_management.interfaces.s3.iter_s3_object_chunks(bucket, key, *, chunk_size=1024 * 1024)¶
Stream an S3 object as chunks of bytes. Intended to be used with higher-level atomic writers / hashing in the CLI.
- Parameters:
- Yields:
Byte chunks from the object body.
- Return type:
- emmi_data_management.interfaces.s3.head_s3_object(bucket, key)¶
Lightweight HEAD to retrieve content length and etag (if available). :returns: (size_bytes, etag) with etag normalized (quotes stripped).
- emmi_data_management.interfaces.s3.fetch_s3_prefix(bucket, prefix, local_dir, extension=None, max_workers=8)¶
Download all objects under bucket/prefix with an optional extension filter into a local directory.
- Parameters:
bucket (str) – Name of the S3 bucket.
prefix (str) – File prefix.
local_dir (pathlib.Path) – Path to local directory.
extension (str | None) – Optional file extension. Defaults to None.
max_workers (int) – Number of workers to use for downloading. Defaults to 8.
- Returns:
A list of relative paths (keys) written.
- Return type: