emmi_data_management.interfaces.huggingface¶
Attributes¶
Functions¶
|
Estimate total size (bytes) of all files in a HF repo (model or dataset), |
|
Downloads all content from the specific HuggingFace repository. |
|
Downloads a specific file from a HuggingFace repository into a local directory. |
|
Downloads specific files from a HuggingFace repository with given extension. |
Module Contents¶
- emmi_data_management.interfaces.huggingface.HFRepoType¶
- emmi_data_management.interfaces.huggingface.estimate_hf_repo_size(repo_id, repo_type='model', revision='main', extension=None)¶
Estimate total size (bytes) of all files in a HF repo (model or dataset), optionally filtering by file-extension.
- Parameters:
- Returns:
Integer value for the total size in bytes.
- Return type:
- emmi_data_management.interfaces.huggingface.fetch_huggingface_repo_snapshot(repo_id, local_dir)¶
Downloads all content from the specific HuggingFace repository.
- Parameters:
repo_id (str) – ID of the HuggingFace repository.
local_dir (pathlib.Path) – Local directory to download content to.
- Returns:
None
- Return type:
None
- emmi_data_management.interfaces.huggingface.fetch_huggingface_file(repo_id, filename, local_dir, repo_type='model', revision='main')¶
Downloads a specific file from a HuggingFace repository into a local directory.
- Parameters:
repo_id (str) – ID of the HuggingFace repository.
filename (str) – Filename to download.
local_dir (pathlib.Path) – Local directory to download the file to.
repo_type (HFRepoType) – Repo type, either “model” or “dataset”. Defaults to “model”.
revision (str) – Revision of the repository. Defaults to “main”.
- Returns:
None
- Return type:
None
- emmi_data_management.interfaces.huggingface.fetch_huggingface_by_extension(repo_id, extension, local_dir, revision='main', repo_type='dataset', max_workers=8)¶
Downloads specific files from a HuggingFace repository with given extension.
- Parameters:
repo_id (str) – ID of the HuggingFace repository.
extension (str) – File extension to download.
local_dir (pathlib.Path) – Local directory to download the file to.
revision (str) – Revision of the repository. Defaults to “main”.
repo_type (HFRepoType) – Repo type, either “model” or “dataset”. Defaults to “dataset”.
max_workers (int) – Maximum number of workers to use for downloading.
- Returns:
A list of downloaded files.
- Return type: