How to Use Private Data Source

Warning

The information below is incomplete and requires extra work. Use it at your own risk!

Interface and Examples

EDM supports two types of executions:

  1. emmi-data <SERVICE_NAME> <COMMAND> — specify a service name (each has unique commands).

  2. emmi-data <COMMAND> — run a global command.

Hugging Face

emmi-data huggingface estimate EmmiAI/AB-UPT
emmi-data huggingface ext EmmiAI/NeuralDEM .th ~/data --type model --manifest-out manifest.json

The ext command downloads all .th files from EmmiAI/NeuralDEM into ~/data. The --manifest-out option writes a manifest for integrity checks.

AWS

emmi-data aws estimate noaa-goes16 ABI-L1b-RadC/2023/001/00/
emmi-data aws fetch my-bucket data/prefix/ ./data --extension .parquet --manifest-out s3-manifest.json

The fetch command downloads only .parquet files into ./data, while creating a manifest file.

Verification

Verification determines whether files are complete. If manifest.json exists, corrupted or missing files can be redownloaded:

emmi-data verification check -r ./data -m manifest.json --action redownload

If no manifest exists, create one with:

emmi-data verification build -r ./data -m manifest.json

To explore all options, use the --help flag.