Introduction to Emmi Framework¶

Main Components¶

Warning

The module layout is still evolving. This page describes the current components and may change in future releases.

Emmi Framework is organized into the following Python modules:

emmi - high-level ML and data modules
emmi_data_management - data fetching and storage utilities
emmi_inference - inference tools
ksuit - low-level ML codebase responsible for the heavy-lifting of this framework

Emmi Module Architecture — Core layering of modules¶

How Do They Interact¶

The interaction between existing modules is better described using a typical workflow. Let’s say, a user wants to train a model, like AB-UPT, to do so they have two options:

Use configuration files to set up an experiment
Use code and tailor it to specific needs

In either case, the same underlying shared codebase is used to ensure consistent behavior.

Our main buildings blocks are located in ksuit (think of it as a core package). It takes care of things like object factories, collators, trainers, runners, trackers, etc. All of which have Base classes that can be used as abstract classes to create custom variations, as well as ready-to-use implementations with clearly defined usage patterns.

To account for various levels of expertise (e.g. a seasoned ML engineer, a PhD student, a CFD expert, etc.) we provide multiple abstraction levels. The high-level emmi module gives a list of convenient and frequently used blocks to get things going. It fully relies on ksuit and is ready to be extended with your custom logic when necessary. In most cases it is recommended to extend emmi first rather than diving directly into ksuit.

Data is at the core of any machine learning workflow and we offer tools relevant to data fetching as part of the emmi_data_management. Currently it supports data fetching and validation from HuggingFace and AWS S3. By sharing feedback about your preferred way of storing and accessing data, you can help us prioritize future features. This module doesn’t depend on any of the other modules and can be used standalone and/or part of other modules when needed.

The inference engine offers the flexibility of running inference using arbitrary models via CLI or code. It’s worth mentioning a necessity of registering custom models prior to using the CLI. emmi_inference relies on emmi mainly to have an access to custom models, datasets, etc. The required CLI arguments simply converge to:

a config path the defines data, collators, model related execution parameters, and output settings;
a model type (must have a match the registry);
a checkpoint path.