Introduction to Emmi Framework
==============================

Main Components
---------------
.. warning::
   The module layout is still evolving. This page describes the *current* components and may change in future releases.

Emmi Framework is organized into the following Python modules:

- ``emmi`` - high-level ML and data modules
- ``emmi_data_management`` - data fetching and storage utilities
- ``emmi_inference`` - inference tools
- ``ksuit`` - low-level ML codebase responsible for the heavy-lifting of this framework

.. figure:: /_static/emmi_module_architecture.png
   :alt: Emmi Module Architecture
   :align: center
   :width: 600px

   Core layering of modules

How Do They Interact
--------------------

The interaction between existing modules is better described using a typical workflow. Let's say, a user wants
to train a model, like `AB-UPT <https://arxiv.org/abs/2502.09692>`_, to do so they have two options:

1. Use configuration files to set up an experiment
2. Use code and tailor it to specific needs

In either case, the same underlying shared codebase is used to ensure consistent behavior.

Our main buildings blocks are located in ``ksuit`` (think of it as a **core** package). It takes care of things like
object factories, collators, trainers, runners, trackers, etc. All of which have **Base** classes that can be used as
abstract classes to create custom variations, as well as ready-to-use implementations with clearly defined usage
patterns.

To account for various levels of expertise (e.g. a seasoned ML engineer, a PhD student, a CFD expert, etc.) we provide
multiple abstraction levels. The high-level ``emmi`` module gives a list of convenient and
frequently used blocks to get things going. It fully relies on ``ksuit`` and is ready to be extended with your custom
logic when necessary. In most cases it is recommended to extend ``emmi`` first rather than diving directly into
``ksuit``.

Data is at the core of any machine learning workflow and we offer tools relevant to data fetching as part
of the ``emmi_data_management``. Currently it supports data fetching and validation from HuggingFace and AWS S3.
By sharing feedback about your preferred way of storing and accessing data, you can help us prioritize future features.
This module doesn't depend on any of the other modules and can be used standalone and/or part of other modules when
needed.

The inference engine offers the flexibility of running inference using arbitrary models via CLI or code. It's worth
mentioning a necessity of registering custom models prior to using the CLI. ``emmi_inference`` relies on ``emmi``
mainly to have an access to custom models, datasets, etc. The required CLI arguments simply converge to:

- a config path the defines data, collators, model related execution parameters, and output settings;
- a model type (must have a match the registry);
- a checkpoint path.