ksuit.data.samplers.interleaved_sampler
=======================================

.. py:module:: ksuit.data.samplers.interleaved_sampler


Classes
-------

.. autoapisummary::

   ksuit.data.samplers.interleaved_sampler.SamplerIntervalConfig
   ksuit.data.samplers.interleaved_sampler.InterleavedSamplerConfig
   ksuit.data.samplers.interleaved_sampler.InterleavedSampler


Module Contents
---------------

.. py:class:: SamplerIntervalConfig

   Configuration dataclass for setting up the dataloading pipeline, which is structured to load data from a "main"
   dataset (i.e., the dataset used for training), which is interleaved by iterations over other datasets (e.g., a
   test dataset to calculate a metric in a callback) in regular intervals.

   :param sampler: Any sampler that would be used in `torch.utils.data.DataLoader(sampler=...)`.
                   Examples: `RandomSampler` for a training dataset or `SequentialSampler` for evaluation.
   :type sampler: SizedIterable
   :param every_n_epochs: Epoch-based interval. Invokes the callback after every n epochs. Mutually
                          exclusive with other intervals.
   :type every_n_epochs: int | None
   :param every_n_updates: Update-based interval. Invokes the callback after every n epochs. Mutually
                           exclusive with other intervals.
   :type every_n_updates: int | None
   :param every_n_samples: Sample-based interval. Invokes the callback after every n epochs. Mutually
                           exclusive with other intervals.
   :type every_n_samples: int | None
   :param pipeline: Any function that would be used in `torch.utils.data.DataLoader(collate_fn=...)`.
   :type pipeline: Optional[callable]
   :param batch_size: Batch size to use for this callback. Default: None (which will use the same batch_size
                      as used for the "main" sampler, i.e., the one used for training).
   :type batch_size: int | None


   .. py:attribute:: sampler
      :type:  ksuit.utils.common.SizedIterable


   .. py:attribute:: pipeline
      :type:  collections.abc.Callable | None


   .. py:attribute:: every_n_epochs
      :type:  int | None
      :value: None


   .. py:attribute:: every_n_updates
      :type:  int | None
      :value: None


   .. py:attribute:: every_n_samples
      :type:  int | None
      :value: None


   .. py:attribute:: batch_size
      :type:  int | None
      :value: None


   .. py:method:: validate_frequency()

      Ensures that exactly one frequency ('every_n_*') is specified and
      that 'batch_size' is present if 'every_n_samples' is used.


   .. py:method:: check_positive_values(v)
      :classmethod:


      Ensures that all integer-based frequency and batch size fields are positive.


.. py:class:: InterleavedSamplerConfig(/, **data)

   Bases: :py:obj:`pydantic.BaseModel`


   !!! abstract "Usage Documentation"
       [Models](../concepts/models.md)

   A base class for creating Pydantic models.

   .. attribute:: __class_vars__

      The names of the class variables defined on the model.

   .. attribute:: __private_attributes__

      Metadata about the private attributes of the model.

   .. attribute:: __signature__

      The synthesized `__init__` [`Signature`][inspect.Signature] of the model.

   .. attribute:: __pydantic_complete__

      Whether model building is completed, or if there are still undefined fields.

   .. attribute:: __pydantic_core_schema__

      The core schema of the model.

   .. attribute:: __pydantic_custom_init__

      Whether the model has a custom `__init__` function.

   .. attribute:: __pydantic_decorators__

      Metadata containing the decorators defined on the model.
      This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.

   .. attribute:: __pydantic_generic_metadata__

      Metadata for generic models; contains data used for a similar purpose to
      __args__, __origin__, __parameters__ in typing-module generics. May eventually be replaced by these.

   .. attribute:: __pydantic_parent_namespace__

      Parent namespace of the model, used for automatic rebuilding of models.

   .. attribute:: __pydantic_post_init__

      The name of the post-init method for the model, if defined.

   .. attribute:: __pydantic_root_model__

      Whether the model is a [`RootModel`][pydantic.root_model.RootModel].

   .. attribute:: __pydantic_serializer__

      The `pydantic-core` `SchemaSerializer` used to dump instances of the model.

   .. attribute:: __pydantic_validator__

      The `pydantic-core` `SchemaValidator` used to validate instances of the model.

   .. attribute:: __pydantic_fields__

      A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.

   .. attribute:: __pydantic_computed_fields__

      A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.

   .. attribute:: __pydantic_extra__

      A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
      is set to `'allow'`.

   .. attribute:: __pydantic_fields_set__

      The names of fields explicitly set during instantiation.

   .. attribute:: __pydantic_private__

      Values of private attributes set on the model instance.

   Create a new model by parsing and validating input data from keyword arguments.

   Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be
   validated to form a valid model.

   `self` is explicitly positional-only to allow `self` as a field name.


   .. py:attribute:: batch_size
      :type:  int

      batch_size to use for creating batches of the main_sampler indices.


   .. py:attribute:: drop_last
      :type:  bool
      :value: True


      Whether to drop the last non-full batch of the main_sampler.


   .. py:attribute:: max_epochs
      :type:  int | None
      :value: None


      How many epochs to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.


   .. py:attribute:: max_updates
      :type:  int | None
      :value: None


      How many updates to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.


   .. py:attribute:: max_samples
      :type:  int | None
      :value: None


      How many samples to sample at most from the main_sampler. Whatever limit is reached first (epochs/updates/samples) will stop the sampling.


   .. py:attribute:: start_epoch
      :type:  int | None
      :value: None


      At which epoch to start (used for resuming training). Mutually exclusive with `start_update` and `start_sample`.


   .. py:attribute:: start_update
      :type:  int | None
      :value: None


      At which update to start (used for resuming training). Mutually exclusive with `start_epoch` and `start_sample`.


   .. py:attribute:: start_sample
      :type:  int | None
      :value: None


      At which sample to start (used for resuming training). Mutually exclusive with `start_epoch` and `start_update`.


   .. py:method:: check_positive_values(v)
      :classmethod:


      Ensures that all integer-based frequency and batch size fields are positive.


   .. py:method:: validate_stop()

      Ensures that at least one frequency ('*_n_*') is specified and


   .. py:method:: validate_start()

      Ensures that at least one start ('start_*') is specified


.. py:class:: InterleavedSampler(train_sampler, config, train_collator = None, callback_samplers = None)

   Sampler to allow efficient dataloading by using a single large dataset containing train/test/... datasets all at
   once. The sampler will sample from different regionis in the dataset according to its specification. For example,
   consider a training dataset of length 100 and a test dataset of length 10. If the sampler is configured with a
   RandomSampler of the training dataset indices as main_sampler, it will repeatedly iterate over the training
   dataset. If the test dataset is configured with a sequential sampler that should be invoked after every epoch, the
   sampler will first return indices for the 100 training samples (randomly sampled) and then indices for the 10 test
   samples (in sequential order).

   :param train_sampler: Sampler that is invoked by default (e.g., randomly sample from the trainset)
   :param config: Configuration for the InterleavedSampler.
   :param train_collator: Collator used to collate samples from indices sampled from the train sampler.
   :param callback_samplers: Configurations when the train_sampler should be paused and
                             indices from other samplers (e.g., from a testset) should be returned. Also configures the interval and
                             optionally a different batch_size to use for the interleaved batches.


   .. py:attribute:: config


   .. py:attribute:: main_sampler


   .. py:attribute:: extra_samplers
      :value: []


   .. py:attribute:: index_offsets


   .. py:attribute:: dataset


   .. py:attribute:: collator


   .. py:attribute:: batch_sampler


   .. py:attribute:: batch_size


   .. py:method:: calculate_start(config, sampler_len)
      :staticmethod:


   .. py:method:: get_data_loader(num_workers = 0, pin_memory = False)

      Creates the DataLoader that uses the InterleavedSampler with the accordingly configured dataset.

      :param num_workers: Number of workers to use.
      :param pin_memory: Whether to use pin memory.

      :returns: DataLoader that uses the InterleavedSampler with the accordingly configured dataset.