ksuit.utils.data.data_container =============================== .. py:module:: ksuit.utils.data.data_container Classes ------- .. autoapisummary:: ksuit.utils.data.data_container.DataContainer Module Contents --------------- .. py:class:: DataContainer(datasets, num_workers = None, pin_memory = True) Container that holds datasets and provides utilities for datasets and data loading. :param datasets: A dictionary with datasets for the training run. :param num_workers: Number of data loading workers to use. If None, will use (#CPUs / #GPUs - 1) workers. The `-1` keeps 1 CPU free for the main process. Defaults to None. :param pin_memory: Is passed as `pin_memory` to `torch.utils.data.DataLoader`. Defaults to True. .. py:attribute:: logger .. py:attribute:: datasets .. py:attribute:: num_workers :value: None .. py:attribute:: pin_memory :value: True .. py:method:: get_dataset(key = None, properties = None, max_size = None, shuffle_seed = None) Returns the dataset identified by key (or the first dataset if no key is provided) with optional wrapping into a :class:`ShuffleWrapper` (via `shuffle_seed`), a :class:`SubsetWrapper` (via `max_size`) or a :class:`PropertySubsetWrapper`. Note that the wrappers can be used at once or individually, in case when all arguments are provided the order will be: Dataset -> ShuffleWrapper(Optional) -> SubsetWrapper(Optional) -> PropertySubsetWrapper(Optional) :param key: Identifier of the dataset. If None, returns the first dataset of the `DataContainer`. Defaults to None. :param properties: If defined, overrides the properties to load from the dataset. If not defined, uses the properties defined in the dataset itself or all properties if none are defined. :param max_size: If defined, wraps the dataset into a SubsetWrapper with the specified `max_size`. Default: None (no wrapping) :param shuffle_seed: If defined, wraps the dataset into a ShuffleWrapper with the specified `shuffle_seed`. Defaults to None (=no wrapping). :returns: Dataset of the DataContainer optionally wrapped into dataset wrappers. :rtype: Dataset .. py:method:: get_main_sampler(train_dataset, shuffle = True) Creates the `main_sampler` for data loading. :param train_dataset: Dataset that is used for training. :param shuffle: Either or not to randomly shuffle the sampled indices before every epoch. Defaults to True. :returns: Sampler to be used for sampling indices of the `train_dataset`. :rtype: Sampler .. py:method:: get_data_loader(train_sampler, train_collator, batch_size, epochs, updates, samples, callback_samplers, start_epoch = None) Creates a `torch.utils.data.DataLoader` that can be used for efficient data loading by utilizing an `InterleavedSampler` based on the `main_sampler`, `configs` and other arguments that are passed to this method. :param train_sampler: Sampler to be used for the main dataset (i.e., training dataset). :param train_collator: Collator to collate samples from the main dataset (i.e., training dataset). :param batch_size: batch_size to use for training. :param epochs: For how many epochs does the training last. :param updates: For how many updates does the training last. :param samples: For how many samples does the training last. :param callback_samplers: List of SamplerIntervalConfigs to use for callback sampling. :param start_epoch: At which epoch to start (used for resuming training). Mutually exclusive with `start_update` and `start_sample`. :returns: Object from which data can be loaded according to the specified configuration. :rtype: DataLoader