blocks.transformer_block_config

Classes

TransformerBlockConfig

Configuration for a transformer block.

Module Contents

class blocks.transformer_block_config.TransformerBlockConfig(/, **data)

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:

data (Any)

hidden_dim: int = None

Hidden Dimension of the transformer block.

num_heads: int = None

Number of attention heads.

mlp_hidden_dim: int | None = None

Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None

Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None

Probability to drop the attention or MLP module. Defaults to 0.0.

normalization_constructor: type

Constructor for the normalization layer.

attention_constructor: type

Constructor of the attention module. Defaults to DotProductAttention..

layerscale: float | None = None

Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None

Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None

Whether to use biases in norm/projections. Defaults to True.

eps: float = None

Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: emmi.types.InitWeightsMode = None

Initialization method for the weight matrixes of the network. Defaults to “truncnormal002

use_rope: bool = None

Whether to use Rotary Positional Embeddings (RoPE).

attention_arguments: dict

Additional arguments for the attention module that are only needed for a specific attention impelentation.

set_mlp_hidden_dim()