blocks.transformer_block_config¶

Classes¶

TransformerBlockConfig

Configuration for a transformer block.

Module Contents¶

class blocks.transformer_block_config.TransformerBlockConfig(/, **data)¶

Bases: pydantic.BaseModel

Configuration for a transformer block.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Parameters:: data (Any)

hidden_dim: int = None¶: Hidden Dimension of the transformer block.

num_heads: int = None¶: Number of attention heads.

mlp_hidden_dim: int | None = None¶: Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised.

mlp_expansion_factor: int | None = None¶: Expansion factor for the MLP hidden dimension relative to the hidden dimension. If ‘mlp_hidden_dim’ is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor.

drop_path: float = None¶: Probability to drop the attention or MLP module. Defaults to 0.0.

normalization_constructor: type¶: Constructor for the normalization layer.

attention_constructor: type¶: Constructor of the attention module. Defaults to DotProductAttention..

layerscale: float | None = None¶: Init scale value to scale layer activations. Defaults to None.

condition_dim: int | None = None¶: Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block.

bias: bool = None¶: Whether to use biases in norm/projections. Defaults to True.

eps: float = None¶: Epsilon Value for the layer nornalization. Defaults to 1e-6.

init_weights: emmi.types.InitWeightsMode = None¶: Initialization method for the weight matrixes of the network. Defaults to “truncnormal002

use_rope: bool = None¶: Whether to use Rotary Positional Embeddings (RoPE).

attention_arguments: dict¶: Additional arguments for the attention module that are only needed for a specific attention impelentation.

set_mlp_hidden_dim()¶