blocks.transformer_block_config =============================== .. py:module:: blocks.transformer_block_config Classes ------- .. autoapisummary:: blocks.transformer_block_config.TransformerBlockConfig Module Contents --------------- .. py:class:: TransformerBlockConfig(/, **data) Bases: :py:obj:`pydantic.BaseModel` Configuration for a transformer block. Create a new model by parsing and validating input data from keyword arguments. Raises [`ValidationError`][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. `self` is explicitly positional-only to allow `self` as a field name. .. py:attribute:: hidden_dim :type: int :value: None Hidden Dimension of the transformer block. .. py:attribute:: num_heads :type: int :value: None Number of attention heads. .. py:attribute:: mlp_hidden_dim :type: int | None :value: None Hidden dimension of the MLP layer. If set to None, the mlp_hidden dim is set to hidden_dim * mlp_expansion_factor in the TransformerConfig. If both are None, an error is raised. .. py:attribute:: mlp_expansion_factor :type: int | None :value: None Expansion factor for the MLP hidden dimension relative to the hidden dimension. If 'mlp_hidden_dim' is not set, this factor is used to compute it as hidden_dim * mlp_expansion_factor. .. py:attribute:: drop_path :type: float :value: None Probability to drop the attention or MLP module. Defaults to 0.0. .. py:attribute:: normalization_constructor :type: type Constructor for the normalization layer. .. py:attribute:: attention_constructor :type: type Constructor of the attention module. Defaults to DotProductAttention.. .. py:attribute:: layerscale :type: float | None :value: None Init scale value to scale layer activations. Defaults to None. .. py:attribute:: condition_dim :type: int | None :value: None Dimension of the conditioning vector. If none, no conditioning is applied. If provided, the transformer block will turn into a Diffusion Transformer (DiT) block. .. py:attribute:: bias :type: bool :value: None Whether to use biases in norm/projections. Defaults to True. .. py:attribute:: eps :type: float :value: None Epsilon Value for the layer nornalization. Defaults to 1e-6. .. py:attribute:: init_weights :type: emmi.types.InitWeightsMode :value: None Initialization method for the weight matrixes of the network. Defaults to "truncnormal002 .. py:attribute:: use_rope :type: bool :value: None Whether to use Rotary Positional Embeddings (RoPE). .. py:attribute:: attention_arguments :type: dict Additional arguments for the attention module that are only needed for a specific attention impelentation. .. py:method:: set_mlp_hidden_dim()