emmi.modules.blocks.perceiver_transformer_blockpair

Classes

PerceiverTransformerBlock

Base class for all neural network modules.

Module Contents

class emmi.modules.blocks.perceiver_transformer_blockpair.PerceiverTransformerBlock(hidden_dim, num_heads, transformer_attn_ctor=DotProductAttention, init_weights='truncnormal002', mlp_hidden_dim=None, drop_path=0.0)

Bases: torch.nn.Module

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes:

import torch.nn as nn
import torch.nn.functional as F

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 20, 5)
        self.conv2 = nn.Conv2d(20, 20, 5)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        return F.relu(self.conv2(x))

Submodules assigned in this way will be registered, and will have their parameters converted too when you call to(), etc.

Note

As per the example above, an __init__() call to the parent class must be made before assignment on the child.

Variables:

training (bool) – Boolean represents whether this module is in training or evaluation mode.

Parameters:
  • hidden_dim (int)

  • num_heads (int)

  • transformer_attn_ctor (type)

  • init_weights (str)

  • mlp_hidden_dim (int | None)

  • drop_path (float)

Instantiates a block which contains a perciever and a transformer block. :param hidden_dim: hidden Dimension of the transformer block. :param num_heads: Number of attention heads. :param mlp_hidden_dim: Hidden dim of the feed forward MLP after the self-attention. Defaults to None. :param init_weights: Initialization method for the weight matrixes of the network. Defaults to “truncnormal002”.

perceiver
transformer
forward(q, kv, transformer_attn_kwargs=None)

Forward pass of the transformer block. :param q: Input tensor with shape (batch_size, num_query_tokens, hidden_dim). :param kv: Input tensor with shape (batch_size, num_kv_tokens, hidden_dim). :param transformer_attn_kwargs: Dict with arguments for the attention of the transformer block (such as the

attention mask). Defaults to None.

Returns:

Result with shape (batch_size, num_query_tokens, hidden_dim).

Parameters:
  • q (torch.Tensor)

  • kv (torch.Tensor)

  • transformer_attn_kwargs (dict[str, Any] | None)

Return type:

torch.Tensor