MultiheadAttention

class dragon.vm.torch.nn.MultiheadAttention(
  embed_dim,
  num_heads,
  dropout=0.0,
  bias=True,
  kdim=None,
  vdim=None
)[source]

Apply the multihead attention. [Vaswani et.al, 2017].

__init__

MultiheadAttention.__init__(
  embed_dim,
  num_heads,
  dropout=0.0,
  bias=True,
  kdim=None,
  vdim=None
)[source]

Create a MultiheadAttention module.

Parameters:
  • embed_dim (int) – The dimension of input embeddings.
  • num_heads (int) – The number of parallel heads.
  • dropout (float, optional, default=0.) – The probability to set the attention to zero.
  • bias (bool, optional, default=True) – Add a bias tensor to output or not.
  • kdim (int, optional) – The dimension of key embedding.
  • vdim (int, optional) – The dimension of value embedding.