MultiheadAttention¶
- class dragon.vm.torch.nn.MultiheadAttention(
 embed_dim,
 num_heads,
 dropout=0.0,
 bias=True,
 kdim=None,
 vdim=None
 )[source]¶
- Apply the multihead attention. [Vaswani et.al, 2017]. 
__init__¶
- MultiheadAttention.- __init__(
 embed_dim,
 num_heads,
 dropout=0.0,
 bias=True,
 kdim=None,
 vdim=None
 )[source]¶
- Create a - MultiheadAttentionmodule.- Parameters:
- embed_dim (int) – The dimension of input embeddings.
- num_heads (int) – The number of parallel heads.
- dropout (float, optional, default=0.) – The probability to set the attention to zero.
- bias (bool, optional, default=True) – Add a bias tensor to output or not.
- kdim (int, optional) – The dimension of key embedding.
- vdim (int, optional) – The dimension of value embedding.
 
 
