MultiheadAttention¶
- class
dragon.vm.torch.nn.
MultiheadAttention
(
embed_dim,
num_heads,
dropout=0.0,
bias=True,
kdim=None,
vdim=None
)[source]¶ Apply the multihead attention. [Vaswani et.al, 2017].
__init__¶
MultiheadAttention.
__init__
(
embed_dim,
num_heads,
dropout=0.0,
bias=True,
kdim=None,
vdim=None
)[source]¶Create a
MultiheadAttention
module.- Parameters:
- embed_dim (int) – The dimension of input embeddings.
- num_heads (int) – The number of parallel heads.
- dropout (float, optional, default=0.) – The probability to set the attention to zero.
- bias (bool, optional, default=True) – Add a bias tensor to output or not.
- kdim (int, optional) – The dimension of key embedding.
- vdim (int, optional) – The dimension of value embedding.