TransformerDecoderLayer¶
- class
dragon.vm.torch.nn.
TransformerDecoderLayer
(
d_model,
nhead,
dim_feedforward=2048,
dropout=0.1,
activation='relu',
norm_first=False
)[source]¶ Layer for a standard transformer decoder . [Vaswani et.al, 2017].
Examples:
memory = torch.ones(4, 2, 8) tgt = torch.ones(5, 2, 8) decoder_layer = torch.nn.TransformerDecoderLayer(d_model=8, nhead=2) out = decoder_layer(tgt, memory)
__init__¶
TransformerDecoderLayer.
__init__
(
d_model,
nhead,
dim_feedforward=2048,
dropout=0.1,
activation='relu',
norm_first=False
)[source]¶Create a
TransformerDecoderLayer
.- Parameters:
- d_model (int) – The dimension of features.
- nhead (int) – The number of parallel heads.
- dim_feedforward (int, optional, default=2048) – The dimension of feedforward network.
- dropout (float, optional, default=0.1) – The dropout ratio.
- activation (str, optional, default='relu') – The activation function.
- norm_first (bool, optional, default=False) – Apply layer form before attention and feedforward.