TransformerDecoderLayer¶
- class dragon.vm.torch.nn.TransformerDecoderLayer(
 d_model,
 nhead,
 dim_feedforward=2048,
 dropout=0.1,
 activation='relu',
 norm_first=False
 )[source]¶
- Layer for a standard transformer decoder . [Vaswani et.al, 2017]. - Examples: - memory = torch.ones(4, 2, 8) tgt = torch.ones(5, 2, 8) decoder_layer = torch.nn.TransformerDecoderLayer(d_model=8, nhead=2) out = decoder_layer(tgt, memory) 
__init__¶
- TransformerDecoderLayer.- __init__(
 d_model,
 nhead,
 dim_feedforward=2048,
 dropout=0.1,
 activation='relu',
 norm_first=False
 )[source]¶
- Create a - TransformerDecoderLayer.- Parameters:
- d_model (int) – The dimension of features.
- nhead (int) – The number of parallel heads.
- dim_feedforward (int, optional, default=2048) – The dimension of feedforward network.
- dropout (float, optional, default=0.1) – The dropout ratio.
- activation (str, optional, default='relu') – The activation function.
- norm_first (bool, optional, default=False) – Apply layer form before attention and feedforward.
 
 
