axis=- 1,

Apply the group normalization. [Wu & He, 2018].

The normalization is defined as:

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \]

It turns out to be InstanceNorm, if group is 0, or LayerNorm, if group is 1.

Note that the number of inputs should be 3, i.e., this operators is implemented into the fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

  • inputs (Sequence[dragon.Tensor]) – The tensor x, gamma and beta.
  • axis (int, optional, default=-1) – The channel axis.
  • group (int, optional, default=32) – The group size.
  • epsilon (float, optional, default=1e-5) – The value to \(\epsilon\).

dragon.Tensor – The output tensor.