Apply the batch normalization. [Ioffe & Szegedy, 2015].

The normalization is defined as:

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta \]

The moving average of stats are calculated as:

\[x_{moving} \leftarrow momentum * x_{moving} + (1 - momentum) * x_{stat} \]

Note that the number of inputs should be 5, i.e., this operators is implemented into the fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

  • inputs (Sequence[dragon.Tensor]) – The tensor x, gamma, beta, mean and var.
  • axis (int, optional, default=-1) – The channel axis.
  • momentum (float, optional, default=0.9) – The momentum of moving average.
  • eps (float, optional, default=1e-5) – The epsilon.
  • use_stats (int, optional, default=-1) – Whether to use global stats.

dragon.Tensor – The output tensor.