# Normalization¶

dragon.operators.norm.BatchNorm(
inputs,
axis=-1,
momentum=0.9,
eps=1e-05,
use_stats=-1,
**kwargs
)

Batch Normalization. [Ioffe & Szegedy, 2015].

We enforce the number of inputs should be 5, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters: inputs (sequence of Tensor) – The inputs, represent [x, mean, var, gamma, beta]. axis (int, optional) – The channel axis. momentum (float, optional, default=0.99) – The momentum of moving average. eps (float, optional, default=1e-5) – The eps. use_stats (int, optional, default=-1) – Whether to use global stats. The output tensor, calculated as: $$\\ \, \\ \mu_{B} = \frac{1}{m} \sum_{i=1}^{m}x_{i} \\ \sigma_{B}^{2} = \frac{1}{m} \sum_{i=1}^{m}(x_{i} - \mu_{B})^{2} \\ \hat{x}_{i} = \frac{x_{i} - \mu_{B}}{\sqrt{\sigma_{B}^{2} + \epsilon}} \\ y_{i} = \gamma\hat{x}_{i} + \beta \\ \,$$ The moving average of mean/var, calculated as: $$\\ \, \\ x_{moving} \leftarrow Momentum * x_{moving} + (1 - Momentum) * x_{stat} \\ \,$$ Tensor
dragon.operators.norm.GroupNorm(
inputs,
group=32,
axis=-1,
eps=1e-05,
**kwargs
)

Group Normalization. [Wu & He, 2018].

It turns out to be InstanceNorm, if group is 0, or LayerNorm, if group is 1.

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters: inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta]. group (int, optional, default=32) – The group size. axis (int, optional) – The channel axis. eps (float, optional, default=1e-5) – The eps. The output tensor, calculated as: $$\\ \, \\ \mu_{G} = \frac{1}{m} \sum_{i=1}^{m}x_{i} \\ \sigma_{G}^{2} = \frac{1}{m} \sum_{i=1}^{m}(x_{i} - \mu_{G})^{2} \\ \hat{x}_{i} = \frac{x_{i} - \mu_{G}}{\sqrt{\sigma_{G}^{2} + \epsilon}} \\ y_{i} = \gamma\hat{x}_{i} + \beta \\ \,$$ Tensor
dragon.operators.norm.LayerNorm(inputs, axis=-1, eps=1e-05, **kwargs)

Layer Normalization. [Ba et.al, 2016]

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters: inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta]. axis (int, optional) – The channel axis. eps (float, optional, default=1e-5) – The eps. The output tensor. Tensor
dragon.operators.norm.InstanceNorm(inputs, axis=-1, eps=1e-05, **kwargs)

Instance Normalization. [Ulyanov et.al, 2016]

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters: inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta]. axis (int, optional) – The channel axis. eps (float, optional, default=1e-5) – The eps. The output tensor. Tensor
dragon.operators.norm.L2Norm(
inputs,
axis=0,
num_axes=-1,
eps=1e-05,
mode='SUM',
**kwargs
)

L2 Normalization. [Liu et.al, 2015].

Type Constraints: (float16, float32, float64)

Parameters: inputs (Tensor) – The x. axis (int, optional) – The start axis of stats region, can be negative. num_axes (int, optional, default=-1) – The number of axes of stats region. eps (float, optional, default=1e-5) – The eps. mode ({'SUM', 'MEAN'}, optional) – The mode on computing normalizer. The output tensor. Tensor