Normalization

dragon.operators.norm.BatchNorm(
   inputs,
   axis=-1,
   momentum=0.9,
   eps=1e-05,
   use_stats=-1,
   **kwargs
)

Batch Normalization. [Ioffe & Szegedy, 2015].

We enforce the number of inputs should be 5, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters:
  • inputs (sequence of Tensor) – The inputs, represent [x, mean, var, gamma, beta].
  • axis (int, optional) – The channel axis.
  • momentum (float, optional, default=0.99) – The momentum of moving average.
  • eps (float, optional, default=1e-5) – The eps.
  • use_stats (int, optional, default=-1) – Whether to use global stats.
Returns:

The output tensor, calculated as:

\(\\ \, \\ \mu_{B} = \frac{1}{m} \sum_{i=1}^{m}x_{i} \\ \sigma_{B}^{2} = \frac{1}{m} \sum_{i=1}^{m}(x_{i} - \mu_{B})^{2} \\ \hat{x}_{i} = \frac{x_{i} - \mu_{B}}{\sqrt{\sigma_{B}^{2} + \epsilon}} \\ y_{i} = \gamma\hat{x}_{i} + \beta \\ \,\)

The moving average of mean/var, calculated as:

\(\\ \, \\ x_{moving} \leftarrow Momentum * x_{moving} + (1 - Momentum) * x_{stat} \\ \,\)

Return type:

Tensor

dragon.operators.norm.GroupNorm(
   inputs,
   group=32,
   axis=-1,
   eps=1e-05,
   **kwargs
)

Group Normalization. [Wu & He, 2018].

It turns out to be InstanceNorm, if group is 0, or LayerNorm, if group is 1.

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters:
  • inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta].
  • group (int, optional, default=32) – The group size.
  • axis (int, optional) – The channel axis.
  • eps (float, optional, default=1e-5) – The eps.
Returns:

The output tensor, calculated as:

\(\\ \, \\ \mu_{G} = \frac{1}{m} \sum_{i=1}^{m}x_{i} \\ \sigma_{G}^{2} = \frac{1}{m} \sum_{i=1}^{m}(x_{i} - \mu_{G})^{2} \\ \hat{x}_{i} = \frac{x_{i} - \mu_{G}}{\sqrt{\sigma_{G}^{2} + \epsilon}} \\ y_{i} = \gamma\hat{x}_{i} + \beta \\ \,\)

Return type:

Tensor

dragon.operators.norm.LayerNorm(inputs, axis=-1, eps=1e-05, **kwargs)

Layer Normalization. [Ba et.al, 2016]

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters:
  • inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta].
  • axis (int, optional) – The channel axis.
  • eps (float, optional, default=1e-5) – The eps.
Returns:

The output tensor.

Return type:

Tensor

dragon.operators.norm.InstanceNorm(inputs, axis=-1, eps=1e-05, **kwargs)

Instance Normalization. [Ulyanov et.al, 2016]

We enforce the number of inputs should be 3, i.e., it is implemented into a fused version.

However, you can still fix the gamma and beta, by disabling the their gradients directly.

Type Constraints: (float16, float32)

Parameters:
  • inputs (sequence of Tensor) – The inputs, represent [x, gamma, beta].
  • axis (int, optional) – The channel axis.
  • eps (float, optional, default=1e-5) – The eps.
Returns:

The output tensor.

Return type:

Tensor

dragon.operators.norm.L2Norm(
   inputs,
   axis=0,
   num_axes=-1,
   eps=1e-05,
   mode='SUM',
   **kwargs
)

L2 Normalization. [Liu et.al, 2015].

Type Constraints: (float16, float32, float64)

Parameters:
  • inputs (Tensor) – The x.
  • axis (int, optional) – The start axis of stats region, can be negative.
  • num_axes (int, optional, default=-1) – The number of axes of stats region.
  • eps (float, optional, default=1e-5) – The eps.
  • mode ({'SUM', 'MEAN'}, optional) – The mode on computing normalizer.
Returns:

The output tensor.

Return type:

Tensor