Adam

class dragon.optimizers.Adam(
  base_lr=0.001,
  beta1=0.9,
  beta2=0.999,
  eps=1e-08,
  **kwargs
)[source]

The optimizer to apply Adam algorithm. [Kingma & Ba, 2014].

The Adam update is defined as:

\[\text{Adam}(g) = -\frac{\text{lr} * m_{t}}{\sqrt{v_{t}} + \epsilon} \\ \quad \\ \text{where}\quad \begin{cases} m_{t} = \beta_{1} * m_{t-1} + (1 - \beta_{1}) * g \\ v_{t} = \beta_{2} * v_{t-1} + (1 - \beta_{2}) * g^{2} \end{cases} \]

__init__

Adam.__init__(
  base_lr=0.001,
  beta1=0.9,
  beta2=0.999,
  eps=1e-08,
  **kwargs
)[source]

Create an Adam updater.

Parameters:
  • base_lr (float, optional, default=0.001) – The initial value for \(\text{lr}\).
  • beta1 (float, optional, default=0.9) – The initial value for \(\beta_{1}\).
  • beta2 (float, optional, default=0.999) – The initial value for \(\beta_{2}\).
  • eps (float, optional=1e-8) – The initial value for \(\epsilon\)

Methods

apply_gradients

Optimizer.apply_gradients(
  values_and_grads,
  lr_mult=None,
  decay_mult=None
)[source]

Apply the gradients on values.

Parameters:
  • values_and_grads (Sequence[Sequence[dragon.Tensor]]) – The values and grads.
  • lr_mult (number, optional) – The multiplier to learning rate.
  • decay_mult (number, optional) – The multiplier to weight decay.