Nesterov

class dragon.optimizers.Nesterov(
  lr=0.01,
  momentum=0.9,
  **kwargs
)[source]

The optimizer to apply NesterovSGD algorithm. [Sutskever et.al, 2013].

The NesterovSGD update is defined as:

\[\text{NesterovSGD}(g) = -((1 + \text{momentum}) * m_{t} - \text{momentum} * m_{t-1}) \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + \text{lr} * g \]

__init__

Nesterov.__init__(
  lr=0.01,
  momentum=0.9,
  **kwargs
)[source]

Create a Nesterov optimizer.

Parameters:
  • base_lr (float, optional, default=0.01) – The initial value to \(\text{lr}\).
  • momentum (float, optional, default=0.9) – The initial value to \(\text{momentum}\).

Methods

apply_gradients

Optimizer.apply_gradients(grads_and_vars)[source]

Apply the gradients on variables.

Parameters:
  • grads_and_vars (Sequence[Sequence[dragon.Tensor]]) – The sequence of update pair.