SGD

class dragon.optimizers.SGD(
  lr=0.01,
  momentum=0.9,
  nesterov=False,
  **kwargs
)[source]

The optimizer to apply SGD algorithm.

Following SGD algorithms are supported:

VanillaSGD, whose update is defined as:

\[\text{VanillaSGD}(g) = \text{lr} * g \]

MomentumSGD [Polyak, 1964], whose update is defined as:

\[\text{MomentumSGD}(g) = \text{lr} * m_{t} \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]

NesterovSGD [Sutskever et.al, 2013], whose update is defined as:

\[\text{NesterovSGD}(g) = \text{lr} * (\text{momentum} * m_{t} + g) \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]

__init__

SGD.__init__(
  lr=0.01,
  momentum=0.9,
  nesterov=False,
  **kwargs
)[source]

Create a SGD updater.

Parameters:
  • lr (float, optional, default=0.01) – The initial value to \(\text{lr}\).
  • momentum (float, optional, default=0.9) – The initial value to \(\text{momentum}\).
  • nesterov (bool, optional, default=False) – True to switch to NesterovSGD optimizer.

Methods

apply_gradients

Optimizer.apply_gradients(grads_and_vars)[source]

Apply the gradients on variables.

Parameters:
  • grads_and_vars (Sequence[Sequence[dragon.Tensor]]) – The sequence of update pair.