SGD¶
- class
dragon.optimizers.
SGD
(
lr=0.01,
momentum=0.9,
nesterov=False,
**kwargs
)[source]¶ The optimizer to apply SGD algorithm.
Following SGD algorithms are supported:
VanillaSGD, whose update is defined as:
\[\text{VanillaSGD}(g) = \text{lr} * g \]MomentumSGD [Polyak, 1964], whose update is defined as:
\[\text{MomentumSGD}(g) = \text{lr} * m_{t} \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]NesterovSGD [Sutskever et.al, 2013], whose update is defined as:
\[\text{NesterovSGD}(g) = \text{lr} * (\text{momentum} * m_{t} + g) \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]
__init__¶
SGD.
__init__
(
lr=0.01,
momentum=0.9,
nesterov=False,
**kwargs
)[source]¶Create a
SGD
updater.- Parameters:
- lr (float, optional, default=0.01) – The initial value to \(\text{lr}\).
- momentum (float, optional, default=0.9) – The initial value to \(\text{momentum}\).
- nesterov (bool, optional, default=False) –
True
to switch to NesterovSGD optimizer.
Methods¶
apply_gradients¶
Optimizer.
apply_gradients
(grads_and_vars)[source]Apply the gradients on variables.
- Parameters:
- grads_and_vars (Sequence[Sequence[dragon.Tensor]]) – The sequence of update pair.