AdamW¶
- class
dragon.optimizers.AdamW(
lr=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
weight_decay=0.01,
**kwargs
)[source]¶ The optimizer to apply AdamW algorithm. [Loshchilov & Hutter, 2017].
The AdamW update is defined as:
__init__¶
AdamW.__init__(
lr=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
weight_decay=0.01,
**kwargs
)[source]¶Create an
AdamWupdater.- Parameters:
- lr (float, optional, default=0.001) – The initial value to .
- beta1 (float, optional, default=0.9) – The initial value to .
- beta2 (float, optional, default=0.999) – The initial value to .
- eps (float, optional, default=1e-8) – The initial value to
- weight_decay (float, optional, default=0.01) – The initial value to .
Methods¶
apply_gradients¶
Optimizer.apply_gradients(grads_and_vars)[source]Apply the gradients on variables.
- Parameters:
- grads_and_vars (Sequence[Sequence[dragon.Tensor]]) – The sequence of update pair.