class dragon.optimizers.AdamW(
lr=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
weight_decay=0.01,
**kwargs
)[source]

The optimizer to apply AdamW algorithm. [Loshchilov & Hutter, 2017].

The AdamW update is defined as:

$\text{AdamW}(g, p) = \text{lr} * (\frac{\text{correction} * m_{t}} {\sqrt{v_{t}} + \epsilon} + \lambda p) \\ \quad \\ \text{where}\quad \begin{cases} \text{correction} = \sqrt{1 - \beta_{2}^{t}} / (1 - \beta_{1}^{t}) \\ m_{t} = \beta_{1} * m_{t-1} + (1 - \beta_{1}) * g \\ v_{t} = \beta_{2} * v_{t-1} + (1 - \beta_{2}) * g^{2} \\ \end{cases}$

## __init__¶

AdamW.__init__(
lr=0.001,
beta1=0.9,
beta2=0.999,
eps=1e-08,
weight_decay=0.01,
**kwargs
)[source]

Create an AdamW updater.

Parameters:
• lr (float, optional, default=0.001) – The initial value to $$\text{lr}$$.
• beta1 (float, optional, default=0.9) – The initial value to $$\beta_{1}$$.
• beta2 (float, optional, default=0.999) – The initial value to $$\beta_{2}$$.
• eps (float, optional, default=1e-8) – The initial value to $$\epsilon$$
• weight_decay (float, optional, default=0.01) – The initial value to $$\lambda$$.

## Methods¶

Optimizer.apply_gradients(grads_and_vars)[source]