Adam¶
- class
dragon.vm.torch.optim.
Adam
(
params,
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False,
**kwargs
)[source]¶ The optimizer to apply Adam algorithm. [Kingma & Ba, 2014].
The Adam update is defined as:
\[\text{Adam}(g) = \text{lr} * (\frac{\text{correction}* m_{t}} {\sqrt{v_{t}} + \epsilon}) \\ \quad \\ \text{where}\quad \begin{cases} \text{correction} = \sqrt{1 - \beta_{2}^{t}} / (1 - \beta_{1}^{t}) \\ m_{t} = \beta_{1} * m_{t-1} + (1 - \beta_{1}) * g \\ v_{t} = \beta_{2} * v_{t-1} + (1 - \beta_{2}) * g^{2} \end{cases} \]
__init__¶
Adam.
__init__
(
params,
lr=0.001,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=0,
amsgrad=False,
**kwargs
)[source]¶Create an
Adam
optimizer.- Parameters:
- params (Sequence[dragon.vm.torch.nn.Parameter]) – The parameters to optimize.
- lr (float, required) – The initial value to \(\text{lr}\).
- betas (Tuple[float, float], optional, default=(0.9, 0.999)) – The initial value to \(\beta_{1}\) and \(\beta_{2}\).
- eps (float, optional, default=1e-8) – The initial value to \(\epsilon\).
- weight_decay (float, optional, default=0) – The L2 penalty factor to weight.
- amsgrad (bool, optional, default=False) –
True
to switch to AMSGrad optimizer.
Methods¶
add_param_group¶
Optimizer.
add_param_group
(param_group)[source]Add a new param group into the optimizer.
attr:param_group is a dict containing the defaults:
# A group defined ``lr`` and ``weight_decay`` param_group = {'params': [], 'lr': 0.01, 'weight_decay': 0.0001}
- Parameters:
- param_group (dict) – The param group to add.
step¶
Optimizer.
step
()[source]Update all parameter groups using gradients.
Call this method after a
backward
pass:x = torch.ones(1, 3, requires_grad=True) y = x + 1 y.backward() optimizer.step()
sum_grad¶
Optimizer.
sum_grad
()[source]Sum the gradients of all parameters.
Call this method after each
backward
pass:x = torch.ones(1, requires_grad=True) optimizer = torch.optim.SGD([x], lr=0.1) for epoch in range(2): for step in range(3): y = x + 1 y.backward() optimizer.sum_grad() optimizer.step() print(x) # 0.4
zero_grad¶
Optimizer.
zero_grad
(set_to_none=False)[source]Set the gradients of all parameters to zero.
This method is not necessary usually, as we will overwrite the gradients in the next computation.
However, if some gradients are not computed every time, remember to set them to none before
step(...)
:m1 = torch.nn.Linear(3, 3) m2 = torch.nn.Linear(3, 3) x = torch.ones(1, 3, requires_grad=True) for i in range(10): x = m1(x) if i in (2, 4, 6): x += m2(x) optimizer.zero_grad(set_to_none=True) x.backward() optimizer.step()
- Parameters:
- set_to_none (bool, optional, default=False) – Whether to remove the gradients instead of zeroing.