Adam¶
- class dragon.vm.torch.optim.Adam(
 params,
 lr=0.001,
 betas=(0.9, 0.999),
 eps=1e-08,
 weight_decay=0,
 amsgrad=False,
 **kwargs
 )[source]¶
- The optimizer to apply Adam algorithm. [Kingma & Ba, 2014]. - The Adam update is defined as: \[\text{Adam}(g) = \text{lr} * (\frac{\text{correction}* m_{t}} {\sqrt{v_{t}} + \epsilon}) \\ \quad \\ \text{where}\quad \begin{cases} \text{correction} = \sqrt{1 - \beta_{2}^{t}} / (1 - \beta_{1}^{t}) \\ m_{t} = \beta_{1} * m_{t-1} + (1 - \beta_{1}) * g \\ v_{t} = \beta_{2} * v_{t-1} + (1 - \beta_{2}) * g^{2} \end{cases} \]
__init__¶
- Adam.- __init__(
 params,
 lr=0.001,
 betas=(0.9, 0.999),
 eps=1e-08,
 weight_decay=0,
 amsgrad=False,
 **kwargs
 )[source]¶
- Create an - Adamoptimizer.- Parameters:
- params (Sequence[dragon.vm.torch.nn.Parameter]) – The parameters to optimize.
- lr (float, required) – The initial value to \(\text{lr}\).
- betas (Tuple[float, float], optional, default=(0.9, 0.999)) – The initial value to \(\beta_{1}\) and \(\beta_{2}\).
- eps (float, optional, default=1e-8) – The initial value to \(\epsilon\).
- weight_decay (float, optional, default=0) – The L2 penalty factor to weight.
- amsgrad (bool, optional, default=False) – Trueto switch to AMSGrad optimizer.
 
 
Methods¶
add_param_group¶
- Optimizer.- add_param_group(param_group)[source]
- Add a new param group into the optimizer. - attr:param_group is a dict containing the defaults: - # A group defined ``lr`` and ``weight_decay`` param_group = {'params': [], 'lr': 0.01, 'weight_decay': 0.0001} - Parameters:
- param_group (dict) – The param group to add.
 
 
step¶
- Optimizer.- step()[source]
- Update all parameter groups using gradients. - Call this method after a - backwardpass:- x = torch.ones(1, 3, requires_grad=True) y = x + 1 y.backward() optimizer.step() 
sum_grad¶
- Optimizer.- sum_grad()[source]
- Sum the gradients of all parameters. - Call this method after each - backwardpass:- x = torch.ones(1, requires_grad=True) optimizer = torch.optim.SGD([x], lr=0.1) for epoch in range(2): for step in range(3): y = x + 1 y.backward() optimizer.sum_grad() optimizer.step() print(x) # 0.4 
zero_grad¶
- Optimizer.- zero_grad(set_to_none=False)[source]
- Set the gradients of all parameters to zero. - This method is not necessary usually, as we will overwrite the gradients in the next computation. - However, if some gradients are not computed every time, remember to set them to none before - step(...):- m1 = torch.nn.Linear(3, 3) m2 = torch.nn.Linear(3, 3) x = torch.ones(1, 3, requires_grad=True) for i in range(10): x = m1(x) if i in (2, 4, 6): x += m2(x) optimizer.zero_grad(set_to_none=True) x.backward() optimizer.step() - Parameters:
- set_to_none (bool, optional, default=False) – Whether to remove the gradients instead of zeroing.
 
 
