Optimizer¶
- class dragon.vm.torch.optim.Optimizer(
 params,
 defaults,
 **kwargs
 )[source]¶
- The base class of optimizers. - Inherit this class to design a new optimizer: - class MyOptimizer(torch.optim.Optimizer): def __init__(params, hp1, hp2): defaults = dict(hp1=hp1, hp2=hp2) super(MyOptimizer, self).__init__(params, defaults) 
__init__¶
- Optimizer.- __init__(
 params,
 defaults,
 **kwargs
 )[source]¶
- Create a - Optimizer.- Parameters:
- params (Sequence[dragon.vm.torch.nn.Parameter]) – The parameters to optimize.
- defaults (dict) – The pre-defined default hyper-parameters.
 
 
Methods¶
add_param_group¶
- Optimizer.- add_param_group(param_group)[source]¶
- Add a new param group into the optimizer. - attr:param_group is a dict containing the defaults: - # A group defined ``lr`` and ``weight_decay`` param_group = {'params': [], 'lr': 0.01, 'weight_decay': 0.0001} - Parameters:
- param_group (dict) – The param group to add.
 
 
step¶
sum_grad¶
- Optimizer.- sum_grad()[source]¶
- Sum the gradients of all parameters. - Call this method after each - backwardpass:- x = torch.ones(1, requires_grad=True) optimizer = torch.optim.SGD([x], lr=0.1) for epoch in range(2): for step in range(3): y = x + 1 y.backward() optimizer.sum_grad() optimizer.step() print(x) # 0.4 
zero_grad¶
- Optimizer.- zero_grad(set_to_none=False)[source]¶
- Set the gradients of all parameters to zero. - This method is not necessary usually, as we will overwrite the gradients in the next computation. - However, if some gradients are not computed every time, remember to set them to none before - step(...):- m1 = torch.nn.Linear(3, 3) m2 = torch.nn.Linear(3, 3) x = torch.ones(1, 3, requires_grad=True) for i in range(10): x = m1(x) if i in (2, 4, 6): x += m2(x) optimizer.zero_grad(set_to_none=True) x.backward() optimizer.step() - Parameters:
- set_to_none (bool, optional, default=False) – Whether to remove the gradients instead of zeroing.
 
 
