SGD¶
- class dragon.vm.torch.optim.SGD(
 params,
 lr=<object object>,
 momentum=0,
 dampening=0,
 weight_decay=0,
 nesterov=False,
 **kwargs
 )[source]¶
- The optimizer to apply SGD algorithm. - Following SGD algorithms are supported: - VanillaSGD, whose update is defined as: \[\text{VanillaSGD}(g) = \text{lr} * g \]- MomentumSGD [Polyak, 1964], whose update is defined as: \[\text{MomentumSGD}(g) = \text{lr} * m_{t} \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]- NesterovSGD [Sutskever et.al, 2013], whose update is defined as: \[\text{NesterovSGD}(g) = \text{lr} * (\text{momentum} * m_{t} + g) \\ \quad \\ \text{where} \quad m_{t} = \text{momentum} * m_{t-1} + g \]- You can use one of them by setting the defaults: - # Set the ``lr`` only vanilla_sgd = torch.optim.SGD(lr=0.1) # Set the ``lr`` and ``momentum`` momentum_sgd = torch.optim.SGD(lr=0.1, momentum=0.9) # Set the ``lr``, ``momentum`` and ``nesterov`` nesterov_sgd = torch.optim.SGD(lr=0.1, momentum=0.9, nesterov=True) 
__init__¶
- SGD.- __init__(
 params,
 lr=<object object>,
 momentum=0,
 dampening=0,
 weight_decay=0,
 nesterov=False,
 **kwargs
 )[source]¶
- Create a - SGDoptimizer.- Parameters:
- params (Sequence[dragon.vm.torch.nn.Parameter]) – The parameters to optimize.
- lr (float, required) – The initial value to \(\text{lr}\).
- momentum (float, optional, default=0) – The initial value to \(\text{momentum}\).
- dampening (float, optional, default=0) – The dampening for \(\text{momentum}\).
- weight_decay (float, optional, default=0) – The L2 penalty factor to weight.
- nesterov (bool, optional, default=False) – Trueto switch to NesterovSGD optimizer.
 
 
Methods¶
add_param_group¶
- Optimizer.- add_param_group(param_group)[source]
- Add a new param group into the optimizer. - attr:param_group is a dict containing the defaults: - # A group defined ``lr`` and ``weight_decay`` param_group = {'params': [], 'lr': 0.01, 'weight_decay': 0.0001} - Parameters:
- param_group (dict) – The param group to add.
 
 
step¶
- Optimizer.- step()[source]
- Update all parameter groups using gradients. - Call this method after a - backwardpass:- x = torch.ones(1, 3, requires_grad=True) y = x + 1 y.backward() optimizer.step() 
sum_grad¶
- Optimizer.- sum_grad()[source]
- Sum the gradients of all parameters. - Call this method after each - backwardpass:- x = torch.ones(1, requires_grad=True) optimizer = torch.optim.SGD([x], lr=0.1) for epoch in range(2): for step in range(3): y = x + 1 y.backward() optimizer.sum_grad() optimizer.step() print(x) # 0.4 
zero_grad¶
- Optimizer.- zero_grad(set_to_none=False)[source]
- Set the gradients of all parameters to zero. - This method is not necessary usually, as we will overwrite the gradients in the next computation. - However, if some gradients are not computed every time, remember to set them to none before - step(...):- m1 = torch.nn.Linear(3, 3) m2 = torch.nn.Linear(3, 3) x = torch.ones(1, 3, requires_grad=True) for i in range(10): x = m1(x) if i in (2, 4, 6): x += m2(x) optimizer.zero_grad(set_to_none=True) x.backward() optimizer.step() - Parameters:
- set_to_none (bool, optional, default=False) – Whether to remove the gradients instead of zeroing.
 
 
