RMSprop

class dragon.vm.torch.optim.RMSprop(
  params,
  lr=0.01,
  alpha=0.99,
  eps=1e-08,
  weight_decay=0,
  momentum=0,
  centered=False,
  scale=1,
  clip_norm=0
)[source]

The optimizer to apply RMSprop algorithm. [Hinton et.al, 2013].

The RMSprop update is defined as:

\[\text{RMSprop}(g) = -m_{t} \\ \quad \\ \text{where}\quad \begin{cases} v_{t} = \alpha * v_{t-1} + (1 - \alpha) * g^{2} \\ m_{t} = \text{momentum} * m_{t-1} + \frac{\text{lr} * g}{\sqrt{v_{t}} + \epsilon} \end{cases} \]

__init__

RMSprop.__init__(
  params,
  lr=0.01,
  alpha=0.99,
  eps=1e-08,
  weight_decay=0,
  momentum=0,
  centered=False,
  scale=1,
  clip_norm=0
)[source]

Create a RMSprop optimizer.

Parameters:
  • params (Sequence[dragon.vm.torch.nn.Parameter]) – The parameters to optimize.
  • lr (float, required) – The initial value to \(\text{lr}\).
  • alpha (float, optional, default=0.99) – The initial value to \(\alpha\).
  • eps (float, optional, default=1e-7) – The initial value to \(\epsilon\).
  • weight_decay (float, optional, default=0) – The L2 penalty factor to weight.
  • momentum (float, optional, default=0) – The initial value to \(\text{momentum}\).
  • scale (float, optional, default=1) – The scaling factor to gradient.
  • clip_norm (float, optional, default=0) – The maximum L2 norm to clip gradient.

Methods

add_param_group

Optimizer.add_param_group(param_group)[source]

Add a new param group into the optimizer.

attr:param_group is a dict containing the defaults:

# A group defined ``lr`` and ``weight_decay``
param_group = {'params': [], 'lr': 0.01, 'weight_decay': 0.0001}
Parameters:
  • param_group (dict) – The param group to add.

step

Optimizer.step()[source]

Update all parameter groups using gradients.

Call this method after a backward pass:

x = torch.ones(1, 3, requires_grad=True)
y = x + 1
y.backward()
optimizer.step()

sum_grad

Optimizer.sum_grad()[source]

Sum the gradients of all parameters.

Call this method after each backward pass:

x = torch.ones(1, requires_grad=True)
optimizer = torch.optim.SGD([x], lr=0.1)
for epoch in range(2):
    for step in range(3):
        y = x + 1
        y.backward()
        optimizer.sum_grad()
    optimizer.step()
print(x)  # 0.4

zero_grad

Optimizer.zero_grad(set_to_none=False)[source]

Set the gradients of all parameters to zero.

This method is not necessary usually, as we will overwrite the gradients in the next computation.

However, if some gradients are not computed every time, remember to set them to none before step(...):

m1 = torch.nn.Linear(3, 3)
m2 = torch.nn.Linear(3, 3)
x = torch.ones(1, 3, requires_grad=True)
for i in range(10):
    x = m1(x)
    if i in (2, 4, 6):
        x += m2(x)
optimizer.zero_grad(set_to_none=True)
x.backward()
optimizer.step()
Parameters:
  • set_to_none (bool, optional, default=False) – Whether to remove the gradients instead of zeroing.