Learning rate schedulers

Callbacks for auto adjust the learning rate based on the number of epochs or other metrics measurements.

The learning rates schedulers allow implementing dynamic learning rate changing policy. These callbacks are wrappers of native PyTorch torch.optim.lr_scheduler.

Currently, the following schedulers are available (see PyTorch documentation by the links provided for details on the schedulers algorithms themself):

LambdaLR

class argus.callbacks.LambdaLR(lr_lambda, step_on_iteration=False)[source]

LambdaLR scheduler.

Multiply learning rate by a factor computed with a given function. The function should take int value number of epochs as the only argument.

Parameters
  • lr_lambda (function or list of functions) – Lambda function for the learning rate factor computation.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.LambdaLR.

StepLR

class argus.callbacks.StepLR(step_size, gamma=0.1, step_on_iteration=False)[source]

StepLR scheduler.

Multiply learning rate by a given factor with a given period.

Parameters
  • step_size (int) – Period of learning rate update in epochs.

  • gamma (float, optional) – Multiplicative factor. Defaults to 0.1.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.StepLR.

MultiStepLR

class argus.callbacks.MultiStepLR(milestones, gamma=0.1, step_on_iteration=False)[source]

MultiStepLR scheduler.

Multiply learning rate by a given factor on each epoch from a given list.

Parameters
  • milestones (list of ints) – List of epochs number to perform lr step.

  • gamma (float, optional) – Multiplicative factor. Defaults to 0.1.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.MultiStepLR.

ExponentialLR

class argus.callbacks.ExponentialLR(gamma, step_on_iteration=False)[source]

MultiStepLR scheduler.

Multiply learning rate by a given factor on each epoch.

Parameters
  • gamma (float, optional) – Multiplicative factor. Defaults to 0.1.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.ExponentialLR.

CosineAnnealingLR

class argus.callbacks.CosineAnnealingLR(T_max, eta_min=0, step_on_iteration=False)[source]

CosineAnnealingLR scheduler.

Set the learning rate of each parameter group using a cosine annealing schedule.

Parameters
  • T_max (int) – Max number of epochs or iterations.

  • eta_min (float, optional) – Min learning rate. Defaults to 0.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.CosineAnnealingLR.

ReduceLROnPlateau

class argus.callbacks.ReduceLROnPlateau(monitor='val_loss', better='auto', factor=0.1, patience=10, verbose=False, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08)[source]

ReduceLROnPlateau scheduler.

Reduce learning rate when a metric has stopped improving.

Parameters
  • monitor (str, optional) – Metric name to monitor. It should be prepended with val_ for the metric value on validation data and train_ for the metric value on the date from the train loader. A val_loader should be provided during the model fit to make it possible to monitor metrics start with val_. Defaults to val_loss.

  • better (str, optional) – The metric improvement criterion. Should be ‘min’, ‘max’ or ‘auto’. ‘auto’ means the criterion should be taken from the metric itself, which is appropriate behavior in most cases. Defaults to ‘auto’.

  • factor (float, optional) – Multiplicative factor. Defaults to 0.1.

  • patience (int, optional) – Number of training epochs without the metric improvement to update the learning rate. Defaults to 10.

  • verbose (bool, optional) – Print info on each update to stdout. Defaults to False.

  • threshold (float, optional) – Threshold for considering the changes significant. Defaults to 1e-4.

  • threshold_mode (str, optional) – Should be ‘rel’, ‘abs’. Defaults to ‘rel’.

  • cooldown (int, optional) – Number of epochs to wait before resuming normal operation after lr has been updated. Defaults to 0.

  • min_lr (float or list of floats, optional) – Min learning rate. Defaults to 0.

  • eps (float, optional) – Min significant learning rate update. Defaults to 1e-8.

PyTorch docs on torch.optim.lr_scheduler.ReduceLROnPlateau.

CyclicLR

class argus.callbacks.CyclicLR(base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, step_on_iteration=True)[source]

CyclicLR scheduler.

Sets the learning rate of each parameter group according to cyclical learning rate policy.

Parameters
  • base_lr (float or list of floats) – Initial learning rate.

  • max_lr (float or list of floats) – Max learning rate.

  • step_size_up (int, optional) – Increase phase duration in epochs or iterations. Defaults to 2000.

  • step_size_down (int, optional) – Decrease phase duration in epochs or iterations. Defaults to None.

  • mode (str, optional) – Should be ‘triangular’, ‘triangular2’ or ‘exp_range’. Defaults to ‘triangular’.

  • gamma (float, optional) – Constant for the ‘exp_range’ policy. Defaults to 1.

  • scale_fn (function, optional) – Custom scaling policy function. Defaults to None.

  • scale_mode (str, optional) – Should be ‘cycle’ or ‘iterations’. Defaults to ‘cycle’.

  • cycle_momentum (bool, optional) – Momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Defaults to True.

  • base_momentum (float or list of floats, optional) – Lower momentum boundaries in the cycle for each parameter group. Defaults to 0.8.

  • max_momentum (float or list of floats, optional) – Upper momentum boundaries in the cycle for each parameter group. Defaults to 0.9.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to True.

PyTorch docs on torch.optim.lr_scheduler.CyclicLR.

CosineAnnealingWarmRestarts

class argus.callbacks.CosineAnnealingWarmRestarts(T_0, T_mult=1, eta_min=0, step_on_iteration=False)[source]

CosineAnnealingLR scheduler.

Set the learning rate of each parameter group using a cosine annealing schedule with a warm restart.

Parameters
  • T_0 (int) – Number of epochs or iterations for the first restart.

  • T_mult (int) – T increase factor after a restart.

  • eta_min (float, optional) – Min learning rate. Defaults to 0.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.CosineAnnealingWarmRestarts.

MultiplicativeLR

class argus.callbacks.MultiplicativeLR(lr_lambda, step_on_iteration=False)[source]

MultiplicativeLR scheduler.

Multiply the learning rate of each parameter group by the factor given in the specified function.

Parameters
  • lr_lambda (function or list) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.

  • step_on_iteration (bool) – Step on each training iteration rather than each epoch. Defaults to False.

PyTorch docs on torch.optim.lr_scheduler.MultiplicativeLR.

OneCycleLR

class argus.callbacks.OneCycleLR(max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0)[source]

OneCycleLR scheduler.

Sets the learning rate of each parameter group according to the 1cycle learning rate policy. The 1cycle policy anneals the learning rate from an initial learning rate to some maximum learning rate and then from that maximum learning rate to some minimum learning rate much lower than the initial learning rate.

Parameters
  • max_lr (float or list) – Upper learning rate boundaries in the cycle for each parameter group.

  • total_steps (int) – The total number of steps in the cycle. Note that if a value is not provided here, then it must be inferred by providing a value for epochs and steps_per_epoch. Defaults to None.

  • epochs (int) – The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Defaults to None.

  • steps_per_epoch (int) – The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Defaults to None.

  • pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Defaults to 0.3.

  • anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: “cos” for cosine annealing, “linear” for linear annealing. Defaults to ‘cos’.

  • cycle_momentum (bool) – If True, momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Defaults to True.

  • base_momentum (float or list) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Defaults to 0.85.

  • max_momentum (float or list) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Defaults to 0.95.

  • div_factor (float) – Determines the initial learning rate via initial_lr = max_lr/div_factor Defaults to 25.

  • final_div_factor (float) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Defaults to 1e4.

PyTorch docs on torch.optim.lr_scheduler.OneCycleLR.