Gradient Accumulation with Dual (optimizer, scheduler) Training #14999

celsofranssa · 2022-10-04T20:29:42Z

celsofranssa
Oct 4, 2022

Hello, Lightning community,

I am using a dual (optimizer, scheduler) training as shown in the code snippet below:

def configure_optimizers(self):
    [...]
    return (
        {"optimizer": optimizer_1,
         "lr_scheduler": {"scheduler": scheduler_1, "interval": "step", "name": "scheduler_1"},
         "frequency": 1},
        {"optimizer": optimizer_2,
         "lr_scheduler": {"scheduler": scheduler_2, "interval": "step", "name": "scheduler_2"},
         "frequency": 1},
    )

With "frequency": 1 on both optimizers, the trainer calls optimizer_1 in step i while calling optimizer_2 in step (i+1).

Therefore, is there an approach to combine gradient acccumulation with this optimization setup where optimizer_1 uses the accumulated gradient from steps (i-1) and i while optimizer_2 uses the accumulated gradient from steps i and (i+ 1)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient Accumulation with Dual (optimizer, scheduler) Training #14999

{{title}}

Replies: 0 comments

Select a reply

Gradient Accumulation with Dual (optimizer, scheduler) Training #14999

celsofranssa Oct 4, 2022

Replies: 0 comments

celsofranssa
Oct 4, 2022