Selectively applying different optimizers to different groups. #734

kgourgou · 2024-02-23T20:09:37Z

kgourgou
Feb 23, 2024

Hi all,

As I'm learning about MLX, I got curious about how I would use different optimizers with different learning rates for parts of a model, then I saw this issue and the subsequent comment, so I'm curious what's the MLX-way to accomplish this.

Say for example that I have this setup:

class Model(nn.Module):
    def __init__(self, output_dims: int, in_dims: int, hidden_dims: int):
        super().__init__()
        self.layer1 = nn.Linear(in_dims, hidden_dims)
        self.layer2 = nn.Linear(hidden_dims, output_dims)
           
    def __call__(self, x):
        x = self.layer1(x)
        x = mx.maximum(x, 0.0)
        x = self.layer2(x)
        return x

optimizer1 = optim.SGD(learning_rate=learning_rate_1)
optimizer2 = optim.SGD(learning_rate=learning_rate_2)

One way is to freeze one layer, update the other, freeze the second, update the first, etc. Is there a cleaner way than this?

Answered by awni

Feb 27, 2024

No you don't need to freeze the different layers. For your specific case you would do something like:

model = Model(...)

def step(inputs, targets):
   loss, grads = loss_and_grad_function(model, inputs, targets)
   optimizer1.update(model.layer1, grads["layer1"])
   optimizer2.update(model.layer2, grads["layer2"])

View full answer

awni · 2024-02-27T05:18:03Z

awni
Feb 27, 2024
Maintainer

No you don't need to freeze the different layers. For your specific case you would do something like:

model = Model(...)

def step(inputs, targets):
   loss, grads = loss_and_grad_function(model, inputs, targets)
   optimizer1.update(model.layer1, grads["layer1"])
   optimizer2.update(model.layer2, grads["layer2"])

1 reply

kgourgou Feb 27, 2024
Author

Excellent, thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Selectively applying different optimizers to different groups. #734

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Selectively applying different optimizers to different groups. #734

kgourgou Feb 23, 2024

Replies: 1 comment · 1 reply

awni Feb 27, 2024 Maintainer

kgourgou Feb 27, 2024 Author

kgourgou
Feb 23, 2024

Replies: 1 comment 1 reply

awni
Feb 27, 2024
Maintainer

kgourgou Feb 27, 2024
Author