-
Hi all, As I'm learning about MLX, I got curious about how I would use different optimizers with different learning rates for parts of a model, then I saw this issue and the subsequent comment, so I'm curious what's the MLX-way to accomplish this. Say for example that I have this setup: class Model(nn.Module):
def __init__(self, output_dims: int, in_dims: int, hidden_dims: int):
super().__init__()
self.layer1 = nn.Linear(in_dims, hidden_dims)
self.layer2 = nn.Linear(hidden_dims, output_dims)
def __call__(self, x):
x = self.layer1(x)
x = mx.maximum(x, 0.0)
x = self.layer2(x)
return x
optimizer1 = optim.SGD(learning_rate=learning_rate_1)
optimizer2 = optim.SGD(learning_rate=learning_rate_2) One way is to freeze one layer, update the other, freeze the second, update the first, etc. Is there a cleaner way than this? |
Beta Was this translation helpful? Give feedback.
Answered by
awni
Feb 27, 2024
Replies: 1 comment 1 reply
-
No you don't need to freeze the different layers. For your specific case you would do something like: model = Model(...)
def step(inputs, targets):
loss, grads = loss_and_grad_function(model, inputs, targets)
optimizer1.update(model.layer1, grads["layer1"])
optimizer2.update(model.layer2, grads["layer2"]) |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
kgourgou
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
No you don't need to freeze the different layers. For your specific case you would do something like: