-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3rd Order AD] Pullback over twice Jacobian #614
Comments
Stacktrace[38:end]
|
😓 this is the third order requiring a forward over an HVP or a Reverse over Taylor Mode (TaylorDiff.jl) to be remotely efficient. Firstly I would rewrite the function to use some sort of a custom hessian because you just need the diagonals so Jacobian over Jacobian is a bad idea (performance wise). There was some AAAI paper (I think) which showed how to do diagonal of 2nd order very fast, but I can;t seem to locate the paper rn. That said, it would still require a custom rrule where you use forwarddiff over HVP. I don't think anyone is actively working on 3rd order (including me), so all I can do is point you to the code which might be helpful Lines 42 to 63 in afd8555
|
Actually @tansongchen might know how to do these kind of 3rd order differentiation efficiently. |
Thank you @avik-pal, sorry for the 😓. |
No worries, I do think it would be a nice to have feature, if we can get a general implementation for this up and running. The 😓 was for Zygote giving bad error messages. For nested reverse, Zygote becomes type unstable (beyond the simple cases) and then it throws that undefined reference error which can't be parsed by anyone who doesn't already know what is going wrong (i.e. pretty much the opposite of what an error message should do) |
Could you write the mathematical expression for the derivative you want to get? When I hear keywords like "diagonal", "forward over reverse" stuff, I have some confidence to reformulate it to make use of TaylorDiff 🤔 |
Hey @tansongchen, thanks for joining us! Absolutely. I'm trying to learn a mapping between two time-sequences Now, thermodynamic arguments suggest that for some function Ideally, I'd seek as this makes it manifest that the stress There's (at least) two difficulties here:
and then collect the derivatives Please note my shorthand for "functional" dependence Similar structure comes up in Equation (22a) of the following paper, though they use a set of internal variables |
@tansongchen do you think we can rewrite this in-terms of TaylorDiff? |
It would be easy to do once I support chunking (surely will be doing that in a month or two). The model can then be evaluated in one pass of TaylorDiff, so this will be a standard Zygote-over-TaylorDiff overlay which is well understood |
Now that we have better Enzyme support trying this out with Enzyme might be worthwhile. Right now, it will still be messy, but hopefully, #738 will make life easier here. |
I'm looking to take two jacobians of the network neural network (with respect to its inputs) and then do parameter based optimization. Up front, I want to thank you for taking the time to read this, I was hesitant to post given that I probably ought to look somewhere other than nested AD for this purpose.
In my code proper, the pullback goes through and I find a scalar indexing error on back, but I've yet to reproduce this with an MWE. So I apologize if this is more of a "request for help" than an "issue".
At the moment, I get an undefined reference error on the following gradient:
Stacktrace[1:37]:
The text was updated successfully, but these errors were encountered: