-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on relation to univariate Taylor series propagation for higher-order derivatives #1
Comments
Thanks for bringing up this highly relevant work! Upon reading the paper, I think the idea of evaluating arbitrary derivative tensor elements via forward propagation univariate Taylor series is similar. I wasn't aware of this work, and based on my experience talking to people at NeurIPS, this work is not well known within the ML community even for people who work on AD in the ML context. The JAX team who wrote the Taylor mode AD I used in the paper wasn't aware of this technique and was quite surprised that one could do this. So yes your interpretation is right, STDE approximates derivative tensor with randomized univariate Taylor series propagation. |
Thanks again for your insightful questions!
I'm unable to understand why the series you gave would work though (e.g. Also, feel free to continue the discussion if you have further questions or ideas! |
What is C here and could you clarify what you mean by "is known"? I probably have a misunderstanding, but I thought in your Equation 21, 1/330 and 1/200200 are entries of C? Is it different from interpreting C as the coefficient of GUW98 equation 13 above, the coefficient term
Thanks for clarifying! If cancellation allowed, you can do it with just first order perturbations using Equation 13 of GUW98, right? So the minimum order could be
It's been a while and I forgot how I come up with that construction! I remember the idea was to construct it so that lower order perturbations will contribute once and only once at the max order. |
This C corresponds to the linear derivative operators. For example, for the Laplacian operator, C is simply the identity matrix. The 1/330 and 1/200200 in Eq. 21 in my paper are the "constant_ratio" in your gist. The formula for this ratio is given by Faa di Bruno's formula.
Yes, this is my observation as well. The One further comment: I think going for higher jets seems to be slightly better. Here's the sketch:
Thanks for posting the gist, this clears up a lot! I thought you were using |
The complexity analysis should also include the order of jet forward. The complexity in implementations like Additionally, it's also desirable to consider the simultaneous computation of multiple mixed partial derivatives. GUW98 offers an optimized approach for sharing jet forward computations (see Eq 17) that computes all entries of the high-order derivatives tensor with a better complexity.
I see! Thanks! |
Hi,
Thank you for your interesting work! I’ve been reading your paper and found the idea of stochastically evaluating higher-order derivatives quite intriguing.
I was wondering if you could comment on how your technique relates to the known method of evaluating higher-order derivative tensors by propagating a set of univariate Taylor series. This idea is discussed in this paper and the Evaluating Derivatives book (Chapter 13).
Specifically, could your method be interpreted (or extended) as sampling a set of univariate Taylor series to propogate, to the end of approximating the derivative tensor? I’d love to hear your thoughts on this connection or any fundamental differences.
Thanks in advance for your response!
The text was updated successfully, but these errors were encountered: