-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generating Jacobians as Tensors #80
Comments
I am not sure what you are asking. |
Sorry I didn't explain very well. Suppose we have a function y=f(x) where y \in R^m and x \in R^n. The Jacobian or sensitivity matrix is then R^{mxn}. So far so good. Now suppose we want to evaluate the function for N independent samples. We can the write x \in R^{Nxn} and y \in R^{Nxm} for the same function f. In matlab parlance this is y[i,:]=f(x[i,:]). In other words we expand this into a batch dimension so a given batch dimension in y depends only on the exact same batch dimension in x only. I call this construct a tensor. Now further suppose this function can be executed in a Data Parallel fashion, and moreover the function is AD compatible. This is faster than executing the function in thread parallel fashion because of the typically wide SIMD capabilities of modern CPUs and GPUs. Currently in CppADCodeGen to calculate the jacobian we have to run the jacobian in parallel in its own memory space e.g. using OpenMP or tbb. Now if the generated jacobian and all the associated intermediate elements can be emitted as tensors during the code generation, that is the jacobian is now R^{Nxmxn} we can use one of the off the shelf frameworks e.g. pytorch, to calculate the Jacobian in a data parallel manner using a GPU. That would greatly speed up the computation. I hope this makes sense. |
As I understand, you are asking is for the derivative computation to be vectorized ? I think this may be possible in CppAD by defining you own Base type, much the way CppADCodeGen does. This base type would would implement element-wise operations on vectors: e.g., This would not be an easy task. You can find out about defining your own Base type at There is one operation there that would not fit in, namely the Integer function. I think this is mainly used by VecAD operations |
Thank you-I think this is definitely a good approach. I will take a look at the documentation you pointed out and see how easy it is to implement. |
Start small, just do a minimal extension that only implements a few operations like +, -, *, /. |
CppADCodeGen can already be used for parallel executions but it has to be done manually using different threads. This will likely produce faster executions than using CppAD directly with vector operations. A single thread using CppADCodeGen can already produce results that can be more than 10 times faster than regular CppAD. If you would like to benefit from running on different devices (e.g., GPUs) I would recommend choosing a technology (such as OpenCL or CUDA) and modifying the LanguageC source generate so that it supports this. |
I agree that the best speeds would have to include the conversion to compiled code using CppADCodeGen or CppAD JIT The CppAD JIT does not support vector base types (at this time) so you would still have to do as suggested above. |
Sorry for my late response. I had to go brush up on my computer architecture. I did a lot of numerical experiments. Here is what I discovered. Compilers will not vectorize functions with many statements. They will skip them to save optimization time. However if you split each generated statement into its own function then the statements can be very easily vectorized. I think this would be a lot of work to do, however if you point me to the right place I can attempt to do this for a small problem. I believe this will result in tremendous speedup for some interesting problems in control. The bottom line is every generated statement needs to be wrapped in a function for this to help. It cannot be in-lined. |
There are several avenues that you can take. If what you need is to compute derivative information for several independent variable vectors then the best would be to parallelize at the highest level possible to avoid the overhead of handling/synchronizing several threads. If you would like to parallelize the evaluation of derivative information for a single independent variable vector then you can try to use the existing support. See https://github.com/joaoleal/CppADCodeGen/tree/master/test/cppad/cg/model/threadpool To be able to create loops with CppADCodeGen, you can try to use the pattern/loop detection options but this does not work with openMP/threadpool.
would not work. You can also try to place the common code in an atomic function and then call it multiple times (it will reduce the compilation time but I am not sure what will happen to the execution time). You can also try to mess around with the compiler options (e.g., use -O3) with You could also try to combine multiple of these approaches. |
I think that the Blass linear algerbra package does this for you |
Also consider using eigen for blas operations: https://eigen.tuxfamily.org/index.php?title=Main_Page |
There are use cases e.g. in optimal control where the trajectories and associated gradients and jacobians are required to be calculated in parallel. An simple example would be generating optimal trajectories in phase space using some statistical objective for uncertain initial conditions.
Other AD frameworks such as Google Jax do a poor job of calculating Jacobians explicitly, but a very good job vectorizing the calculations in parallel using a GPU.
Is there a way to generate tensors (e.g. pytorch tensors) as the output of CppAD instead of matrix jacobians? This would allow for the efficient calculation of explicit Jacobians on a GPU without much effort. I don't think this would be a huge change. I am not asking you to do this but if you could point out which files need to be modified I can tackle it.
The text was updated successfully, but these errors were encountered: