-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated GPGPU code #96
base: master
Are you sure you want to change the base?
Conversation
…st time; however, the performance difference from removing branching will only truly become apparent after DP has been implemented.
What are the exact commands that I have to run? |
Oh the script didn't stage, and I need to learn py.test. I'm too used to gmock. Give me a second |
…ignal version (Including the previous LIF OpenCL script)
Please go into the main directory and run
Thanks |
(GeForce GTX 980) |
Awesome, thanks! Later on, since the 980 uses compute model 5.1, I may ask you to benchmark a few reduce functions. |
I can't squeeze much more performance out of the reduction kernel without raising the requirements to OpenCL 2.0; so I will leave it for the time being if we decide to pursue that route. |
Ok so I'm going to try to work on splitting up the workload between gpus. Not full multi gpu support. Highly selective right now since it would break a lot if I tried to do everything. After that (maybe next week) I want to add a low requirement mode for ARM since nengo is using some features that not all ARM processors support. Finally I'll be working on multitheading memory transfers and task issuing sometime after adding arm support but this is a huge job and I may need design documentation to complete it. Said being if I can't do this on my own I may ask a friend of mine who is an expert on deferred processing to assist me. |
Most of gemv is actually very well written (good job lol) and can't be improved much in the current feature set (currently on 1.2, I'd need to increase it to 2.1) without making the code unmaintanable and unreadable. However increasing out tool set to 2.1 would create huge performance increases so I do want to consider this eventually. |
The next version of opencl has a mode for scaling graphs with no cpu to gpu communication needed. It isn't out yet (I think it comes out mid June) but I'll be buying a GTX 1080 anyway so I may be able to play around with it a bit. It may increase performance. (Only the new pascal and AMD card will support it) |
Just wanted to post an update. I didn't stop working on this. Sadly my pc died about 25 days ago but I should have it up and running soon |
I'll push the code here as I continue to develop it this summer. LIF's code was updated and seems to be working flawlessly. Performance increases are still similar to #92 (Eg: 5% - 10%). If you have an OpenCL compatible GPU, please run this code and comment below with the percent increase/decrease for the function "test_lif_speed" for 1000 iterations.
Thanks
Louis