-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intel i5-5300U HD Graphics doesnt finish running #36
Comments
@rajgott Also try to run:
and post the result here. It is the "built-in clinfo" for OpenCL Caffe. |
Here is my Makefile.config This happens with standard models from bvlc, for example googlenet. INTEL_SPATIAL was enabled, now i enabled LIBDNN also. Even with this the problem remains. Output of ./build/tools/caffe device_query: Thanks |
Can you try with LIBDNN enabled but INTEL_SPATIAL disabled and also with both disabled? Just to be sure. From your information I can't think of another problem than a convolution that is stalling. |
I tried all combinations with LIBDNN and INTEL_SPATIAL, but inference still stalls. ./build/test/test_all.testbin 1. For example BatchNormLayerTest/2: ./build/test/test_all.testbin 0 is much slower. For example BatchNormLayerTest/2: Thanks |
@rajgott I can't really find out what's wrong, seems you ran the runtest with INTEL_SPATIAL compiled. You should really compile it in the default configuration (LIBDNN and INTEL_SPATIAL off) to find out more... |
I have INTEL_SPATIAL and LIBDNN commented out as in the default Makefile.config CMakeCache.txt shows this: How else to confirm these two are disabled/enabled? |
I have the same CPU, and I experience the exact same failure on device 1. However, that's the CPU cores, which we don't plan to use. If I run that test case on device 0 (the real HD Graphics GPU), it passes. I am using the default setting of USE_INTEL_SPATIAL. The problem exposed by the tests is that the GPU backend (device 0) runs many at a speed somewhere between 60x and 270x slower than device 1. During this time, the test is using about 6 % of a CPU core and the rest of its time seems to be io_wait. |
@mattg-sp |
First, thanks for all your help! Second, that command has been running for 75 minutes, using 99% of a CPU core (virtually all user time). I've attached a profile and the callstacks of all the threads. caffe_profile-time_gpu0_bvlc_alexnet-benchmark64.txt |
@mattg-sp
It does not stall like that on either the i7-3632QM or i7-6560U integrated GPUs I use for testing (both on beignet-1.1 and Fedora 23/24). @gongzg ideas? |
sumac:~/caffe # uname -a
sumac:~/caffe # cat /etc/os-release
I'm having issues attaching files to this post, so I'll attach the rest of the requested details in a separate post. |
@naibaf7 @mattg-sp @rajgott , beignet seems have a cpu side performance issue with the gradient test case. It runs slower and slower during the iteration and seems to be stalled. Beignet team is investigating it. But you will not have that issue if you run the convnet-benchmark or the case Fabian mentioned above with INTEL_SPATIAL enabled. And for BDW, the recommended kernel version is 4.4 or newer. For SKL the recommended kernel version is 4.6 or newer. |
caffe-device_query.txt The hardware is actually an Intel NUC (model: NUC5i5MYHE) with 16 GB of RAM. Actually, here's the top of /proc/meminfo:
|
@gongzg thanks, but would that explain the behavior of caffe time -model models/bvlc_alexnet/benchmark64.prototxt? Also, how do I know whether we're using beignet? The runtime I'm using is intel-linux-media-ocl_generic_16.4.4-47109_64bit.tar.gz, which I downloaded from Intel's website. Is that built on beignet? |
clinfo_dos.txt |
@mattg-sp Alternatively, try to compile the most recent beignet from source: https://cgit.freedesktop.org/beignet/ The installation is successful if you can find the "beignet-1.x" string in clinfo. |
@mattg-sp let's focus on one configuration at one time. I mean, all of my comments is for USE_INTEL_SPATIAL=ON. And I just saw your clinfo and confirmed that you are using the close source opencl compiler. But the version is a little bit out-of-date. Please change to use the latest published version at https://software.intel.com/en-us/articles/opencl-drivers#latest_linux_driver. The clinfo should be: |
@gongzg |
I believe the closed source SDK came first. It's understandable why people want open source, though. The reason we're using closed source is that we're also using Intel's Media SDK. I'll investigate whether beignet can be used in conjunction with that. |
@naibaf7 that's a little bit complicate. One of the reason is the open source version is for Linux only and the close source version is derived from Windows OCL driver. @mattg-sp Thanks for you explanation and yes, the closed source SDK for windows came first, then we have the open source for linux and then the OpenCL SDK begin to support Linux. I would stop this discussion here. Let's focus on the issue itself :). If you want to use beignet, the recommended beignet version is git master, and the recommended LLVM version is LLVM 3.6. LLVM evolve very quickly and some times the newer version brings compatibility issue with beignet. You can check the link https://www.freedesktop.org/wiki/Software/Beignet/ which recommend to use LLVM 3.5 or 3.6. If you have a BDW or HSW machine and want to use the OpenCL SDK, I would suggest the version "1.0.47971" which is what I am using for BDW machine right now and should not have any issue to run those benchmark. If you have a SKL machine, you will have beignet support only so far. |
By upgrading my i915 device driver, I was able to resolve the issue of slow tests. Now, all unit tests pass on the GPU except for these:
And those pass on the CPU device. The benchmark64 still hangs on the GPU device, however. I will now investigate using the OpenCL 2.0 runtime and Beignet. |
@mattg-sp |
Thanks. I got up to benchmark32 to work on GPU 0 (Total Time: 12673.4 ms). Incidentally, it's about twice as fast as GPU 1 (Total Time: 25053.4 ms - on a CPU with 2 cores / 4 threads). Wait... now benchmark64 even works. But I can still scroll back to last night and see the run that didn't work. Nothing changed, since then. No reboots, nor did I run or install anything until I started with benchmark1, this morning. I'm definitely not mistaken. I've checked over the parameters and I can clearly see that I canceled the failed run after 7m2.418s. Update: even benchmark128 passed. Three out of three times, so far. Maybe I'll reboot and see if I can get it to hang, again. |
Oh, I was also going to ask whether any benchmark data from different platforms is collected anywhere. And are the unit test failures I mentioned a few posts ago anything to be concerned about? Are they likely to compromise the integrity of my results? |
@mattg-sp Which device is GPU and which one is CPU on your system (0 and 1)? If 0 is the GPU, then that's not too bad :) |
I have an i5-5300U and want to do inference on the integrated GPU. I can detect the GPU using clDeviceQuery. I compiled and installed Greentea with Intel OpenCL 1.2.
clDeviceQuery.txt
I can test my model on CPU in ~1 second per image. When i switch to GPU mode the inference doesnt finish, i have waited 6+ hours. It shows CPU is running at close to 100% during this run.
Is this normal? Has anyone go it to work on intel integrated graphics?
The text was updated successfully, but these errors were encountered: