Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLBlast support #32

Merged
merged 27 commits into from
May 13, 2016
Merged

CLBlast support #32

merged 27 commits into from
May 13, 2016

Conversation

psyhtest
Copy link

@psyhtest psyhtest commented May 12, 2016

@naibaf7

I've implemented support for Cedric Nugteren's CLBlast library. The 0.6.0 version had a few issues but the most recent 0.7.0 version seems to have addressed them. In addition, 0.7.0 added support for xASUM which helped to keep integration clean.

I've tested this integration on the Samsung Chromebook 2 with the ARM Mali-T628 GPU and version v6.0 of the driver, skipping the known test failures (#28, #29, #30) that are currently opened for that platform.

Please review.

@bhack
Copy link

bhack commented May 12, 2016

/cc @CNugteren

@naibaf7
Copy link
Owner

naibaf7 commented May 13, 2016

@psyhtest
Nice one, thanks. Will be reviewed over the weekend.

@naibaf7 naibaf7 merged commit a95a523 into naibaf7:master May 13, 2016
@naibaf7
Copy link
Owner

naibaf7 commented May 13, 2016

@psyhtest
Merged this into my branch for now for people who want to test cutting-edge.
I'll do some cleanup and make lint corrections before pushing it to the BVLC repository.

@psyhtest psyhtest deleted the CLBlast.support branch May 14, 2016 13:17
@psyhtest
Copy link
Author

@naibaf7 Thanks!

You may notice that in blocks dispatching calls into CLBlast I use different formatting and explicitly define some constants (e.g. incX, offY). I believe it would be beneficial for clBLAS code too, as this would make it more readable, but I understand if you need to follow an established Caffe style.

Another thing is that even similar code blocks use different styles e.g.

        clblast::Scal<float>(
          N, // uppercase
          alpha,
          x, offx, incx, // all lowercase
          &queue
        )
        clblast::Asum<float>(
          n, // lowercase
          Z, offZ,
          X, offX, incX, // uppercase X, mixed case offX and incX
          &queue
        )

@bhack
Copy link

bhack commented May 14, 2016

@naibaf7 How this is different from autotuining code that you are writing.

@naibaf7
Copy link
Owner

naibaf7 commented May 14, 2016

@bhack
Greenea-LibDNN autotuning you mean? There I attempt to autotune a fused kernel that does not need an intermediate convolution buffer.
CLBlast is an autotuned BLAS that can be tested against ViennaCL and clBLAS for regular GEMM convolutions.
And of course a BLAS is also needed for other auxiliary operations in the network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants