Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are you planning to develop the project further? #1

Open
mkaskov opened this issue Oct 17, 2016 · 18 comments
Open

Are you planning to develop the project further? #1

mkaskov opened this issue Oct 17, 2016 · 18 comments

Comments

@mkaskov
Copy link

mkaskov commented Oct 17, 2016

Nice project
very intresting projects.
I tested it on nexus5 (snapdragon 800)

Are you planning to develop the project further?

for example

  • new Nets: googlenet inception, squeezenet or etc
  • new layers: concat, rnn with dif types
@matinhashemi
Copy link
Member

Hi

Yes. In fact, we are 1) developing faster mobile GPU algorithms for currently supported layers, and 2) adding compressed neural networks. You are more than welcome to join the project.

Large models, e.g., googlenet, do not fit in the mobile memory and therefore are not part of our development plans, but compressed models, e.g., squeezenet, do fit and we are working on them.

Currently we are focused more on CNNs not RNNs.

Matin

@mkaskov
Copy link
Author

mkaskov commented Oct 18, 2016

Hi
Im glad to see it.

  1. On my device (nexus5) CNNdroid in parallel mode time to recognize one image ~800-900 ms (CaffeNet)
  2. Then I tried to add SqueezeNet. but when I create a network I found that the CNNdroid does not support the concat and dropout layers. Now I think how to implement them for calculations on Renderscript.
  3. Google TensorFlow has a mobile app for android with pruned inception5h network. https://github.com/tensorflow/tensorflow/tree/master/tensorflow/examples/android I tested it on my device. time to recognize ~300-500ms but their application does not support mGPU. inception5h has 55mb size on storage.
  4. I tested few other apps without mGPU support android native C++ and it result 700-1500 ms

@matinhashemi
Copy link
Member

Yes, the computation times are about what we have measured as well. About 1 second for every image.

As I have mentioned, we are adding support for compressed models, e.g., SqueezeNet.

@mkaskov
Copy link
Author

mkaskov commented Oct 22, 2016

This is great news. it will be interesting to try.
Pruning or/and quantize network model gives performance boost? did you test? (preparing scripts from compact models deploy large * .msg files)

@pavelgonchar
Copy link

Hi,
Is there an advantage of using GPU-Accelerated CNNdroid over CPU only tensorflow? Do you have any idea on how faster is that when running on comparable networks?

@mkaskov
Copy link
Author

mkaskov commented Nov 6, 2016

in tensorflow app google uses inception network. it is one of the best performance network by calculation cost. closed to squeezenet but more accuracy. google uses pruning and quantization to decrease calculation cost also.
@matinhashemi said that they planning realize squeezenet. with prunned and quatizated model it will provide performance boost, maybe ....

@matinhashemi
Copy link
Member

Hi,
We have not tested tensorflow yet.

@shrutisharmavsco
Copy link

@mkaskov were you able to run the demo projects on Nexus 5 without any changes? I am having trouble running them as is on a Nexus 5 with Android 6.0 - I've had to make a lot of changes to the project already.

@mkaskov
Copy link
Author

mkaskov commented Dec 9, 2016

@shrutisharmavsco I did a lot of additions to run the code.

@AlexandreBriot
Copy link

Regarding CNNdroid / Android Tensorflow demo comparison, I did some runtime measurements on HTC One M9.

As mentionned in your paper, with CNNdroid you can process a forward pass within 700 ms for a 16 images batchsize in optimal conditions. For a single image batchsize, runtime is around 1 second for me.

I tested the same AlexNet model by converting it from bvlc caffe to tensorflow .pb file.
Running this model instead of default inception model, I checked the mean value of inference time displayed in the tensorflow demo logcat.

  • With default bazel build, I got 615 ms inference time, which is better than CNNdroid with any GPU usage.

  • Testing the demo with a gradle build, I realised that inference time was a lot longer, around 1500 ms. In that case (both solutions built with gradle), CNNdroid is twice faster than tensorflow demo.

I got similar difference when I compare inference times for tensorflow demo with inception model building it with bazel/gradle (480ms vs 990ms).

I have very little knowledge about gradle/bazel, does this seems surprising to you to observe such a difference ? Do you think we could get similar improvement building CNNdroid with bazel ?

@filipetrocadoferreira
Copy link

Quantization and Pruning are on the plan to CNNdroid? Quantized SqueezeNet in CNNDroid should be nice :3

@latifisalar
Copy link
Member

Yes, quantization and pruning are in the pipeline and we're working to add the support for them.

@michaelholm-ce
Copy link

Roughly how close are you to squeezenet support? I am interested in trying that when the time comes.

@latifisalar
Copy link
Member

Sorry for the delayed response, the implementation of the version 1 is kind of ready, and hopefully will be pushed to the git next month.

@llhe
Copy link

llhe commented Mar 9, 2017

The underlying library for TensorFlow is Eigen and gemmlowp.
Is there benchmark done between the CPU version conv/matmul and GPU shader version?

@latifisalar
Copy link
Member

Hi,
Unfortunately, we have not done any of the mentioned benchmarks.

@siftr
Copy link

siftr commented Mar 20, 2017

Hello I was looking forward for squeezenet support in CNNdroid, roughly when will it be available?

@latifisalar
Copy link
Member

Hi,
It should be ready really soon, will give you the update about the approximate release time shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants