Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance unexplainabilitty for tflite int8 and fp32 models #13

Open
arun-kumark opened this issue Aug 6, 2021 · 0 comments
Open

Comments

@arun-kumark
Copy link

Dear all,
I am testing the performance/throughput of fp32 and quantized models on my platform. My configuration is as follows:

tflite-runtime==2.5.0.post1
tensorflow==1.14.0

*FP32 on CPU

-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:00:28.796083
-INFO- Throughput: 35.8 fps
-INFO- Latency: 29.5 ms
-INFO- Target          Workload        H/W   Prec  Batch Conc. Metric       Score    Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet       cpu   fp32      1     1 throughput    35.8      fps
-INFO- tensorflow_lite mobilenet       cpu   fp32      1     1 latency       29.5       ms
-INFO- Total runtime: 0:00:28.830364
-INFO- Done

INT8 on CPU

google@localhost:~/mlmark$ harness/mlmark.py -c config/tflite-cpu-mobilenet-int8-throughput.json 
-INFO- Running prediction...
-INFO- Acquired 1 file(s) for model 'MobileNet v1.0'
-INFO- Task runtime: 0:01:00.933346
-INFO- Throughput: 16.9 fps
-INFO- Latency: 65. ms
-INFO- Target          Workload        H/W   Prec  Batch Conc. Metric       Score    Units
-INFO- -----------------------------------------------------------------------------------
-INFO- tensorflow_lite mobilenet       cpu   int8      1     1 throughput    16.9      fps
-INFO- tensorflow_lite mobilenet       cpu   int8      1     1 latency       65.        ms
-INFO- Total runtime: 0:01:00.960828
-INFO- Done

Observations: The performance of FP32 model is almost double than INT8 models on CPU, but Google TensorFlow lite benchmarking mentions the opposite:
https://www.tensorflow.org/lite/guide/hosted_models#quantized_models

I also tried replacing the models from the models present in above Hosted location, but the harness gives the similar results.

Could you let me know, where it's going wrong?

Thanks
Kind Regards
Arun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant