Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'chunk size' = 128MB and 256MB, with automatic chunking scheme not creating 128MB or 256MB of chunk size. #30

Open
tinaok opened this issue Sep 27, 2019 · 3 comments

Comments

@tinaok
Copy link
Collaborator

tinaok commented Sep 27, 2019

I created a branch with 'debug' in my repo. There for each operation, it shows dataset information with chunk size. I created a table from output of debug information as follows.
https://github.com/tinaok/benchmarking/blob/debug/chunksize.md

For Automatic chunking scheme, I tried to re-calculate the chunking size, out of number of element each chunk size that each debug output showed.
For the benchmark with 512MB, chunk size =(251, 192, 160) : it reproduce 520 * 384 * 320 * 8 = 511MB so it is ok.
But for 256MB , chunk size =(317, 192, 160): it is 317 * 192 * 160 * 8=78MB, so chunking size is not working.
Same for 128MB, chunk size =(251, 192, 160): it is 251 * 192* 160 * 8=62MB, so chunking size is not working.

anyidia what's going on here?

@tinaok
Copy link
Collaborator Author

tinaok commented Sep 27, 2019

I made a little script for reproducing this problem.
https://gist.github.com/tinaok/c2ef193e94508a5ba426979d01e99307
from chunk_size 380MB, automatic chunk size creates '380MB' chunk size, but if chunk_size is less than 370MB, automatic chunk size does not create 370MB, it only creates less than half size of chunk size.....

@tinaok tinaok changed the title 'chunk size' = 128MB and 256MB, with automatic changing scheme not reproducing 128MB or 256MB of chunking size? 'chunk size' = 128MB and 256MB, with automatic chunking scheme not creating 128MB or 256MB of chunk size. Sep 29, 2019
@tinaok
Copy link
Collaborator Author

tinaok commented Oct 1, 2019

dask/dask#5439 (comment)
Hi @andersy005 & @kmpaul
as you can see from this issue, 'auto' does not create exact chunk size as it is defined in dask.config.set({"array.chunk-size": '256MB'}),
for example when setting 256MB, it created only 78MB
any thoughts on this?

@andersy005
Copy link
Member

@tinaok, thank you for pointing this out! This is something worth looking into. I will follow the discussion in dask/dask#5439.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants