Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit on number of global arrays that can be created #3

Open
cjroberts opened this issue Mar 21, 2019 · 10 comments
Open

Limit on number of global arrays that can be created #3

cjroberts opened this issue Mar 21, 2019 · 10 comments

Comments

@cjroberts
Copy link

Hello, we would like to use ga4py for parallelising the cf-python library for analysis of gescientific data (https://cfpython.bitbucket.io/). Empirically, we found that there is a hard limit on the number of global arrays that can be created of 32768. For some of our use cases we would like to create in the order of millions of arrays simultaneously. Please could you tell us why this limit exists and if it would be possible to overcome it?

@bjpalmer
Copy link
Member

Charles,

GA maintains an internal list of statically allocated array descriptors that are used to store meta-data on global arrays. This array is replicated on every MPI process so 32768 was chosen as a compromise value. We thought this would be big enough to cover most use cases but not so large that it would take up a significant fraction of available memory. If you want to increase the number of allowable GAs, you can do this fairly easily by setting the MAX_ARRAYS value in global/src/gaconfig.h to whatever value you like.

@cjroberts
Copy link
Author

Thank you for your quick reply.

Is there a way to estimate how much memory the internal list will require given the MAX_ARRAYS value?

@bjpalmer
Copy link
Member

For global arrays configure with --enable-i8 the size of the array descriptor is around 820 bytes (it should be smaller than this for --enable-i4). The total memory for the internal list will therefore be around 820MAX_ARRAYS bytes. You can get a more accurate value by adding a statement that prints out the value of MAX_ARRAYSsizeof(global_array_t) in the pnga_initialize function in global/src/base.c if you need a precise number.

@bjpalmer
Copy link
Member

Sorry, the previous statement is a bit garbled (apparently an asterisk is interpreted as a formatting character). The estimate is 820 times MAX_ARRAYS bytes.

@cjroberts
Copy link
Author

Thank you, that makes sense. We might be able to improve our memory management strategy as well, but would like to support users in increasing the maximum number of arrays as this might be acceptable on some high memory nodes. Could I also ask if it is possible to access the value of MAX_ARRAYS from the Python layer, so that if a user has changed it this can be detected automatically and we can keep track of when the number of arrays created gets too large?

@bjpalmer
Copy link
Member

At the moment there is no way to do this, although I suppose it could be added.

@cjroberts
Copy link
Author

Please would it be possible to add this to your list of feature requests?

@bjpalmer
Copy link
Member

So you are looking for a function that is roughly

int GA_Max_allowable_global_arrays()?

@cjroberts
Copy link
Author

Yes, that's right. That we can access from ga4py.

@cjroberts
Copy link
Author

Hello, we just wanted to let you know that we are pursuing another approach at the moment of using just one massive, 1D, 8 bit integer global array to store large amounts of smaller arrays of different shapes and data types. So, we might not need this function after all, although we are not sure yet which approach we will ultimately take. Thank you for your time and the information on how global arrays works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants