Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need ygm::container::unordered_map or similar #118

Open
bwpriest opened this issue Dec 22, 2022 · 18 comments
Open

Need ygm::container::unordered_map or similar #118

bwpriest opened this issue Dec 22, 2022 · 18 comments

Comments

@bwpriest
Copy link
Member

It will be useful for performance reasons to maintain a distributed key-value store with an interface like ygm::container::map that uses a hashmap under the hood instead of std::map. std::unorderd_map is a logical choice, but I gather that we will need to do something to ensure that we use different hash functions for assignment of keys to ranks versus keys to hash bins.

@rogerpearce
Copy link
Collaborator

@bwpriest, ++ on this.

However, I would want us to fully convert to a separated key/value interface instead of the std::pair<const key,value>, including for_all(), before we start designing a new hash table. I'm not a big fan of std::unordered_map; IMHO, its not a big enough win over std::map to justify the effort. Fortunately, for our standard insert-more-than-delete use cases, there are much better simple tables we could design.

Perhaps step 1 is to overhaul ygm::map and ygm::multimap to remove all echoes of std::pair<const key,value>.

@bwpriest
Copy link
Member Author

@rogerpearce if we're ready to lose all support for std::pair arguments in map lambdas, then I can make that change swiftly. I think I already converted all of the internal and example lambdas to the new format.

@rogerpearce
Copy link
Collaborator

Lets wait to hear from @steiltre since he will have external code to update. I think its worth the pain to do this now.

@rogerpearce
Copy link
Collaborator

One thing I thought about: For array, we probably want to support both:

array.for_all ( [](size_t index, auto& value) {} ) ;
array.for_all ( [](auto& value) {} ) ;

Also, we need a concept to determine at compile time what the interface is, similar to what "value_type" does for STL. We could have a "visit_types" that is a tuple and we just assume that its interpreted as separate arguments to a callback.

@bwpriest
Copy link
Member Author

array.for_all ( [](size_t index, auto& value) {} ) ; array.for_all ( [](auto& value) {} ) ;

This will be easy to manage for local lambdas. If we want to do something similar for remote lambdas, we'll be unable to distinguish between pointer-value and index-value intents with [](auto, auto &){} signatures.

Also, we need a concept to determine at compile time what the interface is, similar to what "value_type" does for STL. We could have a "visit_types" that is a tuple and we just assume that its interpreted as separate arguments to a callback.

I don't understand what you are proposing here. What would the elements of the visit_types tuple be? index_type and value_type? (index_type, value_type) and value_type?

@steiltre
Copy link
Collaborator

steiltre commented Feb 24, 2023

Lets wait to hear from @steiltre since he will have external code to update. I think its worth the pain to do this now.

I think I've managed to get all of the code I care about updated to the new map API today, or have pegged code I don't plan on supporting/updating to use YGM v0.5. I will merge in the associated PR to develop and send out a warning to anyone this may affect.

@steiltre
Copy link
Collaborator

One thing I thought about: For array, we probably want to support both:

array.for_all ( [](size_t index, auto& value) {} ) ; array.for_all ( [](auto& value) {} ) ;

Any reason we want this for array.for_all and not array.async_visit? I'm not advocating for a more complicated interface (especially given the shortcomings of this functionality highlighted above), just trying to get more context.

@rogerpearce
Copy link
Collaborator

I want it for_all() so that bag and array can behave similarly. I don't see the need for async_visit() to have both interfaces, because bag doesn't have an async_visit.

@bwpriest
Copy link
Member Author

bwpriest commented Feb 24, 2023

I want it for_all() so that bag and array can behave similarly.

In that case, does it make sense to also support:

map.for_all([](value_type &){...});

@rogerpearce
Copy link
Collaborator

I don't see the need for it for map now, but could be convinced if you think it would be useful.

I think both Bag & Array should let you loop over all the elements without an index. Map its natural to loop overall all keys and values. Could have for_all_keys and for_all_values, but again, not necessary right now.

@bwpriest
Copy link
Member Author

I don't see the need for it for map now, but could be convinced if you think it would be useful.

My main argument for doing this is to maintain a similar interface for map and array. In a lot of my codes I like to template the index-feature container away and swap in what I want to use at compile time. For example, if I have my value_type == std::vector<float>, I might want to multiply them all by a normalizing scalar. In this case, I don't care whether the data is stored in an array or map. I just want to normalize the features.

@rogerpearce
Copy link
Collaborator

So would a for_all_values be useful for this? Seems so.

Bag could support for_all_values, and then make it consistent with array, which is really what I'm after.

In some ways we have Array acting as both a value container and an associative container.

When we do finally get a tagged_bag container, will naturally have the key/value concept too.

@steiltre
Copy link
Collaborator

steiltre commented Mar 2, 2023

Trying to summarize the lambda interfaces that everyone wants.

bag::for_all() : (value_type)
bag::for_all_values() : (value_type)

array::for_all() : (index_type, value_type)
array::for_all_values() : (value_type)

map::for_all() : (index_type, value_type)
map::for_all_values() : (value_type)

Does this support everyone's desired functionality? Or do we need to add in some constexpr checks and support array::for_all() and/or map::for_all() that take only value_type as well?

@bwpriest
Copy link
Member Author

bwpriest commented Mar 2, 2023

Or do we need to add in some constexpr checks and support array::for_all() and/or map::for_all() that take only value_type as well?

I don't have a use case for that right now, but I could imagine wanting to for_all() over map keys if they are interesting, like strings

@rogerpearce
Copy link
Collaborator

I thought we decided to do this via adapters, instead of for_all_values and for_all_keys, no?

I do think we should have some typedefs:

key_type -- could be missing for Bag
value_type -- could be missing for set
for_all_types -- is a tuple of arguments that for_all() matches to using std::apply. Could be used to validate lambda with is_invocable(std::apply(lambda, for_all_types))

@rogerpearce
Copy link
Collaborator

I think I figured out how to compile time test if a lambda is properly invocable and if its STL (std::for_each) vs YGM (for_all()).

If we had conatiner::for_each_types it could be robust.

@rogerpearce
Copy link
Collaborator

Actually, we should call them:

for_all_args
async_args
async_args_optional

Each should be a tuple that is expanded before is_invocable.

@steiltre
Copy link
Collaborator

steiltre commented Mar 2, 2023

I thought we decided to do this via adapters, instead of for_all_values and for_all_keys, no?

Ah. I do think that was the idea. Just saw PR #135 and forgot we were trying to handle this more generically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants