Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/rxm : Corrected rxm to return an existing MR key in its MR Map. #10371

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nikhilnanal
Copy link
Contributor

if an key/item already exists in its MR map rxm returns an error instead of reflecting the behaviour of the underlying core provider viz returning a key. This patch corrects the behaviour by returning a success to make sure it is consistent with the core provider's behaviour.

if an key/item already exists in its MR map rxm returns an error instead of
reflecting the behaviour of the underlying core provider viz returning a key.
This patch corrects the behaviour by returning a success to make sure it is
consistent with the core provider's behaviour.

Signed-off-by: Nikhil Nanal <[email protected]>
Comment on lines +375 to +378
if (ret != -FI_ENOKEY)
FI_WARN(&rxm_prov, FI_LOG_DOMAIN,
"MR map insert for atomic verification failed %d\n",
ret);
Copy link
Contributor

@j-xiong j-xiong Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of FI_ENOKEY, the key in the mr_map is still associated with an old rxm_mr, which apparently is still valid. We are now in a situation that a key is associated with multiple rxm_mr's, but only one is present in the mr_map.

In rxm_handle_atomic_req, rxm_mr_get_map_entry is called to retrieve the rxm_mr associated with the key. In this case, the old rxm_mr is going to be referenced. That may or may not work as expected. If one close the old rxm_mr. The key will be removed from the mr_map. The atomic req processing would fail even though it shouldn't.

I don't have a good solution for this. It's basically a behavior confliction of cached MR at verbs level and non-cached behavior of the mr_map at rxm level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to do a lookup before inserting the rxm_mr and only insert if lookup fails?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make any difference. The mr_map is mainly used to find rxm_mr with a key. There can be only one association in mr_map, but we have multiple rxm_mr's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants