Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning success count from the .populate() call #1050

Merged
merged 23 commits into from
Oct 9, 2023

Conversation

ttngu207
Copy link
Contributor

@ttngu207 ttngu207 commented Sep 2, 2022

Introduce a new argument return_success_count to the .populate() routine, that would return the count of successful make() calls (i.e. number of successful jobs) in one .populate() call.

This functionality allows for us to answer the question: "Did anything happen in my .populate() call?" without having to call the key_source again (e.g. with .progress())

if suppress_errors:
return error_list
if return_success_count:
return sum(success_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If success_list is only used to report the sum, why not use a counter instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with this approach because of the multi-processing logic - calling mp.Pool(). I'm not sure if, say success_count += 1, is robust in the case of parallel multi-processing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense.

Comment on lines 278 to 283
if suppress_errors and return_success_count:
return sum(success_list), error_list
if suppress_errors:
return error_list
if return_success_count:
return sum(success_list)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tuple returns are tough because the order matters. How about we return this as a dict instead?

Suggested change
if suppress_errors and return_success_count:
return sum(success_list), error_list
if suppress_errors:
return error_list
if return_success_count:
return sum(success_list)
ret = {}
if suppress_errors:
ret['error_list'] = error_list
if return_success_count:
ret['success_count'] = sum(success_list)
return ret

This would be a backward-incompatible change, so we need to update users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use this dictionary return, then we can always return the success count and the user does not need to request it.

Copy link
Contributor Author

@ttngu207 ttngu207 Dec 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, but as you said, the biggest thing is breaking backward compatibility and if we can avoid it, we should.
Breaking backward-compatibility for such a minor new feature is not worth it in my view.

Tuple returns are not as clean, but we can compensate with clear documentation of this feature in the docstring

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to break backward compatibility for better design decisions long-term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, the docstring did not specify the return argument. So it's okay to introduce a modification now with proper documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitri-yatsenko , yes, docstring did not specify the return argument. But we would still break backward compatibility if we make the changes you suggested. I'm okay with it if you're okay with it

@@ -322,6 +338,7 @@ def _populate1(
)
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this True. The docstring needs an update to explain. It's a bit odd to have so many different return types.

Copy link
Contributor Author

@ttngu207 ttngu207 Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about returning True/False for success/failure instead of None (if suppress_errors=False)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does it ever return False?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you should add return False to the case when the function returns without populating anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not populating anything constitute a False? False seems to indicate failure (maybe that's just me thinking so).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current function produces True if successful, None if nothing is populated, throws an exception, or returns (key, error). It never returns False. It seems that if a function returns True, it should also return False in other cases. Perhaps instead of None, we should return False. It would not indicate an error, but simply not populating for other reasons than an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I'll update accordingly

@ttngu207
Copy link
Contributor Author

Hi @dimitri-yatsenko , circling back to this as we need this feature for sciops.
I do like much better the return for autopopulate as dict, very clean. But that means breaking backward compatibility. So, are we firm with the decision to do this? What is our protocol for these kind of backward incompatible changes - new major version, warning statement, etc.?

Copy link
Member

@dimitri-yatsenko dimitri-yatsenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you update the CHANGELOG?

@@ -176,6 +177,8 @@ def populate(
:param max_calls: if not None, populate at most this many keys
:param display_progress: if True, report progress_bar
:param processes: number of processes to use. Set to None to use all cores
:param return_success_count: if True, return the count of successful `make()` calls.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring appears to be missing a description of the return argument.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit this input and always include the success count as part of the return dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear what you mean here. The docstring is included there.
Do you mean it is not descriptive enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you mean

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring should include :return: ...

restriction = self.subject.proj(animal="subject_id").fetch("KEY")[0]
d = self.trial.connection.dependencies
d.load()
success_count, _ = self.trial.populate(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will populate return a dict now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test should reflect that populate returns a dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

make_kwargs=make_kwargs,
)

if processes == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is processes == 0 handled? Perhaps

if not processes:
       return {
            "success_count": 0,
            "error_list": [],
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is handled in the else block

datajoint/autopopulate.py Outdated Show resolved Hide resolved
datajoint/autopopulate.py Outdated Show resolved Hide resolved
@@ -322,6 +338,7 @@ def _populate1(
)
if jobs is not None:
jobs.complete(self.target.table_name, self._job_key(key))
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current function produces True if successful, None if nothing is populated, throws an exception, or returns (key, error). It never returns False. It seems that if a function returns True, it should also return False in other cases. Perhaps instead of None, we should return False. It would not indicate an error, but simply not populating for other reasons than an error.

Comment on lines 247 to 252
status = self._populate1(key, jobs, **populate_kwargs)
if status is not None:
if isinstance(status, tuple):
error_list.append(status)
elif status:
success_list.append(1)
Copy link
Member

@dimitri-yatsenko dimitri-yatsenko Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change _populate to return True, False, or (key, error)

Suggested change
status = self._populate1(key, jobs, **populate_kwargs)
if status is not None:
if isinstance(status, tuple):
error_list.append(status)
elif status:
success_list.append(1)
status = self._populate1(key, jobs, **populate_kwargs)
if status is True:
success_list.append(1)
elif isinstance(status, tuple):
error_list.append(status)
else:
assert status is False

@dimitri-yatsenko dimitri-yatsenko changed the title New kwarg for autopopulate - returning success count after the .populate() call Returning success count from the .populate() call Oct 9, 2023
@dimitri-yatsenko dimitri-yatsenko merged commit 10511e7 into datajoint:master Oct 9, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants