-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning success count from the .populate()
call
#1050
Returning success count from the .populate()
call
#1050
Conversation
datajoint/autopopulate.py
Outdated
if suppress_errors: | ||
return error_list | ||
if return_success_count: | ||
return sum(success_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If success_list is only used to report the sum, why not use a counter instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with this approach because of the multi-processing
logic - calling mp.Pool()
. I'm not sure if, say success_count += 1
, is robust in the case of parallel multi-processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense.
datajoint/autopopulate.py
Outdated
if suppress_errors and return_success_count: | ||
return sum(success_list), error_list | ||
if suppress_errors: | ||
return error_list | ||
if return_success_count: | ||
return sum(success_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tuple returns are tough because the order matters. How about we return this as a dict instead?
if suppress_errors and return_success_count: | |
return sum(success_list), error_list | |
if suppress_errors: | |
return error_list | |
if return_success_count: | |
return sum(success_list) | |
ret = {} | |
if suppress_errors: | |
ret['error_list'] = error_list | |
if return_success_count: | |
ret['success_count'] = sum(success_list) | |
return ret |
This would be a backward-incompatible change, so we need to update users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use this dictionary return, then we can always return the success count and the user does not need to request it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but as you said, the biggest thing is breaking backward compatibility and if we can avoid it, we should.
Breaking backward-compatibility for such a minor new feature is not worth it in my view.
Tuple returns are not as clean, but we can compensate with clear documentation of this feature in the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to break backward compatibility for better design decisions long-term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, the docstring did not specify the return argument. So it's okay to introduce a modification now with proper documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dimitri-yatsenko , yes, docstring did not specify the return argument. But we would still break backward compatibility if we make the changes you suggested. I'm okay with it if you're okay with it
datajoint/autopopulate.py
Outdated
@@ -322,6 +338,7 @@ def _populate1( | |||
) | |||
if jobs is not None: | |||
jobs.complete(self.target.table_name, self._job_key(key)) | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this True
. The docstring needs an update to explain. It's a bit odd to have so many different return types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about returning True/False
for success/failure instead of None (if suppress_errors=False
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When does it ever return False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you should add return False
to the case when the function returns without populating anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does not populating anything constitute a False
? False
seems to indicate failure (maybe that's just me thinking so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current function produces True
if successful, None
if nothing is populated, throws an exception, or returns (key, error)
. It never returns False
. It seems that if a function returns True
, it should also return False
in other cases. Perhaps instead of None
, we should return False
. It would not indicate an error, but simply not populating for other reasons than an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'll update accordingly
Hi @dimitri-yatsenko , circling back to this as we need this feature for sciops. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you update the CHANGELOG?
datajoint/autopopulate.py
Outdated
@@ -176,6 +177,8 @@ def populate( | |||
:param max_calls: if not None, populate at most this many keys | |||
:param display_progress: if True, report progress_bar | |||
:param processes: number of processes to use. Set to None to use all cores | |||
:param return_success_count: if True, return the count of successful `make()` calls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the docstring appears to be missing a description of the return argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we omit this input and always include the success count as part of the return dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit unclear what you mean here. The docstring is included there.
Do you mean it is not descriptive enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you mean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring should include :return: ...
tests_old/test_autopopulate.py
Outdated
restriction = self.subject.proj(animal="subject_id").fetch("KEY")[0] | ||
d = self.trial.connection.dependencies | ||
d.load() | ||
success_count, _ = self.trial.populate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will populate return a dict now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test should reflect that populate
returns a dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
make_kwargs=make_kwargs, | ||
) | ||
|
||
if processes == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is processes == 0
handled? Perhaps
if not processes:
return {
"success_count": 0,
"error_list": [],
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is handled in the else
block
datajoint/autopopulate.py
Outdated
@@ -322,6 +338,7 @@ def _populate1( | |||
) | |||
if jobs is not None: | |||
jobs.complete(self.target.table_name, self._job_key(key)) | |||
return True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current function produces True
if successful, None
if nothing is populated, throws an exception, or returns (key, error)
. It never returns False
. It seems that if a function returns True
, it should also return False
in other cases. Perhaps instead of None
, we should return False
. It would not indicate an error, but simply not populating for other reasons than an error.
Co-authored-by: Dimitri Yatsenko <[email protected]>
datajoint/autopopulate.py
Outdated
status = self._populate1(key, jobs, **populate_kwargs) | ||
if status is not None: | ||
if isinstance(status, tuple): | ||
error_list.append(status) | ||
elif status: | ||
success_list.append(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change _populate
to return True
, False
, or (key, error)
status = self._populate1(key, jobs, **populate_kwargs) | |
if status is not None: | |
if isinstance(status, tuple): | |
error_list.append(status) | |
elif status: | |
success_list.append(1) | |
status = self._populate1(key, jobs, **populate_kwargs) | |
if status is True: | |
success_list.append(1) | |
elif isinstance(status, tuple): | |
error_list.append(status) | |
else: | |
assert status is False |
.populate()
call.populate()
call
Introduce a new argument
return_success_count
to the.populate()
routine, that would return the count of successfulmake()
calls (i.e. number of successful jobs) in one.populate()
call.This functionality allows for us to answer the question: "Did anything happen in my
.populate()
call?" without having to call thekey_source
again (e.g. with.progress()
)