-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Completeness Module #141
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this is a good start, but a couple main points that should be addressed:
- To match the rest of SKDH, ideally this gets refactored so that the completeness module can be used with any incoming data - and can be used in any pipeline to produce an output file indicating the completeness
- This will need unit tests
Minor, but you should also run python-black on these files to ensure consistent formatting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in general, some of these might need to be broken down smaller so that unit testing can be better achieved - right now a lot of functions have many many code branches (ie if statements) which makes it hard to achieve full coverage of unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss this
test notification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, its looking a bit better. Main comments:
- there is so much code here that needs comments, both in-line comments explaining what blocks/lines of code are supposed to do, as well as function documentation strings
- Functions likely need to be broken down still smaller so that they dont have so much branching logic, which will make it easier to test. Either that or find ways to write it without if statements
- I still want to remove any study specific information or things from this
- data loading should be removed, and either handled with specific functions in DHEAP extensions. Also written in such a way that it can work with default loaded data from single CSVs, etc
time_periods=None, | ||
timescales=None): | ||
|
||
check_hyperparameters_init( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good candidate for function for the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also put this after the super() call, that way the signature for the class is as it was called by the user
self.time_periods = time_periods | ||
self.timescales = timescales | ||
|
||
def load_subject_data(self, subject_folder, subject, measures): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really want to get away from loading data in anything other than the io
module, since that is where data ingestion is supposed to happen. If we need to write another reader class, that we can discuss. But the new MultiReader
class might be able to handle everything thats needed for this module to run
fname = subject_folder + '/' + measure + '.csv' | ||
df_raw = pd.read_csv(fname) | ||
|
||
assert 'Time Unix (ms)' in df_raw.keys(), '"Time Unix (ms)" column is missing from file ' + fname |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, asserts are really only supposed to be used for testing. Better practice is to raise actual errors that are at least somewhat descriptive of what is the error. For most of these, ValueError would be appropriate
gap_size_mins=5): | ||
""" | ||
Version of plot overview that plots several data streams for only one device/subject. | ||
:param data: list where each element is a df with one column and a time stamp index. Each df will be plotted in one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpydoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will fix documentation for all functions and styling in the end when we're happy with the code
def completeness_sub_data(): | ||
cwd = str(Path.cwd()) | ||
if cwd.split('/')[-1] == "completeness": | ||
subject_folder = cwd + '/data/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cwd
is not goint to work since these tests can be run from multiple places. Please see other tests that load data (ie gait) for how to handle this properly
data_gaps = np.array([np.timedelta64(10, 'm'), np.timedelta64(30, 'm'), np.timedelta64(1, 'h')]) | ||
subject = 'test_sub' | ||
time_periods = 'daily' | ||
pipe = skdh.completeness.AssessCompleteness(ranges, data_gaps, time_periods, timescales) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your actual runs that are being tested need to be in the test_completeness.py
file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
misread, but setup should be in the actual tests.
|
||
|
||
# Test input check on real data | ||
def test_1_load_data(data_dic): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if using conftest, the arguments need to be the same as the function name, and the value is the value returned
Added completeness as a separate module