Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Completeness Module #141

Open
wants to merge 14 commits into
base: development
Choose a base branch
from
Open

Adding Completeness Module #141

wants to merge 14 commits into from

Conversation

johnsam7
Copy link
Collaborator

Added completeness as a separate module

Copy link
Collaborator

@LukasAdamowicz LukasAdamowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this is a good start, but a couple main points that should be addressed:

  1. To match the rest of SKDH, ideally this gets refactored so that the completeness module can be used with any incoming data - and can be used in any pipeline to produce an output file indicating the completeness
  2. This will need unit tests

Minor, but you should also run python-black on these files to ensure consistent formatting

requirements.txt Outdated Show resolved Hide resolved
src/skdh/completeness/__init__.py Outdated Show resolved Hide resolved
src/skdh/completeness/complete.py Outdated Show resolved Hide resolved
src/skdh/completeness/complete.py Show resolved Hide resolved
src/skdh/completeness/complete.py Outdated Show resolved Hide resolved
src/skdh/completeness/parse.py Outdated Show resolved Hide resolved
src/skdh/completeness/utils.py Outdated Show resolved Hide resolved
src/skdh/completeness/utils.py Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in general, some of these might need to be broken down smaller so that unit testing can be better achieved - right now a lot of functions have many many code branches (ie if statements) which makes it hard to achieve full coverage of unit tests

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this

src/skdh/utility/__init__.py Show resolved Hide resolved
@LukasAdamowicz
Copy link
Collaborator

test notification

Copy link
Collaborator

@LukasAdamowicz LukasAdamowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, its looking a bit better. Main comments:

  1. there is so much code here that needs comments, both in-line comments explaining what blocks/lines of code are supposed to do, as well as function documentation strings
  2. Functions likely need to be broken down still smaller so that they dont have so much branching logic, which will make it easier to test. Either that or find ways to write it without if statements
  3. I still want to remove any study specific information or things from this
  4. data loading should be removed, and either handled with specific functions in DHEAP extensions. Also written in such a way that it can work with default loaded data from single CSVs, etc

src/skdh/completeness/__init__.py Outdated Show resolved Hide resolved
src/skdh/completeness/complete.py Show resolved Hide resolved
time_periods=None,
timescales=None):

check_hyperparameters_init(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good candidate for function for the class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also put this after the super() call, that way the signature for the class is as it was called by the user

self.time_periods = time_periods
self.timescales = timescales

def load_subject_data(self, subject_folder, subject, measures):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really want to get away from loading data in anything other than the io module, since that is where data ingestion is supposed to happen. If we need to write another reader class, that we can discuss. But the new MultiReader class might be able to handle everything thats needed for this module to run

fname = subject_folder + '/' + measure + '.csv'
df_raw = pd.read_csv(fname)

assert 'Time Unix (ms)' in df_raw.keys(), '"Time Unix (ms)" column is missing from file ' + fname
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, asserts are really only supposed to be used for testing. Better practice is to raise actual errors that are at least somewhat descriptive of what is the error. For most of these, ValueError would be appropriate

gap_size_mins=5):
"""
Version of plot overview that plots several data streams for only one device/subject.
:param data: list where each element is a df with one column and a time stamp index. Each df will be plotted in one
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpydoc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix documentation for all functions and styling in the end when we're happy with the code

src/skdh/completeness/visualizations.py Show resolved Hide resolved
def completeness_sub_data():
cwd = str(Path.cwd())
if cwd.split('/')[-1] == "completeness":
subject_folder = cwd + '/data/'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cwd is not goint to work since these tests can be run from multiple places. Please see other tests that load data (ie gait) for how to handle this properly

data_gaps = np.array([np.timedelta64(10, 'm'), np.timedelta64(30, 'm'), np.timedelta64(1, 'h')])
subject = 'test_sub'
time_periods = 'daily'
pipe = skdh.completeness.AssessCompleteness(ranges, data_gaps, time_periods, timescales)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your actual runs that are being tested need to be in the test_completeness.py file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misread, but setup should be in the actual tests.



# Test input check on real data
def test_1_load_data(data_dic):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using conftest, the arguments need to be the same as the function name, and the value is the value returned

@LukasAdamowicz LukasAdamowicz changed the base branch from main to development May 9, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants