Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ids 641 end to end ingestion test on staging server #445

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

LibbyLi667
Copy link
Collaborator

@LibbyLi667 LibbyLi667 commented Jun 25, 2024

In this PR:
Added _save_data_status function to save file verification status into a CSV file for BIRU users to determine if they want to delete some ingested files when the auto clean-up script is disabled. It can be disabled, too, depending on the cleanup strategy's decision.
Tested cmd_clean.py in the ingestion_biru.py.

Copy link
Member

@uoa-noel uoa-noel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Libby, thanks for working on this change, there are a couple of things that I thought we could work through. It may be something to do after this sprint though!

@@ -72,7 +73,7 @@ def _is_completed_df(

def _filter_completed_dfs(
config: ConfigFromEnv, datafiles: list[RawDatafile], min_file_age: Optional[int]
) -> list[RawDatafile]:
) -> Tuple[list[RawDatafile], list[RawDatafile]]:
"""Inspects through a list of datafiles, return datafiles which have completed ingestion."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment to clarify what's being returned.

Comment on lines +135 to +140
network_base_path = Path(
"//files.auckland.ac.nz/research/resmed202000005-biru-shared-drive"
)
modified_source_path_base = network_base_path / source_path.parent.relative_to(
"/mnt/biru_shared_drive"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this command is meant to be generic to different workflows, I'm wondering if it's necessary to prepend BIRU drive paths here?

Comment on lines +236 to +240

# save file verification status to a csv file for idw project
if profile_name == "idw":
logger.info("Saving files verification status into a csv file.")
_save_data_status(source_data_path, verified_dfs, unverified_dfs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to have a separate command for the reporting functionality. The ids clean command mainly deletes files, whereas this functionality creates a report. To prevent accidental deleting of files it's better for the CLI user to run it as a separate command...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants