-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changes from manual testing #81
Conversation
src/scicat_configuration.py
Outdated
options.urls = { | ||
"datasets" : urljoin(options.host, "datasets"), | ||
"proposals" : urljoin(options.host, "proposals"), | ||
"origdatablocks" : urljoin(options.host, "origdatablocks"), | ||
"instruments": urljoin(options.host, "instruments"), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 4 endpoints don't have to be configurable from the config file, no...?
Can I turn them to be a property
instead?
src/scicat_online_ingestor.py
Outdated
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need it cause we expose main
function as scicat_ingestor
command.
You can just type scicat_ingestor
to start the online ingestor.
(it doesn't hurt to have it though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good overall!
Only a few comments on minor implementation details.
src/scicat_offline_ingestor.py
Outdated
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also these. You can just type background_ingestor
instead of using the file itself as a script.
src/scicat_dataset.py
Outdated
# if file_path.exists() and compute_file_stats: | ||
# return DataFileListItem( | ||
# path=file_path.absolute().as_posix(), | ||
# size=(file_stats := file_path.stat()).st_size, | ||
# time=datetime.datetime.fromtimestamp( | ||
# file_stats.st_ctime, tz=datetime.UTC | ||
# ).strftime("%Y-%m-%dT%H:%M:%S.000Z"), | ||
# chk=_calculate_checksum(file_path, file_hash_algorithm) | ||
# if compute_file_hash | ||
# else None, | ||
# uid=str(file_stats.st_uid), | ||
# gid=str(file_stats.st_gid), | ||
# perm=oct(file_stats.st_mode), | ||
# ) | ||
# else: | ||
# return DataFileListItem( | ||
# path=file_path.absolute().as_posix(), | ||
# time=datetime.datetime.now(tz=datetime.UTC).strftime( | ||
# "%Y-%m-%dT%H:%M:%S.000Z" | ||
# ), | ||
# ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the re-written part is fine. We can remove the commented out part...!
src/scicat_offline_ingestor.py
Outdated
@@ -38,7 +38,10 @@ def build_offline_config() -> OfflineIngestorConfig: | |||
# with ``OnlineIngestorConfig``. | |||
del merged_configuration["kafka"] | |||
|
|||
return build_dataclass(OfflineIngestorConfig, merged_configuration) | |||
config = build_dataclass(OfflineIngestorConfig, merged_configuration) | |||
config.scicat = SciCatOptions.from_configurations(merged_configuration["scicat"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_dataclass
automatically builds the sub-dataclasses as well.
So it should be not needed.
If it wasn't it's a bug so I need to fix it...
src/scicat_dataset.py
Outdated
provided_path = path.split("/")[1:] | ||
provided_path[0] = "/" + provided_path[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this part should be handled in the extract_paths_from_h5_file
.
I'll just clone it and make the CI pass : D |
This PR includes all the changes that resulted from manual testing the offline ingestor.