-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
378 update submission processing to use polars data validator #394
base: main
Are you sure you want to change the base?
378 update submission processing to use polars data validator #394
Conversation
…rs-data-validator
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
…rs-data-validator
This was tested on the devpub cluster as well as pytests and docker-compose. I had to merge this with the counters PR (#473) since the database and all that had been updated already with that work. So on devpub, it's a combo of these two. But once one of them is approved I'll do the work to merge them in whatever other PR is left open |
except RuntimeError as re: | ||
log.exception("The file is malformed.", re) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be needed, log.exception contains exc_info=True
, which extracts the exception from the stack.
@@ -14,11 +14,12 @@ asyncpg = "^0.29.0" | |||
regtech-api-commons = {git = "https://github.com/cfpb/regtech-api-commons.git"} | |||
regtech-data-validator = {git = "https://github.com/cfpb/regtech-data-validator.git"} | |||
regtech-regex = {git = "https://github.com/cfpb/regtech-regex.git"} | |||
boto3 = "~1.34.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the downgrade? fsspec related?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, with the boto3 and fsspec tie in in the data-validator. But if I remove that to have the download send back a df, this dependency goes away. Though I'd need to rethink reading in the csv. Right now the filing api just sends over a path and the data validator decides what to do based on path type. s3fs needs the 1.34 version of boto3
pyproject.toml
Outdated
alembic = "^1.13.3" | ||
async-lru = "^2.0.4" | ||
ujson = "^5.10.0" | ||
psutil = "^6.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope!
) | ||
report_path = generate_file_path(period_code, lei, f"{submission.id}_report") | ||
|
||
df_to_download(final_df, report_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can worry about this later, but validator should have no knowledge about how to store results / what to do with the results. the wrapper (in this case the filing API) should be the one that deals with uploading the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think originally I was trying to use a more streaming approach. I don't mind updating all this back to have the filing api know. I had a weird mix of interim solutions going on so this is a carry over.
Closes #378
Changes the filing-api in the following ways: