-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chunking sql export queries, coverage #249
Conversation
0ebe500
to
3258e66
Compare
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
888d5e8
to
df30b62
Compare
df30b62
to
88af48c
Compare
token: ${{ secrets.GITHUB_TOKEN }} | ||
thresholdAll: .9 | ||
thresholdNew: 1 | ||
thresholdModified: .95 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested in bumping this to 1
as well, since otherwise, a repo could easily "stagnate" at .95
But that could be a future thing for sure. Get to .95, then go full 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah that is my rough roadmap for this - trying to spread this out rather than just spending a week on tests.
# Note: we assume that, for duckdb, you are unlikely to be dealing with large | ||
# exports, so it will ignore the chunksize parameter, as it does not provide | ||
# a pandas enabled cursor. | ||
dataframe_chunks, db_schema = db.execute_as_pandas(query, chunksize=chunksize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we talked about this verbally, but just writing down so it's somewhere concrete: it might be nice to not mix dataframes if we can - like always be talking pyarrow, or move the choice of dataframe to the database backend, I dunno. Not super important, since we are forcing a schema, but I could imagine a bad type conversion losing data (like float -> int or whatever)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
def parser(self) -> DatabaseParser: | ||
return DuckDbParser() | ||
|
||
def operational_errors(self) -> tuple[Exception]: | ||
return (duckdb.OperationalError,) | ||
return (duckdb.OperationalError,) # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These methods could be tested - and probably should be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can look into this in the future
@@ -337,7 +370,7 @@ def upload_file( | |||
return f"s3://{bucket}/{s3_key}" | |||
|
|||
def close(self) -> None: | |||
return self.connection.close() | |||
return self.connection.close() # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never close out the connection in tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have anyplace we're explicitly closing a cursor at the moment, actually.
This PR makes the following changes:
pyarrow
to enable large file exportsChecklist
docs/
) needs to be updated