Chunking sql export queries, coverage #249

dogversioning · 2024-06-07T17:56:51Z

This PR makes the following changes:

Exporter is rewritten to use pyarrow to enable large file exports
- This required regenerating export example data
Added coverage action to CI
Added additional tests to clear coverage threshold
Removed 'all' as a study target since it was increasingly not useful
Removed remnants of old study creation mechanism

Checklist

Consider if documentation (like in docs/) needs to be updated
Consider if tests should be added
Update template repo if there are changes to study configuration

github-actions · 2024-06-10T14:00:27Z

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines	Covered	Coverage	Threshold	Status
1854	1768	95%	90%	🟢

New Files

No new covered files...

Modified Files

File	Coverage	Status
cumulus_library/actions/exporter.py	97%	🟢
cumulus_library/cli.py	96%	🟢
cumulus_library/cli_parser.py	100%	🟢
cumulus_library/databases.py	96%	🟢
TOTAL	97%	🟢

updated for commit: 7526933 by action🐍

mikix · 2024-06-17T13:22:22Z

.github/workflows/ci.yaml

+            token: ${{ secrets.GITHUB_TOKEN }}
+            thresholdAll: .9
+            thresholdNew: 1
+            thresholdModified: .95


I'd be interested in bumping this to 1 as well, since otherwise, a repo could easily "stagnate" at .95

But that could be a future thing for sure. Get to .95, then go full 1

yeah that is my rough roadmap for this - trying to spread this out rather than just spending a week on tests.

mikix · 2024-06-17T14:04:30Z

cumulus_library/actions/exporter.py

+        # Note: we assume that, for duckdb, you are unlikely to be dealing with large
+        # exports, so it will ignore the chunksize parameter, as it does not provide
+        # a pandas enabled cursor.
+        dataframe_chunks, db_schema = db.execute_as_pandas(query, chunksize=chunksize)


nit: we talked about this verbally, but just writing down so it's somewhere concrete: it might be nice to not mix dataframes if we can - like always be talking pyarrow, or move the choice of dataframe to the database backend, I dunno. Not super important, since we are forcing a schema, but I could imagine a bad type conversion losing data (like float -> int or whatever)

cumulus_library/actions/exporter.py

cumulus_library/databases.py

mikix · 2024-06-17T14:11:12Z

cumulus_library/databases.py


    def parser(self) -> DatabaseParser:
        return DuckDbParser()

    def operational_errors(self) -> tuple[Exception]:
-        return (duckdb.OperationalError,)
+        return (duckdb.OperationalError,)  # pragma: no cover


These methods could be tested - and probably should be?

i can look into this in the future

cumulus_library/actions/exporter.py

cumulus_library/databases.py

mikix · 2024-06-17T14:15:50Z

cumulus_library/databases.py

@@ -337,7 +370,7 @@ def upload_file(
        return f"s3://{bucket}/{s3_key}"

    def close(self) -> None:
-        return self.connection.close()
+        return self.connection.close()  # pragma: no cover


We never close out the connection in tests?

I don't think we have anyplace we're explicitly closing a cursor at the moment, actually.

main

test

dogversioning force-pushed the mg/chunk_exports branch 6 times, most recently from 0ebe500 to 3258e66 Compare June 10, 2024 13:54

dogversioning force-pushed the mg/chunk_exports branch 7 times, most recently from 888d5e8 to df30b62 Compare June 12, 2024 16:43

dogversioning changed the title ~~Chunking sql export queries~~ Chunking sql export queries, coverage Jun 12, 2024

Chunking SQL export queries

88af48c

dogversioning force-pushed the mg/chunk_exports branch from df30b62 to 88af48c Compare June 12, 2024 17:17

dogversioning marked this pull request as ready for review June 12, 2024 17:31

mikix approved these changes Jun 17, 2024

View reviewed changes

cleanup extra files, docstring tweaks

7526933

dogversioning merged commit 23f6bc9 into main Jun 17, 2024
3 checks passed

dogversioning deleted the mg/chunk_exports branch June 17, 2024 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chunking sql export queries, coverage #249

Chunking sql export queries, coverage #249

dogversioning commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 10, 2024 •

edited

Loading

mikix Jun 17, 2024

dogversioning Jun 17, 2024

mikix Jun 17, 2024

dogversioning Jun 17, 2024

mikix Jun 17, 2024

dogversioning Jun 17, 2024

mikix Jun 17, 2024

dogversioning Jun 17, 2024

Chunking sql export queries, coverage #249

Chunking sql export queries, coverage #249

Conversation

dogversioning commented Jun 7, 2024 • edited Loading

Checklist

github-actions bot commented Jun 10, 2024 • edited Loading

☂️ Python Coverage

Overall Coverage

New Files

Modified Files

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dogversioning commented Jun 7, 2024 •

edited

Loading

github-actions bot commented Jun 10, 2024 •

edited

Loading