Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunking sql export queries, coverage #249

Merged
merged 2 commits into from
Jun 17, 2024
Merged

Chunking sql export queries, coverage #249

merged 2 commits into from
Jun 17, 2024

Conversation

dogversioning
Copy link
Contributor

@dogversioning dogversioning commented Jun 7, 2024

This PR makes the following changes:

  • Exporter is rewritten to use pyarrow to enable large file exports
    • This required regenerating export example data
  • Added coverage action to CI
  • Added additional tests to clear coverage threshold
  • Removed 'all' as a study target since it was increasingly not useful
  • Removed remnants of old study creation mechanism

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added
  • Update template repo if there are changes to study configuration

@dogversioning dogversioning force-pushed the mg/chunk_exports branch 6 times, most recently from 0ebe500 to 3258e66 Compare June 10, 2024 13:54
Copy link

github-actions bot commented Jun 10, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
1854 1768 95% 90% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
cumulus_library/actions/exporter.py 97% 🟢
cumulus_library/cli.py 96% 🟢
cumulus_library/cli_parser.py 100% 🟢
cumulus_library/databases.py 96% 🟢
TOTAL 97% 🟢

updated for commit: 7526933 by action🐍

@dogversioning dogversioning force-pushed the mg/chunk_exports branch 7 times, most recently from 888d5e8 to df30b62 Compare June 12, 2024 16:43
@dogversioning dogversioning changed the title Chunking sql export queries Chunking sql export queries, coverage Jun 12, 2024
@dogversioning dogversioning marked this pull request as ready for review June 12, 2024 17:31
token: ${{ secrets.GITHUB_TOKEN }}
thresholdAll: .9
thresholdNew: 1
thresholdModified: .95
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interested in bumping this to 1 as well, since otherwise, a repo could easily "stagnate" at .95

But that could be a future thing for sure. Get to .95, then go full 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that is my rough roadmap for this - trying to spread this out rather than just spending a week on tests.

# Note: we assume that, for duckdb, you are unlikely to be dealing with large
# exports, so it will ignore the chunksize parameter, as it does not provide
# a pandas enabled cursor.
dataframe_chunks, db_schema = db.execute_as_pandas(query, chunksize=chunksize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we talked about this verbally, but just writing down so it's somewhere concrete: it might be nice to not mix dataframes if we can - like always be talking pyarrow, or move the choice of dataframe to the database backend, I dunno. Not super important, since we are forcing a schema, but I could imagine a bad type conversion losing data (like float -> int or whatever)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cumulus_library/actions/exporter.py Outdated Show resolved Hide resolved
cumulus_library/databases.py Outdated Show resolved Hide resolved

def parser(self) -> DatabaseParser:
return DuckDbParser()

def operational_errors(self) -> tuple[Exception]:
return (duckdb.OperationalError,)
return (duckdb.OperationalError,) # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods could be tested - and probably should be?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can look into this in the future

cumulus_library/actions/exporter.py Outdated Show resolved Hide resolved
cumulus_library/databases.py Show resolved Hide resolved
@@ -337,7 +370,7 @@ def upload_file(
return f"s3://{bucket}/{s3_key}"

def close(self) -> None:
return self.connection.close()
return self.connection.close() # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never close out the connection in tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have anyplace we're explicitly closing a cursor at the moment, actually.

main Outdated Show resolved Hide resolved
test Outdated Show resolved Hide resolved
@dogversioning dogversioning merged commit 23f6bc9 into main Jun 17, 2024
3 checks passed
@dogversioning dogversioning deleted the mg/chunk_exports branch June 17, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants