-
-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry logic to datastore.get_zipfile_resource. #3658
Conversation
4e2d311
to
1b1bc48
Compare
src/pudl/utils.py
Outdated
@@ -0,0 +1,37 @@ | |||
"""A place for helper functions that *don't* import PUDL and thus can be imported anywhere.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this because I started running into circular import problems.
However, most of the functions in pudl.helpers
don't import PUDL. Only these four do:
create_datasette_metadata_yaml
check_tables_have_metadata
expand_timeseries
convert_cols_dtypes
Should we move those to a separate file instead of making a fresh file for all the functions where we definitely won't run into circular imports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the import error coming up? Is it possible that the call to import pudl
plus the multiple from
calls in pudl.workspace.datastore.py
is the source of the import error and could be fixed through removing the first call, rather than splitting helpers into two not clearly defined separate files?
If not, my preference would be to do what you suggest (move these four into their own file for helpers that rely on other PUDL functions), rather than start a new file with a single function and expect us to migrate over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh. The error was:
> pytest test/unit/helpers_test.py
ImportError while loading conftest '/Users/dazhong-catalyst/work/pudl/test/conftest.py'.
test/conftest.py:15: in <module>
import pudl
src/pudl/__init__.py:5: in <module>
from . import (
src/pudl/analysis/__init__.py:9: in <module>
from . import (
src/pudl/analysis/allocate_gen_fuel.py:145: in <module>
from pudl.metadata.fields import apply_pudl_dtypes
src/pudl/metadata/__init__.py:3: in <module>
from . import (
src/pudl/metadata/classes.py:63: in <module>
from pudl.workspace.datastore import Datastore, ZenodoDoi
src/pudl/workspace/__init__.py:11: in <module>
from . import datastore, resource_cache, setup
src/pudl/workspace/datastore.py:26: in <module>
from pudl.helpers import retry
src/pudl/helpers.py:34: in <module>
from pudl.metadata.classes import DatasetteMetadata
E ImportError: cannot import name 'DatasetteMetadata' from partially initialized module 'pudl.metadata.classes' (most likely d
ue to a circular import) (/Users/dazhong-catalyst/work/pudl/src/pudl/metadata/classes.py)
And for some reason this diff fixed it - currently reading through the import system docs to try to figure out what the heck was going on.
diff --git a/src/pudl/helpers.py b/src/pudl/helpers.py
index 87d9fc41a..f97a27c10 100644
--- a/src/pudl/helpers.py
+++ b/src/pudl/helpers.py
@@ -30,7 +31,7 @@ from dagster import AssetKey, AssetsDefinition, AssetSelection, SourceAsset
from pandas._libs.missing import NAType
import pudl.logging_helpers
-from pudl.metadata.classes import DatasetteMetadata
+import pudl.metadata.classes
from pudl.metadata.fields import apply_pudl_dtypes, get_pudl_dtypes
from pudl.workspace.setup import PudlPaths
@@ -2029,7 +2030,7 @@ def create_datasette_metadata_yaml() -> str:
derived from their datapackage.json as YAML.
"""
pudl_output = PudlPaths().pudl_output
- metadata = DatasetteMetadata.from_data_source_ids(pudl_output)
+ metadata = pudl.metadata.classes.DatasetteMetadata.from_data_source_ids(pudl_output)
return metadata.to_yaml()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my crude mental model is that:
At this specific import time, DatasetteMetadata
hasn't been defined in pudl.metadata.classes
yet, since pudl.metadata.classes
imports pudl.workspace.datastore
which imports pudl.helpers
. But we can import the pudl.metadata.classes
module directly since the import
statement doesn't actually check that DatasetteMetadata
is defined in the module.
The reason that we can import attributes from pudl.metadata.fields
and pudl.workspace.setup
is because they don't circularly depend on pudl.helpers
.
I think long term we should probably aim to have "truly independent helpers" and "helpers that actually need the rest of PUDL" as separate files, to avoid but (a) I'm not sure what to name the two files and (b) this works - so I'm happy to punt on that until the next circular import happens >_<
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. Glad you managed to figure it out, this seems like the much preferable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions and a request re: the reorganization of the pudl.helpers.py
file. Otherwise, all looks good on my end.
src/pudl/utils.py
Outdated
@@ -0,0 +1,37 @@ | |||
"""A place for helper functions that *don't* import PUDL and thus can be imported anywhere.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the import error coming up? Is it possible that the call to import pudl
plus the multiple from
calls in pudl.workspace.datastore.py
is the source of the import error and could be fixed through removing the first call, rather than splitting helpers into two not clearly defined separate files?
If not, my preference would be to do what you suggest (move these four into their own file for helpers that rely on other PUDL functions), rather than start a new file with a single function and expect us to migrate over time.
test/unit/utils_test.py
Outdated
assert retry(func=func, retry_on=(RuntimeError,)) == 1 | ||
|
||
assert sleep_mock.call_count == 3 | ||
sleep_mock.assert_has_calls([mocker.call(2**x) for x in range(3)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will MagicMock
add a 7 second delay to our unit tests here? Doesn't seem to be the case when I run unit tests, but maybe I'm missing something. If so, we could reduce the number of RuntimeErrror
calls by one to gain a bit of speed back here if this becomes a noticeable lag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! We're bypassing the delay by replacing time.sleep
with just an empty MagicMock()
- see that with mocker.patch(...)
line above.
645b68a
to
ebae71d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much preferable solution and a helpful fix that should help make the nightly builds a hair more robust. All works as expected locally.
For more information, see https://pre-commit.ci
Pull out retry logic into its own util function. Add util file to avoid circular dependencies.
ebae71d
to
4c4ff39
Compare
Overview
Closes #3657.
What problem does this address?
One more step towards figuring out that weird BadZipFile error that shows up sporadically.
Testing
How did you make sure this worked? How can a reviewer verify this?
Bunch of unit tests!
Suppose I could run the pytest-coverage, too, but it might not run into those
BadZipFile
errors that seem to only pop up very sporadically.To-do list