Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELEX-4647 more unit tests #145

Merged
merged 22 commits into from
Dec 13, 2024
Merged

ELEX-4647 more unit tests #145

merged 22 commits into from
Dec 13, 2024

Conversation

dmnapolitano
Copy link
Contributor

@dmnapolitano dmnapolitano commented Nov 20, 2024

Description

Hi! The changes in this PR:

  • address pylint complaints, although TODO ones still remain; now resolved in PR Resolve pylint and other warnings #146
  • adds a BaseDataHandler abstract class that most DataHandlers are now subclasses of; now resolved in PR Preprocessed data handler superclass #147
  • adds some unit tests for VersionedDataHandler although I'm sure there's more scenarios we want to test there though I'm not exactly sure what 🤔
  • ensures that all top-level imports of external packages are dependencies in setup.py;

Thanks! 🎉

Jira Ticket

ELEX-4647

Test Steps

tox or the elexmodel command of your choice 🎉

@dmnapolitano dmnapolitano requested a review from a team as a code owner November 20, 2024 14:54
Copy link
Collaborator

@lennybronner lennybronner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of adding the BaseDataHandler?

@@ -128,7 +132,7 @@ def wait_for_versions(self, q):
try:
future.result()
yield version, data
except Exception as e:
except Exception as e: # pylint: disable=broad-exception-caught
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we usually specify this with # noqa rather than pyline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmmm...I think you're right. Let me see what happens if I switch all of these to noqa 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, so TIL that pre-commit looks at noqa lines which are part of the PEP8 standard, but pylint doesn't look at noqa at all 🤔 pylint-dev/pylint#2493

So I guess I'll leave these pylint lines in so that pylint doesn't issue these complaints when we run tox 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't really deal with the issues with tox in the past, and I don't think we should moving forward. I think we can stick to just ones that make our pre-commit hooks fail.

@dmnapolitano
Copy link
Contributor Author

What's the purpose of adding the BaseDataHandler?

It's to unify the interface of all of our data handlers as much as possible and reduce some redundancy 🤷🏻‍♀️

@lennybronner
Copy link
Collaborator

What's the purpose of adding the BaseDataHandler?

It's to unify the interface of all of our data handlers as much as possible and reduce some redundancy 🤷🏻‍♀️

Hmm, I see. I mean the LiveData and the PreprocessedData handler sort of makes sense. but the versioned data handler is very different. In this case the data is results rather then preprocessed data.

@dmnapolitano
Copy link
Contributor Author

What's the purpose of adding the BaseDataHandler?

It's to unify the interface of all of our data handlers as much as possible and reduce some redundancy 🤷🏻‍♀️

Hmm, I see. I mean the LiveData and the PreprocessedData handler sort of makes sense. but the versioned data handler is very different. In this case the data is results rather then preprocessed data.

Hmmmm, yeah, you're right. There's a bit of overlap but not much and it doesn't really fit with the convention of preprocessed vs. results data. Ok, I'll change it back now 😄

@dmnapolitano
Copy link
Contributor Author

What's the purpose of adding the BaseDataHandler?

It's to unify the interface of all of our data handlers as much as possible and reduce some redundancy 🤷🏻‍♀️

Hmm, I see. I mean the LiveData and the PreprocessedData handler sort of makes sense. but the versioned data handler is very different. In this case the data is results rather then preprocessed data.

Hmmmm, yeah, you're right. There's a bit of overlap but not much and it doesn't really fit with the convention of preprocessed vs. results data. Ok, I'll change it back now 😄

Done! 🎉

batch_margin = (
np.diff(results_dem, append=results_dem[-1]) - np.diff(results_gop, append=results_gop[-1])
) / np.diff(results_weights, append=results_weights[-1])
with warnings.catch_warnings():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It happens if you have any zeros in the denominator. Since we probably won't enable this until late into tab anyway, it should be pretty rare, but still, it's nice to not see it in our logs if we know it's a thing we're ok with 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized saying "it happens if you have zeros in the denominator" is not the most useful comment here lol 😅 More like, if you have any results_weights that are still zero due to units that haven't reported, especially in states with a lot of small units (New England, Texas) 🤔

@@ -64,10 +65,12 @@ def get_versioned_results(self, filepath=None):

if self.election_id.startswith("2020-11-03_USA_G"):
path = "elex-models-prod/2020-general/results/pres/current.csv"
elif self.election_id.startswith("2024-11-05_USA_G"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has this been removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't, the logic was a little redundant here so I cleaned it up right below this 😄

@@ -222,9 +231,8 @@ def get_versioned_predictions(self, filepath=None):
return pd.read_csv(filepath)

if self.election_id.startswith("2020-11-03_USA_G"):
path = "elex-models-prod/2020-general/prediction/pres/current.csv"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why has this been removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the exception is raised, path is unused. We don't need it 🤷🏻‍♀️

@lennybronner
Copy link
Collaborator

I am a little confused by these changes. It's not really what we had discussed. I would pretty strongly prefer to only include the unit tests in this PR (+ the changes to setup.py)

@dmnapolitano
Copy link
Contributor Author

I am a little confused by these changes. It's not really what we had discussed. I would pretty strongly prefer to only include the unit tests in this PR (+ the changes to setup.py)

Hmmm ok, let me see what I can split into separate branches 🤔

@dmnapolitano dmnapolitano marked this pull request as draft December 13, 2024 15:48
@dmnapolitano dmnapolitano marked this pull request as ready for review December 13, 2024 16:56
@dmnapolitano
Copy link
Contributor Author

I am a little confused by these changes. It's not really what we had discussed. I would pretty strongly prefer to only include the unit tests in this PR (+ the changes to setup.py)

Hmmm ok, let me see what I can split into separate branches 🤔

Done! 🎉

@dmnapolitano dmnapolitano merged commit ed6e9fa into develop Dec 13, 2024
3 checks passed
@dmnapolitano dmnapolitano deleted the ELEX-4647-more-unit-tests branch December 13, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants