-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor claims_hosp to use smoothing util #433
base: main
Are you sure you want to change the base?
Conversation
blocked by #437 |
@chinandrew psst -- conflicts |
resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand, the savgol smoother is not meant to be equivalent to the left gauss filter previously used. The results don't line up, but do match up to 0.1 tolerance. I ran using Smoother.left_gauss_linear_smoother
from the smoother utility, and those results line up, so we're changing the smoothing method (and not just moving the function out).
Sorry for the delay, the PR looks good. @krivard are we going to announce this change? The documentation as well?
My understanding is that they should be very close with some slight differences in parameters, but I haven't spent a whole lot of time digging into details. See this slack post. |
This set of refactors is meant only to switch indicators to the smoother utility, and must not make any substantive, announcement-worthy changes to any signals. @chinandrew please switch to the utility version of left gauss linear. We'll handle the conversion to savgol in a separate, statistically-rigorous effort. |
@dshemetov would know for sure, but I think savgol is a generalization of the gaussian linear regression (i.e. savgol with polyfit=1 == left gauss linear). It might be the window length that leads to the differences, which we could adjust if needed. |
For clarification, we have savgol polyfit=1 ~= left_gauss_linear. The small differences arise because savgol truncates the fitting window, while left_gauss_linear uses the entire past, but adds negligible weights to anything beyond the latest 2 weeks. Technically speaking, @mariajahja is right that the underlying signal does change by using savgol with polyfit=1. The difference is on the order of 0.1 for signals with a magnitude in the hundreds, however. That did not seem like an announcement-worthy change to the signal to me. I may be wrong, of course. |
I'm OK either way, but we'd need to change the documentation at the least if we go with savgol. The difference of 0.1 is on the percentage scale, so it isn't really a huge change, but because we do a second thresholding step after smoothing, there were a handful of counties and msas that were actually removed/added on some dates due to the change. I don't see this as an issue, but would be good to document. |
Good to know about the thresholding. Definitely something worth documenting. @krivard I'm good with whatever you think is best here. |
Removes won't actually propagate into the API unless we manually trash all issues of that region-date-value, because we still don't have a missingness/deletion encoding system (it's on deck for December...). I'm not too concerned with the ethics of that (since they're all borderline cases anyhow) but if something comes up that requires code forensics we'll need to remember that the API output and the indicator output won't necessarily match for issues before the release of this change. Probably safest to announce it, especially since it will include an update to the docs. That makes this process a bit more fiddly, especially since claims_hosp isn't hooked up to Automation yet. Here are two options:
|
Closes #432
Summary of Changes: