Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YoY double-counting/asymmetry bug for leap days #283

Open
kandersolar opened this issue Jun 30, 2021 · 0 comments
Open

YoY double-counting/asymmetry bug for leap days #283

kandersolar opened this issue Jun 30, 2021 · 0 comments
Labels

Comments

@kandersolar
Copy link
Member

While playing around with #282 I noticed an asymmetry in how leap days contribute to the distribution of YoY slopes. Ignoring filtering and first/last year complications for a moment, each day in the aggregated series is supposed to contribute to one forward and one backward slope. However it seems that there is a small bug related to leap days where a single point can contribute to three slopes instead of two.

To reproduce:

import pandas as pd
import rdtools
import matplotlib.pyplot as plt
daily_pm = pd.Series(1, index=pd.date_range('2014-01-01', '2017-12-31', freq='d'))
daily_pm.loc['2015-02-28'] = 0  # outlier point that interacts with a leap day
rd, ci, calc_info = rdtools.degradation.degradation_year_on_year(daily_pm)

fig = rdtools.plotting.degradation_summary_plots(rd, ci, calc_info, daily_pm)
fig.axes[1].set_ylim(0, 10)  # shrink y-axis to show detail

image

Note that in the histogram plot, the left-most bin has height=1 and the right-most bin has height=2. So a single outlier day that interacts with a leap day creates one big negative slope but two big positive slopes. Examining the df variable inside rdtools.degradation.degradation_year_on_year confirms this -- Feb 28 gets paired with Feb 28, but it also gets paired with Feb 29 in one direction:

df.loc['2015-02'].tail():

                   dt  energy   dt_right  energy_right dt_shifted  time_diff_years    yoy
dt
2015-02-24 2015-02-24     1.0 2014-02-24           1.0 2015-02-24              1.0    0.0
2015-02-25 2015-02-25     1.0 2014-02-25           1.0 2015-02-25              1.0    0.0
2015-02-26 2015-02-26     1.0 2014-02-26           1.0 2015-02-26              1.0    0.0
2015-02-27 2015-02-27     1.0 2014-02-27           1.0 2015-02-27              1.0    0.0
2015-02-28 2015-02-28     0.0 2014-02-28           1.0 2015-02-28              1.0 -100.0

df.loc['2016-02'].tail():

                   dt  energy   dt_right  energy_right dt_shifted  time_diff_years         yoy
dt
2016-02-25 2016-02-25     1.0 2015-02-25           1.0 2016-02-25          1.00000    0.000000
2016-02-26 2016-02-26     1.0 2015-02-26           1.0 2016-02-26          1.00000    0.000000
2016-02-27 2016-02-27     1.0 2015-02-27           1.0 2016-02-27          1.00000    0.000000
2016-02-28 2016-02-28     1.0 2015-02-28           0.0 2016-02-28          1.00000  100.000000
2016-02-29 2016-02-29     1.0 2015-02-28           0.0 2016-02-28          1.00274   99.726776

I suspect, but did not verify, that this has to do with pd.merge_asof's default choice of direction='backward'. Possible solutions:

  • do nothing because this probably has negligible impact on results
  • filter out leap days from the series before calculating YoY slopes
  • something else?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant