Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

valid_days unnecessarily slow #108

Closed
hmanfarmer opened this issue Jul 14, 2020 · 4 comments
Closed

valid_days unnecessarily slow #108

hmanfarmer opened this issue Jul 14, 2020 · 4 comments

Comments

@hmanfarmer
Copy link

hmanfarmer commented Jul 14, 2020

This is my first true github issue submission, my apologies if I buggered up the formatting or left out useful info.

Name: pandas-market-calendars
Version: 1.3.5

I was using the valid_days function to calculate trade days remaining for options expiration dates. This was extremely slow. About 500ms per call to valid_days. Overall, for me, this added 16 seconds to my app after calculating trade days for one stock's option chain (both put and call). The 16 seconds is slightly inflated, because I was calculating each date twice (once for puts and once for calls). But still, 8 seconds is a lot for such a simple set of function calls.

I was able to work around the issue by using the schedule function, caching the schedule, and then indexing and counting schedule to get the same data.

valid_days shouldn't be this slow. On one level, this is a feature request, but valid_days is so slow, it might also be fair to consider this a bug.

Here is code to reproduce the issue:

import pandas as pd
import pandas_market_calendars as mcal
from datetime import datetime, timezone
from pyinstrument import Profiler

def slow(dates):
    today = pd.Timestamp.now().floor("D")
    trade_calendar = mcal.get_calendar("NYSE")
    days_to_expiration_array = []
    for date in dates:
        date = pd.to_datetime(date)
        days_to_expiration = len(trade_calendar.valid_days(start_date=today, end_date=date))
        days_to_expiration_array.append(days_to_expiration)
    return days_to_expiration_array

def fast(dates):
    today = pd.Timestamp.now().floor("D")
    longest_date = pd.to_datetime(dates[-1])
    trade_calendar = mcal.get_calendar("NYSE")
    schedule = trade_calendar.schedule(today,longest_date)
    days_to_expiration_array = []
    for date in dates:
        date = pd.to_datetime(date)
        days_to_expiration = len(schedule.loc[today:date])
        days_to_expiration_array.append(days_to_expiration)
    return days_to_expiration_array

option_expire_dates = ["2020-07-17","2020-07-24","2020-07-31","2020-08-07","2020-08-14","2020-08-21",
                        "2020-08-28","2020-09-18","2020-11-20","2020-12-18","2021-01-15","2021-06-18",
                        "2021-09-17","2022-01-21","2022-06-17"]

profiler = Profiler()

profiler.start()
slow_days_array = slow(option_expire_dates)
fast_days_array = fast(option_expire_dates)
profiler.stop()

if slow_days_array == fast_days_array:
    print("Outputs match")
else:
    print("Error: Outputs mismatch")

print (slow_days_array)
print(profiler.output_text(unicode=True, color=True))`

And the code's output:

Outputs match
[4, 9, 14, 19, 24, 29, 34, 48, 93, 112, 130, 236, 299, 386, 488]

  _     ._   __/__   _ _  _  _ _/_   Recorded: 08:46:08  Samples:  8815
 /_//_/// /_\ / //_// / //_'/ //     Duration: 9.029     CPU time: 8.938
/   _/                      v3.1.3

Program: c:\Users\HomeLaptop\Documents\FarmTrader\bug_example.py

9.026 <module>  bug_example.py:1
├─ 8.412 slow  bug_example.py:6
│  ├─ 8.279 valid_days  pandas_market_calendars\market_calendar.py:203
│  │     [2321 frames hidden]  pandas_market_calendars, pandas, date...
│  └─ 0.117 get_calendar  pandas_market_calendars\calendar_registry.py:18
│        [39 frames hidden]  pandas_market_calendars, pytz, generi...
└─ 0.614 fast  bug_example.py:20
   └─ 0.589 schedule  pandas_market_calendars\market_calendar.py:214
         [933 frames hidden]  pandas_market_calendars, pandas, date...
@rsheftel
Copy link
Owner

Thank you very much for this. To date this package has not been optimized for performance, so that is definitely something that could be improved. The reason for the difference in speed you are seeing is that inside of every call to valid_day() is a call to create the pandas date_range with the holiday calendars. That construction is expensive. Since you are calling that function inside the loop it is going thought the holiday construction and date_range() on every iteration of the loop. In contrast your fast solution calls schedule(), which also does the date_range() and holiday call, but only once since you create it outside the loop.

There are definitely opportunities for optimization of the library and I would welcome any PR you would like to contribute.

@hmanfarmer
Copy link
Author

I am interested in helping. As I don't have a strong computer science background, I have zero experience submitting to github, writing unit tests, and the like. If I submit a proposed fix (I assume that 'PR' means something similar to proposed fix), I would probably need a bit of hand holding to do all of the other stuff right (unit testing... proper conventions... etc.).

Just wanted to give a heads up on my situation. Also, I am wrapped up fixing other performance issues with my app right now, I won't be able to work on this issue until that is done.

@rsheftel
Copy link
Owner

No problem at all. We were all beginners at one point. There are some great online resources on how to create a pull request, here is one: https://www.digitalocean.com/community/tutorials/how-to-create-a-pull-request-on-github

Myself or others would be happy to help

@2torus 2torus mentioned this issue Sep 2, 2020
@rsheftel
Copy link
Owner

rsheftel commented Sep 3, 2020

This is addressed in #117

@rsheftel rsheftel closed this as completed Sep 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants