Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP: Deprecate all data reading functionality via pandas-datareader; ensure independence from SPY and FF #536

Merged
merged 39 commits into from
Jun 14, 2018

Conversation

eigenfoo
Copy link
Contributor

@eigenfoo eigenfoo commented May 18, 2018

Closes #538 #534 #532 #530 #525 #495 #484 #435 #413 by deprecating all data reading functionality.
Closes #499 by removing rolling Fama-French plots.

Similar to quantopian/empyrical#97

To quote from the README:

As of early 2018, Yahoo Finance has suffered major API breaks with no stable replacement, and the Google Finance API has not been stable since late 2017 (source). In recent months it has become a greater and greater strain on the empyrical and pyfolio development teams to maintain support for fetching data through pandas-datareader and other third-party libraries, as these APIs are known to be unstable.

As a result, all empyrical (and therefore pyfolio, which is a downstream dependency) support for data reading functionality has been deprecated and will be removed in a future version.

Users should beware that the following functions are now deprecated:

  • pyfolio.utils.default_returns_func
  • pyfolio.utils.get_fama_french
  • pyfolio.utils.get_returns_cached
  • pyfolio.utils.get_symbol_returns_from_yahoo
  • pyfolio.utils.get_treasury_yield
  • pyfolio.utils.cache_dir
  • pyfolio.utils.ensure_directory
  • pyfolio.utils.data_path
  • pyfolio.utils.load_portfolio_risk_factors

Users should expect regular failures from the following functions, pending patches to the Yahoo or Google Finance API:

  • pyfolio.utils.default_returns_func
  • pyfolio.utils.get_symbol_returns_from_yahoo

@eigenfoo
Copy link
Contributor Author

eigenfoo commented May 21, 2018

It seems as if most of pyfolio is already independent of benchmarks. Most functionality in timeseries.py and plotting.py already check if benchmark_rets is None. So this is not as painful a PR as we initially expected, but there's still a lot to do. Mainly testing that all this nonsense actually works.

Brief summary of results:

  • pyfolio now no longer supports any data reading functionality at all: it is incumbent on users to pass in all data, even if this data is SPY or the Fama-French factors.
  • pyfolio should automatically adapt all tear sheet outputs to show what analysis/plots it can, given the data it has. This means that pyfolio is agnostic to data sources: i.e. because we are removing support for any data reading, it does not make sense to have anything tailored to any particular dataset.
  • Small changes to API: some arguments are now required, some are now optional. I've made sure that any changes produce graceful errors.
  • Unit tests have been changed to reflect this new API.

@eigenfoo
Copy link
Contributor Author

eigenfoo commented May 21, 2018

This PR still isn't ready for merging, btw. Unit tests are done but I will need to do some manual tests to make sure everything is plotting correctly, etc.

In the meantime, @twiecki @richafrank requesting review now.

@eigenfoo eigenfoo changed the title DEP: Deprecate all data reading functionality via pandas-datareader DEP: Deprecate all data reading functionality via pandas-datareader; ensure independence from SPY and FF May 21, 2018
@eigenfoo
Copy link
Contributor Author

eigenfoo commented May 22, 2018

@twiecki @richafrank finished manual tests, ready to review and merge now.

Note that for best results, we should ship both pyfolio and empyrical with my latest PRs together. Perhaps new releases?

README.md Outdated
providers and receive an API key in order to access data. These API keys
should be set as environment variables, or passed as an argument to
`pandas-datareader`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well written!

@@ -925,12 +913,15 @@ def create_interesting_times_tear_sheet(
bubble burst, EZB IR, Great Recession (August 2007, March and September
of 2008, Q1 & Q2 2009), flash crash, April and October 2014.

benchmark_rets must be passed, as it is meaningless to analyze performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should make that decision for the user, many events in there are not related to macro market events.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we should change the interesting times tear sheet to simply plot cumulative returns during interesting times? That should probably be a separate PR: I'll file an issue for that though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed under #538

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My read of the code suggested it was already possible to run it without a benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, not too much work to do it.

Copy link
Contributor

@twiecki twiecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, quite a big change that will make our lives so much easier.

@twiecki
Copy link
Contributor

twiecki commented May 23, 2018

Assigning @richafrank to assign an engineering reviewer.

@eigenfoo
Copy link
Contributor Author

@twiecki finished with engineering feedback; ready for another pass.

@twiecki
Copy link
Contributor

twiecki commented Jun 12, 2018

Thanks @eigenfoo. @ssanderson can you check if you feedback was appropriately incorporated?

@@ -116,7 +118,9 @@ def test_create_round_trip_tear_sheet_breakdown(self, kwargs):
@cleanup
def test_create_interesting_times_tear_sheet_breakdown(self,
kwargs):
# FIXME needs benchmark_rets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still need to be fixed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope!

pyfolio/utils.py Outdated
cache_dir = empyrical.utils.cache_dir
ensure_directory = empyrical.utils.ensure_directory
data_path = empyrical.utils.data_path
_1_bday_ago = empyrical.utils._1_bday_ago
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unused in pyfolio. Do we think external consumers were depending on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't say; I included it just in case though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pyfolio/utils.py Outdated
"""
If the returns is longer than the benchmark's, limit strategy returns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is a bit ambiguous. I would naively interpret "longer than" to mean "has more observations than", but it looks to me like the actual behavior here is that we remove any values from rets that are older than the oldest observation in benchmark_rets. I also wouldn't know what "limit" means without more context.

I would describe the currently-implemented behavior here as something like:

def clip_returns_to_benchmark(rets, benchmark_rets):
    """Drop entries from rets that precede the first entry in benchmark_rets.
    """

That said, is there a reason we're only clipping the left side here? Naively, I'd expect to have similar problems if rets/benchmark_rets don't align in either direction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I had just moved the code without thinking about it. The function now clips from both ends.

@eigenfoo
Copy link
Contributor Author

@twiecki @ssanderson done!

README.md Outdated

### Deprecated: Data Reading via `pandas-datareader`

As of early 2018, Yahoo Finance has suffered major API breaks with no stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than emphasize that the readers are deprecated, I would focus on making pyfolio independent of a benchmark being present, that's an enhancement. Then discuss how empyrical does not provide them anymore so if you need them, you have to find them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case I'd rather put that in the release notes. I'll remove the README update here.

@twiecki
Copy link
Contributor

twiecki commented Jun 14, 2018

Still needs to be added to the release notes.

@twiecki twiecki merged commit 50d8190 into quantopian:master Jun 14, 2018
@twiecki
Copy link
Contributor

twiecki commented Jun 14, 2018

Thanks @eigenfoo!

brian-from-quantrocket pushed a commit to quantrocket-llc/pyfolio that referenced this pull request Aug 7, 2018
DEP: Deprecate all data reading functionality via pandas-datareader; ensure independence from SPY and FF
@EPGM-DES
Copy link

@eigenfoo I was able to access yahoo finance historic prices using fix-yahoo-finance. However when I use price dataframe I receive the following error:

----pyfolio.create_returns_tear_sheet(stock_df)
Traceback (most recent call last):
File "", line 1, in
AttributeError: module 'pyfolio' has no attribute 'create_returns_tear_sheet'

I looked through the function in tears.py but it's unclear if there is another dependency on pandas-datareader in the function. Using fix-yahoo-finance seemed like it could be a helpful, short-term solution to the Google and Yahoo Finance API instability.

@eigenfoo
Copy link
Contributor Author

Hi @EPGM-DES, for future reference, please open up a new GitHub issue for a separate problem you're having: I directed you to this PR since it was what caused your original problem.

Without knowing more about your environment or code, it's hard for me to help you with your error. Could you open another issue with more details?

@MirTunio MirTunio mentioned this pull request Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants