Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEP: Deprecate all data reading functionality via pandas-datareader; ensure independence from SPY and FF #536

Merged
merged 39 commits into from
Jun 14, 2018
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
3253719
DEP: deprecate all data reading functionality
eigenfoo May 18, 2018
dca4ba5
DOC: Update README about deprecation
eigenfoo May 18, 2018
c0b7ec7
REV: get_utc_timestamp should not deprecated
eigenfoo May 18, 2018
b3e1469
DEP: deprecate some other funcs
eigenfoo May 18, 2018
ed4e11d
MAINT: remove set_context from tears
eigenfoo May 18, 2018
1bb6f39
DOC: Update copyright year
eigenfoo May 18, 2018
8e36a2c
MAINT: returns tearsheet now indep of benchmarks
eigenfoo May 21, 2018
893bd76
MAINT: interesting times tearsheet now indep of benchmarks
eigenfoo May 21, 2018
a75944f
MAINT: bayesian tearsheet now indep of benchmarks
eigenfoo May 21, 2018
7d7a9cd
MAINT: simple tearsheet now indep of benchmarks
eigenfoo May 21, 2018
d12cf4a
MAINT: timeseries.py now indep of benchmarks
eigenfoo May 21, 2018
fbb511b
BUG: flagged bug for later
eigenfoo May 21, 2018
e27a853
MAINT: remove fama-french plots
eigenfoo May 21, 2018
8c9ea2b
DOC: update documentation and copyright years
eigenfoo May 21, 2018
1ff60fb
MAINT: one more pass through
eigenfoo May 21, 2018
8821774
DOC: flag things to do
eigenfoo May 21, 2018
4b11da6
MAINT: numbers checked
eigenfoo May 21, 2018
91a6910
TST: updated tests
eigenfoo May 21, 2018
5a749ec
TST: fixed tests
eigenfoo May 21, 2018
60ebc82
MAINT: make benchmark optional for interesting times tear sheet
eigenfoo May 23, 2018
7c77d88
DOC: updated plot_perf_stats docstring
eigenfoo Jun 11, 2018
b8a0557
DOC: cleared up confusing regression terminology
eigenfoo Jun 11, 2018
1fac58d
DOC: consistent docstring for factor_returns
eigenfoo Jun 11, 2018
1f91172
MAINT: refactor passthrough functions to empyrical
eigenfoo Jun 11, 2018
b89e6c2
MAINT: do not deprecate register_return_func
eigenfoo Jun 11, 2018
7f28f7a
BUG: fixed typo
eigenfoo Jun 11, 2018
7001240
STY: remove unused imports and deprecation warning
eigenfoo Jun 11, 2018
8f3a243
MAINT: add clip_returns_to_benchmark helper function
eigenfoo Jun 11, 2018
2517419
MAINT: sum indiv sections to get vertical_sections
eigenfoo Jun 11, 2018
81e91f4
BUG: fix typo
eigenfoo Jun 11, 2018
d0255ab
BUG: fixed typo
eigenfoo Jun 11, 2018
b6b4518
DOC: remove suggestions from readme
eigenfoo Jun 11, 2018
dcbbd93
DOC: remove fixme
eigenfoo Jun 13, 2018
f9d3012
REV: put back set_context
eigenfoo Jun 13, 2018
ea215db
DOC: add docstrings for set_context
eigenfoo Jun 13, 2018
cee52d2
MAINT: fix clip_rets_to_bench_rets
eigenfoo Jun 13, 2018
2a6a81f
DEP: remove deprecated functions
eigenfoo Jun 14, 2018
915f6df
DOC: remove readme updates
eigenfoo Jun 14, 2018
389a248
DOC: added WHATSNEW
eigenfoo Jun 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,62 @@ If you find a bug, feel free to [open an issue](https://github.com/quantopian/py
You can also join our [mailing list](https://groups.google.com/forum/#!forum/pyfolio) or
our [Gitter channel](https://gitter.im/quantopian/pyfolio).

## Support

Please [open an issue](https://github.com/quantopian/pyfolio/issues/new) for support.

### Deprecated: Data Reading via `pandas-datareader`

As of early 2018, Yahoo Finance has suffered major API breaks with no stable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than emphasize that the readers are deprecated, I would focus on making pyfolio independent of a benchmark being present, that's an enhancement. Then discuss how empyrical does not provide them anymore so if you need them, you have to find them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's the case I'd rather put that in the release notes. I'll remove the README update here.

replacement, and the Google Finance API has not been stable since late 2017
[(source)](https://github.com/pydata/pandas-datareader/blob/da18fbd7621d473828d7fa81dfa5e0f9516b6793/README.rst).
In recent months it has become a greater and greater strain on the `empyrical`
and `pyfolio` development teams to maintain support for fetching data through
`pandas-datareader` and other third-party libraries, as these APIs are known to
be unstable.

As a result, all `empyrical` (and therefore `pyfolio`, which is a downstream
dependency) support for data reading functionality has been deprecated and will
be removed in a future version.

Users should beware that the following functions are now deprecated:

- `pyfolio.utils.default_returns_func`
- `pyfolio.utils.get_fama_french`
- `pyfolio.utils.get_returns_cached`
- `pyfolio.utils.get_symbol_returns_from_yahoo`
- `pyfolio.utils.get_treasury_yield`
- `pyfolio.utils.cache_dir`
- `pyfolio.utils.ensure_directory`
- `pyfolio.utils.data_path`
- `pyfolio.utils._1_bday_ago`
- `pyfolio.utils.load_portfolio_risk_factors`
- `pyfolio.utils.register_return_func`
- `pyfolio.utils.get_symbol_rets`

Users should expect regular failures from the following functions, pending
patches to the Yahoo or Google Finance API:

- `pyfolio.utils.default_returns_func`
- `pyfolio.utils.get_symbol_returns_from_yahoo`
- `pyfolio.utils.get_symbol_rets`

For alternative data sources, we suggest the following:

1. Migrate your research workflow to the Quantopian Research environment,
where there is [free and flexible data access to over 57
datasets](https://www.quantopian.com/data)
2. Make use of any remaining functional APIs supported by
`pandas-datareader`. These include:

- [Morningstar](https://pydata.github.io/pandas-datareader/stable/remote_data.html#remote-data-morningstar)
- [Quandl](https://pydata.github.io/pandas-datareader/stable/remote_data.html#remote-data-quandl)

Please note that you may need to create free accounts with these data
providers and receive an API key in order to access data. These API keys
should be set as environment variables, or passed as an argument to
`pandas-datareader`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well written!


## Contributing

Expand Down
2 changes: 1 addition & 1 deletion pyfolio/deprecate.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""Utilities for marking deprecated functions."""
# Copyright 2016 Quantopian, Inc.
# Copyright 2018 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down
83 changes: 8 additions & 75 deletions pyfolio/plotting.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#
# Copyright 2017 Quantopian, Inc.
# Copyright 2018 Quantopian, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -140,72 +140,6 @@ def axes_style(style='darkgrid', rc=None):
return sns.axes_style(style=style, rc=rc)


def plot_rolling_fama_french(returns,
factor_returns=None,
rolling_window=APPROX_BDAYS_PER_MONTH * 6,
legend_loc='best',
ax=None, **kwargs):
"""
Plots rolling Fama-French single factor betas.

Specifically, plots SMB, HML, and UMD vs. date with a legend.

Parameters
----------
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
factor_returns : pd.DataFrame, optional
data set containing the Fama-French risk factors. See
utils.load_portfolio_risk_factors.
rolling_window : int, optional
The days window over which to compute the beta.
legend_loc : matplotlib.loc, optional
The location of the legend on the plot.
ax : matplotlib.Axes, optional
Axes upon which to plot.
**kwargs, optional
Passed to plotting function.

Returns
-------
ax : matplotlib.Axes
The axes that were plotted on.
"""

if ax is None:
ax = plt.gca()

ax.set_title(
"Rolling Fama-French single factor betas (%.0f-month)" % (
rolling_window / APPROX_BDAYS_PER_MONTH
)
)

ax.set_ylabel('Beta')

rolling_beta = timeseries.rolling_regression(
returns,
factor_returns=factor_returns,
rolling_window=rolling_window)

rolling_beta = rolling_beta[['SMB', 'HML', 'Mom']]
rolling_beta.plot(alpha=0.7, ax=ax, **kwargs)

ax.axhline(0.0, color='black')
ax.legend(['Small cap (SMB)',
'High growth (HML)',
'Momentum (UMD)'],
loc=legend_loc, frameon=True, framealpha=0.5)

y_axis_formatter = FuncFormatter(utils.two_dec_places)
ax.yaxis.set_major_formatter(FuncFormatter(y_axis_formatter))
ax.axhline(0.0, color='black')
ax.set_xlabel('')
ax.set_ylim((-1.0, 1.0))
return ax


def plot_monthly_returns_heatmap(returns, ax=None, **kwargs):
"""
Plots a heatmap of returns by month.
Expand Down Expand Up @@ -566,9 +500,8 @@ def plot_perf_stats(returns, factor_returns, ax=None):
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
factor_returns : pd.DataFrame, optional
data set containing the Fama-French risk factors. See
utils.load_portfolio_risk_factors.
factor_returns : pd.DataFrame
Data set containing the risk factors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more specific about what we expect here? "containing the risk factors" doesn't tell me much about what I should be passing here.

ax : matplotlib.Axes, optional
Axes upon which to plot.

Expand Down Expand Up @@ -601,7 +534,7 @@ def plot_perf_stats(returns, factor_returns, ax=None):
]


def show_perf_stats(returns, factor_returns, positions=None,
def show_perf_stats(returns, factor_returns=None, positions=None,
transactions=None, turnover_denom='AGB',
live_start_date=None, bootstrap=False,
header_rows=None):
Expand All @@ -619,7 +552,7 @@ def show_perf_stats(returns, factor_returns, positions=None,
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
factor_returns : pd.Series
factor_returns : pd.Series, optional
Daily noncumulative returns of the benchmark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description here seems like it doesn't match the parameter name. What does this parameter actually do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my mind, there isn't much ambiguity in this docstring: it's the daily returns of some benchmark factor (possibly risk factor). E.g. it could the returns of SPY, or the returns of any of the Fama French factors, etc.

Perhaps "benchmark factor" would be more explicit? I've changed the docstring to reflect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some searching, I'm realizing that the same parameter has very spotty docstrings in different functions. I'm changing all functions to be this same docstring:

    factor_returns : pd.Series
         Daily noncumulative returns of the benchmark factor to which betas are
         computed. Usually a benchmark such as market returns.
         - This is in the same style as returns.

Copy link
Contributor

@ssanderson ssanderson Jun 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That docstring is definitely an improvement.

My general objection to this parameter name is that factor_returns has ambiguous plurality. Is it expected to be the returns of a single factor (in which case, what factor is it?), or is it a dataframe of factor returns ala the Q risk model? Your updated docstring answers this question for me, but ideally we would have a name here that was more self-describing to begin with. (Changing the name probably shouldn't be in the scope of this PR though).

- This is in the same style as returns.
positions : pd.DataFrame, optional
Expand Down Expand Up @@ -841,7 +774,7 @@ def cone(in_sample_returns (pd.Series),
ax.set_yscale('log' if logy else 'linear')

if volatility_match and factor_returns is None:
raise ValueError('volatility_match requires passing of'
raise ValueError('volatility_match requires passing of '
'factor_returns.')
elif volatility_match and factor_returns is not None:
bmark_vol = factor_returns.loc[returns.index].std()
Expand Down Expand Up @@ -909,7 +842,7 @@ def plot_rolling_beta(returns, factor_returns, legend_loc='best',
returns : pd.Series
Daily returns of the strategy, noncumulative.
- See full explanation in tears.create_full_tear_sheet.
factor_returns : pd.Series, optional
factor_returns : pd.Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same general note on this docstring

Daily noncumulative returns of the benchmark.
- This is in the same style as returns.
legend_loc : matplotlib.loc, optional
Expand Down Expand Up @@ -1005,7 +938,7 @@ def plot_rolling_volatility(returns, factor_returns=None,

ax.set_ylabel('Volatility')
ax.set_xlabel('')
if factor_returns.empty:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this would have crashed before if factor_returns was None, should we just make this parameter required? It seems like it was effectively required previously.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssanderson I'm not sure I see how this function would crash if factor_returns was None. Could you point out a line number?

Copy link
Contributor

@ssanderson ssanderson Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this line, we're accessing factor_returns.empty, and None doesn't have an empty attribute, so this would have crashed with an AttributeError if None was passed.

Copy link
Contributor Author

@eigenfoo eigenfoo Jun 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ssanderson yes, which is why I changed it to if factor_returns is None; there's no more reference to the empty attribute` anywhere in this function. Am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eigenfoo my point here was that, since factor_returns was previously required for this function to work, we might want to just consider making it actually a required argument and not call this function at all if we don't have a benchmark. Whether or not that makes sense depends on how useful we think this plot is without access to the benchmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see! No, that was intentional: see this issue. The idea is that we want to make pyfolio as benchmark-independent as possible: if a benchmark is passed, everything works as required. If not, pyfolio simply skips the analyses that depend on benchmarks.

if factor_returns is None:
ax.legend(['Volatility', 'Average volatility'],
loc=legend_loc, frameon=True, framealpha=0.5)
else:
Expand Down
Loading