DEP: Deprecate all data reading functionality via pandas-datareader #97

eigenfoo · 2018-05-18T18:38:10Z

Deprecates all data-fetching functionality via pandas-datareader, and adds a simple_returns helper function to replace that functionality that was implicitly done in the data-fetchers.

To quote the REAME:

As of early 2018, Yahoo Finance has suffered major API breaks with no stable replacement, and the Google Finance API has not been stable since late 2017 (source). In recent months it has become a greater and greater strain on the empyrical development team to maintain support for fetching data through pandas-datareader and other third-party libraries, as these APIs are known to be unstable.

As a result, all empyrical support for data reading functionality has been deprecated and will be removed in a future version.

Users should beware that the following functions are now deprecated:

empyrical.utils.cache_dir
empyrical.utils.data_path
empyrical.utils.ensure_directory
empyrical.utils.get_utc_timestamp
empyrical.utils._1_bday_ago
empyrical.utils.get_fama_french
empyrical.utils.load_portfolio_risk_factors
empyrical.utils.default_returns_func
empyrical.utils.get_symbol_returns_from_yahoo

Users should expect regular failures from the following functions, pending patches to the Yahoo or Google Finance API:

empyrical.utils.default_returns_func
empyrical.utils.get_symbol_returns_from_yahoo

eigenfoo · 2018-05-18T18:39:29Z

@twiecki @richafrank requesting review from research and engineering, ready to merge otherwise.

twiecki · 2018-05-22T08:42:25Z

LGTM. One thing we should add, though, is a function that takes prices and computes correctly indexed returns which one can pass into pyfolio.

eigenfoo · 2018-05-22T11:35:53Z

@twiecki done

twiecki · 2018-05-22T11:50:36Z

Thanks, this definitely requires engineering review. CC @richafrank

eigenfoo · 2018-06-01T14:39:23Z

empyrical/utils.py

@@ -24,6 +24,16 @@
 import pandas as pd
 from pandas.tseries.offsets import BDay
 from pandas_datareader import data as web


@twiecki @richafrank there is a known import error with pandas_datareader: it has been fixed upstream here, but has yet to be released in a new version. See this StackOverflow thread, and this pyfolio issue that brought this to my attention.

Wondering what we should do about this. Perhaps wrap it in a try-except, and raise a warning if the import fails? If we're deprecating pandas-datareader functionality anyways, it doesn't make sense to keep this line around if it raises an import error.

twiecki · 2018-06-04T09:01:38Z

That sounds like a good solution to me.

…

On Fri, Jun 1, 2018 at 4:39 PM, George Ho ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In empyrical/utils.py <#97 (comment)>: > @@ -24,6 +24,16 @@ import pandas as pd from pandas.tseries.offsets import BDay from pandas_datareader import data as web @twiecki <https://github.com/twiecki> @richafrank <https://github.com/richafrank> there is a known import error with pandas_datareader: it has been fixed upstream here <pydata/pandas-datareader#520>, but has yet to be released in a new version. See this StackOverflow thread <https://stackoverflow.com/questions/50394873/import-pandas-datareader-gives-importerror-cannot-import-name-is-list-like> . Wondering what we should do about this. Perhaps wrap it in a try-except, and raise a warning if the import fails? If we're deprecating pandas-datareader functionality anyways, it doesn't make sense to keep this line around if it raises an import error. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#97 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AApJmHDvFXNEj4bVyfBRWfVs4zWxUQnrks5t4VIkgaJpZM4UFLcn> .

eigenfoo · 2018-06-04T13:31:27Z

Done.

ssanderson

@eigenfoo I took a pass here.

In general the deprecation warning implementation seems reasonable. I had some concerns about how strongly we want to push people toward Quantopian research, and I had some more philosophical comments on the newly-added returns function.

ssanderson · 2018-06-06T14:46:25Z

README.md

+
+For alternative data sources, we suggest the following:
+
+1. Migrate your research workflow to the Quantopian Research environment,


I'm a little torn on whether we want to include this here. On the one hand, it's definitely a useful and relevant resource for a consumer of empyrical. On the other hand, it feels a bit like advertising, which feels a little odd in an open source project.

I'd be curious to hear thoughts on this from others who regularly work on open source (cc @twiecki @richafrank @llllllllll @ehebert, @TimShawver)

Yea I agree...I've been thinking about some similar questions with qgrid since we're going to direct people to research from there. It feels OK to me in the qgrid case because we're going to be using research to host a demo of qgrid. In this case it feels a bit weird because we're not directing them to anything empyrical-specific within research, we're just saying here's another place you can get data.

I don't know if making the statement less imperative would help it make it less of an advertisement.
i.e. "we suggest" -> "these options are available".

I lean on the side of removing, so that we don't have to worry about semantics.

Okay, removed all suggestions.

ssanderson · 2018-06-06T14:49:20Z

empyrical/utils.py

@@ -230,10 +274,12 @@ def get_utc_timestamp(dt):
 _1_bday = BDay()


+@deprecated(msg=DATAREADER_DEPRECATION_WARNING)


Why is this being deprecated?

It's only called in functions that this PR is deprecating, and since this function is for internal use only (i.e. starts with an underscore), I thought it would make sense to deprecate it. Is that not good practice?

You could, though the particular message is seems a bit confusing for this function. If this is called from deprecated functions, then I'd also be worried about warning spam from this. Since this is a "private" function (in general, functions prefixed with an _ are private by convention), it doesn't seem all that important to me to deprecate this.

Removed deprecation warning.

ssanderson · 2018-06-06T14:50:27Z

empyrical/utils.py

@@ -205,14 +224,39 @@ def ensure_directory(path):
            raise


+def compute_returns(prices):


Why is this function being added in this PR? This seems out of place with the rest of the changes here.

If we do want this, I would probably call this something like simple_returns to distinguish it from log returns. As a general naming note, I'm usually not a fan of including generic verbs like compute in function names. All functions compute things, so the compute_ doesn't really add any useful information for a reader.

Previously, get_symbol_returns_from_yahoo (which is being deprecated) fetched pricing data from Yahoo, and computed returns, as in this line:
https://github.com/quantopian/empyrical/blob/master/empyrical/utils.py#L420

This function is merely to retain support for the conversion from prices to returns. Thinking about it, I agree that putting it in utils is a bit out of place, and that it probably belongs in stats with the name simple_returns.

ssanderson · 2018-06-06T15:00:18Z

empyrical/utils.py

+    """
+
+    rets = prices.pct_change().dropna()
+    rets.index = rets.index.tz_localize('UTC')


Same note as on your other PR re: not updating values in place here.

This stops being a problem if we implement the other changes discussed above.

ssanderson · 2018-06-06T15:39:08Z

empyrical/utils.py

+        and index coerced to be tz-aware.
+    """
+
+    rets = prices.pct_change().dropna()


Calling dropna() and modifying the index here feel philosophically out of place to me for empyrical.

In my mind at least, the primary purpose of empyrical is to be low-level infrastructure that can be shared betwen Zipline, PyFolio, and Alphalens. Zipline pretty much never wants to call a function that silently discards bad data, or that changes the localization of a datetime index, because both of those behaviors prevent us from noticing bugs, and they incur nontrivial performance overhead.

Calling dropna() in particular here seems a bit fraught if we expect DataFrame input, because the behavior there is that pandas will drop an entire row if there are any NaNs in the row. If we expect the user to have missing data, then that seems like not a great behavior (if they pass enough columns, it seems likely that they'll drop a large percentage of their input data). If we don't expect them to have missing data, then we're just incurring extra cost for no reason.

If we want to include a simple returns function in empyrical, I would probably put it in stats and write it as something like:

def simple_returns(prices): """Compute simple returns from a timeseries of prices. <rest of the docstring> """ if isinstance(prices, (pd.DataFrame, pd.Series)): return prices.pct_change().iloc[1:] else: # Assume ndarray out = np.diff(prices, axis=0) np.divide(out, prices[:-1], out=out) return out

This also needs a test if we decide to keep it.

This will sound bad but I was calling dropna because that was the way it was done before, and I assumed there was a reason for that 😦 I believe that dropna was only used to drop the first row, which would be NaN. Given that, using .iloc[1:] is more watertight.

I agree that this function belongs in stats, though. The only question is whether or not we want empyrical to tz-localize anything. Philosophically, I feel as if that should be the responsibility of downstream packages like alphalens and pyfolio, since empyrical is only responsible for low-level financial calculations, and is not meant to be user-facing at all. This relates to the input sanitization PR for pyfolio.

@twiecki thoughts? I've went ahead and implemented these changes

Yes, .iloc[1:] is a bit clearer on the intent.

And yeah, I'd be happy if we just completely did away with the whole tz-aware thing. Since we don't provide any data in the future ourselves, we can probably just ignore it and the user has to ensure that it matches up.

ssanderson · 2018-06-06T15:40:23Z

empyrical/utils.py

+         "version."
+         "\n"
+         "Please use empyrical in the Quantopian Research environment, or "
+         "supply your own data. See README.md for more details.")


A user reading this error message isn't necessarily going to know what README.md you mean. I'd probably make this a link the github README, and/or add a section in the docs.

ssanderson · 2018-06-06T15:41:55Z

empyrical/utils.py

+         "in empyrical has been deprecated and will be removed in a future "
+         "version."
+         "\n"
+         "Please use empyrical in the Quantopian Research environment, or "


My inclination would be to drop the reference to Quantopian Research here, especially if we include a reference to it in the README.

ssanderson · 2018-06-06T15:44:58Z

empyrical/utils.py

@@ -205,14 +224,39 @@ def ensure_directory(path):
            raise


+def compute_returns(prices):
+    """
+    Computes correctly-indexed returns from prices.


As a reader of this function, I'm not sure I would know what "correctly-indexed" means in this context. Is there something more specific we can say here?

This used to mean 'tz-localized', but with the other comments below I don't think we want to do that after all.

eigenfoo · 2018-06-07T16:22:43Z

Note to self: still need to write a test for the simple_returns function.

eigenfoo · 2018-06-08T19:46:13Z

Not quite sure why tests are failing. The checks fail on my local machine as well, but using pdb doesn't reproduce the error.

Any pointers @ssanderson @twiecki?

eigenfoo · 2018-06-13T14:54:23Z

@twiecki fixed unit tests. Ready for another pass, or merging.

ssanderson · 2018-06-13T15:12:34Z

README.md

+- `empyrical.utils.cache_dir`
+- `empyrical.utils.data_path`
+- `empyrical.utils.ensure_directory`
+- `empyrical.utils._1_bday_ago`


This isn't deprecated anymore.

ssanderson · 2018-06-13T15:17:55Z

@eigenfoo I had one more comment on the README. Otherwise this LGTM.

twiecki · 2018-06-14T14:43:12Z

Thanks @eigenfoo!

eigenfoo added 5 commits May 18, 2018 13:57

DEP: Deprecate all functions using pandas-datareader

d963764

DOC: Update README with deprecation documentation

3ee34b0

STY: Markdown style

c44a0b0

STY: Markdown style again

8ed19fd

REV: revert previous commit

20b2a54

eigenfoo added 4 commits May 18, 2018 14:45

STY: typo

24b9fe0

STY: consistent naming convention

983e4c9

DEP: also deprecate any cacheing of data

ee42d05

DEP: forgot to deprecate additional funcs

c6d2f7e

eigenfoo mentioned this pull request May 18, 2018

DEP: Deprecate all data reading functionality via pandas-datareader; ensure independence from SPY and FF quantopian/pyfolio#536

Merged

REV: get_utc_timestamp should not be deprecated

6c57e1d

ENH: add function to compute returns from prices

43a5e51

twiecki assigned richafrank May 22, 2018

eigenfoo commented Jun 1, 2018

View reviewed changes

BUG: wrap import in try-except

f8f1682

ssanderson reviewed Jun 6, 2018

View reviewed changes

eigenfoo added 4 commits June 6, 2018 13:45

MAINT: update deprecation warning

b56456c

MAINT: move simple_returns func to stats module

bc17ec6

MAINT: don't raise deprecation warning for _1_bday_ago

970f9b0

DOC: remove suggestions

f45611f

eigenfoo added 3 commits June 7, 2018 17:40

TST: added test for simple_returns

7e802a2

MAINT: add simple_returns to init

a0b0fa4

TST: fixed simple_returns test

48829b6

STY: use size, not shape

8de4201

TST: tests passing

24b9b0f

ssanderson reviewed Jun 13, 2018

View reviewed changes

DOC: 1_bday_ago no longer deprecated

c307031

twiecki merged commit 30a5c4c into quantopian:master Jun 14, 2018

eigenfoo mentioned this pull request Dec 12, 2018

KeyError: 'date' #106

Open


		For alternative data sources, we suggest the following:

		1. Migrate your research workflow to the Quantopian Research environment,

		@@ -230,10 +274,12 @@ def get_utc_timestamp(dt):
		_1_bday = BDay()


		@deprecated(msg=DATAREADER_DEPRECATION_WARNING)

		@@ -205,14 +224,39 @@ def ensure_directory(path):
		raise


		def compute_returns(prices):

DEP: Deprecate all data reading functionality via pandas-datareader #97

DEP: Deprecate all data reading functionality via pandas-datareader #97

Conversation

eigenfoo commented May 18, 2018 • edited Loading

eigenfoo commented May 18, 2018 • edited Loading

twiecki commented May 22, 2018

eigenfoo commented May 22, 2018

twiecki commented May 22, 2018

eigenfoo Jun 1, 2018 • edited Loading

Choose a reason for hiding this comment

twiecki commented Jun 4, 2018 via email

eigenfoo commented Jun 4, 2018

ssanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigenfoo Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigenfoo Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twiecki Jun 6, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eigenfoo commented Jun 7, 2018 • edited Loading

eigenfoo commented Jun 8, 2018 • edited Loading

eigenfoo commented Jun 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssanderson commented Jun 13, 2018

twiecki commented Jun 14, 2018

eigenfoo commented May 18, 2018 •

edited

Loading

eigenfoo commented May 18, 2018 •

edited

Loading

eigenfoo Jun 1, 2018 •

edited

Loading

eigenfoo Jun 6, 2018 •

edited

Loading

eigenfoo Jun 6, 2018 •

edited

Loading

twiecki Jun 6, 2018 •

edited

Loading

eigenfoo commented Jun 7, 2018 •

edited

Loading

eigenfoo commented Jun 8, 2018 •

edited

Loading