Using joblib to parallelise certain tasks. Feedback before PR #671
Replies: 2 comments
-
I've played with this sort of thing before and never got it to work properly, so I'm a big fan of the concept, especially for obvious bottlenecks like these which can be run in parallel. To specifics; I generally don't like new dependencies, I'll be honest; the tqdm_joblib should be in The specific code that actually does stuff:
... is not super attractive. I'd prefer something like:
Where Why batch_size = 100? I'd expect a comment here. Make sure there's an option to disable or reduce the process count explicitly else risk memory exhaustion for folks with many cores and little ram YES: My big concern with this sort of thing is if it doesn't work, then you're sort of buggered (which is why I've never gone down this route). So in practice I've liked the idea of having some kind of config switch to turn parallel and non parallel behaviour on and off. The problem with that is you eithier pass the config flag down through 30 different layers of abstraction or you pull in a default_config and check the flag in there. The former is horrible. The problem with the latter that is that you can't override that behaviour by changing the system backtest file and it's very unintuitive and breaks abstraction; I used to have this pattern a lot and I've succesfully weeded them all out so I don't really want to go backwards. The other alternative is to have a flag in the .py file itself, C pre-processor style. Ideally this would be in
Thoughts? |
Beta Was this translation helpful? Give feedback.
-
Interesting! I've actually implemented the (parallerization of risk and forecast calculation some time ago. The patch includes few hacky modifications and I wasn't so sure if Rob wanted to include these into the main branch, so I didn't do a PR. The main annoyance for parallel forecast calculation was having to modify dataBlob because each process needs a separate connection to Mongo. You have to explicitly call precalculate_forecasts() before running the simulation, because I didn't figure out any other nice way to create the parallel jobs ad hoc. precalculate_forecasts() will end up storing results in the cache, which then the simulation will later use. At least in my testing I haven't run into any oddities, but I don't use it in the production system (only for speculating and testing different rules for simulations). |
Beta Was this translation helpful? Give feedback.
-
Hiya All,
I whipped up a quick implementation of
joblib
based parallelisation (do we use the Queen's English around here or should that have a 'z'?) last night. Oncalc_portfolio_risk_series
as a start as that was a pretty obvious expense operation. Hope is to put together a reasonable approach to parallelization that can then be implemented across other operations. Usingjoblib
as expectation is that for most part folks going to be running on a single node, but, can swap out backend fordask
which would allow cluster execution too. I'm a long time R user but Python's still new to me... so be gentle...Love some feedback before I send as a PR. It's pretty easy to play with yourself, just grab
portfolio_risk.py
from my fork.https://github.com/cauldnz/pysystemtrade/blob/chauld/wip/parallel-risk/sysquant/portfolio_risk.py
On my machine (Lenovo P1 w/ i9-10885H) wall-time of the 'Calculating portfolio risk' loop goes from ~100s down to ~30 seconds running the
simple_system.py
. Tested on Ubuntu 22.04 running under WSL.Would love some feedback on:
progressBar
fortqdm
, really just becuase there was some good looking code to borrow and all things point to it having lower overhead as well. I think @bug-or-feature did the progressBar impl so feedback there be great?joblib
andtqdm
. Feels reasonable and minimal?Todos still:
Beta Was this translation helpful? Give feedback.
All reactions