Using joblib to parallelise certain tasks. Feedback before PR #671

cauldnz · 2022-07-04T21:36:27Z

cauldnz
Jul 4, 2022

Hiya All,

I whipped up a quick implementation of joblib based parallelisation (do we use the Queen's English around here or should that have a 'z'?) last night. On calc_portfolio_risk_series as a start as that was a pretty obvious expense operation. Hope is to put together a reasonable approach to parallelization that can then be implemented across other operations. Using joblib as expectation is that for most part folks going to be running on a single node, but, can swap out backend for dask which would allow cluster execution too. I'm a long time R user but Python's still new to me... so be gentle...

Love some feedback before I send as a PR. It's pretty easy to play with yourself, just grab portfolio_risk.py from my fork.
https://github.com/cauldnz/pysystemtrade/blob/chauld/wip/parallel-risk/sysquant/portfolio_risk.py

On my machine (Lenovo P1 w/ i9-10885H) wall-time of the 'Calculating portfolio risk' loop goes from ~100s down to ~30 seconds running the simple_system.py. Tested on Ubuntu 22.04 running under WSL.

Would love some feedback on:

Folks, @robcarver17, comfortable with the general pattern? Useful? Juice worth the squeeze?
It swaps out progressBar for tqdm, really just becuase there was some good looking code to borrow and all things point to it having lower overhead as well. I think @bug-or-feature did the progressBar impl so feedback there be great?
Does add some new dependencies joblib and tqdm. Feels reasonable and minimal?

Todos still:

Make sure there's an option to disable or reduce the process count explicitly else risk memory exhaustion for folks with many cores and little ram

robcarver17 · 2022-07-05T08:14:26Z

robcarver17
Jul 5, 2022
Maintainer

I've played with this sort of thing before and never got it to work properly, so I'm a big fan of the concept, especially for obvious bottlenecks like these which can be run in parallel.

To specifics;

I generally don't like new dependencies, I'll be honest; the progressBar (which was me, BTW) is 144 lines, and tqdm is massive; of course it does lot's more stuff but we don't need all that new stuff. In theory be in favour of switching out progressBar for a standardised alternative that just does what progressBar does and nothing else, especially if it was faster, but that isn't tqdm.

tqdm_joblib should be in syscore.parallel.py (for example) if it's going to be used widely rather than just here. And not prefixed tqdm!

The specific code that actually does stuff:

   calculate_risk_ = partial(calculate_risk, portfolio_weights=portfolio_weights, 
        list_of_correlations=list_of_correlations, pd_of_stdev=pd_of_stdev)

    # Parallel execution. Uses all CPUs but one (-2) and arbitrary, but relatively large, batch size for efficiency
    # Has been tested using threads rather than processes, poor performance indicates GIL is held so uses processes
    with tqdm_joblib(tqdm(desc="Calculating portfolio risk", total=len(common_index))) as progress_bar:
        risk_list = joblib.Parallel(n_jobs=-2, prefer="processes", batch_size=100)(joblib.delayed(calculate_risk_)(i) for i in common_index)

... is not super attractive. I'd prefer something like:

  calculate_risk_ = parallel_job_calculation_with_partial(calculate_risk, *args, **kwargs )

Where parallel_job_calculation_with_partial also lives in syscore.parallel.py and the signature is the same as a partial signature.

Why batch_size = 100? I'd expect a comment here.

Make sure there's an option to disable or reduce the process count explicitly else risk memory exhaustion for folks with many cores and little ram

YES: My big concern with this sort of thing is if it doesn't work, then you're sort of buggered (which is why I've never gone down this route). So in practice I've liked the idea of having some kind of config switch to turn parallel and non parallel behaviour on and off. The problem with that is you eithier pass the config flag down through 30 different layers of abstraction or you pull in a default_config and check the flag in there. The former is horrible. The problem with the latter that is that you can't override that behaviour by changing the system backtest file and it's very unintuitive and breaks abstraction; I used to have this pattern a lot and I've succesfully weeded them all out so I don't really want to go backwards.

The other alternative is to have a flag in the .py file itself, C pre-processor style.

Ideally this would be in syscore.parallel.py rather than in the (dozen?) or so places that this gets used, and be something like:

USE_PARALLEL_CODE = True ## Change this if you don't want this to happen

def parallel_job_calculation_with_partial(func_name, *args, **kwargs ):
    if USE_PARALLEL_CODE:
        ans = _parallel_job_calculation_with_partial_without_checking_flag(func_name, *args, **kwargs)
    else:
        ans = _non_parallel_job_calculation_with_partial(func_name, *args, **kwargs)
   return ans

Thoughts?

0 replies

mjuhanne · 2022-07-05T18:33:35Z

mjuhanne
Jul 5, 2022

Interesting! I've actually implemented the (parallerization of risk and forecast calculation some time ago. The patch includes few hacky modifications and I wasn't so sure if Rob wanted to include these into the main branch, so I didn't do a PR.

The main annoyance for parallel forecast calculation was having to modify dataBlob because each process needs a separate connection to Mongo. You have to explicitly call precalculate_forecasts() before running the simulation, because I didn't figure out any other nice way to create the parallel jobs ad hoc. precalculate_forecasts() will end up storing results in the cache, which then the simulation will later use.

At least in my testing I haven't run into any oddities, but I don't use it in the production system (only for speculating and testing different rules for simulations).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using joblib to parallelise certain tasks. Feedback before PR #671

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Using joblib to parallelise certain tasks. Feedback before PR #671

cauldnz Jul 4, 2022

Replies: 2 comments

robcarver17 Jul 5, 2022 Maintainer

mjuhanne Jul 5, 2022

cauldnz
Jul 4, 2022

robcarver17
Jul 5, 2022
Maintainer

mjuhanne
Jul 5, 2022